51
|
Benjelloun B, Boyer F, Streeter I, Zamani W, Engelen S, Alberti A, Alberto FJ, BenBati M, Ibnelbachyr M, Chentouf M, Bechchari A, Rezaei HR, Naderi S, Stella A, Chikhi A, Clarke L, Kijas J, Flicek P, Taberlet P, Pompanon F. An evaluation of sequencing coverage and genotyping strategies to assess neutral and adaptive diversity. Mol Ecol Resour 2019; 19:1497-1515. [PMID: 31359622 PMCID: PMC7115901 DOI: 10.1111/1755-0998.13070] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 06/30/2019] [Accepted: 07/08/2019] [Indexed: 12/12/2022]
Abstract
Whole genome sequences (WGS) greatly increase our ability to precisely infer population genetic parameters, demographic processes, and selection signatures. However, WGS may still be not affordable for a representative number of individuals/populations. In this context, our goal was to assess the efficiency of several SNP genotyping strategies by testing their ability to accurately estimate parameters describing neutral diversity and to detect signatures of selection. We analysed 110 WGS at 12× coverage for four different species, i.e., sheep, goats and their wild counterparts. From these data we generated 946 data sets corresponding to random panels of 1K to 5M variants, commercial SNP chips and exome capture, for sample sizes of five to 48 individuals. We also extracted low-coverage genome resequencing of 1×, 2× and 5× by randomly subsampling reads from the 12× resequencing data. Globally, 5K to 10K random variants were enough for an accurate estimation of genome diversity. Conversely, commercial panels and exome capture displayed strong ascertainment biases. Besides the characterization of neutral diversity, the detection of the signature of selection and the accurate estimation of linkage disequilibrium (LD) required high-density panels of at least 1M variants. Finally, genotype likelihoods increased the quality of variant calling from low coverage resequencing but proportions of incorrect genotypes remained substantial, especially for heterozygote sites. Whole genome resequencing coverage of at least 5× appeared to be necessary for accurate assessment of genomic variations. These results have implications for studies seeking to deploy low-density SNP collections or genome scans across genetically diverse populations/species showing similar genetic characteristics and patterns of LD decay for a wide variety of purposes.
Collapse
Affiliation(s)
- Badr Benjelloun
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
- National Institute of Agronomic Research (INRA Maroc), Regional Centre of Agronomic Research, 23000 Beni-Mellal, Morocco
| | - Frédéric Boyer
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| | - Ian Streeter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Wahid Zamani
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
- Department of Environmental Sciences, Faculty of Natural Resources and Marine Sciences, Tarbiat Modares University, 46417-76489 Noor, Mazandaran, Iran
| | - Stefan Engelen
- CEA - Institut de biologie François-Jacob, Genoscope, 2 Rue Gaston Cremieux 91057 Evry Cedex, France
| | - Adriana Alberti
- CEA - Institut de biologie François-Jacob, Genoscope, 2 Rue Gaston Cremieux 91057 Evry Cedex, France
| | - Florian J. Alberto
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| | - Mohamed BenBati
- National Institute of Agronomic Research (INRA Maroc), Regional Centre of Agronomic Research, 23000 Beni-Mellal, Morocco
| | - Mustapha Ibnelbachyr
- National Institute of Agronomic Research (INRA Maroc), CRRA Errachidia, 52000 Errachidia, Morocco
| | - Mouad Chentouf
- National Institute of Agronomic Research (INRA Maroc), CRRA Tangier, 90010 Tangier, Morocco
| | - Abdelmajid Bechchari
- National Institute of Agronomic Research (INRA Maroc), CRRA Oujda, 60000 Oujda, Morocco
| | - Hamid R. Rezaei
- Department of Environmental Sci, Gorgan University of Agricultural Sciences & Natural Resources, 41996-13776 Gorgan, Iran
| | - Saeid Naderi
- Environmental Sciences Department, Natural Resources Faculty, University of Guilan, 49138-15749 Guilan, Iran
| | - Alessandra Stella
- PTP Science Park, Bioinformatics Unit, Via Einstein-Loc. Cascina Codazza, 26900 Lodi, Italy
| | - Abdelkader Chikhi
- National Institute of Agronomic Research (INRA Maroc), CRRA Errachidia, 52000 Errachidia, Morocco
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - James Kijas
- Commonwealth Scientific and Industrial Research Organisation Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Pierre Taberlet
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| | - François Pompanon
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| |
Collapse
|
52
|
Euclide PT, McKinney GJ, Bootsma M, Tarsa C, Meek MH, Larson WA. Attack of the PCR clones: Rates of clonality have little effect on RAD‐seq genotype calls. Mol Ecol Resour 2019; 20:66-78. [DOI: 10.1111/1755-0998.13087] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 08/12/2019] [Accepted: 08/16/2019] [Indexed: 12/11/2022]
Affiliation(s)
- Peter T. Euclide
- Wisconsin Cooperative Fishery Research Unit College of Natural Resources University of Wisconsin‐Stevens Point Stevens Point WI USA
| | - Garrett J. McKinney
- School of Aquatic and Fishery Sciences University of Washington Seattle WA USA
| | - Matthew Bootsma
- Wisconsin Cooperative Fishery Research Unit College of Natural Resources University of Wisconsin‐Stevens Point Stevens Point WI USA
| | - Charlene Tarsa
- Department of Integrative Biology and AgBio Research Michigan State University East Lansing MI USA
| | - Mariah H. Meek
- Department of Integrative Biology and AgBio Research Michigan State University East Lansing MI USA
| | - Wesley A. Larson
- U.S. Geological Survey Wisconsin Cooperative Fishery Research Unit College of Natural Resources University of Wisconsin‐Stevens Point Stevens Point WI USA
| |
Collapse
|
53
|
Single nucleotide polymorphism markers for genotyping hawksbill turtles (Eretmochelys imbricata). CONSERV GENET RESOUR 2019. [DOI: 10.1007/s12686-019-01112-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
54
|
Ko A, Nielsen R. Joint Estimation of Pedigrees and Effective Population Size Using Markov Chain Monte Carlo. Genetics 2019; 212:855-868. [PMID: 31123041 PMCID: PMC6614905 DOI: 10.1534/genetics.119.302280] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 05/16/2019] [Indexed: 12/31/2022] Open
Abstract
Pedigrees provide the genealogical relationships among individuals at a fine resolution and serve an important function in many areas of genetic studies. One such use of pedigree information is in the estimation of the short-term effective population size [Formula: see text], which is of great relevance in fields such as conservation genetics. Despite the usefulness of pedigrees, however, they are often an unknown parameter and must be inferred from genetic data. In this study, we present a Bayesian method to jointly estimate pedigrees and [Formula: see text] from genetic markers using Markov Chain Monte Carlo. Our method supports analysis of a large number of markers and individuals within a single generation with the use of a composite likelihood, which significantly increases computational efficiency. We show, on simulated data, that our method is able to jointly estimate relationships up to first cousins and [Formula: see text] with high accuracy. We also apply the method on a real dataset of house sparrows to reconstruct their previously unreported pedigree.
Collapse
Affiliation(s)
- Amy Ko
- Department of Integrative Biology, University of California, Berkeley, 94720 California
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, 94720 California
- Department of Statistics, University of California, Berkeley, 94720 California
- Museum of Natural History, University of Copenhagen, 1123 Denmark
| |
Collapse
|
55
|
Gervais L, Perrier C, Bernard M, Merlet J, Pemberton JM, Pujol B, Quéméré E. RAD-sequencing for estimating genomic relatedness matrix-based heritability in the wild: A case study in roe deer. Mol Ecol Resour 2019; 19:1205-1217. [PMID: 31058463 DOI: 10.1111/1755-0998.13031] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Revised: 04/19/2019] [Accepted: 04/23/2019] [Indexed: 01/02/2023]
Abstract
Estimating the evolutionary potential of quantitative traits and reliably predicting responses to selection in wild populations are important challenges in evolutionary biology. The genomic revolution has opened up opportunities for measuring relatedness among individuals with precision, enabling pedigree-free estimation of trait heritabilities in wild populations. However, until now, most quantitative genetic studies based on a genomic relatedness matrix (GRM) have focused on long-term monitored populations for which traditional pedigrees were also available, and have often had access to knowledge of genome sequence and variability. Here, we investigated the potential of RAD-sequencing for estimating heritability in a free-ranging roe deer (Capreolous capreolus) population for which no prior genomic resources were available. We propose a step-by-step analytical framework to optimize the quality and quantity of the genomic data and explore the impact of the single nucleotide polymorphism (SNP) calling and filtering processes on the GRM structure and GRM-based heritability estimates. As expected, our results show that sequence coverage strongly affects the number of recovered loci, the genotyping error rate and the amount of missing data. Ultimately, this had little effect on heritability estimates and their standard errors, provided that the GRM was built from a minimum number of loci (above 7,000). Genomic relatedness matrix-based heritability estimates thus appear robust to a moderate level of genotyping errors in the SNP data set. We also showed that quality filters, such as the removal of low-frequency variants, affect the relatedness structure of the GRM, generating lower h2 estimates. Our work illustrates the huge potential of RAD-sequencing for estimating GRM-based heritability in virtually any natural population.
Collapse
Affiliation(s)
- Laura Gervais
- CEFS, INRA, Université de Toulouse, Castanet-Tolosan, Cedex, France.,Laboratoire Évolution & Diversité Biologique (EDB UMR 5174), CNRS, IRD, UPS, Université Fédérale de Toulouse Midi-Pyrénées, Toulouse, France
| | | | - Maria Bernard
- SIGENAE, INRA, Jouy-en-Josas, France.,GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | - Joël Merlet
- CEFS, INRA, Université de Toulouse, Castanet-Tolosan, Cedex, France
| | - Josephine M Pemberton
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Benoit Pujol
- Laboratoire Évolution & Diversité Biologique (EDB UMR 5174), CNRS, IRD, UPS, Université Fédérale de Toulouse Midi-Pyrénées, Toulouse, France.,PSL Université Paris: EPHE-UPVD-CNRS, Université de Perpignan, Perpignan, France
| | - Erwan Quéméré
- CEFS, INRA, Université de Toulouse, Castanet-Tolosan, Cedex, France
| |
Collapse
|
56
|
Díaz-Arce N, Rodríguez-Ezpeleta N. Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better? Front Genet 2019; 10:533. [PMID: 31191624 PMCID: PMC6549478 DOI: 10.3389/fgene.2019.00533] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 05/16/2019] [Indexed: 11/25/2022] Open
Abstract
Restriction site-associated DNA sequencing (RAD-seq) has become a powerful and widely used tool in molecular ecology studies as it allows to cost-effectively recover thousands of polymorphic sites across individuals of non-model organisms. However, its successful implementation in population genetics relies on correct data processing that would minimize potential loci-assembly biases and consequent genotyping error rates. RAD-seq data processing when no reference genome is available involves the assembly of hundreds of thousands high-throughput sequencing reads into orthologous loci, for which various key parameter values need to be selected by the researcher. Previous studies exploring the effect of these parameter values found or assumed that a larger number of recovered polymorphic loci is associated with a better assembly. Here, using three RAD-seq datasets from different species, we explore the effect of read filtering, loci assembly and polymorphic site selection on number of markers obtained and genetic differentiation inferred using the Stacks software. We find (i) that recovery of higher numbers of polymorphic loci is not necessarily associated with higher genetic differentiation, (ii) that the presence of PCR duplicates, selected loci assembly parameters and selected SNP filtering parameters affect the number of recovered polymorphic loci and degree of genetic differentiation, and (iii) that this effect is different in each dataset, meaning that defining a systematic universal protocol for RAD-seq data analysis may lead to missing relevant information about population differentiation.
Collapse
|
57
|
Hard JJ. Robin S. Waples—Recipient of the 2018 Molecular Ecology Prize. Mol Ecol 2019; 28:29-32. [DOI: 10.1111/mec.14959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 11/09/2018] [Indexed: 11/29/2022]
Affiliation(s)
- Jeffrey J. Hard
- Conservation Biology Division, Northwest Fisheries Science Center, National Marine Fisheries Service National Oceanic and Atmospheric Administration Seattle Washington
| |
Collapse
|
58
|
Waples RS, Lindley ST. Genomics and conservation units: The genetic basis of adult migration timing in Pacific salmonids. Evol Appl 2018; 11:1518-1526. [PMID: 30344624 PMCID: PMC6183503 DOI: 10.1111/eva.12687] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Revised: 07/18/2018] [Accepted: 07/20/2018] [Indexed: 01/01/2023] Open
Abstract
It is now routinely possible to generate genomics-scale datasets for nonmodel species; however, many questions remain about how best to use these data for conservation and management. Some recent genomics studies of anadromous Pacific salmonids have reported a strong association between alleles at one or a very few genes and a key life history trait (adult migration timing) that has played an important role in defining conservation units. Publication of these results has already spurred a legal challenge to the existing framework for managing these species, which was developed under the paradigm that most phenotypic traits are controlled by many genes of small effect, and that parallel evolution of life history traits is common. But what if a key life history trait can only be expressed if a specific allele is present? Does the current framework need to be modified to account for the new genomics results, as some now propose? Although this real-world example focuses on Pacific salmonids, the issues regarding how genomics can inform us about the genetic basis of phenotypic traits, and what that means for applied conservation, are much more general. In this perspective, we consider these issues and outline a general process that can be used to help generate the types of additional information that would be needed to make informed decisions about the adequacy of existing conservation and management frameworks.
Collapse
Affiliation(s)
- Robin S. Waples
- NOAA FisheriesNorthwest Fisheries Science CenterSeattleWashington
| | - Steven T. Lindley
- NOAA FisheriesSouthwest Fisheries Science CenterSanta CruzCalifornia
| |
Collapse
|
59
|
O'Leary SJ, Puritz JB, Willis SC, Hollenbeck CM, Portnoy DS. These aren't the loci you'e looking for: Principles of effective SNP filtering for molecular ecologists. Mol Ecol 2018; 27:3193-3206. [PMID: 29987880 DOI: 10.1111/mec.14792] [Citation(s) in RCA: 180] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 06/23/2018] [Accepted: 06/26/2018] [Indexed: 12/16/2022]
Abstract
Sequencing reduced-representation libraries of restriction site-associated DNA (RADseq) to identify single nucleotide polymorphisms (SNPs) is quickly becoming a standard methodology for molecular ecologists. Because of the scale of RADseq data sets, putative loci cannot be assessed individually, making the process of filtering noise and correctly identifying biologically meaningful signal more difficult. Artefacts introduced during library preparation and/or bioinformatic processing of SNP data can create patterns that are incorrectly interpreted as indicative of population structure or natural selection. Therefore, it is crucial to carefully consider types of errors that may be introduced during laboratory work and data processing, and how to minimize, detect and remove these errors. Here, we discuss issues inherent to RADseq methodologies that can result in artefacts during library preparation and locus reconstruction resulting in erroneous SNP calls and, ultimately, genotyping error. Further, we describe steps that can be implemented to create a rigorously filtered data set consisting of markers accurately representing independent loci and compare the effect of different combinations of filters on four RAD data sets. At last, we stress the importance of publishing raw sequence data along with final filtered data sets in addition to detailed documentation of filtering steps and quality control measures.
Collapse
Affiliation(s)
- Shannon J O'Leary
- Department of Life Sciences, Texas A&M University - Corpus Christi, Texas
| | - Jonathan B Puritz
- Biological Sciences, University of Rhode Island, Kingston, Rhode Island
| | - Stuart C Willis
- Department of Life Sciences, Texas A&M University - Corpus Christi, Texas
- Department of Ichthyology, California Academy of Sciences, San Francisco, California
| | | | - David S Portnoy
- Department of Life Sciences, Texas A&M University - Corpus Christi, Texas
| |
Collapse
|
60
|
Luikart G, Kardos M, Hand BK, Rajora OP, Aitken SN, Hohenlohe PA. Population Genomics: Advancing Understanding of Nature. POPULATION GENOMICS 2018. [DOI: 10.1007/13836_2018_60] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|