1
|
Hemstrom W, Grummer JA, Luikart G, Christie MR. Next-generation data filtering in the genomics era. Nat Rev Genet 2024:10.1038/s41576-024-00738-6. [PMID: 38877133 DOI: 10.1038/s41576-024-00738-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 06/16/2024]
Abstract
Genomic data are ubiquitous across disciplines, from agriculture to biodiversity, ecology, evolution and human health. However, these datasets often contain noise or errors and are missing information that can affect the accuracy and reliability of subsequent computational analyses and conclusions. A key step in genomic data analysis is filtering - removing sequencing bases, reads, genetic variants and/or individuals from a dataset - to improve data quality for downstream analyses. Researchers are confronted with a multitude of choices when filtering genomic data; they must choose which filters to apply and select appropriate thresholds. To help usher in the next generation of genomic data filtering, we review and suggest best practices to improve the implementation, reproducibility and reporting standards for filter types and thresholds commonly applied to genomic datasets. We focus mainly on filters for minor allele frequency, missing data per individual or per locus, linkage disequilibrium and Hardy-Weinberg deviations. Using simulated and empirical datasets, we illustrate the large effects of different filtering thresholds on common population genetics statistics, such as Tajima's D value, population differentiation (FST), nucleotide diversity (π) and effective population size (Ne).
Collapse
Affiliation(s)
- William Hemstrom
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | - Jared A Grummer
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Gordon Luikart
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Mark R Christie
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
2
|
Arantes LS, Caccavo JA, Sullivan JK, Sparmann S, Mbedi S, Höner OP, Mazzoni CJ. Scaling-up RADseq methods for large datasets of non-invasive samples: Lessons for library construction and data preprocessing. Mol Ecol Resour 2023. [PMID: 37646753 DOI: 10.1111/1755-0998.13859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 08/12/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023]
Abstract
Genetic non-invasive sampling (gNIS) is a critical tool for population genetics studies, supporting conservation efforts while imposing minimal impacts on wildlife. However, gNIS often presents variable levels of DNA degradation and non-endogenous contamination, which can incur considerable processing costs. Furthermore, the use of restriction-site-associated DNA sequencing methods (RADseq) for assessing thousands of genetic markers introduces the challenge of obtaining large sets of shared loci with similar coverage across multiple individuals. Here, we present an approach to handling large-scale gNIS-based datasets using data from the spotted hyena population inhabiting the Ngorongoro Crater in Tanzania. We generated 3RADseq data for more than a thousand individuals, mostly from faecal mucus samples collected non-invasively and varying in DNA degradation and contamination level. Using small-scale sequencing, we screened samples for endogenous DNA content, removed highly contaminated samples, confirmed overlap fragment length between libraries, and balanced individual representation in a sequencing pool. We evaluated the impact of (1) DNA degradation and contamination of non-invasive samples, (2) PCR duplicates and (3) different SNP filters on genotype accuracy based on Mendelian error estimated for parent-offspring trio datasets. Our results showed that when balanced for sequencing depth, contaminated samples presented similar genotype error rates to those of non-contaminated samples. We also showed that PCR duplicates and different SNP filters impact genotype accuracy. In summary, we showed the potential of using gNIS for large-scale genetic monitoring based on SNPs and demonstrated how to improve control over library preparation by using a weighted re-pooling strategy that considers the endogenous DNA content.
Collapse
Affiliation(s)
- Larissa S Arantes
- Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany
- Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany
| | - Jilda A Caccavo
- Laboratoire des Sciences du Climat et de l'Environnement, LSCE/IPSL, CEA-CNRS-UVSQ, Université Paris-Saclay, Gif-sur-Yvette, France
- Laboratoire d'Océanographie et du Climat: Expérimentations et Approches Numériques, LOCEAN/IPSL, UPMC-CNRS-IRD-MNHN, Sorbonne Université, Paris, France
| | - James K Sullivan
- Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany
- Freie Universität, Berlin, Germany
| | - Sarah Sparmann
- Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany
- Leibniz-Institut für Gewässerökologie und Binnenfischerei (IGB), Berlin, Germany
| | - Susan Mbedi
- Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany
- Museum für Naturkunde, Berlin, Germany
| | - Oliver P Höner
- Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany
| | - Camila J Mazzoni
- Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany
- Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany
| |
Collapse
|
3
|
Ruperao P, Bajaj P, Subramani R, Yadav R, Reddy Lachagari VB, Lekkala SP, Rathore A, Archak S, Angadi UB, Singh R, Singh K, Mayes S, Rangan P. A pilot-scale comparison between single and double-digest RAD markers generated using GBS strategy in sesame (Sesamum indicum L.). PLoS One 2023; 18:e0286599. [PMID: 37267340 DOI: 10.1371/journal.pone.0286599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 05/19/2023] [Indexed: 06/04/2023] Open
Abstract
To reduce the genome sequence representation, restriction site-associated DNA sequencing (RAD-seq) protocols is being widely used either with single-digest or double-digest methods. In this study, we genotyped the sesame population (48 sample size) in a pilot scale to compare single and double-digest RAD-seq (sd and ddRAD-seq) methods. We analysed the resulting short-read data generated from both protocols and assessed their performance impacting the downstream analysis using various parameters. The distinct k-mer count and gene presence absence variation (PAV) showed a significant difference between the sesame samples studied. Additionally, the variant calling from both datasets (sdRAD-seq and ddRAD-seq) exhibits a significant difference between them. The combined variants from both datasets helped in identifying the most diverse samples and possible sub-groups in the sesame population. The most diverse samples identified from each analysis (k-mer, gene PAV, SNP count, Heterozygosity, NJ and PCA) can possibly be representative samples holding major diversity of the small sesame population used in this study. The best possible strategies with suggested inputs for modifications to utilize the RAD-seq strategy efficiently on a large dataset containing thousands of samples to be subjected to molecular analysis like diversity, population structure and core development studies were discussed.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Prasad Bajaj
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Rajkumar Subramani
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi, India
| | - Rashmi Yadav
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi, India
| | | | | | | | - Sunil Archak
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi, India
| | - Ulavappa B Angadi
- ICAR-Indian Agricultural Statistical Research Institute, New Delhi, India
| | - Rakesh Singh
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi, India
| | - Kuldeep Singh
- Genebank, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi, India
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Australia
| |
Collapse
|
4
|
Chambers EA, Tarvin RD, Santos JC, Ron SR, Betancourth-Cundar M, Hillis DM, Matz MV, Cannatella DC. 2b or not 2b? 2bRAD is an effective alternative to ddRAD for phylogenomics. Ecol Evol 2023; 13:e9842. [PMID: 36911313 PMCID: PMC9994478 DOI: 10.1002/ece3.9842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/02/2023] [Accepted: 02/03/2023] [Indexed: 03/10/2023] Open
Abstract
Restriction-site-associated DNA sequencing (RADseq) has become an accessible way to obtain genome-wide data in the form of single-nucleotide polymorphisms (SNPs) for phylogenetic inference. Nonetheless, how differences in RADseq methods influence phylogenetic estimation is poorly understood because most comparisons have largely relied on conceptual predictions rather than empirical tests. We examine how differences in ddRAD and 2bRAD data influence phylogenetic estimation in two non-model frog groups. We compare the impact of method choice on phylogenetic information, missing data, and allelic dropout, considering different sequencing depths. Given that researchers must balance input (funding, time) with output (amount and quality of data), we also provide comparisons of laboratory effort, computational time, monetary costs, and the repeatability of library preparation and sequencing. Both 2bRAD and ddRAD methods estimated well-supported trees, even at low sequencing depths, and had comparable amounts of missing data, patterns of allelic dropout, and phylogenetic signal. Compared to ddRAD, 2bRAD produced more repeatable datasets, had simpler laboratory protocols, and had an overall faster bioinformatics assembly. However, many fewer parsimony-informative sites per SNP were obtained from 2bRAD data when using native pipelines, highlighting a need for further investigation into the effects of each pipeline on resulting datasets. Our study underscores the importance of comparing RADseq methods, such as expected results and theoretical performance using empirical datasets, before undertaking costly experiments.
Collapse
Affiliation(s)
- E Anne Chambers
- Department of Integrative Biology and Biodiversity Center University of Texas at Austin Austin Texas USA.,Department of Environmental Science, Policy, and Management and Museum of Vertebrate Zoology University of California Berkeley Berkeley California USA
| | - Rebecca D Tarvin
- Department of Integrative Biology and Biodiversity Center University of Texas at Austin Austin Texas USA.,Department of Integrative Biology and Museum of Vertebrate Zoology University of California Berkeley Berkeley California USA
| | - Juan C Santos
- Department of Biological Sciences St John's University New York New York USA
| | - Santiago R Ron
- Museo de Zoología, Escuela de Ciencias Biológicas Pontificia Universidad Católica del Ecuador Quito Ecuador
| | | | - David M Hillis
- Department of Integrative Biology and Biodiversity Center University of Texas at Austin Austin Texas USA
| | - Mikhail V Matz
- Department of Integrative Biology and Biodiversity Center University of Texas at Austin Austin Texas USA
| | - David C Cannatella
- Department of Integrative Biology and Biodiversity Center University of Texas at Austin Austin Texas USA
| |
Collapse
|
5
|
Meuser AV, Pyne CB, Mandeville EG. Limited evidence of a genetic basis for sex determination in the common creek chub, Semotilus atromaculatus. J Evol Biol 2022; 35:1635-1645. [PMID: 35411987 DOI: 10.1111/jeb.14006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 03/15/2022] [Indexed: 12/16/2022]
Abstract
Sexual reproduction is almost universal in vertebrates; therefore, each animal species which uses it must have a mechanism for designating sex as male or female. Fish, especially, have a wide range of sex determining systems. In the present study, we aimed to identify a genetic basis for sex determination in the common creek chub (Semotilus atromaculatus) using genotyping-by-sequencing data. No sex-associated markers were found by RADSex or a GWAS using GEMMA; however, Weir and Cockerham locus-specific FST analysis and discriminant analysis of principal components revealed genetic differentiation between the sexes at several loci. While no explicit sex determination mechanism has been yet discovered in creek chub, these loci are potential candidates for future studies. Incompatible systems are thought to increase reproductive isolation but interspecific hybridization is common among groups such as cyprinid minnows; thus, studies such as ours can provide insight into hybridization and evolutionary diversification of this clade. We also highlight technical challenges involved in studying sex determination in evolutionary groups with extremely variable mechanisms and without heteromorphic sex chromosomes.
Collapse
Affiliation(s)
- Amanda V Meuser
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada.,Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada
| | - Cassandre B Pyne
- Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada
| | | |
Collapse
|
6
|
Lotterhos KE, Fitzpatrick MC, Blackmon H. Simulation Tests of Methods in Evolution, Ecology, and Systematics: Pitfalls, Progress, and Principles. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2022; 53:113-136. [PMID: 38107485 PMCID: PMC10723108 DOI: 10.1146/annurev-ecolsys-102320-093722] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Complex statistical methods are continuously developed across the fields of ecology, evolution, and systematics (EES). These fields, however, lack standardized principles for evaluating methods, which has led to high variability in the rigor with which methods are tested, a lack of clarity regarding their limitations, and the potential for misapplication. In this review, we illustrate the common pitfalls of method evaluations in EES, the advantages of testing methods with simulated data, and best practices for method evaluations. We highlight the difference between method evaluation and validation and review how simulations, when appropriately designed, can refine the domain in which a method can be reliably applied. We also discuss the strengths and limitations of different evaluation metrics. The potential for misapplication of methods would be greatly reduced if funding agencies, reviewers, and journals required principled method evaluation.
Collapse
Affiliation(s)
- Katie E Lotterhos
- Department of Marine and Environmental Sciences, Northeastern University, Nahant, Massachusetts, USA
| | - Matthew C Fitzpatrick
- Appalachian Lab, University of Maryland Center for Environmental Science, Frostburg, Maryland, USA
| | - Heath Blackmon
- Department of Biology, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
7
|
Zhou W, Jenny Xiang QY. Phylogenomics and Biogeography of Castanea (Chestnut) and Hamamelis (Witch-hazel) - Choosing between RAD-seq and Hyb-Seq Approaches. Mol Phylogenet Evol 2022; 176:107592. [DOI: 10.1016/j.ympev.2022.107592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 06/18/2022] [Accepted: 07/20/2022] [Indexed: 10/31/2022]
|
8
|
Gileta AF, Gao J, Chitre AS, Bimschleger HV, St Pierre CL, Gopalakrishnan S, Palmer AA. Adapting Genotyping-by-Sequencing and Variant Calling for Heterogeneous Stock Rats. G3 (BETHESDA, MD.) 2020; 10:2195-2205. [PMID: 32398234 PMCID: PMC7341140 DOI: 10.1534/g3.120.401325] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 05/01/2020] [Indexed: 02/06/2023]
Abstract
The heterogeneous stock (HS) is an outbred rat population derived from eight inbred rat strains. HS rats are ideally suited for genome wide association studies; however, only a few genotyping microarrays have ever been designed for rats and none of them are currently in production. To address the need for an efficient and cost effective method of genotyping HS rats, we have adapted genotype-by-sequencing (GBS) to obtain genotype information at large numbers of single nucleotide polymorphisms (SNPs). In this paper, we have outlined the laboratory and computational steps we took to optimize double digest genotype-by-sequencing (ddGBS) for use in rats. We evaluated multiple existing computational tools and explain the workflow we have used to call and impute over 3.7 million SNPs. We have also compared various rat genetic maps, which are necessary for imputation, including a recently developed map specific to the HS. Using our approach, we obtained concordance rates of 99% with data obtained using data from a genotyping array. The principles and computational pipeline that we describe could easily be adapted for use in other species for which reliable reference genome sets are available.
Collapse
Affiliation(s)
- Alexander F Gileta
- Department of Psychiatry
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California, 92093
| | | | | | | | | | - Shyam Gopalakrishnan
- Department of Human Genetics, University of Chicago, Chicago, Illinois, 60637, and
| | - Abraham A Palmer
- Department of Psychiatry,
- Natural History Museum of Denmark, University of Copenhagen, 2200 København N, Denmark
| |
Collapse
|
9
|
Graham CF, Boreham DR, Manzon RG, Stott W, Wilson JY, Somers CM. How "simple" methodological decisions affect interpretation of population structure based on reduced representation library DNA sequencing: A case study using the lake whitefish. PLoS One 2020; 15:e0226608. [PMID: 31978053 PMCID: PMC6980518 DOI: 10.1371/journal.pone.0226608] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 12/01/2019] [Indexed: 12/30/2022] Open
Abstract
Reduced representation (RRL) sequencing approaches (e.g., RADSeq, genotyping by sequencing) require decisions about how much to invest in genome coverage and sequencing depth, as well as choices of values for adjustable bioinformatics parameters. To empirically explore the importance of these “simple” methodological decisions, we generated two independent sequencing libraries for the same 142 individual lake whitefish (Coregonus clupeaformis) using a nextRAD RRL approach: (1) a larger number of loci at low sequencing depth based on a 9mer (library A); and (2) fewer loci at higher sequencing depth based on a 10mer (library B). The fish were selected from populations with different levels of expected genetic subdivision. Each library was analyzed using the STACKS pipeline followed by three types of population structure assessment (FST, DAPC and ADMIXTURE) with iterative increases in the stringency of sequencing depth and missing data requirements, as well as more specific a priori population maps. Library B was always able to resolve strong population differentiation in all three types of assessment regardless of the selected parameters, largely due to retention of more loci in analyses. In contrast, library A produced more variable results; increasing the minimum sequencing depth threshold (-m) resulted in a reduced number of retained loci, and therefore lost resolution at high -m values for FST and ADMIXTURE, but not DAPC. When detecting fine population differentiation, the population map influenced the number of loci and missing data, which generated artefacts in all downstream analyses tested. Similarly, when examining fine scale population subdivision, library B was robust to changing parameters but library A lost resolution depending on the parameter set. We used library B to examine actual subdivision in our study populations. All three types of analysis found complete subdivision among populations in Lake Huron, ON and Dore Lake, SK, Canada using 10,640 SNP loci. Weak population subdivision was detected in Lake Huron with fish from sites in the north-west, Search Bay, North Point and Hammond Bay, showing slight differentiation. Overall, we show that apparently simple decisions about library construction and bioinformatics parameters can have important impacts on the interpretation of population subdivision. Although potentially more costly on a per-locus basis, early investment in striking a balance between the number of loci and sequencing effort is well worth the reduced genomic coverage for population genetics studies. More conservative stringency settings on STACKS parameters lead to a final dataset that was more consistent and robust when examining both weak and strong population differentiation. Overall, we recommend that researchers approach “simple” methodological decisions with caution, especially when working on non-model species for the first time.
Collapse
Affiliation(s)
- Carly F. Graham
- Department of Biology, University of Regina, Regina, Saskatchewan, Canada
| | - Douglas R. Boreham
- Medical Sciences, Northern Ontario School of Medicine, Greater Sudbury, Ontario, Canada
| | - Richard G. Manzon
- Department of Biology, University of Regina, Regina, Saskatchewan, Canada
| | - Wendylee Stott
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan, USA
| | - Joanna Y. Wilson
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | | |
Collapse
|
10
|
LaCava MEF, Aikens EO, Megna LC, Randolph G, Hubbard C, Buerkle CA. Accuracy of de novo assembly of DNA sequences from double-digest libraries varies substantially among software. Mol Ecol Resour 2019; 20:360-370. [PMID: 31665547 DOI: 10.1111/1755-0998.13108] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 10/21/2019] [Accepted: 10/23/2019] [Indexed: 11/29/2022]
Abstract
Advances in DNA sequencing have made it feasible to gather genomic data for non-model organisms and large sets of individuals, often using methods for sequencing subsets of the genome. Several of these methods sequence DNA associated with endonuclease restriction sites (various RAD and GBS methods). For use in taxa without a reference genome, these methods rely on de novo assembly of fragments in the sequencing library. Many of the software options available for this application were originally developed for other assembly types and we do not know their accuracy for reduced representation libraries. To address this important knowledge gap, we simulated data from the Arabidopsis thaliana and Homo sapiens genomes and compared de novo assemblies by six software programs that are commonly used or promising for this purpose (ABySS, CD-HIT, Stacks, Stacks2, Velvet and VSEARCH). We simulated different mutation rates and types of mutations, and then applied the six assemblers to the simulated data sets, varying assembly parameters. We found substantial variation in software performance across simulations and parameter settings. ABySS failed to recover any true genome fragments, and Velvet and VSEARCH performed poorly for most simulations. Stacks and Stacks2 produced accurate assemblies of simulations containing SNPs, but the addition of insertion and deletion mutations decreased their performance. CD-HIT was the only assembler that consistently recovered a high proportion of true genome fragments. Here, we demonstrate the substantial difference in the accuracy of assemblies from different software programs and the importance of comparing assemblies that result from different parameter settings.
Collapse
Affiliation(s)
- Melanie E F LaCava
- Program in Ecology, University of Wyoming, Laramie, WY, USA.,Wildlife Genomics and Disease Ecology Laboratory, Department of Veterinary Sciences, University of Wyoming, Laramie, WY, USA
| | - Ellen O Aikens
- Program in Ecology, University of Wyoming, Laramie, WY, USA.,Wyoming Cooperative Fish and Wildlife Research Unit, Department of Zoology and Physiology, University of Wyoming, Laramie, WY, USA
| | - Libby C Megna
- Program in Ecology, University of Wyoming, Laramie, WY, USA.,Department of Zoology and Physiology, University of Wyoming, Laramie, WY, USA
| | - Gregg Randolph
- Genome Technologies Lab, University of Wyoming, Laramie, WY, USA
| | - Charley Hubbard
- Program in Ecology, University of Wyoming, Laramie, WY, USA.,Department of Botany, University of Wyoming, Laramie, WY, USA
| | - C Alex Buerkle
- Program in Ecology, University of Wyoming, Laramie, WY, USA.,Department of Botany, University of Wyoming, Laramie, WY, USA
| |
Collapse
|
11
|
Euclide PT, McKinney GJ, Bootsma M, Tarsa C, Meek MH, Larson WA. Attack of the PCR clones: Rates of clonality have little effect on RAD‐seq genotype calls. Mol Ecol Resour 2019; 20:66-78. [DOI: 10.1111/1755-0998.13087] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 08/12/2019] [Accepted: 08/16/2019] [Indexed: 12/11/2022]
Affiliation(s)
- Peter T. Euclide
- Wisconsin Cooperative Fishery Research Unit College of Natural Resources University of Wisconsin‐Stevens Point Stevens Point WI USA
| | - Garrett J. McKinney
- School of Aquatic and Fishery Sciences University of Washington Seattle WA USA
| | - Matthew Bootsma
- Wisconsin Cooperative Fishery Research Unit College of Natural Resources University of Wisconsin‐Stevens Point Stevens Point WI USA
| | - Charlene Tarsa
- Department of Integrative Biology and AgBio Research Michigan State University East Lansing MI USA
| | - Mariah H. Meek
- Department of Integrative Biology and AgBio Research Michigan State University East Lansing MI USA
| | - Wesley A. Larson
- U.S. Geological Survey Wisconsin Cooperative Fishery Research Unit College of Natural Resources University of Wisconsin‐Stevens Point Stevens Point WI USA
| |
Collapse
|
12
|
Species-diagnostic SNP markers for the black basses (Micropterus spp.): a new tool for black bass conservation and management. CONSERV GENET RESOUR 2019. [DOI: 10.1007/s12686-019-01109-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
13
|
Wright BR, Grueber CE, Lott MJ, Belov K, Johnson RN, Hogg CJ. Impact of reduced-representation sequencing protocols on detecting population structure in a threatened marsupial. Mol Biol Rep 2019; 46:5575-5580. [DOI: 10.1007/s11033-019-04966-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 07/03/2019] [Indexed: 10/26/2022]
|
14
|
Flanagan SP, Jones AG. The future of parentage analysis: From microsatellites to SNPs and beyond. Mol Ecol 2019; 28:544-567. [PMID: 30575167 DOI: 10.1111/mec.14988] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 11/30/2018] [Accepted: 12/03/2018] [Indexed: 12/14/2022]
Abstract
Parentage analysis is a cornerstone of molecular ecology that has delivered fundamental insights into behaviour, ecology and evolution. Microsatellite markers have long been the king of parentage, their hypervariable nature conferring sufficient power to correctly assign offspring to parents. However, microsatellite markers have seen a sharp decline in use with the rise of next-generation sequencing technologies, especially in the study of population genetics and local adaptation. The time is ripe to review the current state of parentage analysis and see how it stands to be affected by the emergence of next-generation sequencing approaches. We find that single nucleotide polymorphisms (SNPs), the typical next-generation sequencing marker, remain underutilized in parentage analysis but are gaining momentum, with 58 SNP-based parentage analyses published thus far. Many of these papers, particularly the earlier ones, compare the power of SNPs and microsatellites in a parentage context. In virtually every case, SNPs are at least as powerful as microsatellite markers. As few as 100-500 SNPs are sufficient to resolve parentage completely in most situations. We also provide an overview of the analytical programs that are commonly used and compatible with SNP data. As the next-generation parentage enterprise grows, a reliance on likelihood and Bayesian approaches, as opposed to strict exclusion, will become increasingly important. We discuss some of the caveats surrounding the use of next-generation sequencing data for parentage analysis and conclude that the future is bright for this important realm of molecular ecology.
Collapse
Affiliation(s)
- Sarah P Flanagan
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Adam G Jones
- Department of Biological Sciences, University of Idaho, Moscow, Idaho
| |
Collapse
|
15
|
Jensen EL, Edwards DL, Garrick RC, Miller JM, Gibbs JP, Cayot LJ, Tapia W, Caccone A, Russello MA. Population genomics through time provides insights into the consequences of decline and rapid demographic recovery through head-starting in a Galapagos giant tortoise. Evol Appl 2018; 11:1811-1821. [PMID: 30459831 PMCID: PMC6231475 DOI: 10.1111/eva.12682] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 07/09/2018] [Accepted: 07/16/2018] [Indexed: 12/26/2022] Open
Abstract
Population genetic theory related to the consequences of rapid population decline is well-developed, but there are very few empirical studies where sampling was conducted before and after a known bottleneck event. Such knowledge is of particular importance for species restoration, given links between genetic diversity and the probability of long-term persistence. To directly evaluate the relationship between current genetic diversity and past demographic events, we collected genome-wide single nucleotide polymorphism data from prebottleneck historical (c.1906) and postbottleneck contemporary (c.2014) samples of Pinzón giant tortoises (Chelonoidis duncanensis; n = 25 and 149 individuals, respectively) endemic to a single island in the Galapagos. Pinzón giant tortoises had a historically large population size that was reduced to just 150-200 individuals in the mid 20th century. Since then, Pinzón's tortoise population has recovered through an ex situ head-start programme in which eggs or pre-emergent individuals were collected from natural nests on the island, reared ex situ in captivity until they were 4-5 years old and subsequently repatriated. We found that the extent and distribution of genetic variation in the historical and contemporary samples were very similar, with the latter group not exhibiting the characteristic genetic patterns of recent population decline. No population structure was detected either spatially or temporally. We estimated an effective population size (N e) of 58 (95% CI = 50-69) for the postbottleneck population; no prebottleneck N e point estimate was attainable (95% CI = 39-infinity) likely due to the sample size being lower than the true N e. Overall, the historical sample provided a valuable benchmark for evaluating the head-start captive breeding programme, revealing high retention of genetic variation and no skew in representation despite the documented bottleneck event. Moreover, this work demonstrates the effectiveness of head-starting in rescuing the Pinzón giant tortoise from almost certain extinction.
Collapse
Affiliation(s)
- Evelyn L. Jensen
- Department of BiologyUniversity of British Columbia OkanaganKelownaBritish ColumbiaCanada
- Present address:
Department of BiologyQueen's UniversityKingstonOntarioCanada
| | | | - Ryan C. Garrick
- Department of BiologyUniversity of MississippiOxfordMississippi
| | - Joshua M. Miller
- Department of Ecology and Evolutionary BiologyYale UniversityNew HavenConnecticut
| | - James P. Gibbs
- College of Environmental Science and ForestryState University of New YorkSyracuseNew York
| | | | - Washington Tapia
- Department of Applied ResearchGalapagos National Park ServicePuerto AyoraEcuador
- Galapagos ConservancySanta CruzEcuador
| | - Adalgisa Caccone
- Department of Ecology and Evolutionary BiologyYale UniversityNew HavenConnecticut
| | - Michael A. Russello
- Department of BiologyUniversity of British Columbia OkanaganKelownaBritish ColumbiaCanada
| |
Collapse
|
16
|
Campbell EO, Brunet BMT, Dupuis JR, Sperling FAH. Would an
RRS
by any other name sound as
RAD
? Methods Ecol Evol 2018. [DOI: 10.1111/2041-210x.13038] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Erin O. Campbell
- Department of Biological SciencesCW405 Biosciences CentreUniversity of Alberta Edmonton Alberta Canada
| | - Bryan M. T. Brunet
- Department of Biological SciencesCW405 Biosciences CentreUniversity of Alberta Edmonton Alberta Canada
| | - Julian R. Dupuis
- Department of Plant and Environmental Protection SciencesUniversity of Hawai'i at Mãnoa Honolulu Hawai'i
| | - Felix A. H. Sperling
- Department of Biological SciencesCW405 Biosciences CentreUniversity of Alberta Edmonton Alberta Canada
| |
Collapse
|