1
|
Lajmi A, Glinka F, Privman E. Optimizing ddRAD sequencing for population genomic studies with ddgRADer. Mol Ecol Resour 2023. [PMID: 37732396 DOI: 10.1111/1755-0998.13870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 08/04/2023] [Accepted: 08/31/2023] [Indexed: 09/22/2023]
Abstract
Double-digest Restriction-site Associated DNA sequencing (ddRADseq) is widely used to generate genomic data for non-model organisms in evolutionary and ecological studies. Along with affordable paired-end sequencing, this method makes population genomic analyses more accessible. However, multiple factors should be considered when designing a ddRADseq experiment, which can be challenging for new users. The generated data often suffer from substantial read overlaps and adaptor contamination, severely reducing sequencing efficiency and affecting data quality. Here, we analyse diverse datasets from the literature and carry out controlled experiments to understand the effects of enzyme choice and size selection on sequencing efficiency. The empirical data reveal that size selection is imprecise and has limited efficacy. In certain scenarios, a substantial proportion of short fragments pass below the lower size-selection cut-off resulting in low sequencing efficiency. However, enzyme choice can considerably mitigate inadvertent inclusion of these shorter fragments. A simple model based on these experiments is implemented to predict the number of genomic fragments generated after digestion and size selection, number of SNPs genotyped, number of samples that can be multiplexed and the expected sequencing efficiency. We developed ddgRADer - http://ddgrader.haifa.ac.il/ - a user-friendly webtool and incorporated these calculations to aid in ddRADseq experimental design while optimizing sequencing efficiency. This tool can also be used for single enzyme protocols such as Genotyping-by-Sequencing. Given user-defined study goals, ddgRADer recommends enzyme pairs and allows users to compare and choose enzymes and size-selection criteria. ddgRADer improves the accessibility and ease of designing ddRADseq experiments and increases the probability of success of the first population genomic study conducted in labs with no prior experience in genomics.
Collapse
Affiliation(s)
- Aparna Lajmi
- Department of Evolutionary and Environmental Biology, Institute of Evolution, University of Haifa, Haifa, Israel
| | - Felix Glinka
- Department of Evolutionary and Environmental Biology, Institute of Evolution, University of Haifa, Haifa, Israel
| | - Eyal Privman
- Department of Evolutionary and Environmental Biology, Institute of Evolution, University of Haifa, Haifa, Israel
| |
Collapse
|
2
|
Narváez-Barandica JC, Quintero-Galvis JF, Aguirre-Pabón JC, Castro LR, Betancur R, Acero Pizarro A. A Comparative Phylogeography of Three Marine Species with Different PLD Modes Reveals Two Genetic Breaks across the Southern Caribbean Sea. Animals (Basel) 2023; 13:2528. [PMID: 37570336 PMCID: PMC10417521 DOI: 10.3390/ani13152528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/02/2023] [Accepted: 07/19/2023] [Indexed: 08/13/2023] Open
Abstract
The comparative phylogeography of marine species with contrasting dispersal potential across the southern Caribbean Sea was evaluated by the presence of two putative barriers: the Magdalena River plume (MRP) and the combination of the absence of a rocky bottom and the almost permanent upwelling in the La Guajira Peninsula (ARB + PUG). Three species with varying biological and ecological characteristics (i.e., dispersal potentials) that inhabit shallow rocky bottoms were selected: Cittarium pica (PLD < 6 days), Acanthemblemaria rivasi (PLD < 22 days), and Nerita tessellata (PLD > 60 days). We generated a set of SNPs for the three species using the ddRad-seq technique. Samples of each species were collected in five locations from Capurganá to La Guajira. For the first time, evidence of a phylogeographic break caused by the MRP is provided, mainly for A. rivasi (AMOVA: ΦCT = 0.420). The ARB + PUG barrier causes another break for A. rivasi (ΦCT = 0.406) and C. pica (ΦCT = 0.224). Three populations (K = 3) were identified for A. rivasi and C. pica, while N. tessellata presented one population (K = 1). The Mantel correlogram indicated that A. rivasi and C. pica fit the hierarchical population model, and only the A. rivasi and C. pica comparisons showed phylogeographic congruence. Our results demonstrate how the biological traits of these three species and the biogeographic barriers have influenced their phylogeographic structure.
Collapse
Affiliation(s)
- Juan Carlos Narváez-Barandica
- Centro de Genética y Biología Molecular, Universidad del Magdalena, Carrera 32 No 22–08, Santa Marta 470004, Colombia; (J.C.A.-P.); (L.R.C.)
| | - Julián F. Quintero-Galvis
- Instituto de Ciencias Ambientales y Evolutivas, Facultad de Ciencias, Universidad Austral de Chile, Valdivia 5110566, Chile;
| | - Juan Carlos Aguirre-Pabón
- Centro de Genética y Biología Molecular, Universidad del Magdalena, Carrera 32 No 22–08, Santa Marta 470004, Colombia; (J.C.A.-P.); (L.R.C.)
| | - Lyda R. Castro
- Centro de Genética y Biología Molecular, Universidad del Magdalena, Carrera 32 No 22–08, Santa Marta 470004, Colombia; (J.C.A.-P.); (L.R.C.)
| | - Ricardo Betancur
- Biology Department, University of Oklahoma, Norman, OK 73019, USA;
| | - Arturo Acero Pizarro
- Instituto de Estudios en Ciencias del Mar (CECIMAR), Universidad Nacional de Colombia sede Caribe, Santa Marta 470006, Colombia;
| |
Collapse
|
3
|
Gregorio Martínez J, David Rangel-Medrano J, Johanna Yepes-Acevedo A, Restrepo-Escobar N, Judith Márquez E. Species limits and introgression in Pimelodus from the Magdalena-Cauca River basin. Mol Phylogenet Evol 2022; 173:107517. [DOI: 10.1016/j.ympev.2022.107517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 03/20/2022] [Accepted: 04/05/2022] [Indexed: 11/26/2022]
|
4
|
Parra-Salazar A, Gomez J, Lozano-Arce D, Reyes-Herrera PH, Duitama J. Robust and efficient software for reference-free genomic diversity analysis of genotyping-by-sequencing data on diploid and polyploid species. Mol Ecol Resour 2021; 22:439-454. [PMID: 34288487 DOI: 10.1111/1755-0998.13477] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 07/08/2021] [Accepted: 07/13/2021] [Indexed: 12/14/2022]
Abstract
Genotyping-by-sequencing (GBS) is a widely used and cost-effective technique for obtaining large numbers of genetic markers from populations by sequencing regions adjacent to restriction cut sites. Although a standard reference-based pipeline can be followed to analyse GBS reads, a reference genome is still not available for a large number of species. Hence, reference-free approaches are required to generate the genetic variability information that can be obtained from a GBS experiment. Unfortunately, available tools to perform de novo analysis of GBS reads face issues of usability, accuracy and performance. Furthermore, few available tools are suitable for analysing data sets from polyploid species. In this manuscript, we describe a novel algorithm to perform reference-free variant detection and genotyping from GBS reads. Nonexact searches on a dynamic hash table of consensus sequences allow for efficient read clustering and sorting. This algorithm was integrated in the Next Generation Sequencing Experience Platform (NGSEP) to integrate the state-of-the-art variant detector already implemented in this tool. We performed benchmark experiments with three different empirical data sets of plants and animals with different population structures and ploidies, and sequenced with different GBS protocols at different read depths. These experiments show that NGSEP has comparable and in some cases better accuracy and always better computational efficiency compared to existing solutions. We expect that this new development will be useful for many research groups conducting population genetic studies in a wide variety of species.
Collapse
Affiliation(s)
- Andrea Parra-Salazar
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Jorge Gomez
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Daniela Lozano-Arce
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | | | - Jorge Duitama
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| |
Collapse
|
5
|
Hohenlohe PA, Funk WC, Rajora OP. Population genomics for wildlife conservation and management. Mol Ecol 2020; 30:62-82. [PMID: 33145846 PMCID: PMC7894518 DOI: 10.1111/mec.15720] [Citation(s) in RCA: 162] [Impact Index Per Article: 40.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 10/02/2020] [Accepted: 10/29/2020] [Indexed: 12/21/2022]
Abstract
Biodiversity is under threat worldwide. Over the past decade, the field of population genomics has developed across nonmodel organisms, and the results of this research have begun to be applied in conservation and management of wildlife species. Genomics tools can provide precise estimates of basic features of wildlife populations, such as effective population size, inbreeding, demographic history and population structure, that are critical for conservation efforts. Moreover, population genomics studies can identify particular genetic loci and variants responsible for inbreeding depression or adaptation to changing environments, allowing for conservation efforts to estimate the capacity of populations to evolve and adapt in response to environmental change and to manage for adaptive variation. While connections from basic research to applied wildlife conservation have been slow to develop, these connections are increasingly strengthening. Here we review the primary areas in which population genomics approaches can be applied to wildlife conservation and management, highlight examples of how they have been used, and provide recommendations for building on the progress that has been made in this field.
Collapse
Affiliation(s)
- Paul A Hohenlohe
- Department of Biological Sciences and Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, Idaho, USA
| | - W Chris Funk
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, Colorado, USA
| | - Om P Rajora
- Faculty of Forestry and Environmental Management, University of New Brunswick, Fredericton, New Brunswick, Canada
| |
Collapse
|
6
|
LaCava MEF, Aikens EO, Megna LC, Randolph G, Hubbard C, Buerkle CA. Accuracy of de novo assembly of DNA sequences from double-digest libraries varies substantially among software. Mol Ecol Resour 2019; 20:360-370. [PMID: 31665547 DOI: 10.1111/1755-0998.13108] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 10/21/2019] [Accepted: 10/23/2019] [Indexed: 11/29/2022]
Abstract
Advances in DNA sequencing have made it feasible to gather genomic data for non-model organisms and large sets of individuals, often using methods for sequencing subsets of the genome. Several of these methods sequence DNA associated with endonuclease restriction sites (various RAD and GBS methods). For use in taxa without a reference genome, these methods rely on de novo assembly of fragments in the sequencing library. Many of the software options available for this application were originally developed for other assembly types and we do not know their accuracy for reduced representation libraries. To address this important knowledge gap, we simulated data from the Arabidopsis thaliana and Homo sapiens genomes and compared de novo assemblies by six software programs that are commonly used or promising for this purpose (ABySS, CD-HIT, Stacks, Stacks2, Velvet and VSEARCH). We simulated different mutation rates and types of mutations, and then applied the six assemblers to the simulated data sets, varying assembly parameters. We found substantial variation in software performance across simulations and parameter settings. ABySS failed to recover any true genome fragments, and Velvet and VSEARCH performed poorly for most simulations. Stacks and Stacks2 produced accurate assemblies of simulations containing SNPs, but the addition of insertion and deletion mutations decreased their performance. CD-HIT was the only assembler that consistently recovered a high proportion of true genome fragments. Here, we demonstrate the substantial difference in the accuracy of assemblies from different software programs and the importance of comparing assemblies that result from different parameter settings.
Collapse
Affiliation(s)
- Melanie E F LaCava
- Program in Ecology, University of Wyoming, Laramie, WY, USA.,Wildlife Genomics and Disease Ecology Laboratory, Department of Veterinary Sciences, University of Wyoming, Laramie, WY, USA
| | - Ellen O Aikens
- Program in Ecology, University of Wyoming, Laramie, WY, USA.,Wyoming Cooperative Fish and Wildlife Research Unit, Department of Zoology and Physiology, University of Wyoming, Laramie, WY, USA
| | - Libby C Megna
- Program in Ecology, University of Wyoming, Laramie, WY, USA.,Department of Zoology and Physiology, University of Wyoming, Laramie, WY, USA
| | - Gregg Randolph
- Genome Technologies Lab, University of Wyoming, Laramie, WY, USA
| | - Charley Hubbard
- Program in Ecology, University of Wyoming, Laramie, WY, USA.,Department of Botany, University of Wyoming, Laramie, WY, USA
| | - C Alex Buerkle
- Program in Ecology, University of Wyoming, Laramie, WY, USA.,Department of Botany, University of Wyoming, Laramie, WY, USA
| |
Collapse
|
7
|
Díaz-Arce N, Rodríguez-Ezpeleta N. Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better? Front Genet 2019; 10:533. [PMID: 31191624 PMCID: PMC6549478 DOI: 10.3389/fgene.2019.00533] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 05/16/2019] [Indexed: 11/25/2022] Open
Abstract
Restriction site-associated DNA sequencing (RAD-seq) has become a powerful and widely used tool in molecular ecology studies as it allows to cost-effectively recover thousands of polymorphic sites across individuals of non-model organisms. However, its successful implementation in population genetics relies on correct data processing that would minimize potential loci-assembly biases and consequent genotyping error rates. RAD-seq data processing when no reference genome is available involves the assembly of hundreds of thousands high-throughput sequencing reads into orthologous loci, for which various key parameter values need to be selected by the researcher. Previous studies exploring the effect of these parameter values found or assumed that a larger number of recovered polymorphic loci is associated with a better assembly. Here, using three RAD-seq datasets from different species, we explore the effect of read filtering, loci assembly and polymorphic site selection on number of markers obtained and genetic differentiation inferred using the Stacks software. We find (i) that recovery of higher numbers of polymorphic loci is not necessarily associated with higher genetic differentiation, (ii) that the presence of PCR duplicates, selected loci assembly parameters and selected SNP filtering parameters affect the number of recovered polymorphic loci and degree of genetic differentiation, and (iii) that this effect is different in each dataset, meaning that defining a systematic universal protocol for RAD-seq data analysis may lead to missing relevant information about population differentiation.
Collapse
|