1
|
Lorenzana GP, Figueiró HV, Coutinho LL, Villela PMS, Eizirik E. Comparative assessment of genotyping-by-sequencing and whole-exome sequencing for estimating genetic diversity and geographic structure in small sample sizes: insights from wild jaguar populations. Genetica 2024; 152:133-144. [PMID: 39322785 DOI: 10.1007/s10709-024-00212-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 09/12/2024] [Indexed: 09/27/2024]
Abstract
Biologists currently have an assortment of high-throughput sequencing techniques allowing the study of population dynamics in increasing detail. The utility of genetic estimates depends on their ability to recover meaningful approximations while filtering out noise produced by artifacts. In this study, we empirically compared the congruence of two reduced representation approaches (genotyping-by-sequencing, GBS, and whole-exome sequencing, WES) in estimating genetic diversity and population structure using SNP markers typed in a small number of wild jaguar (Panthera onca) samples from South America. Due to its targeted nature, WES allowed for a more straightforward reconstruction of loci compared to GBS, facilitating the identification of true polymorphisms across individuals. We therefore used WES-derived metrics as a benchmark against which GBS-derived indicators were compared, adjusting parameters for locus assembly and SNP filtering in the latter. We observed significant variation in SNP call rates across samples in GBS datasets, leading to a recurrent miscalling of heterozygous sites. This issue was further amplified by small sample sizes, ultimately impacting the consistency of summary statistics between genotyping methods. Recognizing that the genetic markers obtained from GBS and WES are intrinsically different due to varying evolutionary pressures, particularly selection, we consider that our empirical comparison offers valuable insights and highlights critical considerations for estimating population genetic attributes using reduced representation datasets. Our results emphasize the critical need for careful evaluation of missing data and stringent filtering to achieve reliable estimates of genetic diversity and differentiation in elusive wildlife species.
Collapse
Affiliation(s)
- Gustavo P Lorenzana
- Laboratório de Biologia Genômica e Molecular, Escola de Ciências da Saúde e da Vida, PUCRS, Porto Alegre, Brazil.
- School of Forestry, Northern Arizona University, Flagstaff, AZ, USA.
| | - Henrique V Figueiró
- Laboratório de Biologia Genômica e Molecular, Escola de Ciências da Saúde e da Vida, PUCRS, Porto Alegre, Brazil
- Environmental Genomics Group, Vale Institute of Technology, Belem, Brazil
| | | | - Priscilla M S Villela
- Centro de Genômica Funcional, ESALQ-USP, Piracicaba, Brazil
- EcoMol Consultoria e Projetos, Piracicaba, Brazil
| | - Eduardo Eizirik
- Laboratório de Biologia Genômica e Molecular, Escola de Ciências da Saúde e da Vida, PUCRS, Porto Alegre, Brazil
- Instituto Pró-Carnívoros, Atibaia, Brazil
| |
Collapse
|
2
|
Mussmann SM. Assembly and annotation of a chromosome-level reference genome for the endangered Colorado pikeminnow (Ptychocheilus lucius). G3 (BETHESDA, MD.) 2024; 14:jkae217. [PMID: 39268723 PMCID: PMC11540322 DOI: 10.1093/g3journal/jkae217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 09/02/2024] [Indexed: 09/15/2024]
Abstract
Advancements in genome sequencing technology have brought unprecedented accessibility of high-throughput sequencing to species of conservation interest. The potential knowledge gained from application of these techniques is maximized by availability of high-quality, annotated reference genomes for endangered species. However, these vital resources are often lacking for endangered minnows of North America (Cypriniformes: Leuciscidae). One such endangered species, Colorado pikeminnow (Ptychocheilus lucius), is the largest North American minnow and the top-level native aquatic predator in the Colorado River Basin of the southwestern United States and northwestern Mexico. Over the past century, Colorado pikeminnow has suffered habitat loss and population declines due to anthropogenic habitat modifications and invasive species introductions. The lack of genetic resources for Colorado pikeminnow has hindered conservation genomic study of this unique organism. This study seeks to remedy this issue by presenting a high-quality reference genome for Colorado pikeminnow developed from Pacific Biosciences HiFi sequencing and Hi-C scaffolding. The final assembly was a 1.1 Gb genome comprised of 305 contigs including 25 chromosome-sized scaffolds. Measures of quality, contiguity, and completeness met or exceeded those observed for Danio rerio (Danionidae) and 2 other Colorado River Basin leuciscids (Meda fulgida and Tiaroga cobitis). Comparative genomic analyses identified enrichment of gene families for growth, development, immune activity, and gene transcription; all of which are important for a large-bodied piscivorous fish living in a dynamic environment. This reference genome will provide a basis for important conservation genomic study of Colorado pikeminnow and help efforts to better understand the evolution of desert fishes.
Collapse
Affiliation(s)
- Steven M Mussmann
- Southwestern Native Aquatic Resources and Recovery Center, U.S. Fish and Wildlife Service, 7116 Hatchery Road, Dexter, NM 88230, USA
| |
Collapse
|
3
|
Aguirre NC, Villalba PV, García MN, Filippi CV, Rivas JG, Martínez MC, Acuña CV, López AJ, López JA, Pathauer P, Palazzini D, Harrand L, Oberschelp J, Marcó MA, Cisneros EF, Carreras R, Martins Alves AM, Rodrigues JC, Hopp HE, Grattapaglia D, Cappa EP, Paniego NB, Marcucci Poltri SN. Comparison of ddRADseq and EUChip60K SNP genotyping systems for population genetics and genomic selection in Eucalyptus dunnii (Maiden). Front Genet 2024; 15:1361418. [PMID: 38606359 PMCID: PMC11008695 DOI: 10.3389/fgene.2024.1361418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 02/19/2024] [Indexed: 04/13/2024] Open
Abstract
Eucalyptus dunnii is one of the most important Eucalyptus species for short-fiber pulp production in regions where other species of the genus are affected by poor soil and climatic conditions. In this context, E. dunnii holds promise as a resource to address and adapt to the challenges of climate change. Despite its rapid growth and favorable wood properties for solid wood products, the advancement of its improvement remains in its early stages. In this work, we evaluated the performance of two single nucleotide polymorphism, (SNP), genotyping methods for population genetics analysis and Genomic Selection in E. dunnii. Double digest restriction-site associated DNA sequencing (ddRADseq) was compared with the EUChip60K array in 308 individuals from a provenance-progeny trial. The compared SNP set included 8,011 and 19,008 informative SNPs distributed along the 11 chromosomes, respectively. Although the two datasets differed in the percentage of missing data, genome coverage, minor allele frequency and estimated genetic diversity parameters, they revealed a similar genetic structure, showing two subpopulations with little differentiation between them, and low linkage disequilibrium. GS analyses were performed for eleven traits using Genomic Best Linear Unbiased Prediction (GBLUP) and a conventional pedigree-based model (ABLUP). Regardless of the SNP dataset, the predictive ability (PA) of GBLUP was better than that of ABLUP for six traits (Cellulose content, Total and Ethanolic extractives, Total and Klason lignin content and Syringyl and Guaiacyl lignin monomer ratio). When contrasting the SNP datasets used to estimate PAs, the GBLUP-EUChip60K model gave higher and significant PA values for six traits, meanwhile, the values estimated using ddRADseq gave higher values for three other traits. The PAs correlated positively with narrow sense heritabilities, with the highest correlations shown by the ABLUP and GBLUP-EUChip60K. The two genotyping methods, ddRADseq and EUChip60K, are generally comparable for population genetics and genomic prediction, demonstrating the utility of the former when subjected to rigorous SNP filtering. The results of this study provide a basis for future whole-genome studies using ddRADseq in non-model forest species for which SNP arrays have not yet been developed.
Collapse
Affiliation(s)
| | | | - Martín Nahuel García
- Instituto de Agrobiotecnología y Biología Molecular, UEDD INTA-CONICET, Hurlingham, Argentina
| | - Carla Valeria Filippi
- Instituto de Agrobiotecnología y Biología Molecular, UEDD INTA-CONICET, Hurlingham, Argentina
- Laboratorio de Bioquímica, Departamento de Biología Vegetal, Facultad de Agronomía, Universidad de la República, Montevideo, Uruguay
| | - Juan Gabriel Rivas
- Instituto de Agrobiotecnología y Biología Molecular, UEDD INTA-CONICET, Hurlingham, Argentina
| | - María Carolina Martínez
- Instituto de Agrobiotecnología y Biología Molecular, UEDD INTA-CONICET, Hurlingham, Argentina
| | - Cintia Vanesa Acuña
- Instituto de Agrobiotecnología y Biología Molecular, UEDD INTA-CONICET, Hurlingham, Argentina
| | - Augusto J. López
- Estación Experimental Agropecuaria de Bella Vista, Instituto Nacional de Tecnología Agropecuaria, Bella Vista, Argentina
| | - Juan Adolfo López
- Estación Experimental Agropecuaria de Bella Vista, Instituto Nacional de Tecnología Agropecuaria, Bella Vista, Argentina
| | - Pablo Pathauer
- Instituto de Recursos Biológicos, Instituto Nacional de Tecnología Agropecuaria, Hurlingham, Argentina
| | - Dino Palazzini
- Instituto de Recursos Biológicos, Instituto Nacional de Tecnología Agropecuaria, Hurlingham, Argentina
| | - Leonel Harrand
- Estación Experimental Agropecuaria de Concordia, Instituto Nacional de Tecnología Agropecuaria, Concordia, Argentina
| | - Javier Oberschelp
- Estación Experimental Agropecuaria de Concordia, Instituto Nacional de Tecnología Agropecuaria, Concordia, Argentina
| | - Martín Alberto Marcó
- Estación Experimental Agropecuaria de Concordia, Instituto Nacional de Tecnología Agropecuaria, Concordia, Argentina
| | - Esteban Felipe Cisneros
- Facultad de Ciencias Forestales, Universidad Nacional de Santiago del Estero (UNSE), Santiago del Estero, Argentina
| | - Rocío Carreras
- Facultad de Ciencias Forestales, Universidad Nacional de Santiago del Estero (UNSE), Santiago del Estero, Argentina
| | - Ana Maria Martins Alves
- Centro de Estudos Florestais e Laboratório Associado TERRA, Instituto Superior de Agronomia, Universidade de Lisboa, Tapada da Ajuda, Lisboa, Portugal
| | - José Carlos Rodrigues
- Centro de Estudos Florestais e Laboratório Associado TERRA, Instituto Superior de Agronomia, Universidade de Lisboa, Tapada da Ajuda, Lisboa, Portugal
| | - H. Esteban Hopp
- Instituto de Agrobiotecnología y Biología Molecular, UEDD INTA-CONICET, Hurlingham, Argentina
| | - Dario Grattapaglia
- Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA), Recursos Genéticos e Biotecnologia, Brasilia, Brazil
| | - Eduardo Pablo Cappa
- Instituto de Recursos Biológicos, Instituto Nacional de Tecnología Agropecuaria, Hurlingham, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Norma Beatriz Paniego
- Instituto de Agrobiotecnología y Biología Molecular, UEDD INTA-CONICET, Hurlingham, Argentina
| | | |
Collapse
|
4
|
San Jose M, Doorenweerd C, Geib S, Barr N, Dupuis JR, Leblanc L, Kauwe A, Morris KY, Rubinoff D. Interspecific gene flow obscures phylogenetic relationships in an important insect pest species complex. Mol Phylogenet Evol 2023; 188:107892. [PMID: 37524217 DOI: 10.1016/j.ympev.2023.107892] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/07/2023] [Accepted: 07/28/2023] [Indexed: 08/02/2023]
Abstract
As genomic data proliferates, the prevalence of post-speciation gene flow is making species boundaries and relationships increasingly ambiguous. Although current approaches inferring fully bifurcating phylogenies based on concatenated datasets provide simple and robust answers to many species relationships, they may be inaccurate because the models ignore inter-specific gene flow and incomplete lineage sorting. To examine the potential error resulting from ignoring gene flow, we generated both a RAD-seq and a 500 protein-coding loci highly multiplexed amplicon (HiMAP) dataset for a monophyletic group of 12 species defined as the Bactrocera dorsalis sensu lato clade. With some of the world's worst agricultural pests, the taxonomy of the B. dorsalis s.l. clade is important for trade and quarantines. However, taxonomic confusion confounds resolution due to intra- and interspecific phenotypic variation and convergence, mitochondrial introgression across half of the species, and viable hybrids. We compared the topological convergence of our datasets using concatenated phylogenetic and various multispecies coalescent approaches, some of which account for gene flow. All analyses agreed on species delimitation, but there was incongruence between species relationships. Under concatenation, both datasets suggest identical species relationships with mostly high statistical support. However, multispecies coalescent and multispecies network approaches suggest markedly different hypotheses and detected significant gene flow. We suggest that the network approaches are likely more accurate because gene flow violates the assumptions of the concatenated phylogenetic analyses, but the data-reductive requirements of network approaches resulted in reduced statistical support and could not unambiguously resolve gene flow directions. Our study highlights the importance of testing for gene flow, particularly with phylogenomic datasets, even when concatenated approaches receive high statistical support.
Collapse
Affiliation(s)
- Michael San Jose
- University of Hawaii, College of Tropical Agriculture and Human Resources, Department of Plant and Environmental Protection Sciences, Entomology Section, 3050 Maile Way, Honolulu, HI, 96822-2231, USA.
| | - Camiel Doorenweerd
- University of Hawaii, College of Tropical Agriculture and Human Resources, Department of Plant and Environmental Protection Sciences, Entomology Section, 3050 Maile Way, Honolulu, HI, 96822-2231, USA
| | - Scott Geib
- Tropical Crop and Commodity Protection Research Unit, Daniel K Inouye U.S. Pacific Basin Agricultural Center, USDA Agricultural Research Services, Hilo, HI, USA
| | - Norman Barr
- United States Department of Agriculture, Animal and Plant Health Inspection Service, Plant Protection and Quarantine, Science & Technology, Insect Management and Molecular Diagnostics Laboratory, 22675 N. Moorefield Road, Edinburg, TX 78541, USA
| | - Julian R Dupuis
- University of Kentucky, Department of Entomology, S225 Ag Science Center North, 1100 South Limestone, Lexington, KY, 40546-0091, USA
| | - Luc Leblanc
- University of Idaho, Department of Entomology, Plant Pathology and Nematology, 875 Perimeter Drive, MS2329, Moscow, ID, 83844-2329, USA
| | - Angela Kauwe
- Tropical Crop and Commodity Protection Research Unit, Daniel K Inouye U.S. Pacific Basin Agricultural Center, USDA Agricultural Research Services, Hilo, HI, USA
| | - Kimberley Y Morris
- University of Hawaii, College of Tropical Agriculture and Human Resources, Department of Plant and Environmental Protection Sciences, Entomology Section, 3050 Maile Way, Honolulu, HI, 96822-2231, USA; Tropical Crop and Commodity Protection Research Unit, Daniel K Inouye U.S. Pacific Basin Agricultural Center, USDA Agricultural Research Services, Hilo, HI, USA
| | - Daniel Rubinoff
- University of Hawaii, College of Tropical Agriculture and Human Resources, Department of Plant and Environmental Protection Sciences, Entomology Section, 3050 Maile Way, Honolulu, HI, 96822-2231, USA
| |
Collapse
|
5
|
Aguirre NC, Filippi CV, Vera PA, Puebla AF, Zaina G, Lia VV, Marcucci Poltri SN, Paniego NB. Double Digest Restriction-Site Associated DNA Sequencing (ddRADseq) Technology. Methods Mol Biol 2023; 2638:37-57. [PMID: 36781634 DOI: 10.1007/978-1-0716-3024-2_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Double digest restriction-site associated DNA sequencing (ddRADseq) technology combines genome reduced representation by digestion with two restriction enzymes and next generation sequencing (NGS) to obtain thousands of markers (SNP, SSR, and InDels) and genotype tens to hundreds of samples simultaneously. In this chapter, we describe a 96-plex derived ddRADseq protocol that can be set up to obtain different depth of coverage per locus and can be exploited to model and non-model plant species.
Collapse
Affiliation(s)
- Natalia Cristina Aguirre
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), Unidad Ejecutora de Doble Dependencia Instituto Nacional de Tecnología Agropecuaria (INTA) - Consejo Nacional de Ciencia y Técnica (CONICET), Hurlingham, Argentina.
| | - Carla Valeria Filippi
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), Unidad Ejecutora de Doble Dependencia Instituto Nacional de Tecnología Agropecuaria (INTA) - Consejo Nacional de Ciencia y Técnica (CONICET), Hurlingham, Argentina.,Laboratorio de Bioquímica, Departamento de Biología Vegetal, Facultad de Agronomía, Universidad de la República, Montevideo, Uruguay
| | - Pablo Alfredo Vera
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), Unidad Ejecutora de Doble Dependencia Instituto Nacional de Tecnología Agropecuaria (INTA) - Consejo Nacional de Ciencia y Técnica (CONICET), Hurlingham, Argentina
| | - Andrea Fabiana Puebla
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), Unidad Ejecutora de Doble Dependencia Instituto Nacional de Tecnología Agropecuaria (INTA) - Consejo Nacional de Ciencia y Técnica (CONICET), Hurlingham, Argentina
| | - Giusi Zaina
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Udine, Italy
| | - Verónica Viviana Lia
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), Unidad Ejecutora de Doble Dependencia Instituto Nacional de Tecnología Agropecuaria (INTA) - Consejo Nacional de Ciencia y Técnica (CONICET), Hurlingham, Argentina
| | - Susana Noemí Marcucci Poltri
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), Unidad Ejecutora de Doble Dependencia Instituto Nacional de Tecnología Agropecuaria (INTA) - Consejo Nacional de Ciencia y Técnica (CONICET), Hurlingham, Argentina
| | - Norma Beatriz Paniego
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), Unidad Ejecutora de Doble Dependencia Instituto Nacional de Tecnología Agropecuaria (INTA) - Consejo Nacional de Ciencia y Técnica (CONICET), Hurlingham, Argentina
| |
Collapse
|
6
|
Butler BO, Smith LL, Flores-Villela O. Phylogeography and taxonomy of Coleonyx elegans Gray 1845 (Squamata: Eublepharidae) in Mesoamerica: The Isthmus of Tehuantepec as an environmental barrier. Mol Phylogenet Evol 2023; 178:107632. [PMID: 36182052 DOI: 10.1016/j.ympev.2022.107632] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Revised: 08/05/2022] [Accepted: 09/23/2022] [Indexed: 12/14/2022]
Abstract
Population divergence leading to speciation is often explained by physical barriers causing allopatric distributions of historically connected populations. Environmental barriers have increasingly been shown to cause population divergence through local adaptation to distinct ecological characteristics. In this study, we evaluate population structuring and phylogeographic history within the Yucatán banded gecko Coleonyx elegans Gray 1845 to assess the role of both physical and environmental barriers in shaping the spatio-genetic distribution of a Mesoamerican tropical forest taxon. We generated RADseq and multi-locus Sanger datasets that included sampling across the entire species' range. Results find support for two distinct evolutionary lineages that diverged during the late Pliocene and show recent population expansions. Furthermore, these genetic lineages largely align with subspecies boundaries defined by morphology. Several mountain ranges identified as phylogeographic barriers in other taxa act as physical barriers to gene flow between the two clades. Despite the absence of a physical barrier between lineages across the lowland Isthmus of Tehuantepec, no introgression was observed. Here, a steep environmental cline associated with seasonality of precipitation corresponds exactly with the distributional limits of the lineages, whose closest samples are only 30 km apart. The combination of molecular and environmental evidence, and in conjunction with previous morphological evidence, allows us to reassess the current taxonomy in an integrative framework. Based on our findings, we elevate the previously recognized subspecies from the Pacific versant, the Colima banded gecko C. nemoralis Klauber 1945, to full species status and comment on conservation implications.
Collapse
Affiliation(s)
- Brett O Butler
- Museo de Zoología "Alfonso L. Herrera", Facultad de Ciencias, Universidad Nacional Autónoma de México, Av. Ciudad Universitaria 3000, C. P. 04510 Coyoacán, CDMX, Mexico; Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Av. Ciudad Universitaria 3000, C. P. 04510 Coyoacán, CDMX, Mexico.
| | - Lydia L Smith
- Museum of Vertebrate Zoology, 3101 Valley Life Sciences Building, University of California, Berkeley, CA 94720, USA
| | - Oscar Flores-Villela
- Museo de Zoología "Alfonso L. Herrera", Facultad de Ciencias, Universidad Nacional Autónoma de México, Av. Ciudad Universitaria 3000, C. P. 04510 Coyoacán, CDMX, Mexico
| |
Collapse
|
7
|
Vernygora OV, Campbell EO, Grishin NV, Sperling FA, Dupuis JR. Gauging ages of tiger swallowtail butterflies using alternate SNP analyses. Mol Phylogenet Evol 2022; 171:107465. [DOI: 10.1016/j.ympev.2022.107465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 02/26/2022] [Accepted: 03/15/2022] [Indexed: 10/18/2022]
|
8
|
Abstract
Restriction enzymes have been one of the primary tools in the population genetics toolkit for 50 years, being coupled with each new generation of technology to provide a more detailed view into the genetics of natural populations. Restriction site-Associated DNA protocols, which joined enzymes with short-read sequencing technology, have democratized the field of population genomics, providing a means to assay the underlying alleles in scores of populations. More than 10 years on, the technique has been widely applied across the tree of life and served as the basis for many different analysis techniques. Here, we provide a detailed protocol to conduct a RAD analysis from experimental design to de novo analysis-including parameter optimization-as well as reference-based analysis, all in Stacks version 2, which is designed to work with paired-end reads to assemble RAD loci up to 1000 nucleotides in length. The protocol focuses on major points of friction in the molecular approaches and downstream analysis, with special attention given to validating experimental analyses. Finally, the protocol provides several points of departure for further analysis.
Collapse
Affiliation(s)
- Angel G Rivera-Colón
- Department of Evolution, Ecology, and Behavior, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Julian Catchen
- Department of Evolution, Ecology, and Behavior, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
9
|
Sinn BT, Simon SJ, Santee MV, DiFazio SP, Fama NM, Barrett CF. ISSRseq: An extensible method for reduced representation sequencing. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Brandon T. Sinn
- Department of Biology and Earth Science Otterbein University Westerville OH USA
- Department of Biology West Virginia University Morgantown WV USA
| | - Sandra J. Simon
- Department of Biology West Virginia University Morgantown WV USA
- Institute for Sustainability, Energy, and Environment (ISEE) University of Illinois at Urbana‐Champaign Urbana IL USA
- Department of Biology West Virginia University Institute of Technology Beckley WV USA
| | | | | | - Nicole M. Fama
- Department of Biology West Virginia University Morgantown WV USA
- Genetic Immunotherapy Section National Institute of Allergy and Infectious Diseases National Institutes of Health Bethesda MD USA
| | - Craig F. Barrett
- Department of Biology West Virginia University Morgantown WV USA
| |
Collapse
|
10
|
Christiansen H, Heindler FM, Hellemans B, Jossart Q, Pasotti F, Robert H, Verheye M, Danis B, Kochzius M, Leliaert F, Moreau C, Patel T, Van de Putte AP, Vanreusel A, Volckaert FAM, Schön I. Facilitating population genomics of non-model organisms through optimized experimental design for reduced representation sequencing. BMC Genomics 2021; 22:625. [PMID: 34418978 PMCID: PMC8380342 DOI: 10.1186/s12864-021-07917-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 07/26/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Genome-wide data are invaluable to characterize differentiation and adaptation of natural populations. Reduced representation sequencing (RRS) subsamples a genome repeatedly across many individuals. However, RRS requires careful optimization and fine-tuning to deliver high marker density while being cost-efficient. The number of genomic fragments created through restriction enzyme digestion and the sequencing library setup must match to achieve sufficient sequencing coverage per locus. Here, we present a workflow based on published information and computational and experimental procedures to investigate and streamline the applicability of RRS. RESULTS In an iterative process genome size estimates, restriction enzymes and size selection windows were tested and scaled in six classes of Antarctic animals (Ostracoda, Malacostraca, Bivalvia, Asteroidea, Actinopterygii, Aves). Achieving high marker density would be expensive in amphipods, the malacostracan target taxon, due to the large genome size. We propose alternative approaches such as mitogenome or target capture sequencing for this group. Pilot libraries were sequenced for all other target taxa. Ostracods, bivalves, sea stars, and fish showed overall good coverage and marker numbers for downstream population genomic analyses. In contrast, the bird test library produced low coverage and few polymorphic loci, likely due to degraded DNA. CONCLUSIONS Prior testing and optimization are important to identify which groups are amenable for RRS and where alternative methods may currently offer better cost-benefit ratios. The steps outlined here are easy to follow for other non-model taxa with little genomic resources, thus stimulating efficient resource use for the many pressing research questions in molecular ecology.
Collapse
Affiliation(s)
- Henrik Christiansen
- Laboratory of Biodiversity and Evolutionary Genomics, KU Leuven, Leuven, Belgium.
| | - Franz M Heindler
- Laboratory of Biodiversity and Evolutionary Genomics, KU Leuven, Leuven, Belgium
| | - Bart Hellemans
- Laboratory of Biodiversity and Evolutionary Genomics, KU Leuven, Leuven, Belgium
| | - Quentin Jossart
- Marine Biology Group, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | | | - Henri Robert
- OD Nature, Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| | - Marie Verheye
- OD Nature, Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| | - Bruno Danis
- Marine Biology Laboratory, Université Libre de Bruxelles (ULB), Brussels, Belgium
| | - Marc Kochzius
- Marine Biology Group, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Frederik Leliaert
- Marine Biology Research Group, Ghent University, Ghent, Belgium.,Meise Botanic Garden, Meise, Belgium
| | - Camille Moreau
- Marine Biology Laboratory, Université Libre de Bruxelles (ULB), Brussels, Belgium.,Université de Bourgogne Franche-Comté (UBFC) UMR CNRS 6282 Biogéosciences, Dijon, France
| | - Tasnim Patel
- OD Nature, Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| | - Anton P Van de Putte
- Laboratory of Biodiversity and Evolutionary Genomics, KU Leuven, Leuven, Belgium.,OD Nature, Royal Belgian Institute of Natural Sciences, Brussels, Belgium.,Marine Biology Laboratory, Université Libre de Bruxelles (ULB), Brussels, Belgium
| | - Ann Vanreusel
- Marine Biology Research Group, Ghent University, Ghent, Belgium
| | - Filip A M Volckaert
- Laboratory of Biodiversity and Evolutionary Genomics, KU Leuven, Leuven, Belgium
| | - Isa Schön
- OD Nature, Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| |
Collapse
|
11
|
Double-digest RAD-sequencing: do pre- and post-sequencing protocol parameters impact biological results? Mol Genet Genomics 2021; 296:457-471. [PMID: 33469716 DOI: 10.1007/s00438-020-01756-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 12/14/2020] [Indexed: 02/06/2023]
Abstract
Next-generation sequencing technologies have opened a new era of research in population genetics. Following these new sequencing opportunities, the use of restriction enzyme-based genotyping techniques, such as restriction site-associated DNA sequencing (RAD-seq) or double-digest RAD-sequencing (ddRAD-seq), has dramatically increased in the last decade. From DNA sampling to SNP calling, the laboratory and bioinformatic parameters of enzyme-based techniques have been investigated in the literature. However, the impact of those parameters on downstream analyses and biological results remains less documented. In this study, we investigated the effects of sevral pre- and post-sequencing settings on ddRAD-seq results for two biological systems: a complex of butterfly species (Coenonympha sp.) and several populations of common beech (Fagus sylvatica). Our results suggest that pre-sequencing parameters (i.e., DNA quantity, number of PCR cycles during library preparation) have a significant impact on the number of recovered reads and SNPs, on the number of unique alleles and on individual heterozygosity. In the same way, we found that post-sequencing settings (i.e., clustering and minimum coverage thresholds) influenced loci reconstruction (e.g., number of loci, mean coverage) and SNP calling (e.g., number of SNPs; heterozygosity) but had only a marginal impact on downstream analyses (e.g., measure of genetic differentiation, estimation of individual admixture, and demographic inferences). In addition, replication analyses confirmed the reproducibility of the ddRAD-seq procedure. Overall, this study assesses the degree of sensitivity of ddRAD-seq data to pre- and post-sequencing protocols, and illustrates its robustness when studying population genetics.
Collapse
|
12
|
Torkamaneh D, Laroche J, Boyle B, Belzile F. DepthFinder: a tool to determine the optimal read depth for reduced-representation sequencing. Bioinformatics 2020; 36:26-32. [PMID: 31173057 DOI: 10.1093/bioinformatics/btz473] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Revised: 05/29/2019] [Accepted: 06/01/2019] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Identification of DNA sequence variations such as single nucleotide polymorphisms (SNPs) is a fundamental step toward genetic studies. Reduced-representation sequencing methods have been developed as alternatives to whole genome sequencing to reduce costs and enable the analysis of many more individual. Amongst these methods, restriction site associated sequencing (RSAS) methodologies have been widely used for rapid and cost-effective discovery of SNPs and for high-throughput genotyping in a wide range of species. Despite the extensive improvements of the RSAS methods in the last decade, the estimation of the number of reads (i.e. read depth) required per sample for an efficient and effective genotyping remains mostly based on trial and error. RESULTS Herein we describe a bioinformatics tool, DepthFinder, designed to estimate the required read counts for RSAS methods. To illustrate its performance, we estimated required read counts in six different species (human, cattle, spruce budworm, salmon, barley and soybean) that cover a range of different biological (genome size, level of genome complexity, level of DNA methylation and ploidy) and technical (library preparation protocol and sequencing platform) factors. To assess the prediction accuracy of DepthFinder, we compared DepthFinder-derived results with independent datasets obtained from an RSAS experiment. This analysis yielded estimated accuracies of nearly 94%. Moreover, we present DepthFinder as a powerful tool to predict the most effective size selection interval in RSAS work. We conclude that DepthFinder constitutes an efficient, reliable and useful tool for a broad array of users in different research communities. AVAILABILITY AND IMPLEMENTATION https://bitbucket.org/jerlar73/DepthFinder. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Davoud Torkamaneh
- Département de Phytologie, Québec City, QC G1V 0A6, Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC G1V 0A6, Canada.,Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Jérôme Laroche
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC G1V 0A6, Canada
| | - Brian Boyle
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC G1V 0A6, Canada
| | - François Belzile
- Département de Phytologie, Québec City, QC G1V 0A6, Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC G1V 0A6, Canada
| |
Collapse
|
13
|
Bubac CM, Miller JM, Coltman DW. The genetic basis of animal behavioural diversity in natural populations. Mol Ecol 2020; 29:1957-1971. [PMID: 32374914 DOI: 10.1111/mec.15461] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 04/17/2020] [Accepted: 05/01/2020] [Indexed: 12/30/2022]
Abstract
Individual differences in animal behaviour influence ecological and evolutionary processes. Much behavioural variation has a heritable component, suggesting that genetics may play a role in its development. Yet, the study of the mechanistic description linking genes to behaviour in nature remains in its infancy, and such research is considered a challenge in contemporary biology. Here, we performed a literature review and meta-analysis to assess trends in analytical approaches used to investigate the relationship between genes and behaviour in natural systems, specifically candidate gene approaches, quantitative trait locus (QTL) mapping, and genome-wide association studies (GWAS). We aimed to determine the efficacy and success of each approach, while also describing which behaviours and species were examined by researchers most often. We found that the majority of QTL mapping and GWAS results revealed a significant or suggestive effect (Zr = 0.3 [95% CI: 0.25:0.35] and Zr = 0.39 [0.33:0.46], respectively) between the trait of interest and genetic marker(s) tested, while over half of candidate gene accounts (Zr = 0.16 [0.11:0.21]) did not find a significant association. Approximately a third of all study estimates investigated animal personality traits; though, reproductive and migratory behaviours were also well-represented. Our findings show that despite widespread accessibility of molecular approaches given current sequencing technologies, efforts to elucidate the genetic basis of behaviour in free-ranging systems has been limited to relatively few species. We discuss challenges encountered by researchers, and recommend integration of novel genomic methods with longitudinal studies to usher in the next wave of behavioural genomic research.
Collapse
Affiliation(s)
- Christine M Bubac
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Joshua M Miller
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - David W Coltman
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
14
|
Rivera-Colón AG, Rochette NC, Catchen JM. Simulation with RADinitio improves RADseq experimental design and sheds light on sources of missing data. Mol Ecol Resour 2020; 21:363-378. [PMID: 32275349 DOI: 10.1111/1755-0998.13163] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 03/25/2020] [Indexed: 12/20/2022]
Abstract
Restriction-site associated DNA sequencing (RADseq) has become a powerful and versatile tool in modern population genomics, enabling large-scale evolutionary and genomic analyses in otherwise inaccessible biological systems. With its widespread use, different variants on the protocol have been developed to suit specific experimental needs. Researchers face the challenge of choosing the optimal molecular and sequencing protocols for their reduced representation experimental design, an often-complicated process. Strategic errors can lead to biased data generation that has reduced power to answer biological questions. Here, we present RADinitio, simulation software for the selection and optimization of RADseq experiments via the generation of sequencing data that behave similarly to empirical sources. RADinitio provides an evolutionary simulation of populations, implementation of various RADseq protocols with customizable parameters, and thorough assessment of missing data. We test the efficacy of the software using different RAD protocols across several organisms, highlighting the importance of protocol selection on the magnitude and quality of data acquired. Additionally, we test the effects of RAD library preparation and sequencing on allelic dropout, observing that library preparation and sequencing often contributes more to missing alleles than population-level variation.
Collapse
Affiliation(s)
- Angel G Rivera-Colón
- Department of Evolution, Ecology and Behavior, University of Illinois, Urbana, Illinois, USA
| | - Nicolas C Rochette
- Department of Evolution, Ecology and Behavior, University of Illinois, Urbana, Illinois, USA
| | - Julian M Catchen
- Department of Evolution, Ecology and Behavior, University of Illinois, Urbana, Illinois, USA
| |
Collapse
|
15
|
Jenkins TL, Ellis CD, Triantafyllidis A, Stevens JR. Single nucleotide polymorphisms reveal a genetic cline across the north-east Atlantic and enable powerful population assignment in the European lobster. Evol Appl 2019; 12:1881-1899. [PMID: 31700533 PMCID: PMC6824076 DOI: 10.1111/eva.12849] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 07/09/2019] [Accepted: 07/11/2019] [Indexed: 12/11/2022] Open
Abstract
Resolving stock structure is crucial for fisheries conservation to ensure that the spatial implementation of management is commensurate with that of biological population units. To address this in the economically important European lobster (Homarus gammarus), genetic structure was explored across the species' range using a small panel of single nucleotide polymorphisms (SNPs) previously isolated from restriction-site-associated DNA sequencing; these SNPs were selected to maximize differentiation at a range of both broad and fine scales. After quality control and filtering, 1,278 lobsters from 38 sampling sites were genotyped at 79 SNPs. The results revealed a pronounced phylogeographic break between the Atlantic and Mediterranean basins, while structure within the Mediterranean was also apparent, partitioned between lobsters from the central Mediterranean and the Aegean Sea. In addition, a genetic cline across the north-east Atlantic was revealed using both putatively neutral and outlier SNPs, but the precise driver(s) of this clinal pattern-isolation by distance, secondary contact, selection across an environmental gradient, or a combination of these factors-remains undetermined. Putatively neutral markers differentiated lobsters from Oosterschelde, an estuary on the Dutch coast, a finding likely explained by past bottlenecks and limited gene flow with adjacent North Sea populations. Building on the findings of our spatial genetic analysis, we were able to test the accuracy of assigning lobsters at various spatial scales, including to basin of origin (Atlantic or Mediterranean), region of origin and sampling location. The predictive model assembled using 79 SNPs correctly assigned 99.7% of lobsters not used to build the model to their basin of origin, but accuracy decreased to region of origin and again to sampling location. These results are of direct relevance to managers of lobster fisheries and hatcheries, and provide the basis for a genetic tool for tracing the origin of European lobsters in the food supply chain.
Collapse
Affiliation(s)
- Tom L. Jenkins
- Department of Biosciences, College of Life and Environmental SciencesUniversity of ExeterExeterUK
| | - Charlie D. Ellis
- Department of Biosciences, College of Life and Environmental SciencesUniversity of ExeterExeterUK
- National Lobster HatcherySouth QuayPadstowUK
| | | | - Jamie R. Stevens
- Department of Biosciences, College of Life and Environmental SciencesUniversity of ExeterExeterUK
| |
Collapse
|
16
|
Zan Y, Payen T, Lillie M, Honaker CF, Siegel PB, Carlborg Ö. Genotyping by low-coverage whole-genome sequencing in intercross pedigrees from outbred founders: a cost-efficient approach. Genet Sel Evol 2019; 51:44. [PMID: 31412777 PMCID: PMC6694510 DOI: 10.1186/s12711-019-0487-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 08/07/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Experimental intercrosses between outbred founder populations are powerful resources for mapping loci that contribute to complex traits i.e. quantitative trait loci (QTL). Here, we present an approach and its accompanying software for high-resolution reconstruction of founder mosaic genotypes in the intercross offspring from such populations using whole-genome high-coverage sequence data on founder individuals (~ 30×) and very low-coverage sequence data on intercross individuals (< 0.5×). Sets of founder-line informative markers were selected for each full-sib family and used to infer the founder mosaic genotypes of the intercross individuals. The application of this approach and the quality of the estimated genome-wide genotypes are illustrated in a large F2 pedigree between two divergently selected lines of chickens. RESULTS We describe how we obtained whole-genome genotype data for hundreds of individuals in a cost- and time-efficient manner by using a Tn5-based library preparation protocol and an imputation algorithm that was optimized for this application. In total, 7.6 million markers segregated in this pedigree and, within each full-sib family, between 10.0 and 13.7% of these were fully informative, i.e. fixed for alternative alleles in the founders from the divergent lines, and were used for reconstruction of the offspring mosaic genotypes. The genotypes that were estimated based on the low-coverage sequence data were highly consistent (> 95% agreement) with those obtained using individual single nucleotide polymorphism (SNP) genotyping. The estimated resolution of the inferred recombination breakpoints was relatively high, with 50% of them being defined on regions shorter than 10 kb. CONCLUSIONS A method and software for inferring founder mosaic genotypes in intercross offspring from low-coverage whole-genome sequencing in pedigrees from heterozygous founders are described. They provide high-quality, high-resolution genotypes in a time- and cost-efficient manner. The software is freely available at https://github.com/CarlborgGenomics/Stripes .
Collapse
Affiliation(s)
- Yanjun Zan
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Thibaut Payen
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Mette Lillie
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Christa F Honaker
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Paul B Siegel
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Örjan Carlborg
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
17
|
Meek MH, Larson WA. The future is now: Amplicon sequencing and sequence capture usher in the conservation genomics era. Mol Ecol Resour 2019; 19:795-803. [PMID: 30681776 DOI: 10.1111/1755-0998.12998] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 01/17/2019] [Accepted: 01/18/2019] [Indexed: 01/21/2023]
Abstract
The genomics revolution has initiated a new era of population genetics where genome-wide data are frequently used to understand complex patterns of population structure and selection. However, the application of genomic tools to inform management and conservation has been somewhat rare outside a few well studied species. Fortunately, two recently developed approaches, amplicon sequencing and sequence capture, have the potential to significantly advance the field of conservation genomics. Here, amplicon sequencing refers to highly multiplexed PCR followed by high-throughput sequencing (e.g., GTseq), and sequence capture refers to using capture probes to isolate loci from reduced-representation libraries (e.g., Rapture). Both approaches allow sequencing of thousands of individuals at relatively low costs, do not require any specialized equipment for library preparation, and generate data that can be analyzed without sophisticated computational infrastructure. Here, we discuss the advantages and disadvantages of each method and provide a decision framework for geneticists who are looking to integrate these methods into their research programme. While it will always be important to consider the specifics of the biological question and system, we believe that amplicon sequencing is best suited for projects aiming to genotype <500 loci on many individuals (>1,500) or for species where continued monitoring is anticipated (e.g., long-term pedigrees). Sequence capture, on the other hand, is best applied to projects including fewer individuals or where >500 loci are required. Both of these techniques should smooth the transition from traditional genetic techniques to genomics, helping to usher in the conservation genomics era.
Collapse
Affiliation(s)
- Mariah H Meek
- Department of Integrative Biology and AgBio Research, Michigan State University, East Lansing, Michigan
| | - Wesley A Larson
- U.S. Geological Survey, Wisconsin Cooperative Fishery Research Unit, College of Natural Resources, University of Wisconsin-Stevens Point, Stevens Point, Wisconsin
| |
Collapse
|