1
|
Hemstrom W, Grummer JA, Luikart G, Christie MR. Next-generation data filtering in the genomics era. Nat Rev Genet 2024; 25:750-767. [PMID: 38877133 DOI: 10.1038/s41576-024-00738-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 06/16/2024]
Abstract
Genomic data are ubiquitous across disciplines, from agriculture to biodiversity, ecology, evolution and human health. However, these datasets often contain noise or errors and are missing information that can affect the accuracy and reliability of subsequent computational analyses and conclusions. A key step in genomic data analysis is filtering - removing sequencing bases, reads, genetic variants and/or individuals from a dataset - to improve data quality for downstream analyses. Researchers are confronted with a multitude of choices when filtering genomic data; they must choose which filters to apply and select appropriate thresholds. To help usher in the next generation of genomic data filtering, we review and suggest best practices to improve the implementation, reproducibility and reporting standards for filter types and thresholds commonly applied to genomic datasets. We focus mainly on filters for minor allele frequency, missing data per individual or per locus, linkage disequilibrium and Hardy-Weinberg deviations. Using simulated and empirical datasets, we illustrate the large effects of different filtering thresholds on common population genetics statistics, such as Tajima's D value, population differentiation (FST), nucleotide diversity (π) and effective population size (Ne).
Collapse
Affiliation(s)
- William Hemstrom
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | - Jared A Grummer
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Gordon Luikart
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Mark R Christie
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
2
|
Schiebelhut LM, Guillaume AS, Kuhn A, Schweizer RM, Armstrong EE, Beaumont MA, Byrne M, Cosart T, Hand BK, Howard L, Mussmann SM, Narum SR, Rasteiro R, Rivera-Colón AG, Saarman N, Sethuraman A, Taylor HR, Thomas GWC, Wellenreuther M, Luikart G. Genomics and conservation: Guidance from training to analyses and applications. Mol Ecol Resour 2024; 24:e13893. [PMID: 37966259 DOI: 10.1111/1755-0998.13893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/16/2023]
Abstract
Environmental change is intensifying the biodiversity crisis and threatening species across the tree of life. Conservation genomics can help inform conservation actions and slow biodiversity loss. However, more training, appropriate use of novel genomic methods and communication with managers are needed. Here, we review practical guidance to improve applied conservation genomics. We share insights aimed at ensuring effectiveness of conservation actions around three themes: (1) improving pedagogy and training in conservation genomics including for online global audiences, (2) conducting rigorous population genomic analyses properly considering theory, marker types and data interpretation and (3) facilitating communication and collaboration between managers and researchers. We aim to update students and professionals and expand their conservation toolkit with genomic principles and recent approaches for conserving and managing biodiversity. The biodiversity crisis is a global problem and, as such, requires international involvement, training, collaboration and frequent reviews of the literature and workshops as we do here.
Collapse
Affiliation(s)
- Lauren M Schiebelhut
- Life and Environmental Sciences, University of California, Merced, California, USA
| | - Annie S Guillaume
- Geospatial Molecular Epidemiology group (GEOME), Laboratory for Biological Geochemistry (LGB), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Arianna Kuhn
- Department of Biological Sciences, University of Lethbridge, Lethbridge, Alberta, Canada
- Virginia Museum of Natural History, Martinsville, Virginia, USA
| | - Rena M Schweizer
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
| | | | - Mark A Beaumont
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Margaret Byrne
- Department of Biodiversity, Conservation and Attractions, Biodiversity and Conservation Science, Perth, Western Australia, Australia
| | - Ted Cosart
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| | - Brian K Hand
- Flathead Lake Biological Station, University of Montana, Polson, Montana, USA
| | - Leif Howard
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| | - Steven M Mussmann
- Southwestern Native Aquatic Resources and Recovery Center, U.S. Fish & Wildlife Service, Dexter, New Mexico, USA
| | - Shawn R Narum
- Hagerman Genetics Lab, University of Idaho, Hagerman, Idaho, USA
| | - Rita Rasteiro
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Angel G Rivera-Colón
- Department of Evolution, Ecology, and Behavior, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA
| | - Norah Saarman
- Department of Biology and Ecology Center, Utah State University, Logan, Utah, USA
| | - Arun Sethuraman
- Department of Biology, San Diego State University, San Diego, California, USA
| | - Helen R Taylor
- Royal Zoological Society of Scotland, Edinburgh, Scotland
| | - Gregg W C Thomas
- Informatics Group, Harvard University, Cambridge, Massachusetts, USA
| | - Maren Wellenreuther
- Plant and Food Research, Nelson, New Zealand
- University of Auckland, Auckland, New Zealand
| | - Gordon Luikart
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| |
Collapse
|
3
|
Wong WLE, Arathimos R, Lewis CM, Young AH, Dawe GS. Investigating the role of the relaxin-3/RXFP3 system in neuropsychiatric disorders and metabolic phenotypes: A candidate gene approach. PLoS One 2023; 18:e0294045. [PMID: 37967073 PMCID: PMC10651050 DOI: 10.1371/journal.pone.0294045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 10/19/2023] [Indexed: 11/17/2023] Open
Abstract
The relaxin-3/RXFP3 system has been implicated in the modulation of depressive- and anxiety-like behaviour in the animal literature; however, there is a lack of human studies investigating this signalling system. We seek to bridge this gap by leveraging the large UK Biobank study to retrospectively assess genetic risk variants linked with this neuropeptidergic system. Specifically, we conducted a candidate gene study in the UK Biobank to test for potential associations between a set of functional, candidate single nucleotide polymorphisms (SNPs) pertinent to relaxin-3 signalling, determined using in silico tools, and several outcomes, including depression, atypical depression, anxiety and metabolic syndrome. For each outcome, we used several rigorously defined phenotypes, culminating in subsample sizes ranging from 85,881 to 386,769 participants. Across all outcomes, there were no associations between any candidate SNP and any outcome phenotype, following corrections for multiple testing burden. Regression models comprising several SNPs per relevant candidate gene as exploratory variables further exhibited no prediction of outcome. Our findings corroborate conclusions from previous literature about the limitations of candidate gene approaches, even when based on firm biological hypotheses, in the domain of genetic research for neuropsychiatric disorders.
Collapse
Affiliation(s)
- Win Lee Edwin Wong
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Healthy Longevity Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, United Kingdom
| | - Ryan Arathimos
- Institute of Psychiatry, Psychology and Neuroscience, Social, Genetic and Developmental Psychiatry Centre, King’s College London, London, United Kingdom
| | - Cathryn M. Lewis
- Institute of Psychiatry, Psychology and Neuroscience, Social, Genetic and Developmental Psychiatry Centre, King’s College London, London, United Kingdom
- Faculty of Life Sciences and Medicine, Department of Medical and Molecular Genetics, King’s College London, London, United Kingdom
| | - Allan H. Young
- Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, United Kingdom
- South London & Maudsley NHS Foundation Trust, Bethlem Royal Hospital, London, United Kingdom
| | - Gavin S. Dawe
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Healthy Longevity Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Life Sciences Institute, Neurobiology Programme, National University of Singapore, Singapore, Singapore
| |
Collapse
|
4
|
Pearman WS, Urban L, Alexander A. Commonly used Hardy-Weinberg equilibrium filtering schemes impact population structure inferences using RADseq data. Mol Ecol Resour 2022; 22:2599-2613. [PMID: 35593534 PMCID: PMC9541430 DOI: 10.1111/1755-0998.13646] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 05/13/2022] [Indexed: 11/29/2022]
Abstract
Reduced representation sequencing (RRS) is a widely used method to assay the diversity of genetic loci across the genome of an organism. The dominant class of RRS approaches assay loci associated with restriction sites within the genome (restriction site associated DNA sequencing, or RADseq). RADseq is frequently applied to non‐model organisms since it enables population genetic studies without relying on well‐characterized reference genomes. However, RADseq requires the use of many bioinformatic filters to ensure the quality of genotyping calls. These filters can have direct impacts on population genetic inference, and therefore require careful consideration. One widely used filtering approach is the removal of loci that do not conform to expectations of Hardy–Weinberg equilibrium (HWE). Despite being widely used, we show that this filtering approach is rarely described in sufficient detail to enable replication. Furthermore, through analyses of in silico and empirical data sets we show that some of the most widely used HWE filtering approaches dramatically impact inference of population structure. In particular, the removal of loci exhibiting departures from HWE after pooling across samples significantly reduces the degree of inferred population structure within a data set (despite this approach being widely used). Based on these results, we provide recommendations for best practice regarding the implementation of HWE filtering for RADseq data sets.
Collapse
Affiliation(s)
- William S Pearman
- Department of Marine Science, University of Otago, Dunedin, New Zealand.,Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Lara Urban
- Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Alana Alexander
- Department of Anatomy, University of Otago, Dunedin, New Zealand
| |
Collapse
|
5
|
Hauser SS, Athrey G, Leberg PL. Waste not, want not: Microsatellites remain an economical and informative technology for conservation genetics. Ecol Evol 2021; 11:15800-15814. [PMID: 34824791 PMCID: PMC8601879 DOI: 10.1002/ece3.8250] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 09/07/2021] [Accepted: 09/16/2021] [Indexed: 11/07/2022] Open
Abstract
Comparisons of microsatellites and single-nucleotide polymorphisms (SNPs) have found that SNPs outperform microsatellites in population genetic analyses, questioning the continued utility of microsatellites in population and landscape genetics. Yet, highly polymorphic markers may be of value in species that have reduced genetic variation. This study repeated previous analyses that used microsatellites with SNPs developed from ddRAD sequencing in the black-capped vireo source-sink system. SNPs provided greater resolution of genetic diversity, population differentiation, and migrant detection but could not reconstruct parentage relationships due to insufficient heterozygosities. The biological inferences made by both sets of markers were similar: asymmetrical gene flow from source sites to the remaining sink sites. With the landscape genetic analyses, we found different results between the two molecular markers, but associations of the top environmental features (riparian, open habitat, agriculture, and human development) with dispersal estimates were shared between marker types. Despite the higher precision of SNPs, we find that microsatellites effectively uncover population processes and patterns and are superior for parentage analyses in this species with reduced genetic diversity. This study illustrates the continued applicability and relevance of microsatellites in population genetic research.
Collapse
Affiliation(s)
- Samantha S. Hauser
- Department of BiologyUniversity of Louisiana at LafayetteLafayetteLouisianaUSA
| | - Giridhar Athrey
- Faculty of Ecology and Evolutionary BiologyTexas A&M UniversityCollege StationTexasUSA
| | - Paul L. Leberg
- Department of BiologyUniversity of Louisiana at LafayetteLafayetteLouisianaUSA
| |
Collapse
|
6
|
Mayoke A, Ouma JO, Mireji PO, Omondi SF, Muya SM, Itoua A, Okoth SO, Bateta R. Population Structure and Migration Patterns of the Tsetse Fly Glossina fuscipes in Congo-Brazzaville. Am J Trop Med Hyg 2020; 104:917-927. [PMID: 33372648 PMCID: PMC7941806 DOI: 10.4269/ajtmh.20-0774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 11/17/2020] [Indexed: 11/07/2022] Open
Abstract
Tsetse flies of the palpalis group, particularly Glossina fuscipes, are the main vectors of human African trypanosomiasis or sleeping sickness in Congo-Brazzaville. They transmit the deadly human parasite, Trypanosoma brucei gambiense and other trypanosomes that cause animal trypanosomiasis. Knowledge on diversity, population structure, population size, and gene flow is a prerequisite for designing effective tsetse control strategies. There is limited published information on these parameters including migration patterns of G. fuscipes in Congo-Brazzaville. We genotyped 288 samples of G. fuscipes from Bomassa (BMSA), Bouemba (BEMB), and Talangai (TLG) locations at 10 microsatellite loci and determined levels of genetic diversity, differentiation, structuring, and gene flow among populations. We observed high genetic diversity in all three localities. Mean expected heterozygosity was 0.77 ± 0.04, and mean allelic richness was 11.2 ± 1.35. Deficiency of heterozygosity was observed in all populations with positive and significant F IS values (0.077-0.149). Structure analysis revealed three clusters with genetic admixtures, evidence of closely related but potentially different taxa within G. fuscipes. Genetic differentiation indices were low but significant (F ST = 0.049, P < 0.05), indicating ongoing gene flow countered with a stronger force of drift. We recorded significant migration from all the three populations, suggesting exchange of genetic information between and among locations. Ne estimates revealed high and infinite population sizes in BEMB and TLG. These critical factors should be considered when planning area-wide tsetse control interventions in the country to prevent resurgence of tsetse from relict populations and/or reinvasion of cleared habitats.
Collapse
Affiliation(s)
- Abraham Mayoke
- Department of Molecular Biology and Biotechnology, Pan African University Institute for Basic Sciences, Technology and Innovation, Nairobi, Kenya
- Kenya Forestry Research Institute, Nairobi, Kenya
- Biotechnology Research Institute, Kenya Agricultural and Livestock Research Organization, Kikuyu, Kenya
- Marien Ngouabi University, Brazzaville, Congo
| | - Johnson O. Ouma
- African Technical Research Centre, Vector Health International, Arusha, Tanzania
| | - Paul O. Mireji
- Biotechnology Research Institute, Kenya Agricultural and Livestock Research Organization, Kikuyu, Kenya
| | | | - Shadrack M. Muya
- School of Biological Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
| | - Andre Itoua
- Laboratoire de Parasitologie, Centre de Recherche Veterinaire et Zootechniques, Brazzaville, Congo
| | - Sylvance O. Okoth
- Biotechnology Research Institute, Kenya Agricultural and Livestock Research Organization, Kikuyu, Kenya
| | - Rosemary Bateta
- Biotechnology Research Institute, Kenya Agricultural and Livestock Research Organization, Kikuyu, Kenya
| |
Collapse
|
7
|
Baird HP, Moon KL, Janion‐Scheepers C, Chown SL. Springtail phylogeography highlights biosecurity risks of repeated invasions and intraregional transfers among remote islands. Evol Appl 2020; 13:960-973. [PMID: 32431746 PMCID: PMC7232766 DOI: 10.1111/eva.12913] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 12/08/2019] [Accepted: 12/13/2019] [Indexed: 12/13/2022] Open
Abstract
Human-mediated transport of species outside their natural range is a rapidly growing threat to biodiversity, particularly for island ecosystems that have evolved in isolation. The genetic structure underpinning island populations will largely determine their response to increased transport and thus help to inform biosecurity management. However, this information is severely lacking for some groups, such as the soil fauna. We therefore analysed the phylogeographic structure of an indigenous and an invasive springtail species (Collembola: Poduromorpha), each distributed across multiple remote sub-Antarctic islands, where human activity is currently intensifying. For both species, we generated a genome-wide SNP data set and additionally analysed all available COI barcodes. Genetic differentiation in the indigenous springtail Tullbergia bisetosa is substantial among (and, to a lesser degree, within) islands, reflecting low dispersal and historic population fragmentation, while COI patterns reveal ancestral signatures of postglacial recolonization. This pronounced geographic structure demonstrates the key role of allopatric divergence in shaping the region's diversity and highlights the vulnerability of indigenous populations to genetic homogenization via human transport. For the invasive species Hypogastrura viatica, nuclear genetic structure is much less apparent, particularly for islands linked by regular shipping, while diverged COI haplotypes indicate multiple independent introductions to each island. Thus, human transport has likely facilitated this species' persistence since its initial colonization, through the ongoing introduction and inter-island spread of genetic variation. These findings highlight the different evolutionary consequences of human transport for indigenous and invasive soil species. Crucially, both outcomes demonstrate the need for improved intraregional biosecurity among remote island systems, where the policy focus to date has been on external introductions.
Collapse
Affiliation(s)
- Helena P. Baird
- School of Biological SciencesMonash UniversityClaytonVictoriaAustralia
| | - Katherine L. Moon
- School of Biological SciencesMonash UniversityClaytonVictoriaAustralia
| | - Charlene Janion‐Scheepers
- Iziko Museums of South AfricaCape TownSouth Africa
- Department of Zoology & EntomologyUniversity of the Free StateBloemfonteinSouth Africa
| | - Steven L. Chown
- School of Biological SciencesMonash UniversityClaytonVictoriaAustralia
| |
Collapse
|