1
|
Dallaire X, Bouchard R, Hénault P, Ulmo-Diaz G, Normandeau E, Mérot C, Bernatchez L, Moore JS. Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication. Genome Biol Evol 2023; 15:evad229. [PMID: 38085037 PMCID: PMC10752349 DOI: 10.1093/gbe/evad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2023] [Indexed: 12/28/2023] Open
Abstract
Most population genomic tools rely on accurate single nucleotide polymorphism (SNP) calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguous reference genomes. Consequently, short-read resequencing studies can encounter mismapping issues, leading to SNPs that deviate from Mendelian expected patterns of heterozygosity and allelic ratio. In this study, we employed the ngsParalog software to identify such deviant SNPs in whole-genome sequencing (WGS) data with low (1.5×) to intermediate (4.8×) coverage for four species: Arctic Char (Salvelinus alpinus), Lake Whitefish (Coregonus clupeaformis), Atlantic Salmon (Salmo salar), and the American Eel (Anguilla rostrata). The analyses revealed that deviant SNPs accounted for 22% to 62% of all SNPs in salmonid datasets and approximately 11% in the American Eel dataset. These deviant SNPs were particularly concentrated within repetitive elements and genomic regions that had recently undergone rediploidization in salmonids. Additionally, narrow peaks of elevated coverage were ubiquitous along all four reference genomes, encompassed most deviant SNPs, and could be partially associated with transposons and tandem repeats. Including these deviant SNPs in genomic analyses led to highly distorted site frequency spectra, underestimated pairwise FST values, and overestimated nucleotide diversity. Considering the widespread occurrence of deviant SNPs arising from a variety of sources, their important impact in estimating population parameters, and the availability of effective tools to identify them, we propose that excluding deviant SNPs from WGS datasets is required to improve genomic inferences for a wide range of taxa and sequencing depths.
Collapse
Affiliation(s)
- Xavier Dallaire
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Centre d'Études Nordiques, Université Laval, Québec, Canada
| | - Raphael Bouchard
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Philippe Hénault
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Gabriela Ulmo-Diaz
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Eric Normandeau
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
- Plateforme de bio-informatique de l’IBIS, Université Laval, Québec, Canada
| | - Claire Mérot
- CNRS, UMR 6553 ECOBIO, Université de Rennes, Rennes, France
| | - Louis Bernatchez
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Jean-Sébastien Moore
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Centre d'Études Nordiques, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| |
Collapse
|
2
|
García-Meseguer AJ, Villastrigo A, Mirón-Gatón JM, Millán A, Velasco J, Muñoz I. Novel Microsatellite Loci, Cross-Species Validation of Multiplex Assays, and By-Catch Mitochondrial Genomes on Ochthebius Beetles from Supratidal Rockpools. INSECTS 2023; 14:881. [PMID: 37999080 PMCID: PMC10672297 DOI: 10.3390/insects14110881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 11/06/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023]
Abstract
Here we focus on designing, for the first time, microsatellite markers for evolutionary and ecological research on aquatic beetles from the genus Ochthebius (Coleoptera, Hydraenidae). Some of these non-model species, with high cryptic diversity, exclusively inhabit supratidal rockpools, extreme and highly dynamic habitats with important anthropogenic threats. We analysed 15 individuals of four species (O. lejolisii, O. subinteger, O. celatus, and O. quadricollis) across 10 localities from the Mediterranean coasts of Spain and Malta. Using next-generation sequencing technology, two libraries were constructed to interpret the species of the two subgenera present consistently (Ochthebius s. str., O. quadricollis; and Cobalius, the rest of the species). Finally, 20 markers (10 for each subgenus) were obtained and successfully tested by cross-validation in the four species under study. As a by-catch, we could retrieve the complete mitochondrial genomes of O. lejolisii, O. quadricollis, and O. subinteger. Interestingly, the mitochondrial genome of O. quadricollis exhibited high genetic variability compared to already published data. The novel SSR panels and mitochondrial genomes for Ochthebius will be valuable in future research on species identification, diversity, genetic structure, and population connectivity in highly dynamic and threatened habitats such as supratidal coastal rockpools.
Collapse
Affiliation(s)
| | - Adrián Villastrigo
- Division of Entomology, SNSB-Zoologische Staatssammlung München, 81247 Munich, Germany;
| | - Juana María Mirón-Gatón
- Ecology and Hydrology Department, University of Murcia, 30100 Murcia, Spain; (A.J.G.-M.); (J.M.M.-G.); (A.M.)
| | - Andrés Millán
- Ecology and Hydrology Department, University of Murcia, 30100 Murcia, Spain; (A.J.G.-M.); (J.M.M.-G.); (A.M.)
| | - Josefa Velasco
- Ecology and Hydrology Department, University of Murcia, 30100 Murcia, Spain; (A.J.G.-M.); (J.M.M.-G.); (A.M.)
| | - Irene Muñoz
- Department of Biodiversity, Ecology and Evolution, Complutense University of Madrid, 28040 Madrid, Spain;
| |
Collapse
|
3
|
Wyngaard GA, Skern-Mauritzen R, Malde K, Prendergast R, Peruzzi S. The salmon louse genome may be much larger than sequencing suggests. Sci Rep 2022; 12:6616. [PMID: 35459797 PMCID: PMC9033869 DOI: 10.1038/s41598-022-10585-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 04/08/2022] [Indexed: 12/30/2022] Open
Abstract
The genome size of organisms impacts their evolution and biology and is often assumed to be characteristic of a species. Here we present the first published estimates of genome size of the ecologically and economically important ectoparasite, Lepeophtheirus salmonis (Copepoda, Caligidae). Four independent L. salmonis genome assemblies of the North Atlantic subspecies Lepeophtheirus salmonis salmonis, including two chromosome level assemblies, yield assemblies ranging from 665 to 790 Mbps. These genome assemblies are congruent in their findings, and appear very complete with Benchmarking Universal Single-Copy Orthologs analyses finding > 92% of expected genes and transcriptome datasets routinely mapping > 90% of reads. However, two cytometric techniques, flow cytometry and Feulgen image analysis densitometry, yield measurements of 1.3-1.6 Gb in the haploid genome. Interestingly, earlier cytometric measurements reported genome sizes of 939 and 567 Mbps in L. salmonis salmonis samples from Bay of Fundy and Norway, respectively. Available data thus suggest that the genome sizes of salmon lice are variable. Current understanding of eukaryotic genome dynamics suggests that the most likely explanation for such variability involves repetitive DNA, which for L. salmonis makes up ≈ 60% of the genome assemblies.
Collapse
Affiliation(s)
- Grace A Wyngaard
- Department of Biology, James Madison University, Harrisonburg, VA, USA
| | | | - Ketil Malde
- Institute of Marine Research, Bergen, Norway
- Department of Informatics, University of Bergen, Bergen, Norway
| | | | - Stefano Peruzzi
- Department of Arctic Marine Biology, UiT-the Arctic University of Norway, Tromsø, Norway.
| |
Collapse
|
4
|
Gwiazdowska A, Karpińska O, Kamionka-Kanclerska K, Rowiński P, Panagiotopoulou H, Pomorski JJ, Broughton RK, da Silva LFP, Rutkowski R. First microsatellite markers for the European Robin (Erithacus rubecula) and their application in analysis of parentage and genetic diversity. Sci Rep 2021; 11:18962. [PMID: 34556712 PMCID: PMC8460626 DOI: 10.1038/s41598-021-98364-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 09/06/2021] [Indexed: 11/21/2022] Open
Abstract
The European Robin is a small passerine bird associated with woodlands of Eurasia and North Africa. Despite being relatively widespread and common, little is known of the species’ breeding biology and genetic diversity. We used Next Generation Sequencing (NGS) to develop and characterize microsatellite markers for the European Robin, designing three multiplex panels to amplify 14 microsatellite loci. The level of polymorphism and its value for assessing parentage and genetic structure was estimated based on 119 individuals, including seven full families and 69 unrelated individuals form Poland’s Białowieża Primaeval Forest and an additional location in Portugal. All markers appeared to be highly variable. Analysis at the family level confirmed a Mendelian manner of inheritance in the investigated loci. Genetic data also revealed evidence for extra-pair paternity in one family. The set of markers that we developed are proven to be valuable for analysis of the breeding biology and population genetics of the European Robin.
Collapse
Affiliation(s)
- Aleksandra Gwiazdowska
- Museum and Institute of Zoology, Polish Academy of Sciences, Wilcza 64, 00-679, Warsaw, Poland
| | - Oliwia Karpińska
- Institute of Forest Sciences, Warsaw University of Life Sciences, Nowoursynowska 159, 02-776, Warsaw, Poland
| | | | - Patryk Rowiński
- Institute of Forest Sciences, Warsaw University of Life Sciences, Nowoursynowska 159, 02-776, Warsaw, Poland
| | - Hanna Panagiotopoulou
- Museum and Institute of Zoology, Polish Academy of Sciences, Wilcza 64, 00-679, Warsaw, Poland
| | - Jan J Pomorski
- Museum and Institute of Zoology, Polish Academy of Sciences, Wilcza 64, 00-679, Warsaw, Poland
| | - Richard K Broughton
- UK Centre for Ecology and Hydrology, Maclean Building, Benson Lane, Crowmarsh Gifford, Wallingford, OX10 8BB, UK
| | - Luis F P da Silva
- CBIO-InBIO Campus Agrário de Vairão Rua Padre Armando Quintas, nº7, 4485-661, Vila do Conde, Portugal
| | - Robert Rutkowski
- Museum and Institute of Zoology, Polish Academy of Sciences, Wilcza 64, 00-679, Warsaw, Poland.
| |
Collapse
|
5
|
DNA-nanopore technology: a human perspective. Emerg Top Life Sci 2021; 5:455-463. [PMID: 34282838 DOI: 10.1042/etls20200282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 06/09/2021] [Accepted: 06/11/2021] [Indexed: 11/17/2022]
Abstract
The purpose of this article is to give a brief overview of the current state of nanopore sequencing in relation to forensic science with a brief outline of where it stands in relation to current methods, its potential uses in forensic science and factors which may influence acceptance of this technology by forensic practitioners, the judiciary and law enforcement. Perhaps most importantly consideration is also given to concerns which may influence the acceptance of the technology by the general public.
Collapse
|
6
|
Development of microsatellite loci and optimization of a multiplex assay for Latibulus argiolus (Hymenoptera: Ichneumonidae), the specialized parasitoid of paper wasps. Sci Rep 2020; 10:16068. [PMID: 32999353 PMCID: PMC7527953 DOI: 10.1038/s41598-020-72923-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 09/09/2020] [Indexed: 11/09/2022] Open
Abstract
Microsatellite loci are commonly used markers in population genetic studies. In this study, we present 40 novel and polymorphic microsatellite loci elaborated for the ichneumonid parasitoid Latibulus argiolus (Rossi, 1790). Reaction condition optimisation procedures allowed 14 of these loci to be co-amplified in two PCRs and loaded in two multiplex panels onto a genetic analyser. The assay was tested on 197 individuals of L. argiolus originating from ten natural populations obtained from the host nests of paper wasps. The validated loci were polymorphic with high allele numbers ranging from eight to 27 (average 17.6 alleles per locus). Both observed and expected heterozygosity values were high, ranging between 0.75 and 0.92 for HO (mean 0.83) and from 0.70 to 0.90 for HE (mean 0.85). The optimized assay showed low genotyping error rate and negligible null allele frequency. The designed multiplex panels could be successfully applied in relatedness analyses and genetic variability studies of L. argiolus populations, which would be particularly interesting considering the coevolutionary context of this species with its social host.
Collapse
|
7
|
Vasiliskov VA, Shershov VE, Miftahov RA, Kuznetsova VE, Radko SP, Lisitsa AV, Lapa SA, Surzhikov SA, Timofeev EN, Zasedatelev AS, Chudinov AV. Slippage of the Primer Strand in the Primer Extension Reaction with Modified 2'-Deoxyuridine Triphosphates. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2020. [DOI: 10.1134/s106816202003022x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
8
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 159] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
9
|
Burkholder AB, Lujan SA, Lavender CA, Grimm SA, Kunkel TA, Fargo DC. Muver, a computational framework for accurately calling accumulated mutations. BMC Genomics 2018; 19:345. [PMID: 29743009 PMCID: PMC5944071 DOI: 10.1186/s12864-018-4753-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 05/02/2018] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Identification of mutations from next-generation sequencing data typically requires a balance between sensitivity and accuracy. This is particularly true of DNA insertions and deletions (indels), that can impart significant phenotypic consequences on cells but are harder to call than substitution mutations from whole genome mutation accumulation experiments. To overcome these difficulties, we present muver, a computational framework that integrates established bioinformatics tools with novel analytical methods to generate mutation calls with the extremely low false positive rates and high sensitivity required for accurate mutation rate determination and comparison. RESULTS Muver uses statistical comparison of ancestral and descendant allelic frequencies to identify variant loci and assigns genotypes with models that include per-sample assessments of sequencing errors by mutation type and repeat context. Muver identifies maximally parsimonious mutation pathways that connect these genotypes, differentiating potential allelic conversion events and delineating ambiguities in mutation location, type, and size. Benchmarking with a human gold standard father-son pair demonstrates muver's sensitivity and low false positive rates. In DNA mismatch repair (MMR) deficient Saccharomyces cerevisiae, muver detects multi-base deletions in homopolymers longer than the replicative polymerase footprint at rates greater than predicted for sequential single-base deletions, implying a novel multi-repeat-unit slippage mechanism. CONCLUSIONS Benchmarking results demonstrate the high accuracy and sensitivity achieved with muver, particularly for indels, relative to available tools. Applied to an MMR-deficient Saccharomyces cerevisiae system, muver mutation calls facilitate mechanistic insights into DNA replication fidelity.
Collapse
Affiliation(s)
- Adam B Burkholder
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA
| | - Scott A Lujan
- Laboratory of Genomic Integrity and Structural Biology, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA
| | - Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA
| | - Sara A Grimm
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA
| | - Thomas A Kunkel
- Laboratory of Genomic Integrity and Structural Biology, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA
| | - David C Fargo
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, Durham, NC, 27709, USA.
| |
Collapse
|