1
|
Mira-Jover A, Graciá E, Giménez A, Fritz U, Rodríguez-Caro RC, Bourgeois Y. Taking advantage of reference-guided assembly in a slowly-evolving lineage: Application to Testudo graeca. PLoS One 2024; 19:e0303408. [PMID: 39121089 PMCID: PMC11315351 DOI: 10.1371/journal.pone.0303408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 07/22/2024] [Indexed: 08/11/2024] Open
Abstract
BACKGROUND Obtaining de novo chromosome-level genome assemblies greatly enhances conservation and evolutionary biology studies. For many research teams, long-read sequencing technologies (that produce highly contiguous assemblies) remain unaffordable or unpractical. For the groups that display high synteny conservation, these limitations can be overcome by a reference-guided assembly using a close relative genome. Among chelonians, tortoises (Testudinidae) are considered one of the most endangered taxa, which calls for more genomic resources. Here we make the most of high synteny conservation in chelonians to produce the first chromosome-level genome assembly of the genus Testudo with one of the most iconic tortoise species in the Mediterranean basin: Testudo graeca. RESULTS We used high-quality, paired-end Illumina sequences to build a reference-guided assembly with the chromosome-level reference of Gopherus evgoodei. We reconstructed a 2.29 Gb haploid genome with a scaffold N50 of 107.598 Mb and 5.37% gaps. We sequenced 25,998 protein-coding genes, and identified 41.2% of the assembly as repeats. Demographic history reconstruction based on the genome revealed two events (population decline and recovery) that were consistent with previously suggested phylogeographic patterns for the species. This outlines the value of such reference-guided assemblies for phylogeographic studies. CONCLUSIONS Our results highlight the value of using close relatives to produce de novo draft assemblies in species where such resources are unavailable. Our annotated genome of T. graeca paves the way to delve deeper into the species' evolutionary history and provides a valuable resource to enhance direct conservation efforts on their threatened populations.
Collapse
Affiliation(s)
- Andrea Mira-Jover
- Ecology Area, University Institute for Agro-food and Agro-environmental Research and Innovation (CIAGRO), Miguel Hernández University, Elche, Carretera de Beniel, Orihuela (Alicante), Spain
| | - Eva Graciá
- Ecology Area, University Institute for Agro-food and Agro-environmental Research and Innovation (CIAGRO), Miguel Hernández University, Elche, Carretera de Beniel, Orihuela (Alicante), Spain
| | - Andrés Giménez
- Ecology Area, University Institute for Agro-food and Agro-environmental Research and Innovation (CIAGRO), Miguel Hernández University, Elche, Carretera de Beniel, Orihuela (Alicante), Spain
| | - Uwe Fritz
- Museum of Zoology, Senckenberg Dresden, Dresden, Germany
| | | | | |
Collapse
|
2
|
Marcionetti A, Bertrand JAM, Cortesi F, Donati GFA, Heim S, Huyghe F, Kochzius M, Pellissier L, Salamin N. Recurrent gene flow events occurred during the diversification of clownfishes of the skunk complex. Mol Ecol 2024; 33:e17347. [PMID: 38624248 DOI: 10.1111/mec.17347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/15/2024] [Accepted: 03/26/2024] [Indexed: 04/17/2024]
Abstract
Clownfish (subfamily Amphiprioninae) are an iconic group of coral reef fish that evolved a mutualistic interaction with sea anemones, which triggered the adaptive radiation of the clade. Within clownfishes, the "skunk complex" is particularly interesting. Besides ecological speciation, interspecific gene flow and hybrid speciation are thought to have shaped the evolution of the group. We investigated the mechanisms characterizing the diversification of this complex. By taking advantage of their disjunct geographical distribution, we obtained whole-genome data of sympatric and allopatric populations of the three main species of the complex (Amphiprion akallopisos, A. perideraion and A. sandaracinos). We examined population structure, genomic divergence and introgression signals and performed demographic modelling to identify the most realistic diversification scenario. We excluded scenarios of strict isolation or hybrid origin of A. sandaracinos. We discovered moderate gene flow from A. perideraion to the ancestor of A. akallopisos + A. sandaracinos and weak gene flow between the species in the Indo-Australian Archipelago throughout the diversification of the group. We identified introgressed regions in A. sandaracinos and detected in A. perideraion two large regions of high divergence from the two other species. While we found that gene flow has occurred throughout the species' diversification, we also observed that recent admixture was less pervasive than initially thought, suggesting a role of host repartition or behavioural barriers in maintaining the genetic identity of the species in sympatry.
Collapse
Affiliation(s)
- Anna Marcionetti
- Department of Computational Biology, Génopode, University of Lausanne, Lausanne, Switzerland
| | - Joris A M Bertrand
- Department of Computational Biology, Génopode, University of Lausanne, Lausanne, Switzerland
- Laboratoire Génome et Développement Des Plantes (UMR 5096 UPVD/CNRS), University of Perpignan via Domitia, Perpignan, France
| | - Fabio Cortesi
- School of the Environment and Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
| | - Giulia F A Donati
- EAWAG Swiss Federal Institute of Aquatic Science & Technology, Dübendorf, Switzerland
- Swiss Federal Institute for Forest, Snow and Landscape Research WSL, Birmensdorf, Switzerland
| | - Sara Heim
- Department of Computational Biology, Génopode, University of Lausanne, Lausanne, Switzerland
| | - Filip Huyghe
- Marine Biology - Ecology, Evolution and Genetics, Vrije Universiteit Brussel (VUB), Pleinlaan 2, Brussels, Belgium
| | - Marc Kochzius
- Marine Biology - Ecology, Evolution and Genetics, Vrije Universiteit Brussel (VUB), Pleinlaan 2, Brussels, Belgium
| | - Loïc Pellissier
- Swiss Federal Institute for Forest, Snow and Landscape Research WSL, Birmensdorf, Switzerland
- Ecosystems and Landscape Evolution, Department of Environmental System Science, Institute of Terrestrial Ecosystems, ETH Zürich, Zurich, Switzerland
| | - Nicolas Salamin
- Department of Computational Biology, Génopode, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
3
|
Rick JA, Brock CD, Lewanski AL, Golcher-Benavides J, Wagner CE. Reference Genome Choice and Filtering Thresholds Jointly Influence Phylogenomic Analyses. Syst Biol 2024; 73:76-101. [PMID: 37881861 DOI: 10.1093/sysbio/syad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 09/20/2023] [Accepted: 10/20/2023] [Indexed: 10/27/2023] Open
Abstract
Molecular phylogenies are a cornerstone of modern comparative biology and are commonly employed to investigate a range of biological phenomena, such as diversification rates, patterns in trait evolution, biogeography, and community assembly. Recent work has demonstrated that significant biases may be introduced into downstream phylogenetic analyses from processing genomic data; however, it remains unclear whether there are interactions among bioinformatic parameters or biases introduced through the choice of reference genome for sequence alignment and variant calling. We address these knowledge gaps by employing a combination of simulated and empirical data sets to investigate the extent to which the choice of reference genome in upstream bioinformatic processing of genomic data influences phylogenetic inference, as well as the way that reference genome choice interacts with bioinformatic filtering choices and phylogenetic inference method. We demonstrate that more stringent minor allele filters bias inferred trees away from the true species tree topology, and that these biased trees tend to be more imbalanced and have a higher center of gravity than the true trees. We find the greatest topological accuracy when filtering sites for minor allele count (MAC) >3-4 in our 51-taxa data sets, while tree center of gravity was closest to the true value when filtering for sites with MAC >1-2. In contrast, filtering for missing data increased accuracy in the inferred topologies; however, this effect was small in comparison to the effect of minor allele filters and may be undesirable due to a subsequent mutation spectrum distortion. The bias introduced by these filters differs based on the reference genome used in short read alignment, providing further support that choosing a reference genome for alignment is an important bioinformatic decision with implications for downstream analyses. These results demonstrate that attributes of the study system and dataset (and their interaction) add important nuance for how best to assemble and filter short-read genomic data for phylogenetic inference.
Collapse
Affiliation(s)
- Jessica A Rick
- School of Natural Resources & the Environment, University of Arizona, Tucson, AZ 85719, USA
| | - Chad D Brock
- Department of Biological Sciences, Tarleton State University, Stephenville, TX 76401, USA
| | - Alexander L Lewanski
- Department of Integrative Biology and W.K. Kellogg Biological Station, Michigan State University, East Lansing, MI 48824, USA
| | - Jimena Golcher-Benavides
- Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA 50011, USA
| | - Catherine E Wagner
- Program in Ecology and Evolution, University of Wyoming, Laramie, WY 82071, USA
- Department of Botany, University of Wyoming, Laramie, WY 82071, USA
| |
Collapse
|
4
|
Sopniewski J, Catullo RA. Estimates of heterozygosity from single nucleotide polymorphism markers are context-dependent and often wrong. Mol Ecol Resour 2024; 24:e13947. [PMID: 38433491 DOI: 10.1111/1755-0998.13947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 02/18/2024] [Accepted: 02/21/2024] [Indexed: 03/05/2024]
Abstract
Genetic diversity is frequently described using heterozygosity, particularly in a conservation context. Often, it is estimated using single nucleotide polymorphisms (SNPs); however, it has been shown that heterozygosity values calculated from SNPs can be biased by both study design and filtering parameters. Though solutions have been proposed to address these issues, our own work has found them to be inadequate in some circumstances. Here, we aimed to improve the reliability and comparability of heterozygosity estimates, specifically by investigating how sample size and missing data thresholds influenced the calculation of autosomal heterozygosity (heterozygosity calculated from across the genome, i.e. fixed and variable sites). We also explored how the standard practice of tri- and tetra-allelic site exclusion could bias heterozygosity estimates and influence eventual conclusions relating to genetic diversity. Across three distinct taxa (a frog, Litoria rubella; a tree, Eucalyptus microcarpa; and a grasshopper, Keyacris scurra), we found heterozygosity estimates to be meaningfully affected by sample size and missing data thresholds, partly due to the exclusion of tri- and tetra-allelic sites. These biases were inconsistent both between species and populations, with more diverse populations tending to have their estimates more severely affected, thus having potential to dramatically alter interpretations of genetic diversity. We propose a modified framework for calculating heterozygosity that reduces bias and improves the utility of heterozygosity as a measure of genetic diversity, whilst also highlighting the need for existing population genetic pipelines to be adjusted such that tri- and tetra-allelic sites be included in calculations.
Collapse
Affiliation(s)
- Jarrod Sopniewski
- School of Biological Sciences, University of Western Australia, Crawley, Western Australia, Australia
| | - Renee A Catullo
- School of Biological Sciences, University of Western Australia, Crawley, Western Australia, Australia
| |
Collapse
|
5
|
Schmidt TL, Thia JA, Hoffmann AA. How Can Genomics Help or Hinder Wildlife Conservation? Annu Rev Anim Biosci 2024; 12:45-68. [PMID: 37788416 DOI: 10.1146/annurev-animal-021022-051810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Genomic data are becoming increasingly affordable and easy to collect, and new tools for their analysis are appearing rapidly. Conservation biologists are interested in using this information to assist in management and planning but are typically limited financially and by the lack of genomic resources available for non-model taxa. It is therefore important to be aware of the pitfalls as well as the benefits of applying genomic approaches. Here, we highlight recent methods aimed at standardizing population assessments of genetic variation, inbreeding, and forms of genetic load and methods that help identify past and ongoing patterns of genetic interchange between populations, including those subjected to recent disturbance. We emphasize challenges in applying some of these methods and the need for adequate bioinformatic support. We also consider the promises and challenges of applying genomic approaches to understand adaptive changes in natural populations to predict their future adaptive capacity.
Collapse
Affiliation(s)
- Thomas L Schmidt
- School of BioSciences, Bio21 Institute, University of Melbourne, Parkville, Victoria, Australia;
| | - Joshua A Thia
- School of BioSciences, Bio21 Institute, University of Melbourne, Parkville, Victoria, Australia;
| | - Ary A Hoffmann
- School of BioSciences, Bio21 Institute, University of Melbourne, Parkville, Victoria, Australia;
| |
Collapse
|
6
|
Deng X, Frandsen PB, Dikow RB, Favre A, Shah DN, Shah RDT, Schneider JV, Heckenhauer J, Pauls SU. The impact of sequencing depth and relatedness of the reference genome in population genomic studies: A case study with two caddisfly species (Trichoptera, Rhyacophilidae, Himalopsyche). Ecol Evol 2022; 12:e9583. [PMID: 36523526 PMCID: PMC9745013 DOI: 10.1002/ece3.9583] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 11/10/2022] [Accepted: 11/16/2022] [Indexed: 12/15/2022] Open
Abstract
Whole genome sequencing for generating SNP data is increasingly used in population genetic studies. However, obtaining genomes for massive numbers of samples is still not within the budgets of many researchers. It is thus imperative to select an appropriate reference genome and sequencing depth to ensure the accuracy of the results for a specific research question, while balancing cost and feasibility. To evaluate the effect of the choice of the reference genome and sequencing depth on downstream analyses, we used five confamilial reference genomes of variable relatedness and three levels of sequencing depth (3.5×, 7.5× and 12×) in a population genomic study on two caddisfly species: Himalopsyche digitata and H. tibetana. Using these 30 datasets (five reference genomes × three depths × two target species), we estimated population genetic indices (inbreeding coefficient, nucleotide diversity, pairwise F ST, and genome-wide distribution of F ST) based on variants and population structure (PCA and admixture) based on genotype likelihood estimates. The results showed that both distantly related reference genomes and lower sequencing depth lead to degradation of resolution. In addition, choosing a more closely related reference genome may significantly remedy the defects caused by low depth. Therefore, we conclude that population genetic studies would benefit from closely related reference genomes, especially as the costs of obtaining a high-quality reference genome continue to decrease. However, to determine a cost-efficient strategy for a specific population genomic study, a trade-off between reference genome relatedness and sequencing depth can be considered.
Collapse
Affiliation(s)
- Xi‐Ling Deng
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Institute of Insect BiotechnologyJustus‐Liebig‐University GießenGießenGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| | - Paul B. Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
- Department of Plant & Wildlife SciencesBrigham Young UniversityProvoUtahUSA
- Data Science Lab, Office of the Chief Information OfficerSmithsonian InstitutionWashingtonDCUSA
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information OfficerSmithsonian InstitutionWashingtonDCUSA
| | - Adrien Favre
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Regional Nature Park of the Trient ValleySalvanSwitzerland
| | - Deep Narayan Shah
- Central Department of Environmental ScienceTribhuvan UniversityKirtipurNepal
| | - Ram Devi Tachamo Shah
- Aquatic Ecology Centre, School of ScienceKathmandu UniversityDhulikhelNepal
- Department of Life SciencesSchool of Science, Kathmandu UniversityDhulikhelNepal
| | - Julio V. Schneider
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
| | - Jacqueline Heckenhauer
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| | - Steffen U. Pauls
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Institute of Insect BiotechnologyJustus‐Liebig‐University GießenGießenGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| |
Collapse
|
7
|
Prasad A, Lorenzen ED, Westbury MV. Evaluating the role of reference-genome phylogenetic distance on evolutionary inference. Mol Ecol Resour 2021; 22:45-55. [PMID: 34176238 DOI: 10.1111/1755-0998.13457] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 05/26/2021] [Accepted: 06/23/2021] [Indexed: 12/15/2022]
Abstract
When a high-quality genome assembly of a target species is unavailable, an option to avoid the costly de novo assembly process is a mapping-based assembly. However, mapping shotgun data to a distant relative may lead to biased or erroneous evolutionary inference. Here, we used short-read data from a mammal (beluga whale) and a bird species (rowi kiwi) to evaluate whether reference genome phylogenetic distance can impact downstream demographic (Pairwise Sequentially Markovian Coalescent) and genetic diversity (heterozygosity, runs of homozygosity) analyses. We mapped to assemblies of species of varying phylogenetic distance (from conspecific to genome-wide divergence of >7%), and de novo assemblies created using cross-species scaffolding. We show that while reference genome phylogenetic distance has an impact on demographic analyses, it is not pronounced until using a reference genome with >3% divergence from the target species. When mapping to cross-species scaffolded assemblies, we are unable to replicate the original beluga demographic results, but are able with the rowi kiwi, presumably reflecting the more fragmented nature of the beluga assemblies. We find that increased phylogenetic distance has a pronounced impact on genetic diversity estimates; heterozygosity estimates deviate incrementally with increasing phylogenetic distance. Moreover, runs of homozygosity are largely undetectable when mapping to any nonconspecific assembly. However, these biases can be reduced when mapping to a cross-species scaffolded assembly. Taken together, our results show that caution should be exercised when selecting reference genomes. Cross-species scaffolding may offer a way to avoid a costly, traditional de novo assembly, while still producing robust, evolutionary inference.
Collapse
Affiliation(s)
- Aparna Prasad
- GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | | | | |
Collapse
|