1
|
Benham PM, Cicero C, Escalona M, Beraut E, Fairbairn C, Marimuthu MPA, Nguyen O, Sahasrabudhe R, King BL, Thomas WK, Kovach AI, Nachman MW, Bowie RCK. Remarkably High Repeat Content in the Genomes of Sparrows: The Importance of Genome Assembly Completeness for Transposable Element Discovery. Genome Biol Evol 2024; 16:evae067. [PMID: 38566597 PMCID: PMC11088854 DOI: 10.1093/gbe/evae067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/01/2024] [Accepted: 03/23/2024] [Indexed: 04/04/2024] Open
Abstract
Transposable elements (TE) play critical roles in shaping genome evolution. Highly repetitive TE sequences are also a major source of assembly gaps making it difficult to fully understand the impact of these elements on host genomes. The increased capacity of long-read sequencing technologies to span highly repetitive regions promises to provide new insights into patterns of TE activity across diverse taxa. Here we report the generation of highly contiguous reference genomes using PacBio long-read and Omni-C technologies for three species of Passerellidae sparrow. We compared these assemblies to three chromosome-level sparrow assemblies and nine other sparrow assemblies generated using a variety of short- and long-read technologies. All long-read based assemblies were longer (range: 1.12 to 1.41 Gb) than short-read assemblies (0.91 to 1.08 Gb) and assembly length was strongly correlated with the amount of repeat content. Repeat content for Bell's sparrow (31.2% of genome) was the highest level ever reported within the order Passeriformes, which comprises over half of avian diversity. The highest levels of repeat content (79.2% to 93.7%) were found on the W chromosome relative to other regions of the genome. Finally, we show that proliferation of different TE classes varied even among species with similar levels of repeat content. These patterns support a dynamic model of TE expansion and contraction even in a clade where TEs were once thought to be fairly depauperate and static. Our work highlights how the resolution of difficult-to-assemble regions of the genome with new sequencing technologies promises to transform our understanding of avian genome evolution.
Collapse
Affiliation(s)
- Phred M Benham
- Museum of Vertebrate Zoology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Carla Cicero
- Museum of Vertebrate Zoology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Merly Escalona
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Eric Beraut
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Colin Fairbairn
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mohan P A Marimuthu
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California-Davis, Davis, CA 95616, USA
| | - Oanh Nguyen
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California-Davis, Davis, CA 95616, USA
| | - Ruta Sahasrabudhe
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California-Davis, Davis, CA 95616, USA
| | - Benjamin L King
- Department of Molecular and Biomedical Sciences, University of Maine, Orono, ME 04469, USA
| | - W Kelley Thomas
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH 03824, USA
| | - Adrienne I Kovach
- Department of Natural Resources and the Environment, University of New Hampshire, Durham, NH 03824, USA
| | - Michael W Nachman
- Museum of Vertebrate Zoology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Rauri C K Bowie
- Museum of Vertebrate Zoology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
2
|
Clark JD, Benham PM, Maldonado JE, Luther DA, Lim HC. Maintenance of local adaptation despite gene flow in a coastal songbird. Evolution 2022; 76:1481-1494. [PMID: 35700208 PMCID: PMC9545442 DOI: 10.1111/evo.14538] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 03/09/2022] [Accepted: 03/19/2022] [Indexed: 01/22/2023]
Abstract
Adaptation to local environments is common in widespread species and the basis of ecological speciation. The song sparrow (Melospiza melodia) is a widespread, polytypic passerine that occurs in shrubland habitats throughout North America. We examined the population structure of two parapatric subspecies that inhabit different environments: the Atlantic song sparrow (M. m. atlantica), a coastal specialist, and the eastern song sparrow (M. m. melodia), a shrubland generalist. These populations lacked clear mitochondrial population structure, yet coastal birds formed a distinct nuclear genetic cluster. We found weak overall genomic differentiation between these subspecies, suggesting either recent divergence, extensive gene flow, or a combination thereof. There was a steep genetic cline at the transition to coastal habitats, consistent with isolation by environment, not isolation by distance. A phenotype under divergent selection, bill size, varied with the amount of coastal ancestry in transitional areas, but larger bill size was maintained in coastal habitats regardless of ancestry, further supporting a role for selection in the maintenance of these subspecies. Demographic modeling suggested a divergence history of limited gene flow followed by secondary contact, which has emerged as a common theme in adaptive divergence across taxa.
Collapse
Affiliation(s)
- Jonathan D. Clark
- Department of Environmental Science and PolicyGeorge Mason UniversityFairfaxVirginia22030,Current Address: Department of Natural Resources and the EnvironmentUniversity of New HampshireDurhamNew Hampshire03824
| | - Phred M. Benham
- Museum of Vertebrate ZoologyUniversity of California, BerkeleyBerkeleyCalifornia94720
| | - Jesus E. Maldonado
- Department of Environmental Science and PolicyGeorge Mason UniversityFairfaxVirginia22030,Center for Conservation GenomicsSmithsonian Conservation Biology InstituteWashingtonD.C.20013
| | - David A. Luther
- Department of BiologyGeorge Mason UniversityFairfaxVirginia22030
| | - Haw Chuan Lim
- Center for Conservation GenomicsSmithsonian Conservation Biology InstituteWashingtonD.C.20013,Department of BiologyGeorge Mason UniversityFairfaxVirginia22030
| |
Collapse
|
3
|
Friis G, Vizueta J, Ketterson ED, Milá B. A high-quality genome assembly and annotation of the dark-eyed junco Junco hyemalis, a recently diversified songbird. G3 (BETHESDA, MD.) 2022; 12:jkac083. [PMID: 35404451 PMCID: PMC9157146 DOI: 10.1093/g3journal/jkac083] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 03/31/2022] [Indexed: 11/26/2022]
Abstract
The dark-eyed junco (Junco hyemalis) is one of the most common passerines of North America, and has served as a model organism in studies related to ecophysiology, behavior, and evolutionary biology for over a century. It is composed of at least 6 distinct, geographically structured forms of recent evolutionary origin, presenting remarkable variation in phenotypic traits, migratory behavior, and habitat. Here, we report a high-quality genome assembly and annotation of the dark-eyed junco generated using a combination of shotgun libraries and proximity ligation Chicago and Dovetail Hi-C libraries. The final assembly is ∼1.03 Gb in size, with 98.3% of the sequence located in 30 full or nearly full chromosome scaffolds, and with a N50/L50 of 71.3 Mb/5 scaffolds. We identified 19,026 functional genes combining gene prediction and similarity approaches, of which 15,967 were associated to GO terms. The genome assembly and the set of annotated genes yielded 95.4% and 96.2% completeness scores, respectively when compared with the BUSCO avian dataset. This new assembly for J. hyemalis provides a valuable resource for genome evolution analysis, and for identifying functional genes involved in adaptive processes and speciation.
Collapse
Affiliation(s)
- Guillermo Friis
- Department of Biodiversity and Evolutionary Biology, National Museum of Natural Sciences, Spanish National Research Council (CSIC), Madrid 28006, Spain
| | - Joel Vizueta
- Centre for Social Evolution, University of Copenhaguen, Copenhaguen 1165, Denmark
| | - Ellen D Ketterson
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Borja Milá
- Department of Biodiversity and Evolutionary Biology, National Museum of Natural Sciences, Spanish National Research Council (CSIC), Madrid 28006, Spain
| |
Collapse
|
4
|
Goubert C, Craig RJ, Bilat AF, Peona V, Vogan AA, Protasio AV. A beginner's guide to manual curation of transposable elements. Mob DNA 2022; 13:7. [PMID: 35354491 PMCID: PMC8969392 DOI: 10.1186/s13100-021-00259-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 12/17/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND In the study of transposable elements (TEs), the generation of a high confidence set of consensus sequences that represent the diversity of TEs found in a given genome is a key step in the path to investigate these fascinating genomic elements. Many algorithms and pipelines are available to automatically identify putative TE families present in a genome. Despite the availability of these valuable resources, producing a library of high-quality full-length TE consensus sequences largely remains a process of manual curation. This know-how is often passed on from mentor-to-mentee within research groups, making it difficult for those outside the field to access this highly specialised skill. RESULTS Our manuscript attempts to fill this gap by providing a set of detailed computer protocols, software recommendations and video tutorials for those aiming to manually curate TEs. Detailed step-by-step protocols, aimed at the complete beginner, are presented in the Supplementary Methods. CONCLUSIONS The proposed set of programs and tools presented here will make the process of manual curation achievable and amenable to all researchers and in special to those new to the field of TEs.
Collapse
Affiliation(s)
- Clement Goubert
- Canadian Center for Computational Genomics, McGill University, Montreal, Québec Canada
- Department of Human Genetics, McGill University, Montreal, Québec Canada
| | - Rory J. Craig
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, EH9 3FL UK
| | - Agustin F. Bilat
- Departamento de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Valentina Peona
- Department of Organismal Biology, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - Aaron A. Vogan
- Department of Organismal Biology, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - Anna V. Protasio
- Department of Pathology, Tennis Court Road, Cambridge, CB1 2PQ UK
- Christ’s College, St Andrews Street, Cambridge, CB2 3BU UK
| |
Collapse
|
5
|
Gamboa MP, Ghalambor CK, Scott Sillett T, Morrison SA, Chris Funk W. Adaptive divergence in bill morphology and other thermoregulatory traits is facilitated by restricted gene flow in song sparrows on the California Channel Islands. Mol Ecol 2021; 31:603-619. [PMID: 34704295 DOI: 10.1111/mec.16253] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 09/20/2021] [Accepted: 09/27/2021] [Indexed: 02/06/2023]
Abstract
Disentangling the effects of neutral and adaptive processes in maintaining phenotypic variation across environmental gradients is challenging in natural populations. Song sparrows (Melospiza melodia) on the California Channel Islands occupy a pronounced east-west climate gradient within a small spatial scale, providing a unique opportunity to examine the interaction of genetic isolation (reduced gene flow) and the environment (selection) in driving variation. We used reduced representation genomic libraries to infer the role of neutral processes (drift and restricted gene flow) and divergent selection in driving variation in thermoregulatory traits with an emphasis on the mechanisms that maintain bill divergence among islands. Analyses of 22,029 neutral SNPs confirm distinct population structure by island with restricted gene flow and relatively large effective population sizes, suggesting bill differences are probably not a product of genetic drift. Instead, we found strong support for local adaptation using 3294 SNPs in differentiation-based and environmental association analyses coupled with genome-wide association tests. Specifically, we identified several putatively adaptive and candidate loci in or near genes involved in bill development pathways (e.g., BMP, CaM, Wnt), confirming the highly complex and polygenic architecture underlying bill morphology. Furthermore, we found divergence in genes associated with other thermoregulatory traits (i.e., feather structure, plumage colour, and physiology). Collectively, these results suggest strong divergent selection across an island archipelago results in genomic changes in a suite of traits associated with climate adaptation over small spatial scales. Future research should move beyond studying univariate traits to better understand multidimensional responses to complex environmental conditions.
Collapse
Affiliation(s)
- Maybellene P Gamboa
- Department of Organismal Biology and Ecology, Colorado College, Colorado Springs, Colorado, USA
| | - Cameron K Ghalambor
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, Colorado, USA.,Department of Biology, Centre for Biodiversity Dynamics (CBD), Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - T Scott Sillett
- Migratory Bird Center, Smithsonian Conservation Biology Institute, National Zoological Park, Washington, District of Columbia, USA
| | | | - W Chris Funk
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, Colorado, USA
| |
Collapse
|
6
|
Boyd RJ, Denommé MR, Grieves LA, MacDougall-Shackleton EA. Stronger population differentiation at infection-sensing than infection-clearing innate immune loci in songbirds: Different selective regimes for different defenses. Evolution 2021; 75:2736-2746. [PMID: 34596241 DOI: 10.1111/evo.14368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 08/30/2021] [Accepted: 09/14/2021] [Indexed: 10/20/2022]
Abstract
Parasite-mediated selection is widespread at loci involved in immune defense, but different defenses may experience different selective regimes. For defenses involved in clearing infections, purifying selection favoring a single most efficacious allele likely predominates. However, for defenses involved in sensing and recognizing infections, evolutionary arms races may make positive selection particularly important. This could manifest primarily within populations (e.g., balancing selection maintaining variation) or among them (e.g., spatially varying selection enhancing population differences in allele frequencies). We genotyped three toll-like receptors (TLR; involved in sensing infections) and three avian beta-defensins (involved in clearing infections) in 96 song sparrows (Melospiza melodia) from three breeding populations that differ in disease resistance. Variation-based indicators of selection (proportion of variable sites, proportion of nonsynonymous SNPs, proportion of sites bearing signatures of positive or purifying selection, rare allele frequencies) did not differ appreciably between the two locus types. However, differentiation was generally higher at infection-sensing than infection-clearing loci. Allele frequencies differed markedly at TLR3, driven by a variant predicted to alter protein function. Geographically structured variants at infection-sensing loci may reflect local adaptation to spatially heterogeneous parasite communities. Selective regimes experienced by infection-sensing versus infection-clearing loci may differ primarily due to parasite-mediated population differentiation.
Collapse
Affiliation(s)
- Rachel J Boyd
- Department of Biology, University of Western Ontario, London, Ontario, N6A 5B7, Canada.,McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, 21205
| | - Melanie R Denommé
- Department of Biology, University of Western Ontario, London, Ontario, N6A 5B7, Canada.,Department of Biological Sciences, Brock University Faculty of Mathematics & Science, St. Catherines, Ontario, L2S 3A1, Canada
| | - Leanne A Grieves
- Department of Biology, University of Western Ontario, London, Ontario, N6A 5B7, Canada.,Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, Ontario, L8S 4M4, Canada
| | | |
Collapse
|
7
|
Recuerda M, Vizueta J, Cuevas-Caballé C, Blanco G, Rozas J, Milá B. Chromosome-Level Genome Assembly of the Common Chaffinch (Aves: Fringilla coelebs): A Valuable Resource for Evolutionary Biology. Genome Biol Evol 2021; 13:evab034. [PMID: 33616654 PMCID: PMC8046334 DOI: 10.1093/gbe/evab034] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/16/2021] [Indexed: 12/26/2022] Open
Abstract
The common chaffinch, Fringilla coelebs, is one of the most common, widespread, and well-studied passerines in Europe, with a broad distribution encompassing Western Europe and parts of Asia, North Africa, and the Macaronesian archipelagos. We present a high-quality genome assembly of the common chaffinch generated using Illumina shotgun sequencing in combination with Chicago and Hi-C libraries. The final genome is a 994.87-Mb chromosome-level assembly, with 98% of the sequence data located in chromosome scaffolds and a N50 statistic of 69.73 Mb. Our genome assembly shows high completeness, with a complete BUSCO score of 93.9% using the avian data set. Around 7.8% of the genome contains interspersed repetitive elements. The structural annotation yielded 17,703 genes, 86.5% of which have a functional annotation, including 7,827 complete universal single-copy orthologs out of 8,338 genes represented in the BUSCO avian data set. This new annotated genome assembly will be a valuable resource as a reference for comparative and population genomic analyses of passerine, avian, and vertebrate evolution.
Collapse
Affiliation(s)
- María Recuerda
- National Museum of Natural Sciences, Spanish National Research Council (CSIC), Madrid, Spain
| | - Joel Vizueta
- National Museum of Natural Sciences, Spanish National Research Council (CSIC), Madrid, Spain
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Cristian Cuevas-Caballé
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Guillermo Blanco
- National Museum of Natural Sciences, Spanish National Research Council (CSIC), Madrid, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Borja Milá
- National Museum of Natural Sciences, Spanish National Research Council (CSIC), Madrid, Spain
| |
Collapse
|
8
|
Wiley G, Miller MJ. A Highly Contiguous Genome for the Golden-Fronted Woodpecker ( Melanerpes aurifrons) via Hybrid Oxford Nanopore and Short Read Assembly. G3 (BETHESDA, MD.) 2020; 10:1829-1836. [PMID: 32317270 PMCID: PMC7263694 DOI: 10.1534/g3.120.401059] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/17/2020] [Indexed: 12/31/2022]
Abstract
Woodpeckers are found in nearly every part of the world and have been important for studies of biogeography, phylogeography, and macroecology. Woodpecker hybrid zones are often studied to understand the dynamics of introgression between bird species. Notably, woodpeckers are gaining attention for their enriched levels of transposable elements (TEs) relative to most other birds. This enrichment of TEs may have substantial effects on molecular evolution. However, comparative studies of woodpecker genomes are hindered by the fact that no high-contiguity genome exists for any woodpecker species. Using hybrid assembly methods combining long-read Oxford Nanopore and short-read Illumina sequencing data, we generated a highly contiguous genome assembly for the Golden-fronted Woodpecker (Melanerpes aurifrons). The final assembly is 1.31 Gb and comprises 441 contigs plus a full mitochondrial genome. Half of the assembly is represented by 28 contigs (contig L50), each of these contigs is at least 16 Mb in size (contig N50). High recovery (92.6%) of bird-specific BUSCO genes suggests our assembly is both relatively complete and relatively accurate. Over a quarter (25.8%) of the genome consists of repetitive elements, with 287 Mb (21.9%) of those elements assignable to the CR1 superfamily of transposable elements, the highest proportion of CR1 repeats reported for any bird genome to date. Our assembly should improve comparative studies of molecular evolution and genomics in woodpeckers and allies. Additionally, the sequencing and bioinformatic resources used to generate this assembly were relatively low-cost and should provide a direction for development of high-quality genomes for studies of animal biodiversity.
Collapse
Affiliation(s)
- Graham Wiley
- Clinical Genomics Center, Oklahoma Medical Research Foundation, Oklahoma City, Oklahoma and
| | - Matthew J Miller
- Sam Noble Oklahoma Museum of Natural History and Department of Biology, University of Oklahoma, Norman, Oklahoma
| |
Collapse
|