1
|
Baldwin-Brown JG, Villa SM, Vickrey AI, Johnson KP, Bush SE, Clayton DH, Shapiro MD. The assembled and annotated genome of the pigeon louse Columbicola columbae, a model ectoparasite. G3 (Bethesda) 2021; 11:jkab009. [PMID: 33604673 PMCID: PMC8022949 DOI: 10.1093/g3journal/jkab009] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 12/13/2020] [Indexed: 01/01/2023]
Abstract
The pigeon louse Columbicola columbae is a longstanding and important model for studies of ectoparasitism and host-parasite coevolution. However, a deeper understanding of its evolution and capacity for rapid adaptation is limited by a lack of genomic resources. Here, we present a high-quality draft assembly of the C. columbae genome, produced using a combination of Oxford Nanopore, Illumina, and Hi-C technologies. The final assembly is 208 Mb in length, with 12 chromosome-size scaffolds representing 98.1% of the assembly. For gene model prediction, we used a novel clustering method (wavy_choose) for Oxford Nanopore RNA-seq reads to feed into the MAKER annotation pipeline. High recovery of conserved single-copy orthologs (BUSCOs) suggests that our assembly and annotation are both highly complete and highly accurate. Consistent with the results of the only other assembled louse genome, Pediculus humanus, we find that C. columbae has a relatively low density of repetitive elements, the majority of which are DNA transposons. Also similar to P. humanus, we find a reduced number of genes encoding opsins, G protein-coupled receptors, odorant receptors, insulin signaling pathway components, and detoxification proteins in the C. columbae genome, relative to other insects. We propose that such losses might characterize the genomes of obligate, permanent ectoparasites with predictable habitats, limited foraging complexity, and simple dietary regimes. The sequencing and analysis for this genome were relatively low cost, and took advantage of a new clustering technique for Oxford Nanopore RNAseq reads that will be useful to future genome projects.
Collapse
Affiliation(s)
| | - Scott M Villa
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
- Department of Biology, O. Wayne Rollins Research Center, Emory University, Atlanta, GA 30322, USA
| | - Anna I Vickrey
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Kevin P Johnson
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL 61820, USA
| | - Sarah E Bush
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Dale H Clayton
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Michael D Shapiro
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| |
Collapse
|
2
|
Abstract
Vernal pools are unique in their isolation and the strong selection acting on their resident species. Vernal pool clam shrimp (Eulimnadia texana) are a promising model due to ease of culturing, short generation time, small genomes, and obligate desiccated diapaused eggs. Clam shrimp are also androdioecious (sexes include males and hermaphrodites), and here we use population-scaled recombination rates to support the hypothesis that the heterogametic sex is recombination free in these shrimp. We collected short-read sequence data from pooled samples from different vernal pools to gain insights into local adaptation. We identify genomic regions in which some populations have allele frequencies that differ significantly from the metapopulation. BayPass (Gautier M. 2015. Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics 201(4):1555-1579.) detected 19 such genomic regions showing an excess of population subdivision. These regions on average are 550 bp in size and had 2.5 genes within 5 kb of them. Genes located near these regions are involved in Malpighian tubule function and osmoregulation, an essential function in vernal pools. It is likely that salinity profiles vary between pools and over time, and variants at these genes are adapted to local salinity conditions.
Collapse
Affiliation(s)
| | - Anthony D Long
- Department of Ecology and Evolutionary Biology, University of California Irvine
| |
Collapse
|
3
|
Apitanyasai K, Huang SW, Ng TH, He ST, Huang YH, Chiu SP, Tseng KC, Lin SS, Chang WC, Baldwin-Brown JG, Long AD, Lo CF, Yu HT, Wang HC. The gene structure and hypervariability of the complete Penaeus monodon Dscam gene. Sci Rep 2019; 9:16595. [PMID: 31719551 PMCID: PMC6851185 DOI: 10.1038/s41598-019-52656-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 10/17/2019] [Indexed: 12/19/2022] Open
Abstract
Using two advanced sequencing approaches, Illumina and PacBio, we derive the entire Dscam gene from an M2 assembly of the complete Penaeus monodon genome. The P. monodon Dscam (PmDscam) gene is ~266 kbp, with a total of 44 exons, 5 of which are subject to alternative splicing. PmDscam has a conserved architectural structure consisting of an extracellular region with hypervariable Ig domains, a transmembrane domain, and a cytoplasmic tail. We show that, contrary to a previous report, there are in fact 26, 81 and 26 alternative exons in N-terminal Ig2, N-terminal Ig3 and the entirety of Ig7, respectively. We also identified two alternatively spliced exons in the cytoplasmic tail, with transmembrane domains in exon variants 32.1 and 32.2, and stop codons in exon variants 44.1 and 44.2. This means that alternative splicing is involved in the selection of the stop codon. There are also 7 non-constitutive cytoplasmic tail exons that can either be included or skipped. Alternative splicing and the non-constitutive exons together produce more than 21 million isoform combinations from one PmDscam locus in the P. monodon gene. A public-facing database that allows BLAST searches of all 175 exons in the PmDscam gene has been established at http://pmdscam.dbbs.ncku.edu.tw/.
Collapse
Affiliation(s)
- Kantamas Apitanyasai
- Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, Taiwan.,International Center for the Scientific Development of Shrimp Aquaculture, National Cheng Kung University, Tainan, Taiwan
| | - Shiao-Wei Huang
- Department of Life Sciences, National Taiwan University, Taipei, Taiwan
| | - Tze Hann Ng
- Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, Taiwan.,International Center for the Scientific Development of Shrimp Aquaculture, National Cheng Kung University, Tainan, Taiwan
| | - Shu-Ting He
- Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, Taiwan.,International Center for the Scientific Development of Shrimp Aquaculture, National Cheng Kung University, Tainan, Taiwan
| | - Yu-Hsun Huang
- Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, Taiwan.,International Center for the Scientific Development of Shrimp Aquaculture, National Cheng Kung University, Tainan, Taiwan
| | - Shen-Po Chiu
- Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, Taiwan
| | - Kuan-Chien Tseng
- Department of Life Sciences, National Cheng Kung University, Tainan, Taiwan
| | - Shih-Shun Lin
- Institute of Biotechnology, National Taiwan University, Taipei, Taiwan
| | - Wen-Chi Chang
- Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan, Taiwan
| | - James G Baldwin-Brown
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, California, USA
| | - Anthony D Long
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, California, USA
| | - Chu-Fang Lo
- Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, Taiwan.,International Center for the Scientific Development of Shrimp Aquaculture, National Cheng Kung University, Tainan, Taiwan
| | - Hon-Tsen Yu
- Department of Life Sciences, National Taiwan University, Taipei, Taiwan
| | - Han-Ching Wang
- Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, Taiwan. .,International Center for the Scientific Development of Shrimp Aquaculture, National Cheng Kung University, Tainan, Taiwan.
| |
Collapse
|
4
|
Baldwin-Brown JG, Weeks SC, Long AD. A New Standard for Crustacean Genomes: The Highly Contiguous, Annotated Genome Assembly of the Clam Shrimp Eulimnadia texana Reveals HOX Gene Order and Identifies the Sex Chromosome. Genome Biol Evol 2018; 10:143-156. [PMID: 29294012 PMCID: PMC5765565 DOI: 10.1093/gbe/evx280] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/23/2017] [Indexed: 02/06/2023] Open
Abstract
Vernal pool clam shrimp (Eulimnadia texana) are a promising model system due to their ease of lab culture, short generation time, modest sized genome, a somewhat rare stable androdioecious sex determination system, and a requirement to reproduce via desiccated diapaused eggs. We generated a highly contiguous genome assembly using 46× of PacBio long read data and 216× of Illumina short reads, and annotated using Illumina RNAseq obtained from adult males or hermaphrodites. Of the 120 Mb genome 85% is contained in the largest eight contigs, the smallest of which is 4.6 Mb. The assembly contains 98% of transcripts predicted via RNAseq. This assembly is qualitatively different from scaffolded Illumina assemblies: It is produced from long reads that contain sequence data along their entire length, and is thus gap free. The contiguity of the assembly allows us to order the HOX genes within the genome, identifying two loci that contain HOX gene orthologs, and which approximately maintain the order observed in other arthropods. We identified a partial duplication of the Antennapedia complex adjacent to the few genes homologous to the Bithorax locus. Because the sex chromosome of an androdioecious species is of special interest, we used existing allozyme and microsatellite markers to identify the E. texana sex chromosome, and find that it comprises nearly half of the genome of this species. Linkage patterns indicate that recombination is extremely rare and perhaps absent in hermaphrodites, and as a result the location of the sex determining locus will be difficult to refine using recombination mapping.
Collapse
Affiliation(s)
| | | | - Anthony D Long
- Department of Ecology and Evolutionary Biology, University of California Irvine
| |
Collapse
|
5
|
Chakraborty M, Baldwin-Brown JG, Long AD, Emerson JJ. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res 2016; 44:e147. [PMID: 27458204 PMCID: PMC5100563 DOI: 10.1093/nar/gkw654] [Citation(s) in RCA: 185] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 07/09/2016] [Indexed: 01/19/2023] Open
Abstract
Genome assemblies that are accurate, complete and contiguous are essential for identifying important structural and functional elements of genomes and for identifying genetic variation. Nevertheless, most recent genome assemblies remain incomplete and fragmented. While long molecule sequencing promises to deliver more complete genome assemblies with fewer gaps, concerns about error rates, low yields, stringent DNA requirements and uncertainty about best practices may discourage many investigators from adopting this technology. Here, in conjunction with the platinum standard Drosophila melanogaster reference genome, we analyze recently published long molecule sequencing data to identify what governs completeness and contiguity of genome assemblies. We also present a hybrid meta-assembly approach that achieves remarkable assembly contiguity for both Drosophila and human assemblies with only modest long molecule sequencing coverage. Our results motivate a set of preliminary best practices for obtaining accurate and contiguous assemblies, a ‘missing manual’ that guides key decisions in building high quality de novo genome assemblies, from DNA isolation to polishing the assembly.
Collapse
Affiliation(s)
- Mahul Chakraborty
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA 92697, USA
| | - James G Baldwin-Brown
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA 92697, USA
| | - Anthony D Long
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA 92697, USA.,Center for Complex Biological Systems, University of California Irvine, Irvine, CA 92697, USA
| | - J J Emerson
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA 92697, USA .,Center for Complex Biological Systems, University of California Irvine, Irvine, CA 92697, USA
| |
Collapse
|
6
|
Baldwin-Brown JG, Long AD, Thornton KR. The power to detect quantitative trait loci using resequenced, experimentally evolved populations of diploid, sexual organisms. Mol Biol Evol 2014; 31:1040-55. [PMID: 24441104 PMCID: PMC3969567 DOI: 10.1093/molbev/msu048] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
A novel approach for dissecting complex traits is to experimentally evolve laboratory populations under a controlled environment shift, resequence the resulting populations, and identify single nucleotide polymorphisms (SNPs) and/or genomic regions highly diverged in allele frequency. To better understand the power and localization ability of such an evolve and resequence (E&R) approach, we carried out forward-in-time population genetics simulations of 1 Mb genomic regions under a large combination of experimental conditions, then attempted to detect significantly diverged SNPs. Our analysis indicates that the ability to detect differentiation between populations is primarily affected by selection coefficient, population size, number of replicate populations, and number of founding haplotypes. We estimate that E&R studies can detect and localize causative sites with 80% success or greater when the number of founder haplotypes is over 500, experimental populations are replicated at least 25-fold, population size is at least 1,000 diploid individuals, and the selection coefficient on the locus of interest is at least 0.1. More achievable experimental designs (less replicated, fewer founder haplotypes, smaller effective population size, and smaller selection coefficients) can have power of greater than 50% to identify a handful of SNPs of which one is likely causative. Similarly, in cases where s ≥ 0.2, less demanding experimental designs can yield high power.
Collapse
|