1
|
Deng X, Frandsen PB, Dikow RB, Favre A, Shah DN, Shah RDT, Schneider JV, Heckenhauer J, Pauls SU. The impact of sequencing depth and relatedness of the reference genome in population genomic studies: A case study with two caddisfly species (Trichoptera, Rhyacophilidae, Himalopsyche). Ecol Evol 2022; 12:e9583. [PMID: 36523526 PMCID: PMC9745013 DOI: 10.1002/ece3.9583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 11/10/2022] [Accepted: 11/16/2022] [Indexed: 12/15/2022] Open
Abstract
Whole genome sequencing for generating SNP data is increasingly used in population genetic studies. However, obtaining genomes for massive numbers of samples is still not within the budgets of many researchers. It is thus imperative to select an appropriate reference genome and sequencing depth to ensure the accuracy of the results for a specific research question, while balancing cost and feasibility. To evaluate the effect of the choice of the reference genome and sequencing depth on downstream analyses, we used five confamilial reference genomes of variable relatedness and three levels of sequencing depth (3.5×, 7.5× and 12×) in a population genomic study on two caddisfly species: Himalopsyche digitata and H. tibetana. Using these 30 datasets (five reference genomes × three depths × two target species), we estimated population genetic indices (inbreeding coefficient, nucleotide diversity, pairwise F ST, and genome-wide distribution of F ST) based on variants and population structure (PCA and admixture) based on genotype likelihood estimates. The results showed that both distantly related reference genomes and lower sequencing depth lead to degradation of resolution. In addition, choosing a more closely related reference genome may significantly remedy the defects caused by low depth. Therefore, we conclude that population genetic studies would benefit from closely related reference genomes, especially as the costs of obtaining a high-quality reference genome continue to decrease. However, to determine a cost-efficient strategy for a specific population genomic study, a trade-off between reference genome relatedness and sequencing depth can be considered.
Collapse
Affiliation(s)
- Xi‐Ling Deng
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Institute of Insect BiotechnologyJustus‐Liebig‐University GießenGießenGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| | - Paul B. Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
- Department of Plant & Wildlife SciencesBrigham Young UniversityProvoUtahUSA
- Data Science Lab, Office of the Chief Information OfficerSmithsonian InstitutionWashingtonDCUSA
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information OfficerSmithsonian InstitutionWashingtonDCUSA
| | - Adrien Favre
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Regional Nature Park of the Trient ValleySalvanSwitzerland
| | - Deep Narayan Shah
- Central Department of Environmental ScienceTribhuvan UniversityKirtipurNepal
| | - Ram Devi Tachamo Shah
- Aquatic Ecology Centre, School of ScienceKathmandu UniversityDhulikhelNepal
- Department of Life SciencesSchool of Science, Kathmandu UniversityDhulikhelNepal
| | - Julio V. Schneider
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
| | - Jacqueline Heckenhauer
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| | - Steffen U. Pauls
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Institute of Insect BiotechnologyJustus‐Liebig‐University GießenGießenGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| |
Collapse
|
2
|
Stewart RJ, Frandsen PB, Pauls SU, Heckenhauer J. Conservation of Three-Dimensional Structure of Lepidoptera and Trichoptera L-Fibroins for 290 Million Years. Molecules 2022; 27:molecules27185945. [PMID: 36144689 PMCID: PMC9504780 DOI: 10.3390/molecules27185945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/04/2022] [Accepted: 09/09/2022] [Indexed: 11/23/2022] Open
Abstract
The divergence of sister orders Trichoptera (caddisflies) and Lepidoptera (moths and butterflies) from a silk-spinning ancestor occurred around 290 million years ago. Trichoptera larvae are mainly aquatic, and Lepidoptera larvae are almost entirely terrestrial—distinct habitats that required molecular adaptation of their silk for deployment in water and air, respectively. The major protein components of their silks are heavy chain and light chain fibroins. In an effort to identify molecular changes in L-fibroins that may have contributed to the divergent use of silk in water and air, we used the ColabFold implementation of AlphaFold2 to predict three-dimensional structures of L-fibroins from both orders. A comparison of the structures revealed that despite the ancient divergence, profoundly different habitats, and low sequence conservation, a novel 10-helix core structure was strongly conserved in L-fibroins from both orders. Previously known intra- and intermolecular disulfide linkages were accurately predicted. Structural variations outside of the core may represent molecular changes that contributed to the evolution of insect silks adapted to water or air. The distributions of electrostatic potential, for example, were not conserved and present distinct order-specific surfaces for potential interactions with or modulation by external factors. Additionally, the interactions of L-fibroins with the H-fibroin C-termini are different for these orders; lepidopteran L-fibroins have N-terminal insertions that are not present in trichopteran L-fibroins, which form an unstructured ribbon in isolation but become part of an intermolecular β-sheet when folded with their corresponding H-fibroin C-termini. The results are an example of protein structure prediction from deep sequence data of understudied proteins made possible by AlphaFold2.
Collapse
Affiliation(s)
- Russell J. Stewart
- Department of Biomedical Engineering, University of Utah, Salt Lake City, UT 84112, USA
- Correspondence:
| | - Paul B. Frandsen
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84062, USA
| | - Steffen U. Pauls
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany
- Institute for Insect Biotechnology, Justus-Liebig-University, 35392 Gießen, Germany
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany
| |
Collapse
|
3
|
Rouhová L, Sehadová H, Pauchová L, Hradilová M, Žurovcová M, Šerý M, Rindoš M, Žurovec M. Using the multi-omics approach to reveal the silk composition in Plectrocnemia conspersa. Front Mol Biosci 2022; 9:945239. [PMID: 36060257 PMCID: PMC9432349 DOI: 10.3389/fmolb.2022.945239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 07/19/2022] [Indexed: 11/29/2022] Open
Abstract
Similar to Lepidoptera, the larvae of Trichoptera are also capable of producing silk. Plectrocnemia conspersa, a predatory species belonging to the suborder Annulipalpia, builds massive silken retreats with preycapturing nets. In this study, we describe the silk glands of P. conspersa and use the multi-omics methods to obtain a complete picture of the fiber composition. A combination of silk gland-specific transcriptome and proteomic analyses of the spun-out fibers yielded 27 significant candidates whose full-length sequences and gene structures were retrieved from the publicly available genome database. About one-third of the candidates were completely novel proteins for which there are no described homologs, including a group of five pseudofibroins, proteins with a composition similar to fibroin heavy chain. The rest were homologs of lepidopteran silk proteins, although some had a larger number of paralogs. On the other hand, P. conspersa fibers lacked some proteins that are regular components in moth silk. In summary, the multi-omics approach provides an opportunity to compare the overall composition of silk with other insect species. A sufficient number of such studies will make it possible to distinguish between the basic components of all silks and the proteins that represent the adaptation of the fibers for specific purposes or environments.
Collapse
Affiliation(s)
- Lenka Rouhová
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, Ceske Budejovice, Czechia
- Faculty of Science, University of South Bohemia, Ceske Budejovice, Czechia
| | - Hana Sehadová
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, Ceske Budejovice, Czechia
- Faculty of Science, University of South Bohemia, Ceske Budejovice, Czechia
| | - Lucie Pauchová
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, Ceske Budejovice, Czechia
| | - Miluše Hradilová
- Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Praha, Czechia
| | - Martina Žurovcová
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, Ceske Budejovice, Czechia
| | - Michal Šerý
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, Ceske Budejovice, Czechia
| | - Michal Rindoš
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, Ceske Budejovice, Czechia
- Faculty of Forestry and Wood Sciences, Czech University of Life Sciences Prague, Prague, Czechia
| | - Michal Žurovec
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, Ceske Budejovice, Czechia
- Faculty of Science, University of South Bohemia, Ceske Budejovice, Czechia
- *Correspondence: Michal Žurovec,
| |
Collapse
|
4
|
Kawahara AY, Storer CG, Markee A, Heckenhauer J, Powell A, Plotkin D, Hotaling S, Cleland TP, Dikow RB, Dikow T, Kuranishi RB, Messcher R, Pauls SU, Stewart RJ, Tojo K, Frandsen PB. Long-read HiFi sequencing correctly assembles repetitive heavy fibroin silk genes in new moth and caddisfly genomes. GIGABYTE 2022; 2022:gigabyte64. [PMID: 36824508 PMCID: PMC9693786 DOI: 10.46471/gigabyte.64] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 06/24/2022] [Indexed: 11/09/2022] Open
Abstract
Insect silk is a versatile biomaterial. Lepidoptera and Trichoptera display some of the most diverse uses of silk, with varying strength, adhesive qualities, and elastic properties. Silk fibroin genes are long (>20 Kbp), with many repetitive motifs that make them challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (Plodia interpunctella) and genomic sequences for the caddisfly Eubasilissa regina. Both genomes were highly contiguous (N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11) and complete (BUSCO complete = 99.3%/95.2%), with complete and contiguous recovery of silk heavy fibroin gene sequences. We show that HiFi long-read sequencing is helpful for understanding genes with long, repetitive regions.
Collapse
Affiliation(s)
- Akito Y. Kawahara
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA, Corresponding authors. E-mail: ;
| | - Caroline G. Storer
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA,Pacific Biosciences, 1305 O’Brien Dr., Menlo Park, CA 94025, USA
| | - Amanda Markee
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA,School of Natural Resources and the Environment, University of Florida, Gainesville, FL 32611, USA
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany,Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
| | - Ashlyn Powell
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA
| | - David Plotkin
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Scott Hotaling
- School of Biological Sciences, Washington State University, Pullman, WA, USA
| | - Timothy P. Cleland
- Museum Conservation Institute, Smithsonian Institution, Suitland, MD 20746, USA
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC 20002, USA
| | - Torsten Dikow
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Ryoichi B. Kuranishi
- Graduate School of Science, Chiba University, Chiba 263-8522, Japan,Kanagawa Institute of Technology, Kanagawa 243-0292, Japan
| | - Rebeccah Messcher
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Steffen U. Pauls
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany,Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany,Institute for Insect Biotechnology, Justus-Liebig-University, Gießen 35390, Germany
| | - Russell J. Stewart
- Department of Biomedical Engineering, University of Utah, Salt Lake City, UT 84112, USA
| | - Koji Tojo
- Department of Biology, Shinshu University, Matsumoto, Nagano 390-8621, Japan
| | - Paul B. Frandsen
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA,Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC 20002, USA, Corresponding authors. E-mail: ;
| |
Collapse
|
5
|
Angelova N, Danis T, Lagnel J, Tsigenopoulos CS, Manousaki T. SnakeCube: containerized and automated pipeline for de novo genome assembly in HPC environments. BMC Res Notes 2022; 15:98. [PMID: 35255960 PMCID: PMC8900408 DOI: 10.1186/s13104-022-05978-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 02/17/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Objective
The rapid progress in sequencing technology and related bioinformatics tools aims at disentangling diversity and conservation issues through genome analyses. The foremost challenges of the field involve coping with questions emerging from the swift development and application of new algorithms, as well as the establishment of standardized analysis approaches that promote transparency and transferability in research.
Results
Here, we present SnakeCube, an automated and containerized whole de novo genome assembly pipeline that runs within isolated, secured environments and scales for use in High Performance Computing (HPC) domains. SnakeCube was optimized for its performance and tested for its effectiveness with various inputs, highlighting its safe and robust universal use in the field.
Collapse
|
6
|
Heckenhauer J, Frandsen PB, Sproul JS, Li Z, Paule J, Larracuente AM, Maughan PJ, Barker MS, Schneider JV, Stewart RJ, Pauls SU. Genome size evolution in the diverse insect order Trichoptera. Gigascience 2022; 11:6537159. [PMID: 35217860 PMCID: PMC8881205 DOI: 10.1093/gigascience/giac011] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 11/25/2021] [Accepted: 01/21/2022] [Indexed: 12/30/2022] Open
Abstract
Background Genome size is implicated in the form, function, and ecological success of a species. Two principally different mechanisms are proposed as major drivers of eukaryotic genome evolution and diversity: polyploidy (i.e., whole-genome duplication) or smaller duplication events and bursts in the activity of repetitive elements. Here, we generated de novo genome assemblies of 17 caddisflies covering all major lineages of Trichoptera. Using these and previously sequenced genomes, we use caddisflies as a model for understanding genome size evolution in diverse insect lineages. Results We detect a ∼14-fold variation in genome size across the order Trichoptera. We find strong evidence that repetitive element expansions, particularly those of transposable elements (TEs), are important drivers of large caddisfly genome sizes. Using an innovative method to examine TEs associated with universal single-copy orthologs (i.e., BUSCO genes), we find that TE expansions have a major impact on protein-coding gene regions, with TE-gene associations showing a linear relationship with increasing genome size. Intriguingly, we find that expanded genomes preferentially evolved in caddisfly clades with a higher ecological diversity (i.e., various feeding modes, diversification in variable, less stable environments). Conclusion Our findings provide a platform to test hypotheses about the potential evolutionary roles of TE activity and TE-gene associations, particularly in groups with high species, ecological, and functional diversities.
Collapse
Affiliation(s)
- Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany.,Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
| | - Paul B Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany.,Department of Plant & Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA.,Data Science Lab, Smithsonian Institution, Washington, DC 20560, USA
| | - John S Sproul
- Department of Biology, University of Rochester, Rochester, NY 14620, USA.,Department of Biology, University of Nebraska Omaha, Omaha, NE 68182, USA
| | - Zheng Li
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Juraj Paule
- Department of Botany and Molecular Evolution, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
| | | | - Peter J Maughan
- Department of Plant & Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA
| | - Michael S Barker
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Julio V Schneider
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
| | - Russell J Stewart
- Department of Biomedical Engineering, University of Utah, Salt Lake City, UT 84112, USA
| | - Steffen U Pauls
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany.,Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany.,Institute for Insect Biotechnology, Justus-Liebig-University, Gießen 35390, Germany
| |
Collapse
|
7
|
Ríos-Touma B, Holzenthal RW, Rázuri-Gonzales E, Heckenhauer J, Pauls SU, Storer CG, Frandsen PB. De Novo Genome Assembly and Annotation of an Andean Caddisfly, Atopsyche davidsoni Sykora, 1991, a Model for Genome Research of High-Elevation Adaptations. Genome Biol Evol 2022; 14:evab286. [PMID: 34962985 PMCID: PMC8767365 DOI: 10.1093/gbe/evab286] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/20/2021] [Indexed: 11/13/2022] Open
Abstract
We sequence, assemble, and annotate the genome of Atopsyche davidsoni Sykora, 1991, the first whole-genome assembly for the caddisfly family Hydrobiosidae. This free-living and predatory caddisfly inhabits streams in the high-elevation Andes and is separated by more than 200 Myr of evolutionary history from the most closely related caddisfly species with genome assemblies available. We demonstrate the promise of PacBio HiFi reads by assembling the most contiguous caddisfly genome assembly to date with a contig N50 of 14 Mb, which is more than 6× more contiguous than the current most contiguous assembly for a caddisfly (Hydropsyche tenuis). We recover 98.8% of insect BUSCO genes indicating a high level of gene completeness. We also provide a genome annotation of 12,232 annotated proteins. This new genome assembly provides an important new resource for studying genomic adaptation of aquatic insects to harsh, high-altitude environments.
Collapse
Affiliation(s)
- Blanca Ríos-Touma
- Facultad de Ingenierías y Ciencias Aplicadas, Ingeniería Ambiental, Grupo de Investigación en Biodiversidad, Medio Ambiente y Salud (BIOMAS), Universidad de las Américas, Quito, Ecuador
| | - Ralph W Holzenthal
- Department of Entomology, University of Minnesota, St. Paul, Minnesota, USA
| | - Ernesto Rázuri-Gonzales
- Department of Entomology, University of Minnesota, St. Paul, Minnesota, USA
- Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Germany
| | - Jacqueline Heckenhauer
- Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
| | - Steffen U Pauls
- Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Institute of Insect Biotechnology, Justus-Liebig University, Gießen, Germany
| | - Caroline G Storer
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, Florida, USA
| | - Paul B Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah, USA
| |
Collapse
|
8
|
Pfenninger M, Schönnenbeck P, Schell T. ModEst: Accurate estimation of genome size from next generation sequencing data. Mol Ecol Resour 2021; 22:1454-1464. [PMID: 34882987 DOI: 10.1111/1755-0998.13570] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 11/29/2021] [Accepted: 11/30/2021] [Indexed: 01/11/2023]
Abstract
Accurate estimates of genome sizes are important parameters for both theoretical and practical biodiversity genomics. Here we present a fast, easy-to-implement and accurate method to estimate genome size from the number of bases sequenced and the mean sequencing depth. To estimate the latter, we take advantage of the fact that an accurate estimation of the Poisson distribution parameter lambda is possible from truncated data, restricted to the part of the sequencing depth distribution representing the true underlying distribution. With simulations we show that reasonable genome size estimates can be gained even from low-coverage (10×), highly discontinuous genome drafts. Comparison of estimates from a wide range of taxa and sequencing strategies with flow cytometry estimates of the same individuals showed a very good fit and suggested that both methods yield comparable, interchangeable results.
Collapse
Affiliation(s)
- Markus Pfenninger
- Senckenberg Biodiversity and Climate Research Centre, Frankfurt am Main, Germany.,LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Biodiversity and Climate Research Centre, Frankfurt am Main, Germany.,Institute for Organismic and Molecular Evolution, Johannes Gutenberg University, Mainz, Germany
| | - Philipp Schönnenbeck
- Institute for Organismic and Molecular Evolution, Johannes Gutenberg University, Mainz, Germany
| | - Tilman Schell
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Biodiversity and Climate Research Centre, Frankfurt am Main, Germany
| |
Collapse
|
9
|
Li X, Ellis E, Plotkin D, Imada Y, Yago M, Heckenhauer J, Cleland TP, Dikow RB, Dikow T, Storer CG, Kawahara AY, Frandsen PB. First Annotated Genome of a Mandibulate Moth, Neomicropteryx cornuta, Generated Using PacBio HiFi Sequencing. Genome Biol Evol 2021; 13:6380144. [PMID: 34599325 PMCID: PMC8557830 DOI: 10.1093/gbe/evab229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/27/2021] [Indexed: 11/14/2022] Open
Abstract
We provide a new, annotated genome assembly of Neomicropteryx cornuta, a species of the so-called mandibulate archaic moths (Lepidoptera: Micropterigidae). These moths belong to a lineage that is thought to have split from all other Lepidoptera more than 300 Ma and are consequently vital to understanding the early evolution of superorder Amphiesmenoptera, which contains the order Lepidoptera (butterflies and moths) and its sister order Trichoptera (caddisflies). Using PacBio HiFi sequencing reads, we assembled a highly contiguous genome with a contig N50 of nearly 17 Mb. The assembled genome length of 541,115,538 bp is about half the length of the largest published Amphiesmenoptera genome (Limnephilus lunatus, Trichoptera) and double the length of the smallest (Papilio polytes, Lepidoptera). We find high recovery of universal single copy orthologs with 98.1% of BUSCO genes present and provide a genome annotation of 15,643 genes aided by resolved isoforms from PacBio IsoSeq data. This high-quality genome assembly provides an important resource for studying ecological and evolutionary transitions in the early evolution of Amphiesmenoptera.
Collapse
Affiliation(s)
- Xuankun Li
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, USA
| | - Emily Ellis
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, USA
| | - David Plotkin
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, USA
| | - Yume Imada
- Graduate School of Science and Engineering, Ehime University, Matsuyama, Japan
| | - Masaya Yago
- The University Museum, The University of Tokyo, Hongo, Bunkyo-ku, Japan
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany.,Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - Timothy P Cleland
- Museum Conservation Institute, Smithsonian Institution, Suitland, Maryland, USA
| | - Rebecca B Dikow
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, District of Columbia, USA
| | - Torsten Dikow
- Department of Entomology, National Museum of Natural History (USNM), Smithsonian Institution, Washington, District of Columbia, USA
| | - Caroline G Storer
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, USA
| | - Akito Y Kawahara
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, USA
| | - Paul B Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany.,Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, District of Columbia, USA.,Department of Plant and Wildlife Sciences, Brigham Young University, USA
| |
Collapse
|
10
|
Harper JR, Sripada N, Kher P, Whittall JB, Edgerly JS. Interpreting nature's finest insect silks (Order Embioptera): hydropathy, interrupted repetitive motifs, and fiber-to-film transformation for two neotropical species. ZOOLOGY 2021; 146:125923. [PMID: 33901836 DOI: 10.1016/j.zool.2021.125923] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 03/14/2021] [Accepted: 03/19/2021] [Indexed: 10/21/2022]
Abstract
Silks produced by webspinners (Order Embioptera) interact with water by transforming from fiber to film, which then becomes slippery and capable of shedding water. We chose to explore this mechanism by analyzing and comparing the silk protein transcripts of two species with overlapping distributions in Trinidad but from different taxonomic families. The transcript of one, Antipaluria urichi (Clothodidae), was partially characterized in 2009 providing a control for our methods to characterize a second species: Pararhagadochir trinitatis (Scelembiidae), a family that adds to the taxon sampling for this little known order of insects. Previous reports showed that embiopteran silk protein (dubbed Efibroin) consists of a protein core of repetitive motifs largely composed of glycine (Gly), serine (Ser), and alanine (Ala) and a highly conserved C-terminal region. Based on mRNA extracted from silk glands, Next Generation sequencing, and de novo assembly, P. trinitatis silk can be characterized by repetitive motifs of Gly-Ser followed periodically by Gly-Asparagine (Asn-an unusual amino acid for Efibroins) and by a lack of Ala which is otherwise common in Efibroins. The putative N-terminal domain, composed mostly of polar, charged and bulky amino acids, is ten amino acids long with cysteine in the 10th position-a feature likely related to stabilization of the silk fibers. The 29 amino acids of the C-terminus for P. trinitatis silk closely resemble that of other Efibroin sequences, which show 74% shared identity on average. Examination of hydropathicity of Efibroins of both P. trinitatis and An. urichi revealed that these proteins are largely hydrophilic despite having a thin lipid coating on each nano-fiber. We deduced that the hydrophilic quality differs for the two species: due to Ser and Asn for P. trinitatis silk and to previously undetected spacers in An. urichi silk. Spacers are known from some spider and silkworm silks but this is the first report of such for Embioptera. Analysis of hydropathicity revealed the largely hydrophilic quality of these silks and this feature likely explains why water causes the transformation from fiber to film. We compared spun silk to the transcript and detected not insignificant differences between the two measurements implying that as yet undetermined post-translational modifications of their silk may occur. In addition, we found evidence for codon bias in the nucleotides of the putative silk transcript for P. trinitatis, a feature also known for other embiopteran silk genes.
Collapse
Affiliation(s)
- J René Harper
- Department of Biology, 500 El Camino Real, Santa Clara University, Santa Clara, California, 95053, USA.
| | - Neeraja Sripada
- Department of Biology, 500 El Camino Real, Santa Clara University, Santa Clara, California, 95053, USA.
| | - Pooja Kher
- Department of Biology, 500 El Camino Real, Santa Clara University, Santa Clara, California, 95053, USA.
| | - Justen B Whittall
- Department of Biology, 500 El Camino Real, Santa Clara University, Santa Clara, California, 95053, USA.
| | - Janice S Edgerly
- Department of Biology, 500 El Camino Real, Santa Clara University, Santa Clara, California, 95053, USA.
| |
Collapse
|
11
|
Genome Size Estimation of Callipogon relictus Semenov (Coleoptera: Cerambycidae), an Endangered Species and a Korea Natural Monument. INSECTS 2021; 12:insects12020111. [PMID: 33513896 PMCID: PMC7910860 DOI: 10.3390/insects12020111] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 01/21/2021] [Accepted: 01/25/2021] [Indexed: 11/24/2022]
Abstract
Simple Summary The longhorned beetle Calipogon relictus has been considered as a class I endangered species since 2012 in Korea. In an attempt towards beetle conservation, we estimated its genome size at 1.8 ± 0.2 Gb, representing one of the largest cerambycid genomes. This study provides useful insight at the genome level and facilitates the development of an effective conservation strategy. Abstract We estimated the genome size of a relict longhorn beetle, Callipogon relictus Semenov (Cerambycidae: Prioninae)—the Korean natural monument no. 218 and a Class I endangered species—using a combination of flow cytometry and k-mer analysis. The two independent methods enabled accurate estimation of the genome size in Cerambycidae for the first time. The genome size of C. relictus was 1.8 ± 0.2 Gb, representing one of the largest cerambycid genomes studied to date. An accurate estimation of genome size of a critically endangered longhorned beetle is a major milestone in our understanding and characterization of the C. relictus genome. Ultimately, the findings provide useful insight into insect genomics and genome size evolution, particularly among beetles.
Collapse
|
12
|
Olsen LK, Heckenhauer J, Sproul JS, Dikow RB, Gonzalez VL, Kweskin MP, Taylor AM, Wilson SB, Stewart RJ, Zhou X, Holzenthal R, Pauls SU, Frandsen PB. Draft Genome Assemblies and Annotations of Agrypnia vestita Walker, and Hesperophylax magnus Banks Reveal Substantial Repetitive Element Expansion in Tube Case-Making Caddisflies (Insecta: Trichoptera). Genome Biol Evol 2021; 13:6121109. [PMID: 33501983 PMCID: PMC7936034 DOI: 10.1093/gbe/evab013] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2021] [Indexed: 12/20/2022] Open
Abstract
Trichoptera (caddisflies) play an essential role in freshwater ecosystems; for instance, larvae process organic material from the water and are food for a variety of predators. Knowledge on the genomic diversity of caddisflies can facilitate comparative and phylogenetic studies thereby allowing scientists to better understand the evolutionary history of caddisflies. Although Trichoptera are the most diverse aquatic insect order, they remain poorly represented in terms of genomic resources. To date, all long-read based genomes have been sequenced from individuals in the retreat-making suborder, Annulipalpia, leaving ∼275 Ma of evolution without high-quality genomic resources. Here, we report the first long-read based de novo genome assemblies of two tube case-making Trichoptera from the suborder Integripalpia, Agrypnia vestita Walker and Hesperophylax magnus Banks. We find that these tube case-making caddisflies have genome sizes that are at least 3-fold larger than those of currently sequenced annulipalpian genomes and that this pattern is at least partly driven by major expansion of repetitive elements. In H. magnus, long interspersed nuclear elements alone exceed the entire genome size of some annulipalpian counterparts suggesting that caddisflies have high potential as a model for understanding genome size evolution in diverse insect lineages.
Collapse
Affiliation(s)
- Lindsey K Olsen
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah, USA
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany.,Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - John S Sproul
- Department of Biology, University of Rochester, New York, USA
| | - Rebecca B Dikow
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, District of Columbia, USA
| | - Vanessa L Gonzalez
- Global Genome Initiative, National Museum of Natural History, Smithsonian Institution, Washington, District of Columbia, USA
| | - Matthew P Kweskin
- Laboratories of Analytical Biology, National Museum of Natural History, Smithsonian Institution, Washington, District of Columbia, USA
| | - Adam M Taylor
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah, USA
| | - Seth B Wilson
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah, USA
| | - Russell J Stewart
- Department of Biomedical Engineering, University of Utah, Salt Lake City, Utah, USA
| | - Xin Zhou
- Department of Entomology, China Agricultural University, Beijing, China
| | - Ralph Holzenthal
- Department of Entomology, University of Minnesota, St. Paul, Minnesota, USA
| | - Steffen U Pauls
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany.,Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany.,Institute of Insect Biotechnology, Justus-Liebig University, Gießen, Germany
| | - Paul B Frandsen
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah, USA.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany.,Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, District of Columbia, USA
| |
Collapse
|
13
|
Hotaling S, Kelley JL, Frandsen PB. Aquatic Insects Are Dramatically Underrepresented in Genomic Research. INSECTS 2020; 11:E601. [PMID: 32899516 PMCID: PMC7563230 DOI: 10.3390/insects11090601] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 09/01/2020] [Accepted: 09/03/2020] [Indexed: 02/06/2023]
Abstract
Aquatic insects comprise 10% of all insect diversity, can be found on every continent except Antarctica, and are key components of freshwater ecosystems. However, aquatic insect genome biology lags dramatically behind that of terrestrial insects. If genomic effort was spread evenly, one aquatic insect genome would be sequenced for every ~9 terrestrial insect genomes. Instead, ~24 terrestrial insect genomes have been sequenced for every aquatic insect genome. This discrepancy is even more dramatic if the quality of genomic resources is considered; for instance, while no aquatic insect genome has been assembled to the chromosome level, 29 terrestrial insect genomes spanning four orders have. We argue that a lack of aquatic insect genomes is not due to any underlying difficulty (e.g., small body sizes or unusually large genomes), yet it is severely hampering aquatic insect research at both fundamental and applied scales. By expanding the availability of aquatic insect genomes, we will gain key insight into insect diversification and empower future research for a globally important taxonomic group.
Collapse
Affiliation(s)
- Scott Hotaling
- School of Biological Sciences, Washington State University, Pullman, WA 99164, USA;
| | - Joanna L. Kelley
- School of Biological Sciences, Washington State University, Pullman, WA 99164, USA;
| | - Paul B. Frandsen
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84062, USA
- Data Science Lab, Smithsonian Institution, Washington, DC 20002, USA
| |
Collapse
|