1
|
Nuss AB, Lomas JS, Reyes JB, Garcia-Cruz O, Lei W, Sharma A, Pham MN, Beniwal S, Swain ML, McVicar M, Hinne IA, Zhang X, Yim WC, Gulia-Nuss M. The highly improved genome of Ixodes scapularis with X and Y pseudochromosomes. Life Sci Alliance 2023; 6:e202302109. [PMID: 37813487 PMCID: PMC10561763 DOI: 10.26508/lsa.202302109] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 09/21/2023] [Accepted: 09/22/2023] [Indexed: 10/12/2023] Open
Abstract
Ixodes scapularis, the black-legged tick, is the principal vector of the Lyme disease spirochete, Borrelia burgdorferi, and is responsible for most of the ∼470,000 estimated Lyme disease cases annually in the USA. Ixodes scapularis can transmit six additional pathogens of human health significance. Because of its medical importance, I. scapularis was the first tick genome to be sequenced and annotated. However, the first assembly, I. scapularis Wikel (IscaW), was highly fragmented because of the technical challenges posed by the long, repetitive genome sequences characteristic of arthropod genomes and the lack of long-read sequencing techniques. Although I. scapularis has emerged as a model for tick research because of the availability of new tools such as embryo injection and CRISPR-Cas9-mediated gene editing yet the lack of chromosome-scale scaffolds has slowed progress in tick biology and the development of tools for their control. Here we combine diverse technologies to produce the I. scapularis Gulia-Nuss (IscGN) genome assembly and gene set. We used DNA from eggs and male and female adult ticks and took advantage of Hi-C, PacBio HiFi sequencing, and Illumina short-read sequencing technologies to produce a chromosome-level assembly. In this work, we present the predicted pseudochromosomes consisting of 13 autosomes and the sex pseudochromosomes: X and Y, and a markedly improved genome annotation compared with the existing assemblies and annotations.
Collapse
Affiliation(s)
- Andrew B Nuss
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
- https://ror.org/01keh0577 Department of Agriculture, Veterinary, and Rangeland Sciences, The University of Nevada, Reno, NV, USA
| | - Johnathan S Lomas
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Jeremiah B Reyes
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
- https://ror.org/01keh0577 Nevada Bioinformatics Center, University of Nevada, Reno, NV, USA
| | - Omar Garcia-Cruz
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Wenlong Lei
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Arvind Sharma
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Michael N Pham
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Saransh Beniwal
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
- https://ror.org/01keh0577 Department of Computer Science and Engineering, The University of Nevada, Reno, NV, USA
| | - Mia L Swain
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Molly McVicar
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Isaac Amankona Hinne
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Xingtan Zhang
- https://ror.org/01keh0577 Nevada Bioinformatics Center, University of Nevada, Reno, NV, USA
| | - Won C Yim
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Monika Gulia-Nuss
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| |
Collapse
|
2
|
Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mob DNA 2023; 14:8. [PMID: 37452430 PMCID: PMC10347736 DOI: 10.1186/s13100-023-00296-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/09/2023] [Indexed: 07/18/2023] Open
Abstract
BACKGROUND Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors. RESULTS We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast. CONCLUSION McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.
Collapse
Affiliation(s)
- Jingxuan Chen
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | | | - Shunhua Han
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | - David J. Garfinkel
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA USA
| | - Casey M. Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
- Department of Genetics, University of Georgia, Athens, GA USA
| |
Collapse
|
3
|
Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of short-read transposable element detectors and species-wide data mining of insertion patterns in yeast. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.13.528343. [PMID: 36824955 PMCID: PMC9948991 DOI: 10.1101/2023.02.13.528343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
Background Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute or evaluate multiple TE insertion detectors. Results We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae , we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide a consistent and biologically meaningful view of non-reference TE insertions in a species-wide panel of ∼ 1000 yeast genomes, as evaluated by coverage-based abundance estimates and expected patterns of tRNA promoter targeting. Finally, we show that best-in-class predictors for yeast have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge aboutfine-scale target preferences first revealed experimentally for Ty1 to natural insertions and related copia -superfamily retrotransposons in yeast. Conclusion McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors for other species.
Collapse
|
4
|
Lewerentz J, Johansson AM, Stenberg P. The path to immortalization of cells starts by managing stress through gene duplications. Exp Cell Res 2023; 422:113431. [PMID: 36423660 DOI: 10.1016/j.yexcr.2022.113431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 11/18/2022] [Accepted: 11/19/2022] [Indexed: 11/23/2022]
Abstract
The genomes of immortalized cell lines (and cancer cells) are characterized by multiple types of aberrations, ranging from single nucleotide polymorphisms (SNPs) to structural rearrangements that have accumulated over time. Consequently, it is difficult to estimate the relative impact of different aberrations, the order of events, and which gene functions were under selective pressure at the early stage towards cellular immortalization. Here, we have established novel cell cultures derived from Drosophila melanogaster embryos that were sampled at multiple time points over a one-year period. Using short-read DNA sequencing, we show that copy-number gain in preferentially stress-related genes were acquired in a dominant fraction of cells in 300-days old cultures. Furthermore, transposable elements were active in cells of all cultures. Only a few (<1%) SNPs could be followed over time, and these showed no trend to increase or decrease. We conclude that the early cellular responses of a novel culture comprise sequence duplication and transposable element activity. During immortalization, positive selection first occurs on genes that are related to stress response before shifting to genes that are related to growth.
Collapse
Affiliation(s)
- Jacob Lewerentz
- Department of Molecular Biology, Umeå University, Umeå, Västerbotten, SE-901 87, Sweden.
| | - Anna-Mia Johansson
- Department of Molecular Biology, Umeå University, Umeå, Västerbotten, SE-901 87, Sweden
| | - Per Stenberg
- Department of Ecology and Environmental Sciences, Umeå University, Umeå, Västerbotten, SE-901 87, Sweden
| |
Collapse
|
5
|
Han S, Dias GB, Basting PJ, Viswanatha R, Perrimon N, Bergman C. Local assembly of long reads enables phylogenomics of transposable elements in a polyploid cell line. Nucleic Acids Res 2022; 50:e124. [PMID: 36156149 PMCID: PMC9757076 DOI: 10.1093/nar/gkac794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open
Abstract
Animal cell lines often undergo extreme genome restructuring events, including polyploidy and segmental aneuploidy that can impede de novo whole-genome assembly (WGA). In some species like Drosophila, cell lines also exhibit massive proliferation of transposable elements (TEs). To better understand the role of transposition during animal cell culture, we sequenced the genome of the tetraploid Drosophila S2R+ cell line using long-read and linked-read technologies. WGAs for S2R+ were highly fragmented and generated variable estimates of TE content across sequencing and assembly technologies. We therefore developed a novel WGA-independent bioinformatics method called TELR that identifies, locally assembles, and estimates allele frequency of TEs from long-read sequence data (https://github.com/bergmanlab/telr). Application of TELR to a ∼130x PacBio dataset for S2R+ revealed many haplotype-specific TE insertions that arose by transposition after initial cell line establishment and subsequent tetraploidization. Local assemblies from TELR also allowed phylogenetic analysis of paralogous TEs, which revealed that proliferation of TE families in vitro can be driven by single or multiple source lineages. Our work provides a model for the analysis of TEs in complex heterozygous or polyploid genomes that are recalcitrant to WGA and yields new insights into the mechanisms of genome evolution in animal cell culture.
Collapse
Affiliation(s)
| | | | - Preston J Basting
- Institute of Bioinformatics, University of Georgia, 120 E. Green St., Athens, GA, USA
| | - Raghuvir Viswanatha
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA
| | - Norbert Perrimon
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA,Howard Hughes Medical Institute, Boston, MA, USA
| | - Casey M Bergman
- To whom correspondence should be addressed. Tel: +1 706 542 1764; Fax: +1 706 542 3910;
| |
Collapse
|
6
|
Lewerentz J, Johansson AM, Larsson J, Stenberg P. Transposon activity, local duplications and propagation of structural variants across haplotypes drive the evolution of the Drosophila S2 cell line. BMC Genomics 2022; 23:276. [PMID: 35392795 PMCID: PMC8991648 DOI: 10.1186/s12864-022-08472-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 03/15/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Immortalized cell lines are widely used model systems whose genomes are often highly rearranged and polyploid. However, their genome structure is seldom deciphered and is thus not accounted for during analyses. We therefore used linked short- and long-read sequencing to perform haplotype-level reconstruction of the genome of a Drosophila melanogaster cell line (S2-DRSC) with a complex genome structure. RESULTS Using a custom implementation (that is designed to use ultra-long reads in complex genomes with nested rearrangements) to call structural variants (SVs), we found that the most common SV was repetitive sequence insertion or deletion (> 80% of SVs), with Gypsy retrotransposon insertions dominating. The second most common SV was local sequence duplication. SNPs and other SVs were rarer, but several large chromosomal translocations and mitochondrial genome insertions were observed. Haplotypes were highly similar at the nucleotide level but structurally very different. Insertion SVs existed at various haplotype frequencies and were unlinked on chromosomes, demonstrating that haplotypes have different structures and suggesting the existence of a mechanism that allows SVs to propagate across haplotypes. Finally, using public short-read data, we found that transposable element insertions and local duplications are common in other D. melanogaster cell lines. CONCLUSIONS The S2-DRSC cell line evolved through retrotransposon activity and vast local sequence duplications, that we hypothesize were the products of DNA re-replication events. Additionally, mutations can propagate across haplotypes (possibly explained by mitotic recombination), which enables fine-tuning of mutational impact and prevents accumulation of deleterious events, an inherent problem of clonal reproduction. We conclude that traditional linear homozygous genome representation conceals the complexity when dealing with rearranged and heterozygous clonal cells.
Collapse
Affiliation(s)
- Jacob Lewerentz
- Department of Molecular Biology, Umeå University, SE-901 87, Umeå, Västerbotten, Sweden.
| | - Anna-Mia Johansson
- Department of Molecular Biology, Umeå University, SE-901 87, Umeå, Västerbotten, Sweden
| | - Jan Larsson
- Department of Molecular Biology, Umeå University, SE-901 87, Umeå, Västerbotten, Sweden.
| | - Per Stenberg
- Department of Ecology and Environmental Sciences, Umeå University, SE-901 87, Umeå, Västerbotten, Sweden.
| |
Collapse
|
7
|
Mariyappa D, Rusch DB, Han S, Luhur A, Overton D, Miller DFB, Bergman CM, Zelhof AC. A novel transposable element-based authentication protocol for Drosophila cell lines. G3 (BETHESDA, MD.) 2022; 12:6440050. [PMID: 34849844 PMCID: PMC9210319 DOI: 10.1093/g3journal/jkab403] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 11/11/2021] [Indexed: 11/23/2022]
Abstract
Drosophila cell lines are used by researchers to investigate various cell biological phenomena. It is crucial to exercise good cell culture practice. Poor handling can lead to both inter- and intra-species cross-contamination. Prolonged culturing can lead to introduction of large- and small-scale genomic changes. These factors, therefore, make it imperative that methods to authenticate Drosophila cell lines are developed to ensure reproducibility. Mammalian cell line authentication is reliant on short tandem repeat (STR) profiling; however, the relatively low STR mutation rate in Drosophila melanogaster at the individual level is likely to preclude the value of this technique. In contrast, transposable elements (TEs) are highly polymorphic among individual flies and abundant in Drosophila cell lines. Therefore, we investigated the utility of TE insertions as markers to discriminate Drosophila cell lines derived from the same or different donor genotypes, divergent sub-lines of the same cell line, and from other insect cell lines. We developed a PCR-based next-generation sequencing protocol to cluster cell lines based on the genome-wide distribution of a limited number of diagnostic TE families. We determined the distribution of five TE families in S2R+, S2-DRSC, S2-DGRC, Kc167, ML-DmBG3-c2, mbn2, CME W1 Cl.8+, and ovarian somatic sheath Drosophila cell lines. Two independent downstream analyses of the next-generation sequencing data yielded similar clustering of these cell lines. Double-blind testing of the protocol reliably identified various Drosophila cell lines. In addition, our data indicate minimal changes with respect to the genome-wide distribution of these five TE families when cells are passaged for at least 50 times. The protocol developed can accurately identify and distinguish the numerous Drosophila cell lines available to the research community, thereby aiding reproducible Drosophila cell culture research.
Collapse
Affiliation(s)
- Daniel Mariyappa
- Biology Department, Drosophila Genomics Resource Center, Indiana University, Bloomington, IN 47405, USA
| | - Douglas B Rusch
- Biology Department, Center for Genetics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
| | - Shunhua Han
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Arthur Luhur
- Biology Department, Drosophila Genomics Resource Center, Indiana University, Bloomington, IN 47405, USA
| | - Danielle Overton
- Biology Department, Drosophila Genomics Resource Center, Indiana University, Bloomington, IN 47405, USA
| | - David F B Miller
- Biology Department, Center for Genetics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
| | - Casey M Bergman
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA.,Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Andrew C Zelhof
- Biology Department, Drosophila Genomics Resource Center, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|