101
|
Woronzow V, Möhner J, Remane D, Zischler H. Generation of somatic de novo structural variation as a hallmark of cellular senescence in human lung fibroblasts. Front Cell Dev Biol 2023; 11:1274807. [PMID: 38152346 PMCID: PMC10751365 DOI: 10.3389/fcell.2023.1274807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/29/2023] [Indexed: 12/29/2023] Open
Abstract
Cellular senescence is characterized by replication arrest in response to stress stimuli. Senescent cells accumulate in aging tissues and can trigger organ-specific and possibly systemic dysfunction. Although senescent cell populations are heterogeneous, a key feature is that they exhibit epigenetic changes. Epigenetic changes such as loss of repressive constitutive heterochromatin could lead to subsequent LINE-1 derepression, a phenomenon often described in the context of senescence or somatic evolution. LINE-1 elements decode the retroposition machinery and reverse transcription generates cDNA from autonomous and non-autonomous TEs that can potentially reintegrate into genomes and cause structural variants. Another feature of cellular senescence is mitochondrial dysfunction caused by mitochondrial damage. In combination with impaired mitophagy, which is characteristic of senescent cells, this could lead to cytosolic mtDNA accumulation and, as a genomic consequence, integrations of mtDNA into nuclear DNA (nDNA), resulting in mitochondrial pseudogenes called numts. Thus, both phenomena could cause structural variants in aging genomes that go beyond epigenetic changes. We therefore compared proliferating and senescent IMR-90 cells in terms of somatic de novo numts and integrations of a non-autonomous composite retrotransposons - the so-called SVA elements-that hijack the retropositional machinery of LINE-1. We applied a subtractive and kinetic enrichment technique using proliferating cell DNA as a driver and senescent genomes as a tester for the detection of nuclear flanks of de novo SVA integrations. Coupled with deep sequencing we obtained a genomic readout for SVA retrotransposition possibly linked to cellular senescence in the IMR-90 model. Furthermore, we compared the genomes of proliferative and senescent IMR-90 cells by deep sequencing or after enrichment of nuclear DNA using AluScan technology. A total of 1,695 de novo SVA integrations were detected in senescent IMR-90 cells, of which 333 were unique. Moreover, we identified a total of 81 de novo numts with perfect identity to both mtDNA and nuclear hg38 flanks. In summary, we present evidence for possible age-dependent structural genomic changes by paralogization that go beyond epigenetic modifications. We hypothesize, that the structural variants we observe potentially impact processes associated with replicative aging of IMR-90 cells.
Collapse
Affiliation(s)
- Valentina Woronzow
- Division of Anthropology, Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Mainz, Germany
| | - Jonas Möhner
- Division of Anthropology, Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Mainz, Germany
| | - Daniel Remane
- Division of Anthropology, Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Mainz, Germany
- HOX Life Science GmbH, Frankfurt, Hessen, Germany
| | - Hans Zischler
- Division of Anthropology, Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
102
|
Chaung K, Baharav TZ, Henderson G, Zheludev IN, Wang PL, Salzman J. SPLASH: A statistical, reference-free genomic algorithm unifies biological discovery. Cell 2023; 186:5440-5456.e26. [PMID: 38065078 PMCID: PMC10861363 DOI: 10.1016/j.cell.2023.10.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 08/31/2023] [Accepted: 10/26/2023] [Indexed: 12/18/2023]
Abstract
Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), which directly analyzes raw sequencing data, using a statistical test to detect a signature of regulation: sample-specific sequence variation. SPLASH detects many types of variation and can be efficiently run at scale. We show that SPLASH identifies complex mutation patterns in SARS-CoV-2, discovers regulated RNA isoforms at the single-cell level, detects the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a unifying approach to genomic analysis that enables expansive discovery without metadata or references.
Collapse
Affiliation(s)
- Kaitlin Chaung
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA; Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | - Tavor Z Baharav
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
| | - George Henderson
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA; Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | - Ivan N Zheludev
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | - Peter L Wang
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA; Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | - Julia Salzman
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA; Department of Biochemistry, Stanford University, Stanford, CA 94305, USA; Department of Statistics (by courtesy), Stanford University, Stanford, CA 94305, USA; Department of Biology (by courtesy), Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
103
|
Marino A, Reboud EL, Chevalier E, Tilak MK, Contreras-Garduño J, Nabholz B, Condamine FL. Genomics of the relict species Baronia brevicornis sheds light on its demographic history and genome size evolution across swallowtail butterflies. G3 (BETHESDA, MD.) 2023; 13:jkad239. [PMID: 37847748 PMCID: PMC10700114 DOI: 10.1093/g3journal/jkad239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 05/22/2023] [Accepted: 10/09/2023] [Indexed: 10/19/2023]
Abstract
Relict species, like coelacanth, gingko, tuatara, are the remnants of formerly more ecologically and taxonomically diverse lineages. It raises the questions of why they are currently species-poor, have restrained ecology, and are often vulnerable to extinction. Estimating heterozygosity level and demographic history can guide our understanding of the evolutionary history and conservation status of relict species. However, few studies have focused on relict invertebrates compared to vertebrates. We sequenced the genome of Baronia brevicornis (Lepidoptera: Papilionidae), which is an endangered species, the sister species of all swallowtail butterflies, and is the oldest lineage of all extant butterflies. From a dried specimen, we were able to generate both long-read and short-read data and assembled a genome of 406 Mb for Baronia. We found a fairly high level of heterozygosity (0.58%) compared to other swallowtail butterflies, which contrasts with its endangered and relict status. Taking into account the high ratio of recombination over mutation, demographic analyses indicated a sharp decline of the effective population size initiated in the last million years. Moreover, the Baronia genome was used to study genome size variation in Papilionidae. Genome sizes are mostly explained by transposable elements activities, suggesting that large genomes appear to be a derived feature in swallowtail butterflies as transposable elements activity is recent and involves different transposable elements classes among species. This first Baronia genome provides a resource for assisting conservation in a flagship and relict insect species as well as for understanding swallowtail genome evolution.
Collapse
Affiliation(s)
- Alba Marino
- Institut des Sciences de l'Evolution de Montpellier (Université de Montpellier | CNRS | IRD | EPHE), Place Eugène Bataillon, 34095 Montpellier, France
| | - Eliette L Reboud
- Institut des Sciences de l'Evolution de Montpellier (Université de Montpellier | CNRS | IRD | EPHE), Place Eugène Bataillon, 34095 Montpellier, France
| | - Emmanuelle Chevalier
- Institut des Sciences de l'Evolution de Montpellier (Université de Montpellier | CNRS | IRD | EPHE), Place Eugène Bataillon, 34095 Montpellier, France
| | - Marie-Ka Tilak
- Institut des Sciences de l'Evolution de Montpellier (Université de Montpellier | CNRS | IRD | EPHE), Place Eugène Bataillon, 34095 Montpellier, France
| | - Jorge Contreras-Garduño
- Universidad Nacional Autónoma de México, Escuela Nacional de Estudios Superiores, campus Morelia, Antigua Carretera a Pátzcuaro #8701, Col. Ex-Hacienda San José de la Huerta, 58190 Morelia, Michoacán, Mexico
| | - Benoit Nabholz
- Institut des Sciences de l'Evolution de Montpellier (Université de Montpellier | CNRS | IRD | EPHE), Place Eugène Bataillon, 34095 Montpellier, France
- Institut Universitaire de France (IUF), Paris, France
| | - Fabien L Condamine
- Institut des Sciences de l'Evolution de Montpellier (Université de Montpellier | CNRS | IRD | EPHE), Place Eugène Bataillon, 34095 Montpellier, France
| |
Collapse
|
104
|
Zuo Z. The successive emergence of ERVL-MaLRs in primates. Virus Evol 2023; 9:vead072. [PMID: 38131004 PMCID: PMC10735291 DOI: 10.1093/ve/vead072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 11/01/2023] [Accepted: 12/01/2023] [Indexed: 12/23/2023] Open
Abstract
Although the ERVL-mammalian-apparent LTR retrotransposons (MaLRs) are the fourth largest family of transposable elements in the human genome, their evolutionary history and relationship have not been thoroughly studied. In this study, through RepeatMasker annotations of some representative species and construction of phylogenetic tree by sequence similarity, all primate-specific MaLR members are found to descend from MLT1A1 retrotransposon. Comparative genomic analysis, transposition-in-transposition inference, and sequence feature comparisons consistently show that each MaLR member evolved from its predecessor successively and had a limited activity period during primate evolution. Accordingly, a novel MaLR member was discovered as successor of MSTB1 in Tarsiiformes. At last, the identification of candidate precursor and intermediate THE1A elements provides further evidence for the previously proposed arms race model between ZNF430/ZNF100 and THE1B/THE1A. Taken together, this study sheds light on the evolutionary history of MaLRs and can serve as a foundation for future research on their interactions with zinc finger genes, gene regulation, and human health implications.
Collapse
Affiliation(s)
- Zheng Zuo
- School of Life Science and Technology, Southeast University, Nanjing 210096, China
| |
Collapse
|
105
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJ, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PG, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O’Neill RJ, Eichler E, Phillippy AM. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D. Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bomberg
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G. Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y. Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health & Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H. Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A. Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J. Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Alice C. Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V. Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan Eichler
- University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
106
|
Monzon AM, Arrías PN, Elofsson A, Mier P, Andrade-Navarro MA, Bevilacqua M, Clementel D, Bateman A, Hirsh L, Fornasari MS, Parisi G, Piovesan D, Kajava AV, Tosatto SCE. A STRP-ed definition of Structured Tandem Repeats in Proteins. J Struct Biol 2023; 215:108023. [PMID: 37652396 DOI: 10.1016/j.jsb.2023.108023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/31/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Dept. of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Paula Nazarena Arrías
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Arne Elofsson
- Dept. of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Tomtebodavägen 23, 171 21 Solna, Sweden
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Clementel
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
107
|
Salcedo-Sánchez R, Cruz-Zamora Y, Cruz-García F. The S C10-RNase promoter displays changes in DNA methylation patterns through pistil development in self-incompatible Nicotiana alata. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2023; 205:108161. [PMID: 37956612 DOI: 10.1016/j.plaphy.2023.108161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 09/15/2023] [Accepted: 11/02/2023] [Indexed: 11/15/2023]
Abstract
In Solanaceae, self-incompatibility is a genetic mechanism that prevents endogamy in plant populations. Expression of the S-determinants, S-RNase, and SLF, is tightly regulated during pistil and pollen development. However, the molecular mechanism of gene expression regulation in S-RNase-based self-incompatibility systems must be better understood. Here, we identified a 1.3 Kbp sequence upstream to the coding region of the functional SC10-RNase allele from the self-incompatible Nicotiana alata, which directs SC10-RNase expression in mature pistils. This SC10-RNase promoter includes a 300 bp region with minimal elements that sustain the SC10-RNase expression. Likewise, a fragment of a transposable element from the Gypsy family of retrotransposons is also present at the -320 bp position. Nevertheless, its presence does not affect the expression of the SC10-RNase in mature pistils. Additionally, we determined that the SC10-RNase promoter undergoes different DNA methylation states during pistil development, being the mCHH methylation context the most frequent close to the transcription start site at pistil maturity. We hypothesized that the Gypsy element at the SC10-RNase promoter might contribute to the DNA methylation remodeling on the three sequence contexts analyzed here. We propose that mCHH methylation enrichment and other regulatory elements in the S-RNase coding region regulate the specific and abundant SC10-RNase expression in mature pistils in N. alata.
Collapse
Affiliation(s)
- Renata Salcedo-Sánchez
- Departamento de Bioquímica, Facultad de Química, Universidad Nacional Autónoma de México, Cd. Mx, 04510, México
| | - Yuridia Cruz-Zamora
- Departamento de Bioquímica, Facultad de Química, Universidad Nacional Autónoma de México, Cd. Mx, 04510, México
| | - Felipe Cruz-García
- Departamento de Bioquímica, Facultad de Química, Universidad Nacional Autónoma de México, Cd. Mx, 04510, México.
| |
Collapse
|
108
|
Stepankiw N, Yang AWH, Hughes TR. The human genome contains over a million autonomous exons. Genome Res 2023; 33:1865-1878. [PMID: 37945377 PMCID: PMC10760453 DOI: 10.1101/gr.277792.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 10/27/2023] [Indexed: 11/12/2023]
Abstract
Mammalian mRNA and lncRNA exons are often small compared to introns. The exon definition model predicts that exons splice autonomously, dependent on proximal exon sequence features, explaining their delineation within large introns. This model has not been examined on a genome-wide scale, however, leaving open the question of how often mRNA and lncRNA exons are autonomous. It is also unknown how frequently such exons can arise by chance. Here, we directly assayed large fragments (500-1000 bp) of the human genome by exon trapping, which detects exons spliced into a heterologous transgene, here designed with a large intron context. We define the trapped exons as "autonomous." We obtained ∼1.25 million trapped exons, including most known mRNA and well-annotated lncRNA internal exons, demonstrating that human exons are predominantly autonomous. mRNA exons are trapped with the highest efficiency. Nearly a million of the trapped exons are unannotated, most located in intergenic regions and antisense to mRNA, with depletion from the forward strand of introns. These exons are not conserved, suggesting they are nonfunctional and arose from random mutations. They are nonetheless highly enriched with known splicing promoting sequence features that delineate known exons. Novel autonomous exons are more numerous than annotated lncRNA exons, and computational models also indicate they will occur with similar frequency in any randomly generated sequence. These results show that most human coding exons splice autonomously, and provide an explanation for the existence of many unconserved lncRNAs, as well as a new annotation and inclusion levels of spliceable loci in the human genome.
Collapse
Affiliation(s)
- Nicholas Stepankiw
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada M5S 3E1
| | - Ally W H Yang
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada M5S 3E1
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada M5S 3E1;
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
109
|
Cuenca-Guardiola J, Morena-Barrio BDL, Navarro-Manzano E, Stevens J, Ouwehand WH, Gleadall NS, Corral J, Fernández-Breis JT. Detection and annotation of transposable element insertions and deletions on the human genome using nanopore sequencing. iScience 2023; 26:108214. [PMID: 37953943 PMCID: PMC10638045 DOI: 10.1016/j.isci.2023.108214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 07/28/2023] [Accepted: 10/11/2023] [Indexed: 11/14/2023] Open
Abstract
Repetitive sequences represent about 45% of the human genome. Some are transposable elements (TEs) with the ability to change their position in the genome, creating genetic variability both as insertions or deletions, with potential pathogenic consequences. We used long-read nanopore sequencing to identify TE variants in the genomes of 24 patients with antithrombin deficiency. We identified 7 344 TE insertions and 3 056 TE deletions, 2 926 were not previously described in publicly available databases. The insertions affected 3 955 genes, with 6 insertions located in exons, 3 929 in introns, and 147 in promoters. Potential functional impact was evaluated with gene annotation and enrichment analysis, which suggested a strong relationship with neuron-related functions and autism. We conclude that this study encourages the generation of a complete map of TEs in the human genome, which will be useful for identifying new TEs involved in genetic disorders.
Collapse
Affiliation(s)
- Javier Cuenca-Guardiola
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Pascual Parrilla, Facultad de Informática, Campus de Espinardo, Murcia 30100, Spain
| | - Belén de la Morena-Barrio
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Pascual Parrilla, CIBERER-III, Ronda de Garay S/N, Murcia 30003, Spain
| | - Esther Navarro-Manzano
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Pascual Parrilla, CIBERER-III, Ronda de Garay S/N, Murcia 30003, Spain
| | - Jonathan Stevens
- Department of Haematology, University of Cambridge, CB2 0PT, Cambridge Biomedical Campus, Cambridge, Cambridge, England, UK
- Blood and Transplant, National Health Service (NHS), CB2 0QQ, Cambridge Biomedical Campus, Cambridge, England, UK
| | - Willem H. Ouwehand
- Department of Haematology, University of Cambridge, CB2 0PT, Cambridge Biomedical Campus, Cambridge, Cambridge, England, UK
- Blood and Transplant, National Health Service (NHS), CB2 0QQ, Cambridge Biomedical Campus, Cambridge, England, UK
- British Heart Foundation Cambridge Centre of Excellence, Division of Cardiovascular Medicine, Cambridge Heart and Lung Research Institute, Cambridge Biomedical Campus, Cambridge, England CB2 0AY, UK
- University College London Hospitals, NHS Foundation Trust, London, England, UK
| | - Nicholas S. Gleadall
- Department of Haematology, University of Cambridge, CB2 0PT, Cambridge Biomedical Campus, Cambridge, Cambridge, England, UK
- Blood and Transplant, National Health Service (NHS), CB2 0QQ, Cambridge Biomedical Campus, Cambridge, England, UK
| | - Javier Corral
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Pascual Parrilla, CIBERER-III, Ronda de Garay S/N, Murcia 30003, Spain
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Pascual Parrilla, Facultad de Informática, Campus de Espinardo, Murcia 30100, Spain
| |
Collapse
|
110
|
Baril T, Croll D. A pangenome-guided manually curated library of transposable elements for Zymoseptoria tritici. BMC Res Notes 2023; 16:335. [PMID: 37974222 PMCID: PMC10652580 DOI: 10.1186/s13104-023-06613-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/03/2023] [Indexed: 11/19/2023] Open
Abstract
OBJECTIVES High-quality species-specific transposable element (TE) libraries are required for studies to elucidate the evolutionary dynamics of TEs and gain an understanding of their impacts on host genomes. Such high-quality TE resources are severely lacking for species in the fungal kingdom. To facilitate future studies on the putative role of TEs in rapid adaptation observed in the fungal wheat pathogen Zymoseptoria tritici, we produced a manually curated TE library. This was generated by detecting TEs in 19 reference genome assemblies representing the global diversity of the species supplemented by multiple sister species genomes. Improvements over previous TE libraries have been made on TE boundary resolution, detection of ORFs, TE domains, terminal inverted repeats, and class-specific motifs. DATA DESCRIPTION A TE consensus library for Z. tritici formatted for use with RepeatMasker. This data is relevant to other researchers investigating TE-host evolutionary dynamics in Z. tritici or who are interested in comparative studies of the fungal kingdom. Further, this TE library can be used to improve gene annotation. Finally, this TE library increases the number of manually curated TE datasets, providing resources to further our understanding of TE diversity.
Collapse
Affiliation(s)
- Tobias Baril
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchatel, Rue -Argand 11, 2000, Neuchatel, Switzerland
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchatel, Rue -Argand 11, 2000, Neuchatel, Switzerland.
| |
Collapse
|
111
|
Zhang X, Celic I, Mitchell H, Stuckert S, Vedula L, Han JS. Comprehensive profiling of L1 retrotransposons in mouse. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.13.566638. [PMID: 38014156 PMCID: PMC10680791 DOI: 10.1101/2023.11.13.566638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
L1 elements are retrotransposons currently active in mammals. Although L1s are typically silenced in most normal tissues, elevated L1 expression is associated with a variety of conditions, including cancer, aging, infertility, and neurological disease. These associations have raised interest in the mapping of human endogenous de novo L1 insertions, and a variety of methods have been developed for this purpose. Adapting these methods to mouse genomes would allow us to monitor endogenous in vivo L1 activity in controlled, experimental conditions using mouse disease models. Here we use a modified version of transposon insertion profiling, called nanoTIPseq, to selectively enrich young mouse L1s. By linking this amplification step with nanopore sequencing, we identified >95% annotated L1s from C57BL/6 genomic DNA using only 200,000 sequencing reads. In the process, we discovered 82 unannotated L1 insertions from a single C57BL/6 genome. Most of these unannotated L1s were near repetitive sequence and were not found with short-read TIPseq. We used nanoTIPseq on individual mouse breast cancer cells and were able to identify the annotated and unannotated L1s, as well as new insertions specific to individual cells, providing proof of principle for using nanoTIPseq to interrogate retrotransposition activity at the single cell level in vivo .
Collapse
|
112
|
Pulver C, Grun D, Duc J, Sheppard S, Planet E, Coudray A, de Fondeville R, Pontis J, Trono D. Statistical learning quantifies transposable element-mediated cis-regulation. Genome Biol 2023; 24:258. [PMID: 37950299 PMCID: PMC10637000 DOI: 10.1186/s13059-023-03085-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 10/09/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND Transposable elements (TEs) have colonized the genomes of most metazoans, and many TE-embedded sequences function as cis-regulatory elements (CREs) for genes involved in a wide range of biological processes from early embryogenesis to innate immune responses. Because of their repetitive nature, TEs have the potential to form CRE platforms enabling the coordinated and genome-wide regulation of protein-coding genes by only a handful of trans-acting transcription factors (TFs). RESULTS Here, we directly test this hypothesis through mathematical modeling and demonstrate that differences in expression at protein-coding genes alone are sufficient to estimate the magnitude and significance of TE-contributed cis-regulatory activities, even in contexts where TE-derived transcription fails to do so. We leverage hundreds of overexpression experiments and estimate that, overall, gene expression is influenced by TE-embedded CREs situated within approximately 500 kb of promoters. Focusing on the cis-regulatory potential of TEs within the gene regulatory network of human embryonic stem cells, we find that pluripotency-specific and evolutionarily young TE subfamilies can be reactivated by TFs involved in post-implantation embryogenesis. Finally, we show that TE subfamilies can be split into truly regulatorily active versus inactive fractions based on additional information such as matched epigenomic data, observing that TF binding may better predict TE cis-regulatory activity than differences in histone marks. CONCLUSION Our results suggest that TE-embedded CREs contribute to gene regulation during and beyond gastrulation. On a methodological level, we provide a statistical tool that infers TE-dependent cis-regulation from RNA-seq data alone, thus facilitating the study of TEs in the next-generation sequencing era.
Collapse
Affiliation(s)
- Cyril Pulver
- School of Life Sciences, Swiss Federal Institute of Technology Lausanne (EPFL), CH-1015, Lausanne, Switzerland
| | - Delphine Grun
- School of Life Sciences, Swiss Federal Institute of Technology Lausanne (EPFL), CH-1015, Lausanne, Switzerland
| | - Julien Duc
- School of Life Sciences, Swiss Federal Institute of Technology Lausanne (EPFL), CH-1015, Lausanne, Switzerland
| | - Shaoline Sheppard
- School of Life Sciences, Swiss Federal Institute of Technology Lausanne (EPFL), CH-1015, Lausanne, Switzerland
| | - Evarist Planet
- School of Life Sciences, Swiss Federal Institute of Technology Lausanne (EPFL), CH-1015, Lausanne, Switzerland
| | - Alexandre Coudray
- School of Life Sciences, Swiss Federal Institute of Technology Lausanne (EPFL), CH-1015, Lausanne, Switzerland
| | - Raphaël de Fondeville
- Swiss Data Science Center, Swiss Federal Institute of Technology Lausanne (EPFL), CH-1015, Lausanne, Switzerland.
| | - Julien Pontis
- School of Life Sciences, Swiss Federal Institute of Technology Lausanne (EPFL), CH-1015, Lausanne, Switzerland.
- SOPHiA GENETICS SA, La Pièce 12, CH-1180, Rolle, Switzerland.
| | - Didier Trono
- School of Life Sciences, Swiss Federal Institute of Technology Lausanne (EPFL), CH-1015, Lausanne, Switzerland.
| |
Collapse
|
113
|
Pennance T, Calvelo J, Tennessen JA, Burd R, Cayton J, Bollmann SR, Blouin MS, Spaan JM, Hoffmann FG, Ogara G, Rawago F, Andiego K, Mulonga B, Odhiambo M, Loker ES, Laidemitt MR, Lu L, Iriarte A, Odiere M, Steinauer ML. The genome and transcriptome of the snail Biomphalaria sudanica s.l.: Immune gene diversification and highly polymorphic genomic regions in an important African vector of Schistosoma mansoni. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.01.565203. [PMID: 37961413 PMCID: PMC10635097 DOI: 10.1101/2023.11.01.565203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Background Control and elimination of schistosomiasis is an arduous task, with current strategies proving inadequate to break transmission. Exploration of genetic approaches to interrupt Schistosoma mansoni transmission, the causative agent for human intestinal schistosomiasis in sub-Saharan Africa and South America, has led to genomic research of the snail vector hosts of the genus Biomphalaria. Few complete genomic resources exist, with African Biomphalaria species being particularly underrepresented despite this being where the majority of S. mansoni infections occur. Here we generate and annotate the first genome assembly of Biomphalaria sudanica sensu lato, a species responsible for S. mansoni transmission in lake and marsh habitats of the African Rift Valley. Supported by whole-genome diversity data among five inbred lines, we describe orthologs of immune-relevant gene regions in the South American vector B. glabrata and present a bioinformatic pipeline to identify candidate novel pathogen recognition receptors (PRRs). Results De novo genome and transcriptome assembly of inbred B. sudanica originating from the shoreline of Lake Victoria (Kisumu, Kenya) resulted in a haploid genome size of ~944.2 Mb (6732 fragments, N50=1.067 Mb), comprising 23,598 genes (BUSCO=93.6% complete). The B. sudanica genome contains orthologues to all described immune genes/regions tied to protection against S. mansoni in B. glabrata. The B. sudanica PTC2 candidate immune genomic region contained many PRR-like genes across a much wider genomic region than has been shown in B. glabrata, as well as a large inversion between species. High levels of intra-species nucleotide diversity were seen in PTC2, as well as in regions linked to PTC1 and RADres orthologues. Immune related and putative PRR gene families were significantly over-represented in the sub-set of B. sudanica genes determined as hyperdiverse, including high extracellular diversity in transmembrane genes, which could be under pathogen-mediated balancing selection. However, no overall expansion in immunity related genes were seen in African compared to South American lineages. Conclusions The B. sudanica genome and analyses presented here will facilitate future research in vector immune defense mechanisms against pathogens. This genomic/transcriptomic resource provides necessary data for the future development of molecular snail vector control/surveillance tools, facilitating schistosome transmission interruption mechanisms in Africa.
Collapse
Affiliation(s)
- Tom Pennance
- College of Osteopathic Medicine of the Pacific - Northwest, Western University of Health Sciences, Lebanon OR, USA
| | - Javier Calvelo
- Laboratorio Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo 11600, Uruguay
| | | | - Ryan Burd
- College of Osteopathic Medicine of the Pacific - Northwest, Western University of Health Sciences, Lebanon OR, USA
| | - Jared Cayton
- College of Osteopathic Medicine of the Pacific - Northwest, Western University of Health Sciences, Lebanon OR, USA
| | | | | | - Johannie M Spaan
- College of Osteopathic Medicine of the Pacific - Northwest, Western University of Health Sciences, Lebanon OR, USA
| | - Federico G Hoffmann
- Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State University, Starkville, MS USA
| | - George Ogara
- Centre for Global Health Research, Kenya Medical Research Institute (KEMRI), P. O. Box 1578-40100, Kisumu, Kenya
| | - Fredrick Rawago
- Centre for Global Health Research, Kenya Medical Research Institute (KEMRI), P. O. Box 1578-40100, Kisumu, Kenya
| | - Kennedy Andiego
- Centre for Global Health Research, Kenya Medical Research Institute (KEMRI), P. O. Box 1578-40100, Kisumu, Kenya
| | - Boaz Mulonga
- Centre for Global Health Research, Kenya Medical Research Institute (KEMRI), P. O. Box 1578-40100, Kisumu, Kenya
| | - Meredith Odhiambo
- Centre for Global Health Research, Kenya Medical Research Institute (KEMRI), P. O. Box 1578-40100, Kisumu, Kenya
| | - Eric S Loker
- Department of Biology, Center for Evolutionary and Theoretical Immunology, Parasite Division Museum of Southwestern Biology, University of New Mexico, Albuquerque, New Mexico 87131, U.S.A
| | - Martina R Laidemitt
- Department of Biology, Center for Evolutionary and Theoretical Immunology, Parasite Division Museum of Southwestern Biology, University of New Mexico, Albuquerque, New Mexico 87131, U.S.A
| | - Lijun Lu
- Department of Biology, Center for Evolutionary and Theoretical Immunology, Parasite Division Museum of Southwestern Biology, University of New Mexico, Albuquerque, New Mexico 87131, U.S.A
| | - Andrés Iriarte
- Laboratorio Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo 11600, Uruguay
| | - Maurice Odiere
- Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State University, Starkville, MS USA
| | - Michelle L Steinauer
- College of Osteopathic Medicine of the Pacific - Northwest, Western University of Health Sciences, Lebanon OR, USA
| |
Collapse
|
114
|
Bramsiepe J, Krabberød AK, Bjerkan KN, Alling RM, Johannessen IM, Hornslien KS, Miller JR, Brysting AK, Grini PE. Structural evidence for MADS-box type I family expansion seen in new assemblies of Arabidopsis arenosa and A. lyrata. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 116:942-961. [PMID: 37517071 DOI: 10.1111/tpj.16401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 05/24/2023] [Accepted: 07/13/2023] [Indexed: 08/01/2023]
Abstract
Arabidopsis thaliana diverged from A. arenosa and A. lyrata at least 6 million years ago. The three species differ by genome-wide polymorphisms and morphological traits. The species are to a high degree reproductively isolated, but hybridization barriers are incomplete. A special type of hybridization barrier is based on the triploid endosperm of the seed, where embryo lethality is caused by endosperm failure to support the developing embryo. The MADS-box type I family of transcription factors is specifically expressed in the endosperm and has been proposed to play a role in endosperm-based hybridization barriers. The gene family is well known for its high evolutionary duplication rate, as well as being regulated by genomic imprinting. Here we address MADS-box type I gene family evolution and the role of type I genes in the context of hybridization. Using two de-novo assembled and annotated chromosome-level genomes of A. arenosa and A. lyrata ssp. petraea we analyzed the MADS-box type I gene family in Arabidopsis to predict orthologs, copy number, and structural genomic variation related to the type I loci. Our findings were compared to gene expression profiles sampled before and after the transition to endosperm cellularization in order to investigate the involvement of MADS-box type I loci in endosperm-based hybridization barriers. We observed substantial differences in type-I expression in the endosperm of A. arenosa and A. lyrata ssp. petraea, suggesting a genetic cause for the endosperm-based hybridization barrier between A. arenosa and A. lyrata ssp. petraea.
Collapse
Affiliation(s)
- Jonathan Bramsiepe
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
- CEES, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Anders K Krabberød
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Katrine N Bjerkan
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
- CEES, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Renate M Alling
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
- CEES, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Ida M Johannessen
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Karina S Hornslien
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Jason R Miller
- College of STEM, Shepherd University, Shepherdstown, West Virginia, 25443-5000, USA
| | - Anne K Brysting
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
- CEES, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Paul E Grini
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| |
Collapse
|
115
|
Van Etten J, Stephens TG, Bhattacharya D. A k-mer-Based Approach for Phylogenetic Classification of Taxa in Environmental Genomic Data. Syst Biol 2023; 72:1101-1118. [PMID: 37314057 DOI: 10.1093/sysbio/syad037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 03/20/2023] [Accepted: 06/12/2023] [Indexed: 06/15/2023] Open
Abstract
In the age of genome sequencing, whole-genome data is readily and frequently generated, leading to a wealth of new information that can be used to advance various fields of research. New approaches, such as alignment-free phylogenetic methods that utilize k-mer-based distance scoring, are becoming increasingly popular given their ability to rapidly generate phylogenetic information from whole-genome data. However, these methods have not yet been tested using environmental data, which often tends to be highly fragmented and incomplete. Here, we compare the results of one alignment-free approach (which utilizes the D2 statistic) to traditional multi-gene maximum likelihood trees in 3 algal groups that have high-quality genome data available. In addition, we simulate lower-quality, fragmented genome data using these algae to test method robustness to genome quality and completeness. Finally, we apply the alignment-free approach to environmental metagenome assembled genome data of unclassified Saccharibacteria and Trebouxiophyte algae, and single-cell amplified data from uncultured marine stramenopiles to demonstrate its utility with real datasets. We find that in all instances, the alignment-free method produces phylogenies that are comparable, and often more informative, than those created using the traditional multi-gene approach. The k-mer-based method performs well even when there are significant missing data that include marker genes traditionally used for tree reconstruction. Our results demonstrate the value of alignment-free approaches for classifying novel, often cryptic or rare, species, that may not be culturable or are difficult to access using single-cell methods, but fill important gaps in the tree of life.
Collapse
Affiliation(s)
- Julia Van Etten
- Graduate Program in Ecology and Evolution, Rutgers, The State University of New Jersey, 14 College Farm Road, New Brunswick, NJ 08901, USA
| | - Timothy G Stephens
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, 59 Dudley Road, New Brunswick, NJ 08901, USA
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, 59 Dudley Road, New Brunswick, NJ 08901, USA
| |
Collapse
|
116
|
Dionisio JF, Pezenti LF, de Souza RF, Sosa-Gómez DR, da Rosa R. Annotation of transposable elements in the transcriptome of the Neotropical brown stink bug Euschistus heros and its chromosomal distribution. Mol Genet Genomics 2023; 298:1377-1388. [PMID: 37646857 DOI: 10.1007/s00438-023-02063-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 08/17/2023] [Indexed: 09/01/2023]
Abstract
Transposable elements (TEs) are DNA sequences capable of moving within the genome. Their distribution is very dynamic among organisms, and despite advances, there are still gaps in the understanding of the diversity and evolution of TEs in many insect species. In the case of Euschistus heros, considered the main stink bug in the soybean crop in Brazil, little is known about the participation of these elements. Therefore, the objective of the current work was to identify the different groups of transposable elements present in the E. heros transcriptome, evidencing their chromosomal distribution. Through RNA-Seq and de novo assembly, 60,009 transcripts were obtained, which were annotated locally via Blastn against specific databases. Of the 367 transcripts identified as TEs, 202 belong to Class II, with emphasis on the TIR order. Among Class I elements or retrotransposons, most were characterized as LINE. Phylogenetic analyses were performed with the protein domains, evidencing differences between Tc1-mariner sequences, which may be related to possible horizontal transfer events. The transposable elements that stood out in the transcriptome were selected for fluorescent in situ hybridization. DNA transposon probes hAT, Helitron, and Tc1-mariner showed mostly scattered signals, with the presence of some blocks. Retrotransposon probes Copia, Gypsy, Jockey, and RTE showed a more pulverized hybridization pattern, with the presence of small interstitial and/or terminal blocks. Studies like this one, integrating functional genomics and molecular cytogenetic tools, are essential to expanding knowledge about transcriptionally active mobile elements, and their behavior in the chromosomes.
Collapse
Affiliation(s)
- Jaqueline Fernanda Dionisio
- Laboratório de Citogenética e Entomologia Molecular, Departamento de Biologia Geral, Universidade Estadual de Londrina, Rodovia Celso Garcia Cid, PR 445 Km 350, Campus Universitário, Caixa Postal: 10.011, Londrina, PR, CEP:86.057-970, Brazil
| | - Larissa Forim Pezenti
- Laboratório de Citogenética e Entomologia Molecular, Departamento de Biologia Geral, Universidade Estadual de Londrina, Rodovia Celso Garcia Cid, PR 445 Km 350, Campus Universitário, Caixa Postal: 10.011, Londrina, PR, CEP:86.057-970, Brazil
- Laboratório de Bioinformática, Departamento de Biologia Geral, Universidade Estadual de Londrina, Caixa Postal: 10.011, Londrina, PR, CEP:86.057-970, Brazil
| | - Rogério Fernandes de Souza
- Laboratório de Bioinformática, Departamento de Biologia Geral, Universidade Estadual de Londrina, Caixa Postal: 10.011, Londrina, PR, CEP:86.057-970, Brazil
| | - Daniel Ricardo Sosa-Gómez
- Empresa Brasileira de Pesquisa Agropecuária/Centro Nacional de Pesquisa de Soja (Embrapa Soja), Caixa Postal: 4006, Londrina, PR, CEP: 86085-981, Brazil
| | - Renata da Rosa
- Laboratório de Citogenética e Entomologia Molecular, Departamento de Biologia Geral, Universidade Estadual de Londrina, Rodovia Celso Garcia Cid, PR 445 Km 350, Campus Universitário, Caixa Postal: 10.011, Londrina, PR, CEP:86.057-970, Brazil.
| |
Collapse
|
117
|
Tselika M, Belmezos N, Kallemi P, Andronis C, Chiumenti M, Navarro B, Lavigne M, Di Serio F, Kalantidis K, Katsarou K. PSTVd infection in Nicotiana benthamiana plants has a minor yet detectable effect on CG methylation. FRONTIERS IN PLANT SCIENCE 2023; 14:1258023. [PMID: 38023875 PMCID: PMC10645062 DOI: 10.3389/fpls.2023.1258023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 10/13/2023] [Indexed: 12/01/2023]
Abstract
Viroids are small circular RNAs infecting a wide range of plants. They do not code for any protein or peptide and therefore rely on their structure for their biological cycle. Observed phenotypes of viroid infected plants are thought to occur through changes at the transcriptional/translational level of the host. A mechanism involved in such changes is RNA-directed DNA methylation (RdDM). Till today, there are contradictory works about viroids interference of RdDM. In this study, we investigated the epigenetic effect of viroid infection in Nicotiana benthamiana plants. Using potato spindle tuber viroid (PSTVd) as the triggering pathogen and via bioinformatic analyses, we identified endogenous gene promoters and transposable elements targeted by 24 nt host siRNAs that differentially accumulated in PSTVd-infected and healthy plants. The methylation status of these targets was evaluated following digestion with methylation-sensitive restriction enzymes coupled with PCR amplification, and bisulfite sequencing. In addition, we used Methylation Sensitive Amplification Polymorphism (MSAP) followed by sequencing (MSAP-seq) to study genomic DNA methylation of 5-methylcytosine (5mC) in CG sites upon viroid infection. In this study we identified a limited number of target loci differentially methylated upon PSTVd infection. These results enhance our understanding of the epigenetic host changes as a result of pospiviroid infection.
Collapse
Affiliation(s)
- Martha Tselika
- Department of Biology, University of Crete, Heraklion, Crete, Greece
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Heraklion, Crete, Greece
| | | | - Paraskevi Kallemi
- Department of Biology, University of Crete, Heraklion, Crete, Greece
| | - Christos Andronis
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Heraklion, Crete, Greece
| | - Michela Chiumenti
- Istituto per la Protezione Sostenibile delle Piante, Consiglio Nazionale delle Ricerche, Bari, Italy
| | - Beatriz Navarro
- Istituto per la Protezione Sostenibile delle Piante, Consiglio Nazionale delle Ricerche, Bari, Italy
| | - Matthieu Lavigne
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Heraklion, Crete, Greece
| | - Francesco Di Serio
- Istituto per la Protezione Sostenibile delle Piante, Consiglio Nazionale delle Ricerche, Bari, Italy
| | - Kriton Kalantidis
- Department of Biology, University of Crete, Heraklion, Crete, Greece
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Heraklion, Crete, Greece
| | - Konstantina Katsarou
- Department of Biology, University of Crete, Heraklion, Crete, Greece
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Heraklion, Crete, Greece
| |
Collapse
|
118
|
Abante J, Wang PL, Salzman J. DIVE: a reference-free statistical approach to diversity-generating and mobile genetic element discovery. Genome Biol 2023; 24:240. [PMID: 37864197 PMCID: PMC10589994 DOI: 10.1186/s13059-023-03038-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 08/14/2023] [Indexed: 10/22/2023] Open
Abstract
Diversity-generating and mobile genetic elements are key to microbial and viral evolution and can result in evolutionary leaps. State-of-the-art algorithms to detect these elements have limitations. Here, we introduce DIVE, a new reference-free approach to overcome these limitations using information contained in sequencing reads alone. We show that DIVE has improved detection power compared to existing reference-based methods using simulations and real data. We use DIVE to rediscover and characterize the activity of known and novel elements and generate new biological hypotheses about the mobilome. Building on DIVE, we develop a reference-free framework capable of de novo discovery of mobile genetic elements.
Collapse
Affiliation(s)
- Jordi Abante
- Biomedical Data Science, Stanford University, 1265 Welch Rd, Palo Alto, 94305, CA, USA
- Center for Computational, Evolutionary and Human Genomics, Stanford University, 327 Campus Drive, Stanford, 94305, CA, USA
- Current address: Department of Biomedical Sciences, Universitat de Barcelona, Casanova 143, Barcelona, 08036, Spain
| | - Peter L Wang
- Biomedical Data Science, Stanford University, 1265 Welch Rd, Palo Alto, 94305, CA, USA
- Department of Biochemistry, Stanford University, 279 Campus Drive, Stanford, 94305, CA, USA
| | - Julia Salzman
- Biomedical Data Science, Stanford University, 1265 Welch Rd, Palo Alto, 94305, CA, USA.
- Department of Biochemistry, Stanford University, 279 Campus Drive, Stanford, 94305, CA, USA.
- Department of Statistics, Stanford University, 390 Serra Mall, Stanford, 94305, CA, USA.
| |
Collapse
|
119
|
Oliveira DS, Fablet M, Larue A, Vallier A, Carareto CA, Rebollo R, Vieira C. ChimeraTE: a pipeline to detect chimeric transcripts derived from genes and transposable elements. Nucleic Acids Res 2023; 51:9764-9784. [PMID: 37615575 PMCID: PMC10570057 DOI: 10.1093/nar/gkad671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 07/25/2023] [Accepted: 08/09/2023] [Indexed: 08/25/2023] Open
Abstract
Transposable elements (TEs) produce structural variants and are considered an important source of genetic diversity. Notably, TE-gene fusion transcripts, i.e. chimeric transcripts, have been associated with adaptation in several species. However, the identification of these chimeras remains hindered due to the lack of detection tools at a transcriptome-wide scale, and to the reliance on a reference genome, even though different individuals/cells/strains have different TE insertions. Therefore, we developed ChimeraTE, a pipeline that uses paired-end RNA-seq reads to identify chimeric transcripts through two different modes. Mode 1 is the reference-guided approach that employs canonical genome alignment, and Mode 2 identifies chimeras derived from fixed or insertionally polymorphic TEs without any reference genome. We have validated both modes using RNA-seq data from four Drosophila melanogaster wild-type strains. We found ∼1.12% of all genes generating chimeric transcripts, most of them from TE-exonized sequences. Approximately ∼23% of all detected chimeras were absent from the reference genome, indicating that TEs belonging to chimeric transcripts may be recent, polymorphic insertions. ChimeraTE is the first pipeline able to automatically uncover chimeric transcripts without a reference genome, consisting of two running Modes that can be used as a tool to investigate the contribution of TEs to transcriptome plasticity.
Collapse
Affiliation(s)
- Daniel S Oliveira
- São Paulo State University (Unesp), Institute of Biosciences, Humanities and Exact Sciences, São José do Rio Preto, SP, Brazil
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR5558, Villeurbanne, Rhone-Alpes, 69100, France
| | - Marie Fablet
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR5558, Villeurbanne, Rhone-Alpes, 69100, France
- Institut Universitaire de France (IUF), Paris, Île-de-FranceF-75231, France
| | - Anaïs Larue
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR5558, Villeurbanne, Rhone-Alpes, 69100, France
- Univ Lyon, INRAE, INSA-Lyon, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Agnès Vallier
- Univ Lyon, INRAE, INSA-Lyon, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Claudia M A Carareto
- São Paulo State University (Unesp), Institute of Biosciences, Humanities and Exact Sciences, São José do Rio Preto, SP, Brazil
| | - Rita Rebollo
- Univ Lyon, INRAE, INSA-Lyon, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Cristina Vieira
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR5558, Villeurbanne, Rhone-Alpes, 69100, France
| |
Collapse
|
120
|
Chaturvedi A, Li X, Dhandapani V, Marshall H, Kissane S, Cuenca-Cambronero M, Asole G, Calvet F, Ruiz-Romero M, Marangio P, Guigó R, Rago D, Mirbahai L, Eastwood N, Colbourne J, Zhou J, Mallon E, Orsini L. The hologenome of Daphnia magna reveals possible DNA methylation and microbiome-mediated evolution of the host genome. Nucleic Acids Res 2023; 51:9785-9803. [PMID: 37638757 PMCID: PMC10570034 DOI: 10.1093/nar/gkad685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 07/07/2023] [Accepted: 08/09/2023] [Indexed: 08/29/2023] Open
Abstract
Properties that make organisms ideal laboratory models in developmental and medical research are often the ones that also make them less representative of wild relatives. The waterflea Daphnia magna is an exception, by both sharing many properties with established laboratory models and being a keystone species, a sentinel species for assessing water quality, an indicator of environmental change and an established ecotoxicology model. Yet, Daphnia's full potential has not been fully exploited because of the challenges associated with assembling and annotating its gene-rich genome. Here, we present the first hologenome of Daphnia magna, consisting of a chromosomal-level assembly of the D. magna genome and the draft assembly of its metagenome. By sequencing and mapping transcriptomes from exposures to environmental conditions and from developmental morphological landmarks, we expand the previously annotates gene set for this species. We also provide evidence for the potential role of gene-body DNA-methylation as a mutagen mediating genome evolution. For the first time, our study shows that the gut microbes provide resistance to commonly used antibiotics and virulence factors, potentially mediating Daphnia's environmental-driven rapid evolution. Key findings in this study improve our understanding of the contribution of DNA methylation and gut microbiota to genome evolution in response to rapidly changing environments.
Collapse
Affiliation(s)
- Anurag Chaturvedi
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
| | - Xiaojing Li
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
| | - Vignesh Dhandapani
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
| | - Hollie Marshall
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
- Department of Genetics and Genome Biology, the University of Leicester, Leicester LE1 7RH, UK
| | - Stephen Kissane
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
| | - Maria Cuenca-Cambronero
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
- Aquatic Ecology Group, University of Vic - Central University of Catalonia, 08500 Vic, Spain
| | - Giovanni Asole
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Barcelona, Catalonia, Spain
| | - Ferriol Calvet
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Barcelona, Catalonia, Spain
| | - Marina Ruiz-Romero
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Barcelona, Catalonia, Spain
| | - Paolo Marangio
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Barcelona, Catalonia, Spain
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Barcelona, Catalonia, Spain
| | - Daria Rago
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
| | - Leda Mirbahai
- Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| | - Niamh Eastwood
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
| | - John K Colbourne
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
| | - Jiarui Zhou
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
| | - Eamonn Mallon
- Department of Genetics and Genome Biology, the University of Leicester, Leicester LE1 7RH, UK
| | - Luisa Orsini
- Environmental Genomics Group, School of Biosciences, and Institute for Interdisciplinary Data Science and AI, the University of Birmingham, Birmingham B15 2TT, UK
- The Alan Turing Institute, British Library, London NW1 2DB, UK
| |
Collapse
|
121
|
Sahoo RK, Manu S, Chandrakumaran NK, Vasudevan K. Nuclear and Mitochondrial Genome Assemblies of the Beetle, Zygogramma bicolorata, a Globally Important Biocontrol Agent of Invasive Weed Parthenium hysterophorus. Genome Biol Evol 2023; 15:evad188. [PMID: 37831427 PMCID: PMC10603765 DOI: 10.1093/gbe/evad188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/05/2023] [Accepted: 10/09/2023] [Indexed: 10/14/2023] Open
Abstract
Implementing a genetic-based approach to achieve the full potential of classical biocontrol programs has been advocated for decades. The availability of genome-level information brings the opportunity to scrutinize biocontrol traits for their efficacy and evolvability. However, implementation of this advocacy remains limited to few instances. Biocontrol of a globally noxious weed, Parthenium hysterophorus, by the leaf-feeding beetle, Zygogramma bicolorata, has been in place for more than four decades now, with varying levels of success. As the first step in providing genetic-based improvement to this biocontrol program, we describe the nuclear and mitochondrial assemblies of Z. bicolorata. We assembled the genome from the long-read sequence data, error corrected with high-throughput short reads and checked for contaminants and sequence duplication to produce a 936 Mb nuclear genome. With 96.5% Benchmarking Universal Single-Copy Orthologs completeness and the long terminal repeat assembly index 12.91, we present a reference-quality assembly that appeared to be repeat rich at 62.7% genome-wide and consists of 29,437 protein-coding regions. We detected signature of nuclear insertion of mitochondrial fragments in 80 nuclear positions comprising 13 kb out of 17.9 kb mitochondria genome sequence. This genome, along with its annotations, provides a valuable resource to gain further insights into the biocontrol traits of Z. bicolorata for improving the control of the invasive weed P. hysterophorus.
Collapse
Affiliation(s)
- Ranjit Kumar Sahoo
- Laboratory for the Conservation of Endangered Species (LaCONES), CSIR-Centre for Cellular and Molecular Biology (CCMB), Hyderabad, India
| | - Shivakumara Manu
- Laboratory for the Conservation of Endangered Species (LaCONES), CSIR-Centre for Cellular and Molecular Biology (CCMB), Hyderabad, India
| | - Naveen Kumar Chandrakumaran
- Laboratory for the Conservation of Endangered Species (LaCONES), CSIR-Centre for Cellular and Molecular Biology (CCMB), Hyderabad, India
| | - Karthikeyan Vasudevan
- Laboratory for the Conservation of Endangered Species (LaCONES), CSIR-Centre for Cellular and Molecular Biology (CCMB), Hyderabad, India
| |
Collapse
|
122
|
Sproul JS, Hotaling S, Heckenhauer J, Powell A, Marshall D, Larracuente AM, Kelley JL, Pauls SU, Frandsen PB. Analyses of 600+ insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges. Genome Res 2023; 33:1708-1717. [PMID: 37739812 PMCID: PMC10691545 DOI: 10.1101/gr.277387.122] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 09/20/2023] [Indexed: 09/24/2023]
Abstract
Repetitive elements (REs) are integral to the composition, structure, and function of eukaryotic genomes, yet remain understudied in most taxonomic groups. We investigated REs across 601 insect species and report wide variation in RE dynamics across groups. Analysis of associations between REs and protein-coding genes revealed dynamic evolution at the interface between REs and coding regions across insects, including notably elevated RE-gene associations in lineages with abundant long interspersed nuclear elements (LINEs). We leveraged this large, empirical data set to quantify impacts of long-read technology on RE detection and investigate fundamental challenges to RE annotation in diverse groups. In long-read assemblies, we detected ∼36% more REs than short-read assemblies, with long terminal repeats (LTRs) showing 162% increased detection, whereas DNA transposons and LINEs showed less respective technology-related bias. In most insect lineages, 25%-85% of repetitive sequences were "unclassified" following automated annotation, compared with only ∼13% in Drosophila species. Although the diversity of available insect genomes has rapidly expanded, we show the rate of community contributions to RE databases has not kept pace, preventing efficient annotation and high-resolution study of REs in most groups. We highlight the tremendous opportunity and need for the biodiversity genomics field to embrace REs and suggest collective steps for making progress toward this goal.
Collapse
Affiliation(s)
- John S Sproul
- Department of Biology, Brigham Young University, Provo, Utah 84602, USA;
- Department of Biology, University of Nebraska Omaha, Omaha, Nebraska 68182, USA
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | - Scott Hotaling
- School of Biological Sciences, Washington State University, Pullman, Washington 99163, USA
- Department of Watershed Sciences, Utah State University, Logan, Utah 84322, USA
| | - Jacqueline Heckenhauer
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany
| | - Ashlyn Powell
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah 84602, USA
| | - Dez Marshall
- Department of Biology, University of Nebraska Omaha, Omaha, Nebraska 68182, USA
| | | | - Joanna L Kelley
- School of Biological Sciences, Washington State University, Pullman, Washington 99163, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Steffen U Pauls
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany
- Department of Insect Biotechnology, Justus-Liebig-University Gießen, 35392 Gießen, Germany
| | - Paul B Frandsen
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah 84602, USA
- Data Science Lab, Smithsonian Institution, Washington, District of Columbia 20560, USA
| |
Collapse
|
123
|
Simpson J, Kozak CA, Boso G. Evolutionary conservation of an ancient retroviral gagpol gene in Artiodactyla. J Virol 2023; 97:e0053523. [PMID: 37668369 PMCID: PMC10537755 DOI: 10.1128/jvi.00535-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/28/2023] [Indexed: 09/06/2023] Open
Abstract
The genomes of mammals contain fingerprints of past infections by ancient retroviruses that invaded the germline of their ancestors. Most of these endogenous retroviruses (ERVs) contain only remnants of the original retrovirus; however, on rare occasions, ERV genes can be co-opted for a beneficial host function. While most studies of co-opted ERVs have focused on envelope genes, including the syncytins that function in placentation, there are examples of co-opted gag genes including one we recently discovered in simian primates. Here, we searched for other intact gag genes in non-primate mammalian lineages. We began by examining the genomes of extant camel species, which represent a basal lineage in the order Artiodactyla. This identified a gagpol gene with a large open reading frame (ORF) (>3,500 bp) in the same orthologous location in Artiodactyla species but that is absent in other mammals. Thus, this ERV was fixed in the common ancestor of all Artiodactyla at least 64 million years ago. The amino acid sequence of this gene, termed ARTgagpol, contains recognizable matrix, capsid, nucleocapsid, and reverse transcriptase domains in ruminants, with an RNase H domain in camels and pigs. Phylogenetic analysis and structural prediction of its reverse transcriptase and RNase H domains groups ARTgagpol with gammaretroviruses. Transcriptomic analysis shows ARTgagpol expression in multiple tissues suggestive of a co-opted host function. These findings identify the oldest and largest ERV-derived gagpol gene with an intact ORF in mammals, an intriguing milestone in the co-evolution of mammals and retroviruses. IMPORTANCE Retroviruses are unique among viruses that infect animals as they integrate their reverse-transcribed double-stranded DNA into host chromosomes. When this happens in a germline cell, such as sperm, egg, or their precursors, the integrated retroviral copies can be passed on to the next generation as endogenous retroviruses (ERVs). On rare occasions, the genes of these ERVs can be domesticated by the host. In this study we used computational similarity searches to identify an ancient ERV with an intact viral gagpol gene in the genomes of camels that is also found in the same genomic location in other even-toed ungulates suggesting that it is at least 64 million years old. Broad tissue expression and predicted preservation of the reverse transcriptase fold of this protein suggest that it may be domesticated for a host function. This is the oldest known intact gagpol gene of an ancient retrovirus in mammals.
Collapse
Affiliation(s)
- J'Zaria Simpson
- Laboratory of Molecular Microbiology, National Institute of Allergy and Infectious Diseases, Bethesda, Maryland, USA
| | - Christine A. Kozak
- Laboratory of Molecular Microbiology, National Institute of Allergy and Infectious Diseases, Bethesda, Maryland, USA
| | - Guney Boso
- Laboratory of Molecular Microbiology, National Institute of Allergy and Infectious Diseases, Bethesda, Maryland, USA
| |
Collapse
|
124
|
Yang J, Cook L, Chen Z. Systematic Perturbation of Thousands of Retroviral LTRs in Mouse Embryos. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.19.558531. [PMID: 37781606 PMCID: PMC10541133 DOI: 10.1101/2023.09.19.558531] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
In mammals, many retrotransposons are de-repressed during zygotic genome activation (ZGA). However, their functions in early development remain elusive largely due to the challenge to simultaneously manipulate thousands of retrotransposon insertions in embryos. Here, we employed epigenome editing to perturb the long terminal repeat (LTR) MT2_Mm, a well-known ZGA and totipotency marker that exists in ~2667 insertions throughout the mouse genome. CRISPRi robustly repressed 2485 (~93%) MT2_Mm insertions and 1090 (~55%) insertions of the closely related MT2C_Mm in 2-cell embryos. Remarkably, such perturbation caused down-regulation of hundreds of ZGA genes at the 2-cell stage and embryonic arrest mostly at the morula stage. Mechanistically, MT2_Mm/MT2C_Mm primarily served as alternative ZGA promoters activated by OBOX proteins. Thus, through unprecedented large-scale epigenome editing, we addressed to what extent MT2_Mm/MT2C_Mm regulates ZGA and preimplantation development. Our approach could be adapted to systematically perturb retrotransposons in other mammalian embryos as it doesn't require transgenic animals.
Collapse
Affiliation(s)
- Jian Yang
- Reproductive Sciences Center, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, 45229, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, 45229, Ohio, USA
| | - Lauryn Cook
- Reproductive Sciences Center, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, 45229, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, 45229, Ohio, USA
| | - Zhiyuan Chen
- Reproductive Sciences Center, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, 45229, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, 45229, Ohio, USA
| |
Collapse
|
125
|
Lai H, Feng N, Zhai Q. Discovery of the major 15-30 nt mammalian small RNAs, their biogenesis and function. Nat Commun 2023; 14:5796. [PMID: 37723159 PMCID: PMC10507107 DOI: 10.1038/s41467-023-41554-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 09/08/2023] [Indexed: 09/20/2023] Open
Abstract
Small RNAs (sRNAs) within 15-30 nt such as miRNA, tsRNA, srRNA with 3'-OH have been identified. However, whether these sRNAs are the major 15-30 nt sRNAs is still unknown. Here we show about 90% mammalian sRNAs within 15-30 nt end with 2',3'-cyclic phosphate (3'-cP). TANT-seq was developed to simultaneously profile sRNAs with 3'-cP (sRNA-cPs) and sRNA-OHs, and huge amount of sRNA-cPs were detected. Surprisingly, sRNA-cPs and sRNA-OHs usually have distinct sequences. The data from TANT-seq were validated by a novel method termed TE-qPCR, and Northern blot. Furthermore, we found that Angiogenin and RNase 4 contribute to the biogenesis of sRNA-cPs. Moreover, much more sRNA-cPs than sRNA-OHs bind to Ago2, and can regulate gene expression. Particularly, snR-2-cP regulates Bcl2 by targeting to its 3'UTR dependent on Ago2, and subsequently regulates apoptosis. In addition, sRNA-cPs can guide the cleavage of target RNAs in Ago2 complex as miRNAs without the requirement of 3'-cP. Our discovery greatly expands the repertoire of mammalian sRNAs, and provides strategies and powerful tools towards further investigation of sRNA-cPs.
Collapse
Affiliation(s)
- Hejin Lai
- CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ning Feng
- CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Qiwei Zhai
- CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
126
|
Pasquesi GIM, Allen H, Ivancevic A, Barbachano-Guerrero A, Joyner O, Guo K, Simpson DM, Gapin K, Horton I, Nguyen L, Yang Q, Warren CJ, Florea LD, Bitler BG, Santiago ML, Sawyer SL, Chuong EB. Regulation of human interferon signaling by transposon exonization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557241. [PMID: 37745311 PMCID: PMC10515820 DOI: 10.1101/2023.09.11.557241] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Innate immune signaling is essential for clearing pathogens and damaged cells, and must be tightly regulated to avoid excessive inflammation or autoimmunity. Here, we found that the alternative splicing of exons derived from transposable elements is a key mechanism controlling immune signaling in human cells. By analyzing long-read transcriptome datasets, we identified numerous transposon exonization events predicted to generate functional protein variants of immune genes, including the type I interferon receptor IFNAR2. We demonstrated that the transposon-derived isoform of IFNAR2 is more highly expressed than the canonical isoform in almost all tissues, and functions as a decoy receptor that potently inhibits interferon signaling including in cells infected with SARS-CoV-2. Our findings uncover a primate-specific axis controlling interferon signaling and show how a transposon exonization event can be co-opted for immune regulation.
Collapse
Affiliation(s)
- Giulia Irene Maria Pasquesi
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
- Crnic Institute Boulder Branch, BioFrontiers Institute, University of Colorado Boulder, Boulder, CO, 80303
| | - Holly Allen
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Atma Ivancevic
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Arturo Barbachano-Guerrero
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Olivia Joyner
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Kejun Guo
- Division of Infectious Diseases, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045
| | - David M. Simpson
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Keala Gapin
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Isabella Horton
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Lily Nguyen
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045
| | - Qing Yang
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
- Fred Hutchinson Cancer Research Center, Seattle, WA, 98109
| | - Cody J. Warren
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
- The Ohio State University College of Veterinary Medicine, Columbus, OH, 43210
| | - Liliana D. Florea
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205
| | - Benjamin G. Bitler
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045
| | - Mario L. Santiago
- Division of Infectious Diseases, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045
| | - Sara L. Sawyer
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Edward B. Chuong
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
- Crnic Institute Boulder Branch, BioFrontiers Institute, University of Colorado Boulder, Boulder, CO, 80303
| |
Collapse
|
127
|
Blaz J, Galindo LJ, Heiss AA, Kaur H, Torruella G, Yang A, Alexa Thompson L, Filbert A, Warring S, Narechania A, Shiratori T, Ishida KI, Dacks JB, López-García P, Moreira D, Kim E, Eme L. One high quality genome and two transcriptome datasets for new species of Mantamonas, a deep-branching eukaryote clade. Sci Data 2023; 10:603. [PMID: 37689692 PMCID: PMC10492846 DOI: 10.1038/s41597-023-02488-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 08/18/2023] [Indexed: 09/11/2023] Open
Abstract
Mantamonads were long considered to represent an "orphan" lineage in the tree of eukaryotes, likely branching near the most frequently assumed position for the root of eukaryotes. Recent phylogenomic analyses have placed them as part of the "CRuMs" supergroup, along with collodictyonids and rigifilids. This supergroup appears to branch at the base of Amorphea, making it of special importance for understanding the deep evolutionary history of eukaryotes. However, the lack of representative species and complete genomic data associated with them has hampered the investigation of their biology and evolution. Here, we isolated and described two new species of mantamonads, Mantamonas vickermani sp. nov. and Mantamonas sphyraenae sp. nov., for each of which we generated transcriptomic sequence data, as well as a high-quality genome for the latter. The estimated size of the M. sphyraenae genome is 25 Mb; our de novo assembly appears to be highly contiguous and complete with 9,416 predicted protein-coding genes. This near-chromosome-scale genome assembly is the first described for the CRuMs supergroup.
Collapse
Affiliation(s)
- Jazmin Blaz
- Unité d'Ecologie Systématique et Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif-sur-Yvette, France
| | - Luis Javier Galindo
- Unité d'Ecologie Systématique et Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif-sur-Yvette, France
- Department of Biology, University of Oxford, Oxford, United Kingdom
| | - Aaron A Heiss
- Institute of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
- Department of Oceanography, Kyungpook National University, Daegu, South Korea
| | - Harpreet Kaur
- Division of Infectious Disease, Department of Medicine, University of Alberta and Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Guifré Torruella
- Unité d'Ecologie Systématique et Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif-sur-Yvette, France
| | - Ashley Yang
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - L Alexa Thompson
- Division of Infectious Disease, Department of Medicine, University of Alberta and Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Alexander Filbert
- Division of Infectious Disease, Department of Medicine, University of Alberta and Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Sally Warring
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Apurva Narechania
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - Takashi Shiratori
- Institute of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan
| | - Ken-Ichiro Ishida
- Institute of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan
| | - Joel B Dacks
- Department of Oceanography, Kyungpook National University, Daegu, South Korea
- Centre for Life's Origin and Evolution, Department of Genetics, Evolution & Environment, University College London, London, United Kingdom
| | - Purificación López-García
- Unité d'Ecologie Systématique et Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif-sur-Yvette, France
| | - David Moreira
- Unité d'Ecologie Systématique et Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif-sur-Yvette, France
| | - Eunsoo Kim
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA.
- Division of EcoScience, Ewha Womans University, Seoul, South Korea.
| | - Laura Eme
- Unité d'Ecologie Systématique et Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif-sur-Yvette, France.
| |
Collapse
|
128
|
Zuo Z. Quantifying the arms race between LINE-1 and KRAB-zinc finger genes through TECookbook. NAR Genom Bioinform 2023; 5:lqad078. [PMID: 37680368 PMCID: PMC10480687 DOI: 10.1093/nargab/lqad078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 07/13/2023] [Accepted: 08/22/2023] [Indexed: 09/09/2023] Open
Abstract
To defend against the invasion of transposons, hundreds of KRAB-zinc finger genes (ZNFs) evolved to recognize and silence various repeat families specifically. However, most repeat elements reside in the human genome with high copy numbers, making the ChIP-seq reads of ZNFs targeting these repeats predominantly multi-mapping reads. This complicates downstream data analysis and signal quantification. To better visualize and quantify the arms race between transposons and ZNFs, the R package TECookbook has been developed to lift ChIP-seq data into reference repeat coordinates with proper normalization and extract all putative ZNF binding sites from defined loci of reference repeats for downstream analysis. In conjunction with specificity profiles derived from in vitro Spec-seq data, human ZNF10 has been found to bind to a conserved ORF2 locus of selected LINE-1 subfamilies. This provides insight into how LINE-1 evaded capture at least twice and was subsequently recaptured by ZNF10 during evolutionary history. Through similar analyses, ZNF382 and ZNF248 were shown to be broad-spectrum LINE-1 binders. Overall, this work establishes a general analysis workflow to decipher the arms race between ZNFs and transposons through nucleotide substitutions rather than structural variations, particularly in the protein-coding region of transposons.
Collapse
Affiliation(s)
- Zheng Zuo
- Shenzhen University, Shenzhen, China
| |
Collapse
|
129
|
Parey E, Fernandez-Aroca D, Frost S, Uribarren A, Park TJ, Zöttl M, St John Smith E, Berthelot C, Villar D. Phylogenetic modeling of enhancer shifts in African mole-rats reveals regulatory changes associated with tissue-specific traits. Genome Res 2023; 33:1513-1526. [PMID: 37625847 PMCID: PMC10620049 DOI: 10.1101/gr.277715.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 08/24/2023] [Indexed: 08/27/2023]
Abstract
Changes in gene regulation are thought to underlie most phenotypic differences between species. For subterranean rodents such as the naked mole-rat, proposed phenotypic adaptations include hypoxia tolerance, metabolic changes, and cancer resistance. However, it is largely unknown what regulatory changes may associate with these phenotypic traits, and whether these are unique to the naked mole-rat, the mole-rat clade, or are also present in other mammals. Here, we investigate regulatory evolution in the heart and liver from two African mole-rat species and two rodent outgroups using genome-wide epigenomic profiling. First, we adapted and applied a phylogenetic modeling approach to quantitatively compare epigenomic signals at orthologous regulatory elements and identified thousands of promoter and enhancer regions with differential epigenomic activity in mole-rats. These elements associate with known mole-rat adaptations in metabolic and functional pathways and suggest candidate genetic loci that may underlie mole-rat innovations. Second, we evaluated ancestral and species-specific regulatory changes in the study phylogeny and report several candidate pathways experiencing stepwise remodeling during the evolution of mole-rats, such as the insulin and hypoxia response pathways. Third, we report nonorthologous regulatory elements overlap with lineage-specific repetitive elements and appear to modify metabolic pathways by rewiring of HNF4 and RAR/RXR transcription factor binding sites in mole-rats. These comparative analyses reveal how mole-rat regulatory evolution informs previously reported phenotypic adaptations. Moreover, the phylogenetic modeling framework we propose here improves upon the state of the art by addressing known limitations of inter-species comparisons of epigenomic profiles and has broad implications in the field of comparative functional genomics.
Collapse
Affiliation(s)
- Elise Parey
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Diego Fernandez-Aroca
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, United Kingdom
| | - Stephanie Frost
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, United Kingdom
| | - Ainhoa Uribarren
- Cambridge Institute, Cancer Research UK and University of Cambridge, Cambridge CB2 0RE, United Kingdom
| | - Thomas J Park
- Department of Biological Sciences and Laboratory of Integrative Neuroscience, University of Illinois at Chicago, Chicago, Illinois 60607, USA
| | - Markus Zöttl
- Department of Biology and Environmental Science, Linnaeus University, 44054 Kalmar, Sweden
| | - Ewan St John Smith
- Department of Pharmacology, University of Cambridge, Cambridge CB2 1PD, United Kingdom
| | - Camille Berthelot
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France;
- Institut Pasteur, Université Paris Cité, CNRS UMR 3525, INSERM UA12, Comparative Functional Genomics Group, F-75015 Paris, France
| | - Diego Villar
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, United Kingdom;
| |
Collapse
|
130
|
Hallast P, Ebert P, Loftus M, Yilmaz F, Audano PA, Logsdon GA, Bonder MJ, Zhou W, Höps W, Kim K, Li C, Hoyt SJ, Dishuck PC, Porubsky D, Tsetsos F, Kwon JY, Zhu Q, Munson KM, Hasenfeld P, Harvey WT, Lewis AP, Kordosky J, Hoekzema K, O'Neill RJ, Korbel JO, Tyler-Smith C, Eichler EE, Shi X, Beck CR, Marschall T, Konkel MK, Lee C. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature 2023; 621:355-364. [PMID: 37612510 PMCID: PMC10726138 DOI: 10.1038/s41586-023-06425-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 07/11/2023] [Indexed: 08/25/2023]
Abstract
The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.
Collapse
Affiliation(s)
- Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Wolfram Höps
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Kwondo Kim
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fotios Tsetsos
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jee Young Kwon
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Patrick Hasenfeld
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Jan O Korbel
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
131
|
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, Hook PW, Koren S, Rautiainen M, Alexandrov IA, Allen J, Asri M, Bzikadze AV, Chen NC, Chin CS, Diekhans M, Flicek P, Formenti G, Fungtammasan A, Garcia Giron C, Garrison E, Gershman A, Gerton JL, Grady PGS, Guarracino A, Haggerty L, Halabian R, Hansen NF, Harris R, Hartley GA, Harvey WT, Haukness M, Heinz J, Hourlier T, Hubley RM, Hunt SE, Hwang S, Jain M, Kesharwani RK, Lewis AP, Li H, Logsdon GA, Lucas JK, Makalowski W, Markovic C, Martin FJ, Mc Cartney AM, McCoy RC, McDaniel J, McNulty BM, Medvedev P, Mikheenko A, Munson KM, Murphy TD, Olsen HE, Olson ND, Paulin LF, Porubsky D, Potapova T, Ryabov F, Salzberg SL, Sauria MEG, Sedlazeck FJ, Shafin K, Shepelev VA, Shumate A, Storer JM, Surapaneni L, Taravella Oill AM, Thibaud-Nissen F, Timp W, Tomaszkiewicz M, Vollger MR, Walenz BP, Watwood AC, Weissensteiner MH, Wenger AM, Wilson MA, Zarate S, Zhu Y, Zook JM, Eichler EE, O'Neill RJ, Schatz MC, Miga KH, Makova KD, Phillippy AM. The complete sequence of a human Y chromosome. Nature 2023; 621:344-354. [PMID: 37612512 PMCID: PMC10752217 DOI: 10.1038/s41586-023-06457-y] [Citation(s) in RCA: 92] [Impact Index Per Article: 92.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/19/2023] [Indexed: 08/25/2023]
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies Inc., Oxford, UK
| | - Monika Cechova
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A Alexandrov
- Federal Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Chen-Shan Chin
- GeneDX Holdings Corp, Stamford, CT, USA
- Foundation of Biological Data Science, Belmont, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | | | | | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ariel Gershman
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer L Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical Center, Kansas City, MO, USA
| | - Patrick G S Grady
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Reza Halabian
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Nancy F Hansen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Gabrielle A Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jakob Heinz
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore, MD, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Christopher Markovic
- Genome Technology Access Center at the McDonnell Genome Institute, Washington University, St. Louis, MO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann M Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brandy M McNulty
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E Olsen
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Steven L Salzberg
- Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | | | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Angela M Taravella Oill
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Department of Biomedical Engineering, Pennsylvania State University, State College, PA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison C Watwood
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | | | | | - Melissa A Wilson
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Investigator, Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| | - Michael C Schatz
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
132
|
Sun C, Qi Y, Fowlkes N, Lazic N, Su X, Lozano G, Wasylishen AR. The histone chaperone function of Daxx is dispensable for embryonic development. Cell Death Dis 2023; 14:565. [PMID: 37633949 PMCID: PMC10460429 DOI: 10.1038/s41419-023-06089-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 08/14/2023] [Accepted: 08/18/2023] [Indexed: 08/28/2023]
Abstract
Daxx functions as a histone chaperone for the histone H3 variant, H3.3, and is essential for embryonic development. Daxx interacts with Atrx to form a protein complex that deposits H3.3 into heterochromatic regions of the genome, including centromeres, telomeres, and repeat loci. To advance our understanding of histone chaperone activity in vivo, we developed two Daxx mutant alleles in the mouse germline, which abolish the interactions between Daxx and Atrx (DaxxY130A), and Daxx and H3.3 (DaxxS226A). We found that the interaction between Daxx and Atrx is dispensable for viability; mice are born at the expected Mendelian ratio and are fertile. The loss of Daxx-Atrx interaction, however, does cause dysregulated expression of endogenous retroviruses. In contrast, the interaction between Daxx and H3.3, while not required for embryonic development, is essential for postnatal viability. Transcriptome analysis of embryonic tissues demonstrates that this interaction is important for silencing endogenous retroviruses and for maintaining proper immune cell composition. Overall, these results clearly demonstrate that Daxx has both Atrx-dependent and independent functions in vivo, advancing our understanding of this epigenetic regulatory complex.
Collapse
Affiliation(s)
- Chang Sun
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
- Genetics and Epigenetics Program, The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, 77030, USA
| | - Yuan Qi
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Natalie Fowlkes
- Department of Veterinary Medicine and Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Nina Lazic
- Department of Cancer Biology, University of Cincinnati, Cincinnati, OH, 45267, USA
| | - Xiaoping Su
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Guillermina Lozano
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
- Genetics and Epigenetics Program, The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, 77030, USA.
| | - Amanda R Wasylishen
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
- Department of Cancer Biology, University of Cincinnati, Cincinnati, OH, 45267, USA.
| |
Collapse
|
133
|
Zaytsev K, Fedorov A, Korotkov E. Classification of Promoter Sequences from Human Genome. Int J Mol Sci 2023; 24:12561. [PMID: 37628742 PMCID: PMC10454140 DOI: 10.3390/ijms241612561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 07/28/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open
Abstract
We have developed a new method for promoter sequence classification based on a genetic algorithm and the MAHDS sequence alignment method. We have created four classes of human promoters, combining 17,310 sequences out of the 29,598 present in the EPD database. We searched the human genome for potential promoter sequences (PPSs) using dynamic programming and position weight matrices representing each of the promoter sequence classes. A total of 3,065,317 potential promoter sequences were found. Only 1,241,206 of them were located in unannotated parts of the human genome. Every other PPS found intersected with either true promoters, transposable elements, or interspersed repeats. We found a strong intersection between PPSs and Alu elements as well as transcript start sites. The number of false positive PPSs is estimated to be 3 × 10-8 per nucleotide, which is several orders of magnitude lower than for any other promoter prediction method. The developed method can be used to search for PPSs in various eukaryotic genomes.
Collapse
Affiliation(s)
- Konstantin Zaytsev
- Bach Institute of Biochemistry, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia
| | - Alexey Fedorov
- Bach Institute of Biochemistry, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia
| | - Eugene Korotkov
- Institute of Bioengineering, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia
| |
Collapse
|
134
|
Alejo-Jacuinde G, Nájera-González HR, Chávez Montes RA, Gutierrez Reyes CD, Barragán-Rosillo AC, Perez Sanchez B, Mechref Y, López-Arredondo D, Yong-Villalobos L, Herrera-Estrella L. Multi-omic analyses reveal the unique properties of chia (Salvia hispanica) seed metabolism. Commun Biol 2023; 6:820. [PMID: 37550387 PMCID: PMC10406817 DOI: 10.1038/s42003-023-05192-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 07/28/2023] [Indexed: 08/09/2023] Open
Abstract
Chia (Salvia hispanica) is an emerging crop considered a functional food containing important substances with multiple potential applications. However, the molecular basis of some relevant chia traits, such as seed mucilage and polyphenol content, remains to be discovered. This study generates an improved chromosome-level reference of the chia genome, resolving some highly repetitive regions, describing methylation patterns, and refining genome annotation. Transcriptomic analysis shows that seeds exhibit a unique expression pattern compared to other organs and tissues. Thus, a metabolic and proteomic approach is implemented to study seed composition and seed-produced mucilage. The chia genome exhibits a significant expansion in mucilage synthesis genes (compared to Arabidopsis), and gene network analysis reveals potential regulators controlling seed mucilage production. Rosmarinic acid, a compound with enormous therapeutic potential, was classified as the most abundant polyphenol in seeds, and candidate genes for its complex pathway are described. Overall, this study provides important insights into the molecular basis for the unique characteristics of chia seeds.
Collapse
Affiliation(s)
- Gerardo Alejo-Jacuinde
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | - Héctor-Rogelio Nájera-González
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | - Ricardo A Chávez Montes
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | | | - Alfonso Carlos Barragán-Rosillo
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | - Benjamin Perez Sanchez
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | - Yehia Mechref
- Department of Chemistry and Biochemistry, Texas Tech University, Lubbock, TX, 79409, USA
| | - Damar López-Arredondo
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | - Lenin Yong-Villalobos
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA.
| | - Luis Herrera-Estrella
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA.
- Unidad de Genómica Avanzada/Langebio, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Irapuato, Gto., 36821, Mexico.
| |
Collapse
|
135
|
Perez M, Aroh O, Sun Y, Lan Y, Juniper SK, Young CR, Angers B, Qian PY. Third-Generation Sequencing Reveals the Adaptive Role of the Epigenome in Three Deep-Sea Polychaetes. Mol Biol Evol 2023; 40:msad172. [PMID: 37494294 PMCID: PMC10414810 DOI: 10.1093/molbev/msad172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 06/16/2023] [Accepted: 07/17/2023] [Indexed: 07/28/2023] Open
Abstract
The roles of DNA methylation in invertebrates are poorly characterized, and critical data are missing for the phylum Annelida. We fill this knowledge gap by conducting the first genome-wide survey of DNA methylation in the deep-sea polychaetes dominant in deep-sea vents and seeps: Paraescarpia echinospica, Ridgeia piscesae, and Paralvinella palmiformis. DNA methylation calls were inferred from Oxford Nanopore sequencing after assembling high-quality genomes of these animals. The genomes of these worms encode all the key enzymes of the DNA methylation metabolism and possess a mosaic methylome similar to that of other invertebrates. Transcriptomic data of these polychaetes support the hypotheses that gene body methylation strengthens the expression of housekeeping genes and that promoter methylation acts as a silencing mechanism but not the hypothesis that DNA methylation suppresses the activity of transposable elements. The conserved epigenetic profiles of genes responsible for maintaining homeostasis under extreme hydrostatic pressure suggest DNA methylation plays an important adaptive role in these worms.
Collapse
Affiliation(s)
- Maeva Perez
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
- Department of Ocean Science, The Hong Kong University of Science and Technology, Kowloon, China
- Department of Biological Sciences, Université de Montréal, Montréal, Canada
| | - Oluchi Aroh
- Department of Biological Sciences, Auburn University, Auburn, AL, USA
| | - Yanan Sun
- Laboratory of Marine Organism Taxonomy and Phylogeny, Chinese Academy of Sciences, Institute of Oceanology, Qingdao, China
| | - Yi Lan
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
- Department of Ocean Science, The Hong Kong University of Science and Technology, Kowloon, China
| | - Stanley Kim Juniper
- School of Earth and Ocean Sciences, University of Victoria, Victoria, Canada
| | | | - Bernard Angers
- Department of Biological Sciences, Université de Montréal, Montréal, Canada
| | - Pei-Yuan Qian
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
- Department of Ocean Science, The Hong Kong University of Science and Technology, Kowloon, China
| |
Collapse
|
136
|
Chaung K, Baharav TZ, Henderson G, Zheludev IN, Wang PL, Salzman J. [WITHDRAWN] SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.17.549408. [PMID: 37503014 PMCID: PMC10370119 DOI: 10.1101/2023.07.17.549408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The authors have withdrawn this manuscript due to a duplicate posting of manuscript number BIORXIV/2022/497555. Therefore, the authors do not wish this work to be cited as reference for the project. If you have any questions, please contact the corresponding author. The correct preprint can be found at doi: https://doi.org/10.1101/2022.06.24.497555.
Collapse
|
137
|
Chaung K, Baharav TZ, Henderson G, Zheludev IN, Wang PL, Salzman J. SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2022.06.24.497555. [PMID: 35794890 PMCID: PMC9258296 DOI: 10.1101/2022.06.24.497555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation: sample-specific sequence variation. The approach, which includes a new statistical test, is computationally efficient and can be run at scale. SPLASH unifies detection of myriad forms of sequence variation. We demonstrate that SPLASH identifies complex mutation patterns in SARS-CoV-2 strains, discovers regulated RNA isoforms at the single cell level, documents the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a new unifying approach to genomic analysis that enables an expansive scope of discovery without metadata or references.
Collapse
Affiliation(s)
- Kaitlin Chaung
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, USA
- Department of Biochemistry, Stanford University, Stanford, 94305, USA
| | - Tavor Z. Baharav
- Department of Electrical Engineering, Stanford University, Stanford, 94305, USA
| | - George Henderson
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, USA
- Department of Biochemistry, Stanford University, Stanford, 94305, USA
| | - Ivan N. Zheludev
- Department of Biochemistry, Stanford University, Stanford, 94305, USA
| | - Peter L. Wang
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, USA
- Department of Biochemistry, Stanford University, Stanford, 94305, USA
| | - Julia Salzman
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, USA
- Department of Biochemistry, Stanford University, Stanford, 94305, USA
- Department of Statistics (by courtesy), Stanford University, Stanford, 94305, USA
| |
Collapse
|
138
|
Zhao P, Peng C, Fang L, Wang Z, Liu GE. Taming transposable elements in livestock and poultry: a review of their roles and applications. Genet Sel Evol 2023; 55:50. [PMID: 37479995 PMCID: PMC10362595 DOI: 10.1186/s12711-023-00821-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 06/30/2023] [Indexed: 07/23/2023] Open
Abstract
Livestock and poultry play a significant role in human nutrition by converting agricultural by-products into high-quality proteins. To meet the growing demand for safe animal protein, genetic improvement of livestock must be done sustainably while minimizing negative environmental impacts. Transposable elements (TE) are important components of livestock and poultry genomes, contributing to their genetic diversity, chromatin states, gene regulatory networks, and complex traits of economic value. However, compared to other species, research on TE in livestock and poultry is still in its early stages. In this review, we analyze 72 studies published in the past 20 years, summarize the TE composition in livestock and poultry genomes, and focus on their potential roles in functional genomics. We also discuss bioinformatic tools and strategies for integrating multi-omics data with TE, and explore future directions, feasibility, and challenges of TE research in livestock and poultry. In addition, we suggest strategies to apply TE in basic biological research and animal breeding. Our goal is to provide a new perspective on the importance of TE in livestock and poultry genomes.
Collapse
Affiliation(s)
- Pengju Zhao
- Hainan Institute of Zhejiang University, Hainan Sanya, 572000, China
- College of Animal Sciences, Zhejiang University, Zhejiang, Hangzhou, People's Republic of China
| | - Chen Peng
- Hainan Institute of Zhejiang University, Hainan Sanya, 572000, China
- College of Animal Sciences, Zhejiang University, Zhejiang, Hangzhou, People's Republic of China
| | - Lingzhao Fang
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000, Aarhus, Denmark.
| | - Zhengguang Wang
- Hainan Institute of Zhejiang University, Hainan Sanya, 572000, China.
- College of Animal Sciences, Zhejiang University, Zhejiang, Hangzhou, People's Republic of China.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
| |
Collapse
|
139
|
Shin H, Leung A, Costello KR, Senapati P, Kato H, Moore RE, Lee M, Lin D, Tang X, Pirrotte P, Bouman Chen Z, Schones DE. Inhibition of DNMT1 methyltransferase activity via glucose-regulated O-GlcNAcylation alters the epigenome. eLife 2023; 12:e85595. [PMID: 37470704 PMCID: PMC10390045 DOI: 10.7554/elife.85595] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 07/19/2023] [Indexed: 07/21/2023] Open
Abstract
The DNA methyltransferase activity of DNMT1 is vital for genomic maintenance of DNA methylation. We report here that DNMT1 function is regulated by O-GlcNAcylation, a protein modification that is sensitive to glucose levels, and that elevated O-GlcNAcylation of DNMT1 from high glucose environment leads to alterations to the epigenome. Using mass spectrometry and complementary alanine mutation experiments, we identified S878 as the major residue that is O-GlcNAcylated on human DNMT1. Functional studies in human and mouse cells further revealed that O-GlcNAcylation of DNMT1-S878 results in an inhibition of methyltransferase activity, resulting in a general loss of DNA methylation that preferentially occurs at partially methylated domains (PMDs). This loss of methylation corresponds with an increase in DNA damage and apoptosis. These results establish O-GlcNAcylation of DNMT1 as a mechanism through which the epigenome is regulated by glucose metabolism and implicates a role for glycosylation of DNMT1 in metabolic diseases characterized by hyperglycemia.
Collapse
Affiliation(s)
- Heon Shin
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
| | - Amy Leung
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
| | - Kevin R Costello
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
- Irell and Manella Graduate School of Biological Sciences, City of HopeDuarteUnited States
| | - Parijat Senapati
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
| | - Hiroyuki Kato
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
| | - Roger E Moore
- Integrated Mass Spectrometry Shared Resource, City of Hope Comprehensive Cancer Center DuarteDuarteUnited States
| | - Michael Lee
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
- Irell and Manella Graduate School of Biological Sciences, City of HopeDuarteUnited States
| | - Dimitri Lin
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
| | - Xiaofang Tang
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
| | - Patrick Pirrotte
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
- Integrated Mass Spectrometry Shared Resource, City of Hope Comprehensive Cancer Center DuarteDuarteUnited States
- Cancer & Cell Biology Division, Translational Genomics Research InstitutePhoenixUnited States
| | - Zhen Bouman Chen
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
- Irell and Manella Graduate School of Biological Sciences, City of HopeDuarteUnited States
| | - Dustin E Schones
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of HopeDuarteUnited States
- Irell and Manella Graduate School of Biological Sciences, City of HopeDuarteUnited States
| |
Collapse
|
140
|
Chiu KP, Stuart L, Ooi HS, Yu J, Smith DG, Pei KJC. Genome sequencing and application of Taiwanese macaque Macaca cyclopis. Sci Rep 2023; 13:11545. [PMID: 37460589 DOI: 10.1038/s41598-023-38402-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 07/07/2023] [Indexed: 07/20/2023] Open
Abstract
Formosan macaque (Macaca cyclopis) is the only non-human primate in Taiwan Island. We performed de novo hybrid assembly for M. cyclopis using Illumina paired-end short reads, mate-pair reads and Nanopore long reads and obtained 5065 contigs with a N50 of 2.66 megabases. M. cyclopis contigs > = 10 kb were assigned to chromosomes using Indian rhesus macaque (Macaca mulatta mulatta) genome assembly Mmul_10 as reference, resulting in a draft of M. cyclopis genome of 2,846,042,475 bases, distributed in 21 chromosomes. The draft genome contains 23,462 transcriptional origins (genes), capable of expressing 716,231 exons in 59,484 transcripts. Genome-based phylogenetic study using the assembled M. cyclopis genome together with genomes of four other macaque species, human, orangutan and chimpanzee showed similar result as previously reported. However, the M. cyclopis species was found to diverge from Chinese M. mulatta lasiota about 1.8 million years ago. Fossil gene analysis detected the presence of gap and pol endogenous viral elements of simian retrovirus in all macaques tested, including M. fascicularis, M. m. mulatta and M. cyclopis. However, M. cyclopis showed ~ 2 times less in number and more uniform in chromosomal locations. The constrain in foreign genome disturbance, presumably due to geographical isolation, should be able to simplify genomics-related investigations, making M. cyclopis an ideal primate species for medical research.
Collapse
Affiliation(s)
- Kuo-Ping Chiu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan.
- Top Science Biotechnologies, Inc., 4F, 50-2 Dingping Rd., Sec. 1, Shiding District, New Taipei City, 223002, Taiwan.
| | - Lutimba Stuart
- Top Science Biotechnologies, Inc., 4F, 50-2 Dingping Rd., Sec. 1, Shiding District, New Taipei City, 223002, Taiwan
| | - Hong Sain Ooi
- Top Science Biotechnologies, Inc., 4F, 50-2 Dingping Rd., Sec. 1, Shiding District, New Taipei City, 223002, Taiwan
| | - John Yu
- Institute of Stem Cell and Translational Cancer Research, Chang Gung Memorial Hospital at Linkou, No.5, Fu-Shin St., Kuei Shang, Taoyuan, 333, Taiwan
| | - David Glenn Smith
- Department of Anthropology, University of California Davis, Davis, CA, USA
| | - Kurtis Jai-Chyi Pei
- Institute of Wildlife Conservation, College of Veterinary Medicine, National Pingtung University of Science and Technology, Pingtung, Taiwan
| |
Collapse
|
141
|
Gable SM, Mendez JM, Bushroe NA, Wilson A, Byars MI, Tollis M. The State of Squamate Genomics: Past, Present, and Future of Genome Research in the Most Speciose Terrestrial Vertebrate Order. Genes (Basel) 2023; 14:1387. [PMID: 37510292 PMCID: PMC10379679 DOI: 10.3390/genes14071387] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 06/28/2023] [Accepted: 06/29/2023] [Indexed: 07/30/2023] Open
Abstract
Squamates include more than 11,000 extant species of lizards, snakes, and amphisbaenians, and display a dazzling diversity of phenotypes across their over 200-million-year evolutionary history on Earth. Here, we introduce and define squamates (Order Squamata) and review the history and promise of genomic investigations into the patterns and processes governing squamate evolution, given recent technological advances in DNA sequencing, genome assembly, and evolutionary analysis. We survey the most recently available whole genome assemblies for squamates, including the taxonomic distribution of available squamate genomes, and assess their quality metrics and usefulness for research. We then focus on disagreements in squamate phylogenetic inference, how methods of high-throughput phylogenomics affect these inferences, and demonstrate the promise of whole genomes to settle or sustain persistent phylogenetic arguments for squamates. We review the role transposable elements play in vertebrate evolution, methods of transposable element annotation and analysis, and further demonstrate that through the understanding of the diversity, abundance, and activity of transposable elements in squamate genomes, squamates can be an ideal model for the evolution of genome size and structure in vertebrates. We discuss how squamate genomes can contribute to other areas of biological research such as venom systems, studies of phenotypic evolution, and sex determination. Because they represent more than 30% of the living species of amniote, squamates deserve a genome consortium on par with recent efforts for other amniotes (i.e., mammals and birds) that aim to sequence most of the extant families in a clade.
Collapse
Affiliation(s)
- Simone M Gable
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Jasmine M Mendez
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Nicholas A Bushroe
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Adam Wilson
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Michael I Byars
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Marc Tollis
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| |
Collapse
|
142
|
Gonzalez‐García LN, Lozano‐Arce D, Londoño JP, Guyot R, Duitama J. Efficient homology-based annotation of transposable elements using minimizers. APPLICATIONS IN PLANT SCIENCES 2023; 11:e11520. [PMID: 37601317 PMCID: PMC10439823 DOI: 10.1002/aps3.11520] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 03/02/2023] [Accepted: 03/04/2023] [Indexed: 08/22/2023]
Abstract
Premise Transposable elements (TEs) make up more than half of the genomes of complex plant species and can modulate the expression of neighboring genes, producing significant variability of agronomically relevant traits. The availability of long-read sequencing technologies allows the building of genome assemblies for plant species with large and complex genomes. Unfortunately, TE annotation currently represents a bottleneck in the annotation of genome assemblies. Methods and Results We present a new functionality of the Next-Generation Sequencing Experience Platform (NGSEP) to perform efficient homology-based TE annotation. Sequences in a reference library are treated as long reads and mapped to an input genome assembly. A hierarchical annotation is then assigned by homology using the annotation of the reference library. We tested the performance of our algorithm on genome assemblies of different plant species, including Arabidopsis thaliana, Oryza sativa, Coffea humblotiana, and Triticum aestivum (bread wheat). Our algorithm outperforms traditional homology-based annotation tools in speed by a factor of three to >20, reducing the annotation time of the T. aestivum genome from months to hours, and recovering up to 80% of TEs annotated with RepeatMasker with a precision of up to 0.95. Conclusions NGSEP allows rapid analysis of TEs, especially in very large and TE-rich plant genomes.
Collapse
Affiliation(s)
- Laura Natalia Gonzalez‐García
- Systems and Computing Engineering DepartmentUniversidad de los AndesBogotáColombia
- UMR DIADE, Institut de Recherche pour le DéveloppementUniversité de Montpellier, CIRAD34394MontpellierFrance
| | - Daniela Lozano‐Arce
- Systems and Computing Engineering DepartmentUniversidad de los AndesBogotáColombia
| | | | - Romain Guyot
- UMR DIADE, Institut de Recherche pour le DéveloppementUniversité de Montpellier, CIRAD34394MontpellierFrance
| | - Jorge Duitama
- Systems and Computing Engineering DepartmentUniversidad de los AndesBogotáColombia
| |
Collapse
|
143
|
Perera OP, Saha S, Glover J, Parys KA, Allen KC, Grozeva S, Kurtz R, Reddy GVP, Johnston JS, Daly M, Swale T. A chromosome scale assembly of the tarnished plant bug, Lygus lineolaris (Palisot de Beauvois), genome. BMC Res Notes 2023; 16:125. [PMID: 37370172 DOI: 10.1186/s13104-023-06408-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 06/19/2023] [Indexed: 06/29/2023] Open
Abstract
OBJECTIVE The tarnished plant bug (TPB), Lygus lineolaris (Palisot de Beauvois) (Hemiptera: Miridae), is a pest damaging many cultivated crops in North America. Although partial transcriptome data are available for this pest, a genome assembly was not available for this species. This assembly of a high-quality chromosome-length genome of TPB is aimed to develop the genetic resources that can provide the foundation required for advancing research on this species. RESULTS The initial genome of TPB assembled with paired-end nucleotide sequences generated with Illumina technology was scaffolded with Illumina HiseqX reads generated from a proximity ligated (HiC) library to obtain a high-quality genome assembly. The final assembly contained 3963 scaffolds longer than 1 kbp to yield a genome of 599.96 Mbp. The N50 of the TPB genome assembly was 35.64 Mbp and 98.68% of the genome was assembled into 17 scaffolds larger than 1 Mbp. This megabase scaffold number is the same as the number of chromosomes observed in karyotyping of this insect. The TPB genome is known to have high repetitive DNA content, and the reduced assembled genome size compared to flowcytometric estimates of approximately 860 Mbp may be due to the collapsed assembly of highly similar regions.
Collapse
Affiliation(s)
- O P Perera
- Southern Insect Management Research Unit, USDA ARS, 141 Experiment Station Road, Stoneville, MS, 38776, USA.
| | - Surya Saha
- Boyce Thompson Institute, 533 Tower Rd, Ithaca, NY, 14853, USA
| | - James Glover
- Southern Insect Management Research Unit, USDA ARS, 141 Experiment Station Road, Stoneville, MS, 38776, USA
| | - Katherine A Parys
- Pollinator Health in Southern Crop Ecosystems Research Unit, USDA ARS, 141 Experiment Station Road, Stoneville, MS, 38776, USA
| | - K Clint Allen
- Southern Insect Management Research Unit, USDA ARS, 141 Experiment Station Road, Stoneville, MS, 38776, USA
| | - Snejana Grozeva
- Institute of Zoology, Bulgarian Academy of Sciences, 1 Tsar Osvoboditel, Sofia, 1000, Bulgaria
| | - Ryan Kurtz
- , Cotton, Incorporated, Cary, NC, 27513, USA
| | - Gadi V P Reddy
- Southern Insect Management Research Unit, USDA ARS, 141 Experiment Station Road, Stoneville, MS, 38776, USA
| | - J Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, TX, 77843, USA
| | - Mark Daly
- Dovetail Genomics, LLC, 100 Enterprise Way, Suite A101, Scotts Valley, CA, 95066, USA
| | - Thomas Swale
- Dovetail Genomics, LLC, 100 Enterprise Way, Suite A101, Scotts Valley, CA, 95066, USA
| |
Collapse
|
144
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532420. [PMID: 37425675 PMCID: PMC10326970 DOI: 10.1101/2023.03.13.532420] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Although previously thought to be unlikely, recent studies have shown that de novo gene origination from previously non-genic sequences is a relatively common mechanism for gene innovation in many species and taxa. These young genes provide a unique set of candidates to study the structural and functional origination of proteins. However, our understanding of their protein structures and how these structures originate and evolve are still limited, due to a lack of systematic studies. Here, we combined high-quality base-level whole genome alignments, bioinformatic analysis, and computational structure modeling to study the origination, evolution, and protein structure of lineage-specific de novo genes. We identified 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. We found a gradual shift in sequence composition, evolutionary rates, and expression patterns with their gene ages, which indicates possible gradual shifts or adaptations of their functions. Surprisingly, we found little overall protein structural changes for de novo genes in the Drosophilinae lineage. Using Alphafold2, ESMFold, and molecular dynamics, we identified a number of de novo gene candidates with protein products that are potentially well-folded, many of which are more likely to contain transmembrane and signal proteins compared to other annotated protein-coding genes. Using ancestral sequence reconstruction, we found that most potentially well-folded proteins are often born folded. Interestingly, we observed one case where disordered ancestral proteins become ordered within a relatively short evolutionary time. Single-cell RNA-seq analysis in testis showed that although most de novo genes are enriched in spermatocytes, several young de novo genes are biased in the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| |
Collapse
|
145
|
Otsuka K, Sakashita A, Maezawa S, Schultz RM, Namekawa SH. KRAB-zinc-finger proteins regulate endogenous retroviruses to sculpt germline transcriptomes and genome evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.24.546405. [PMID: 37720031 PMCID: PMC10503828 DOI: 10.1101/2023.06.24.546405] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/19/2023]
Abstract
As transposable elements (TEs) coevolved with the host genome, the host genome exploited TEs as functional regulatory elements. What remains largely unknown are how the activity of TEs, namely, endogenous retroviruses (ERVs), are regulated and how TEs evolved in the germline. Here we show that KRAB domain-containing zinc-finger proteins (KZFPs), which are highly expressed in mitotically dividing spermatogonia, bind to suppressed ERVs that function following entry into meiosis as active enhancers. These features are observed for independently evolved KZFPs and ERVs in mice and humans, i.e., are evolutionarily conserved in mammals. Further, we show that meiotic sex chromosome inactivation (MSCI) antagonizes the coevolution of KZFPs and ERVs in mammals. Our study uncovers a mechanism by which KZFPs regulate ERVs to sculpt germline transcriptomes. We propose that epigenetic programming in the mammalian germline during the mitosis-to-meiosis transition facilitates coevolution of KZFPs and TEs on autosomes and is antagonized by MSCI.
Collapse
Affiliation(s)
- Kai Otsuka
- Department of Microbiology and Molecular Genetics, University of California, Davis, California, 95616, USA
| | - Akihiko Sakashita
- Reproductive Sciences Center, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, 45229, USA
- Department of Molecular Biology, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - So Maezawa
- Department of Applied Biological Science, Faculty of Science and Technology, Tokyo University of Science, Noda, Chiba, 278-8510, Japan
| | - Richard M. Schultz
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104 USA
- Department of Anatomy, Physiology and Cell Biology, School of Veterinary Medicine, University of California, Davis, Davis, California 95616, USA
| | - Satoshi H. Namekawa
- Department of Microbiology and Molecular Genetics, University of California, Davis, California, 95616, USA
- Reproductive Sciences Center, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, 45229, USA
| |
Collapse
|
146
|
Flack N, Drown M, Walls C, Pratte J, McLain A, Faulk C. Chromosome-level, nanopore-only genome and allele-specific DNA methylation of Pallas's cat, Otocolobus manul. NAR Genom Bioinform 2023; 5:lqad033. [PMID: 37025970 PMCID: PMC10071556 DOI: 10.1093/nargab/lqad033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 02/10/2023] [Accepted: 03/17/2023] [Indexed: 04/07/2023] Open
Abstract
Pallas's cat, or the manul cat (Otocolobus manul), is a small felid native to the grasslands and steppes of central Asia. Population strongholds in Mongolia and China face growing challenges from climate change, habitat fragmentation, poaching, and other sources. These threats, combined with O. manul's zoo collection popularity and value in evolutionary biology, necessitate improvement of species genomic resources. We used standalone nanopore sequencing to assemble a 2.5 Gb, 61-contig nuclear assembly and 17097 bp mitogenome for O. manul. The primary nuclear assembly had 56× sequencing coverage, a contig N50 of 118 Mb, and a 94.7% BUSCO completeness score for Carnivora-specific genes. High genome collinearity within Felidae permitted alignment-based scaffolding onto the fishing cat (Prionailurus viverrinus) reference genome. Manul contigs spanned all 19 felid chromosomes with an inferred total gap length of less than 400 kilobases. Modified basecalling and variant phasing produced an alternate pseudohaplotype assembly and allele-specific DNA methylation calls; 61 differentially methylated regions were identified between haplotypes. Nearest features included classical imprinted genes, non-coding RNAs, and putative novel imprinted loci. The assembled mitogenome successfully resolved existing discordance between Felinae nuclear and mtDNA phylogenies. All assembly drafts were generated from 158 Gb of sequence using seven minION flow cells.
Collapse
Affiliation(s)
- Nicole Flack
- Department of Veterinary and Biomedical Sciences, University of Minnesota, Saint Paul, MN 55108, USA
| | - Melissa Drown
- Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN 55108, USA
| | - Carrie Walls
- Department of Animal Science, University of Minnesota, Saint Paul, MN 55108, USA
| | - Jay Pratte
- Bloomington Parks and Recreation, Miller Park Zoo, Bloomington, IL 61701, USA
| | - Adam McLain
- Department of Biology and Chemistry, SUNY Polytechnic Institute, Utica, NY 13502, USA
| | - Christopher Faulk
- Department of Animal Science, University of Minnesota, Saint Paul, MN 55108, USA
| |
Collapse
|
147
|
Zuo Z. THE1B may have no role in human pregnancy due to ZNF430-mediated silencing. Mob DNA 2023; 14:6. [PMID: 37217947 DOI: 10.1186/s13100-023-00294-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/12/2023] [Indexed: 05/24/2023] Open
Abstract
THE1-family retrovirus invaded the primate genome more than 40 million years ago. Dunn-Fletcher et al. reported one THE1B element upstream of CRH gene alters gestation length by upregulating corticotropin-releasing hormone expression in transgenic mice and concluded it has the same role in human as well. However, no promoter or enhancer mark has been detected around this CRH-proximal element in any human tissue or cell, so probably some anti-viral factor exists in primates to prevents it from wreaking havoc. Here I report two paralogous zinc finger genes, ZNF430 and ZNF100, that emerged during the simian lineage to specifically silence THE1B and THE1A, respectively. Contact residue changes in one finger confers each ZNF the unique ability to preferentially repress one THE1 sub-family over the other. The reported THE1B element contains an intact ZNF430 binding site, thus under the repression of ZNF430 in most tissues including placenta, it is questionable whether or not this retrovirus has any role in human pregnancy. Overall, this analysis highlights the need to study human retroviruses' functions in suitable model system.
Collapse
Affiliation(s)
- Zheng Zuo
- Shenzhen University, Shenzhen, Guangdong, China.
| |
Collapse
|
148
|
Horton I, Kelly CJ, Dziulko A, Simpson DM, Chuong EB. Mouse B2 SINE elements function as IFN-inducible enhancers. eLife 2023; 12:e82617. [PMID: 37158599 PMCID: PMC10229128 DOI: 10.7554/elife.82617] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 05/08/2023] [Indexed: 05/10/2023] Open
Abstract
Regulatory networks underlying innate immunity continually face selective pressures to adapt to new and evolving pathogens. Transposable elements (TEs) can affect immune gene expression as a source of inducible regulatory elements, but the significance of these elements in facilitating evolutionary diversification of innate immunity remains largely unexplored. Here, we investigated the mouse epigenomic response to type II interferon (IFN) signaling and discovered that elements from a subfamily of B2 SINE (B2_Mm2) contain STAT1 binding sites and function as IFN-inducible enhancers. CRISPR deletion experiments in mouse cells demonstrated that a B2_Mm2 element has been co-opted as an enhancer driving IFN-inducible expression of Dicer1. The rodent-specific B2 SINE family is highly abundant in the mouse genome and elements have been previously characterized to exhibit promoter, insulator, and non-coding RNA activity. Our work establishes a new role for B2 elements as inducible enhancer elements that influence mouse immunity, and exemplifies how lineage-specific TEs can facilitate evolutionary turnover and divergence of innate immune regulatory networks.
Collapse
Affiliation(s)
- Isabella Horton
- Department of Molecular, Cellular, and Developmental Biology and BioFrontiers Institute, University of Colorado BoulderBoulderUnited States
| | - Conor J Kelly
- Department of Molecular, Cellular, and Developmental Biology and BioFrontiers Institute, University of Colorado BoulderBoulderUnited States
| | - Adam Dziulko
- Department of Molecular, Cellular, and Developmental Biology and BioFrontiers Institute, University of Colorado BoulderBoulderUnited States
| | - David M Simpson
- Department of Molecular, Cellular, and Developmental Biology and BioFrontiers Institute, University of Colorado BoulderBoulderUnited States
| | - Edward B Chuong
- Department of Molecular, Cellular, and Developmental Biology and BioFrontiers Institute, University of Colorado BoulderBoulderUnited States
| |
Collapse
|
149
|
Zhou S, Xia T, Gao X, Lyu T, Wang L, Wang X, Shi L, Dong Y, Zhang H. A high-quality chromosomal-level genome assembly of Greater Scaup (Aythya marila). Sci Data 2023; 10:254. [PMID: 37142629 PMCID: PMC10160052 DOI: 10.1038/s41597-023-02142-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 04/11/2023] [Indexed: 05/06/2023] Open
Abstract
Aythya marila is one of the few species of Anatidae, and the only Aythya to live in the circumpolar. However, there is a relative lack of research on genetics of this species. In this study, we reported and assembled the first high-quality chromosome-level genome assembly of A. marila. This genome was assembled using Nanopore long reads, and errors corrected using Illumina short reads, with a final genome size of 1.14 Gb, scaffold N50 of 85.44 Mb, and contig N50 of 32.46 Mb. 106 contigs were clustered and ordered onto 35 chromosomes based on Hi-C data, covering approximately 98.28% of the genome. BUSCO assessment showed that 97.0% of the highly conserved genes in aves_odb10 were present intact in the genome assembly. In addition, a total of 154.94 Mb of repetitive sequences were identified. 15,953 protein-coding genes were predicted in the genome, and 98.96% of genes were functionally annotated. This genome will be a valuable resource for future genetic diversity and genomics studies of A. marila.
Collapse
Affiliation(s)
- Shengyang Zhou
- College of Life Sciences, Qufu Normal University, Qufu, 273165, Shandong, China
| | - Tian Xia
- College of Life Sciences, Qufu Normal University, Qufu, 273165, Shandong, China
| | - Xiaodong Gao
- College of Life Sciences, Qufu Normal University, Qufu, 273165, Shandong, China
| | - Tianshu Lyu
- College of Life Sciences, Qufu Normal University, Qufu, 273165, Shandong, China
| | - Lidong Wang
- College of Life Sciences, Qufu Normal University, Qufu, 273165, Shandong, China
| | - Xibao Wang
- College of Life Sciences, Qufu Normal University, Qufu, 273165, Shandong, China
| | - Lupeng Shi
- College of Life Sciences, Qufu Normal University, Qufu, 273165, Shandong, China
| | - Yuehuan Dong
- College of Life Sciences, Qufu Normal University, Qufu, 273165, Shandong, China
| | - Honghai Zhang
- College of Life Sciences, Qufu Normal University, Qufu, 273165, Shandong, China.
| |
Collapse
|
150
|
Yao Y, Frith MC. Improved DNA-Versus-Protein Homology Search for Protein Fossils. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1691-1699. [PMID: 35617174 DOI: 10.1109/tcbb.2022.3177855] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Protein fossils, i.e., noncoding DNA descended from coding DNA, arise frequently from transposable elements (TEs), decayed genes, and viral integrations. They can reveal, and mislead about, evolutionary history and relationships. They have been detected by comparing DNA to protein sequences, but current methods are not optimized for this task. We describe a powerful DNA-protein homology search method. We use a 64×21 substitution matrix, which is fitted to sequence data, automatically learning the genetic code. We detect subtly homologous regions by considering alternative possible alignments between them, and calculate significance (probability of occurring by chance between random sequences). Our method detects TE protein fossils much more sensitively than blastx, and faster. Of the ∼ 7 major categories of eukaryotic TE, three were long thought absent in mammals: we find two of them in the human genome, polinton and DIRS/Ngaro. This method increases our power to find ancient fossils, and perhaps to detect non-standard genetic codes. The alternative-alignments and significance paradigm is not specific to DNA-protein comparison, and could benefit homology search generally. This is an extended version of a conference paper (Yao & Frith, 2021).
Collapse
|