51
|
Alharbi AB, Schmitz U, Marshall AD, Vanichkina D, Nagarajah R, Vellozzi M, Wong JJ, Bailey CG, Rasko JE. Ctcf haploinsufficiency mediates intron retention in a tissue-specific manner. RNA Biol 2020; 18:93-103. [PMID: 32816606 PMCID: PMC7834090 DOI: 10.1080/15476286.2020.1796052] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
CTCF is a master regulator of gene transcription and chromatin organisation with occupancy at thousands of DNA target sites genome-wide. While CTCF is essential for cell survival, CTCF haploinsufficiency is associated with tumour development and hypermethylation. Increasing evidence demonstrates CTCF as a key player in several mechanisms regulating alternative splicing (AS), however, the genome-wide impact of Ctcf dosage on AS has not been investigated. We examined the effect of Ctcf haploinsufficiency on gene expression and AS in five tissues from Ctcf hemizygous (Ctcf+/-) mice. Reduced Ctcf levels caused distinct tissue-specific differences in gene expression and AS in all tissues. An increase in intron retention (IR) was observed in Ctcf+/- liver and kidney. In liver, this specifically impacted genes associated with cytoskeletal organisation, splicing and metabolism. Strikingly, most differentially retained introns were short, with a high GC content and enriched in Ctcf binding sites in their proximal upstream genomic region. This study provides new insights into the effects of CTCF haploinsufficiency on organ transcriptomes and the role of CTCF in AS regulation.
Collapse
Affiliation(s)
- Adel B Alharbi
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney , Camperdown, Australia.,Computational BioMedicine Laboratory Centenary Institute, The University of Sydney , Camperdown, Australia.,Faculty of Medicine and Health, The University of Sydney , Camperdown, Australia.,Department of Laboratory Medicine, Faculty of Applied Medical Sciences, Umm Al-Qura University , Makkah, Saudi Arabia
| | - Ulf Schmitz
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney , Camperdown, Australia.,Computational BioMedicine Laboratory Centenary Institute, The University of Sydney , Camperdown, Australia.,Faculty of Medicine and Health, The University of Sydney , Camperdown, Australia
| | - Amy D Marshall
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney , Camperdown, Australia
| | - Darya Vanichkina
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney , Camperdown, Australia.,Faculty of Medicine and Health, The University of Sydney , Camperdown, Australia.,Sydney Informatics Hub, University of Sydney , Darlington, Australia
| | - Rajini Nagarajah
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney , Camperdown, Australia
| | - Melissa Vellozzi
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney , Camperdown, Australia.,Computational BioMedicine Laboratory Centenary Institute, The University of Sydney , Camperdown, Australia
| | - Justin Jl Wong
- Faculty of Medicine and Health, The University of Sydney , Camperdown, Australia.,Epigenetics and RNA Biology Program Centenary Institute, The University of Sydney , Camperdown, Australia
| | - Charles G Bailey
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney , Camperdown, Australia.,Faculty of Medicine and Health, The University of Sydney , Camperdown, Australia
| | - John Ej Rasko
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney , Camperdown, Australia.,Faculty of Medicine and Health, The University of Sydney , Camperdown, Australia.,Cell & Molecular Therapies, Royal Prince Alfred Hospital , Camperdown, Australia
| |
Collapse
|
52
|
Transcriptional activity and strain-specific history of mouse pseudogenes. Nat Commun 2020; 11:3695. [PMID: 32728065 PMCID: PMC7392758 DOI: 10.1038/s41467-020-17157-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Accepted: 06/08/2020] [Indexed: 01/07/2023] Open
Abstract
Pseudogenes are ideal markers of genome remodelling. In turn, the mouse is an ideal platform for studying them, particularly with the recent availability of strain-sequencing and transcriptional data. Here, combining both manual curation and automatic pipelines, we present a genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains (available via the mouse.pseudogene.org resource). We also annotate 165 unitary pseudogenes in mouse, and 303, in human. The overall pseudogene repertoire in mouse is similar to that in human in terms of size, biotype distribution, and family composition (e.g. with GAPDH and ribosomal proteins being the largest families). Notable differences arise in the pseudogene age distribution, with multiple retro-transpositional bursts in mouse evolutionary history and only one in human. Furthermore, in each strain about a fifth of all pseudogenes are unique, reflecting strain-specific evolution. Finally, we find that ~15% of the mouse pseudogenes are transcribed, and that highly transcribed parent genes tend to give rise to many processed pseudogenes.
Collapse
|
53
|
Strain-Specific Epigenetic Regulation of Endogenous Retroviruses: The Role of Trans-Acting Modifiers. Viruses 2020; 12:v12080810. [PMID: 32727076 PMCID: PMC7472028 DOI: 10.3390/v12080810] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 07/21/2020] [Accepted: 07/24/2020] [Indexed: 02/07/2023] Open
Abstract
Approximately 10 percent of the mouse genome consists of endogenous retroviruses (ERVs), relics of ancient retroviral infections that are classified based on their relatedness to exogenous retroviral genera. Because of the ability of ERVs to retrotranspose, as well as their cis-acting regulatory potential due to functional elements located within the elements, mammalian ERVs are generally subject to epigenetic silencing by DNA methylation and repressive histone modifications. The mobilisation and expansion of ERV elements is strain-specific, leading to ERVs being highly polymorphic between inbred mouse strains, hinting at the possibility of the strain-specific regulation of ERVs. In this review, we describe the existing evidence of mouse strain-specific epigenetic control of ERVs and discuss the implications of differential ERV regulation on epigenetic inheritance models. We consider Krüppel-associated box domain (KRAB) zinc finger proteins as likely candidates for strain-specific ERV modifiers, drawing on insights gained from the study of the strain-specific behaviour of transgenes. We conclude by considering the coevolution of KRAB zinc finger proteins and actively transposing ERV elements, and highlight the importance of cross-strain studies in elucidating the mechanisms and consequences of strain-specific ERV regulation.
Collapse
|
54
|
Kaaij LJT, Mohn F, van der Weide RH, de Wit E, Bühler M. The ChAHP Complex Counteracts Chromatin Looping at CTCF Sites that Emerged from SINE Expansions in Mouse. Cell 2020; 178:1437-1451.e14. [PMID: 31491387 DOI: 10.1016/j.cell.2019.08.007] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 05/29/2019] [Accepted: 08/02/2019] [Indexed: 12/27/2022]
Abstract
CCCTC-binding factor (CTCF) and cohesin are key players in three-dimensional chromatin organization. The topologically associating domains (TADs) demarcated by CTCF are remarkably well conserved between species, although genome-wide CTCF binding has diverged substantially following transposon-mediated motif expansions. Therefore, the CTCF consensus motif poorly predicts TADs, and additional factors must modulate CTCF binding and subsequent TAD formation. Here, we demonstrate that the ChAHP complex (CHD4, ADNP, HP1) competes with CTCF for a common set of binding motifs. In Adnp knockout cells, novel insulated regions are formed at sites normally bound by ChAHP, whereas proximal canonical boundaries are weakened. These data reveal that CTCF-mediated loop formation is modulated by a distinct zinc-finger protein complex. Strikingly, ChAHP-bound loci are mainly situated within less diverged SINE B2 transposable elements. This implicates ChAHP in maintenance of evolutionarily conserved spatial chromatin organization by buffering novel CTCF binding sites that emerged through SINE expansions.
Collapse
Affiliation(s)
- Lucas J T Kaaij
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland.
| | - Fabio Mohn
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland.
| | - Robin H van der Weide
- Division of Gene Regulation, Oncode Institute, the Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands
| | - Elzo de Wit
- Division of Gene Regulation, Oncode Institute, the Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands
| | - Marc Bühler
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland; University of Basel, Petersplatz 10, 4003 Basel, Switzerland.
| |
Collapse
|
55
|
Yap MW, Young GR, Varnaite R, Morand S, Stoye JP. Duplication and divergence of the retrovirus restriction gene Fv1 in Mus caroli allows protection from multiple retroviruses. PLoS Genet 2020; 16:e1008471. [PMID: 32525879 PMCID: PMC7313476 DOI: 10.1371/journal.pgen.1008471] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 06/23/2020] [Accepted: 05/13/2020] [Indexed: 12/29/2022] Open
Abstract
Viruses and their hosts are locked in an evolutionary race where resistance to infection is acquired by the hosts while viruses develop strategies to circumvent these host defenses. Forming one arm of the host defense armory are cell autonomous restriction factors like Fv1. Originally described as protecting laboratory mice from infection by murine leukemia virus (MLV), Fv1s from some wild mice have also been found to restrict non-MLV retroviruses, suggesting an important role in the protection against viruses in nature. We surveyed the Fv1 genes of wild mice trapped in Thailand and characterized their restriction activities against a panel of retroviruses. An extra copy of the Fv1 gene, named Fv7, was found on chromosome 6 of three closely related Asian species of mice: Mus caroli, M. cervicolor, and M. cookii. The presence of flanking repeats suggested it arose by LINE-mediated retroduplication within their most recent common ancestor. A high degree of natural variation was observed in both Fv1 and Fv7 and, on top of positive selection at certain residues, insertions and deletions were present that changed the length of the reading frames. These genes exhibited a range of restriction phenotypes, with activities directed against gamma-, spuma-, and lentiviruses. It seems likely, at least in the case of M. caroli, that the observed gene duplication may expand the breadth of restriction beyond the capacity of Fv1 alone and that one or more such viruses have recently driven or continue to drive the evolution of the Fv1 and Fv7 genes.
Collapse
Affiliation(s)
| | | | | | - Serge Morand
- Centre National de la Recherche Scientifique-Centre de coopération
Internationale en Recherche Agronomique pour le Développement Animal et Gestion
Intégrée des Risques, Faculty of Veterinary Technology, Kasetsart University,
Bangkok, Thailand
| | - Jonathan P. Stoye
- The Francis Crick Institute, London, United Kingdom
- Faculty of Medicine, Imperial College London, London, United
Kingdom
| |
Collapse
|
56
|
Transposon Reactivation in the Germline May Be Useful for Both Transposons and Their Host Genomes. Cells 2020; 9:cells9051172. [PMID: 32397241 PMCID: PMC7290860 DOI: 10.3390/cells9051172] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 05/05/2020] [Accepted: 05/07/2020] [Indexed: 12/29/2022] Open
Abstract
Transposable elements (TEs) are long-term residents of eukaryotic genomes that make up a large portion of these genomes. They can be considered as perfectly fine members of genomes replicating with resident genes and being transmitted vertically to the next generation. However, unlike regular genes, TEs have the ability to send new copies to new sites. As such, they have been considered as parasitic members ensuring their own replication. In another view, TEs may also be considered as symbiotic sequences providing shared benefits after mutualistic interactions with their host genome. In this review, we recall the relationship between TEs and their host genome and discuss why transient relaxation of TE silencing within specific developmental windows may be useful for both.
Collapse
|
57
|
Bult CJ, Blake JA, Smith CL, Kadin JA, Richardson JE. Mouse Genome Database (MGD) 2019. Nucleic Acids Res 2020; 47:D801-D806. [PMID: 30407599 PMCID: PMC6323923 DOI: 10.1093/nar/gky1056] [Citation(s) in RCA: 456] [Impact Index Per Article: 114.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 10/30/2018] [Indexed: 01/19/2023] Open
Abstract
The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the community model organism genetic and genome resource for the laboratory mouse. MGD is the authoritative source for biological reference data sets related to mouse genes, gene functions, phenotypes, and mouse models of human disease. MGD is the primary outlet for official gene, allele and mouse strain nomenclature based on the guidelines set by the International Committee on Standardized Nomenclature for Mice. In this report we describe significant enhancements to MGD, including two new graphical user interfaces: (i) the Multi Genome Viewer for exploring the genomes of multiple mouse strains and (ii) the Phenotype-Gene Expression matrix which was developed in collaboration with the Gene Expression Database (GXD) and allows researchers to compare gene expression and phenotype annotations for mouse genes. Other recent improvements include enhanced efficiency of our literature curation processes and the incorporation of Transcriptional Start Site (TSS) annotations from RIKEN's FANTOM 5 initiative.
Collapse
Affiliation(s)
- Carol J Bult
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - Judith A Blake
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - Cynthia L Smith
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - James A Kadin
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | | | | |
Collapse
|
58
|
Tigano A, Colella JP, MacManes MD. Comparative and population genomics approaches reveal the basis of adaptation to deserts in a small rodent. Mol Ecol 2020; 29:1300-1314. [PMID: 32130752 PMCID: PMC7204510 DOI: 10.1111/mec.15401] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 02/19/2020] [Accepted: 02/27/2020] [Indexed: 12/30/2022]
Abstract
Organisms that live in deserts offer the opportunity to investigate how species adapt to environmental conditions that are lethal to most plants and animals. In the hot deserts of North America, high temperatures and lack of water are conspicuous challenges for organisms living there. The cactus mouse (Peromyscus eremicus) displays several adaptations to these conditions, including low metabolic rate, heat tolerance, and the ability to maintain homeostasis under extreme dehydration. To investigate the genomic basis of desert adaptation in cactus mice, we built a chromosome‐level genome assembly and resequenced 26 additional cactus mouse genomes from two locations in southern California (USA). Using these data, we integrated comparative, population, and functional genomic approaches. We identified 16 gene families exhibiting significant contractions or expansions in the cactus mouse compared to 17 other Myodontine rodent genomes, and found 232 sites across the genome associated with selective sweeps. Functional annotations of candidate gene families and selective sweeps revealed a pervasive signature of selection at genes involved in the synthesis and degradation of proteins, consistent with the evolution of cellular mechanisms to cope with protein denaturation caused by thermal and hyperosmotic stress. Other strong candidate genes included receptors for bitter taste, suggesting a dietary shift towards chemically defended desert plants and insects, and a growth factor involved in lipid metabolism, potentially involved in prevention of dehydration. Understanding how species adapted to deserts will provide an important foundation for predicting future evolutionary responses to increasing temperatures, droughts and desertification in the cactus mouse and other species.
Collapse
Affiliation(s)
- Anna Tigano
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH, USA.,Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, USA
| | - Jocelyn P Colella
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH, USA.,Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, USA
| | - Matthew D MacManes
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH, USA.,Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, USA
| |
Collapse
|
59
|
Sundaram V, Wysocka J. Transposable elements as a potent source of diverse cis-regulatory sequences in mammalian genomes. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190347. [PMID: 32075564 PMCID: PMC7061989 DOI: 10.1098/rstb.2019.0347] [Citation(s) in RCA: 103] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Eukaryotic gene regulation is mediated by cis-regulatory elements, which are embedded within the vast non-coding genomic space and recognized by the transcription factors in a sequence- and context-dependent manner. A large proportion of eukaryotic genomes, including at least half of the human genome, are composed of transposable elements (TEs), which in their ancestral form carried their own cis-regulatory sequences able to exploit the host trans environment to promote TE transcription and facilitate transposition. Although not all present-day TE copies have retained this regulatory function, the preexisting regulatory potential of TEs can provide a rich source of cis-regulatory innovation for the host. Here, we review recent evidence documenting diverse contributions of TE sequences to gene regulation by functioning as enhancers, promoters, silencers and boundary elements. We discuss how TE-derived enhancer sequences can rapidly facilitate changes in existing gene regulatory networks and mediate species- and cell-type-specific regulatory innovations, and we postulate a unique contribution of TEs to species-specific gene expression divergence in pluripotency and early embryogenesis. With advances in genome-wide technologies and analyses, systematic investigation of TEs' cis-regulatory potential is now possible and our understanding of the biological impact of genomic TEs is increasing. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.
Collapse
Affiliation(s)
- Vasavi Sundaram
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Joanna Wysocka
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, USA.,Department of Developmental Biology, Stanford University School of Medicine, Stanford, USA.,Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, USA
| |
Collapse
|
60
|
Seibt KM, Schmidt T, Heitkam T. The conserved 3' Angio-domain defines a superfamily of short interspersed nuclear elements (SINEs) in higher plants. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 101:681-699. [PMID: 31610059 DOI: 10.1111/tpj.14567] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 09/13/2019] [Accepted: 09/17/2019] [Indexed: 06/10/2023]
Abstract
Repetitive sequences are ubiquitous components of eukaryotic genomes affecting genome size and evolution as well as gene regulation. Among them, short interspersed nuclear elements (SINEs) are non-coding retrotransposons usually shorter than 1000 bp. They contain only few short conserved structural motifs, in particular an internal promoter derived from cellular RNAs and a mostly AT-rich 3' tail, whereas the remaining regions are highly variable. SINEs emerge and vanish during evolution, and often diversify into numerous families and subfamilies that are usually specific for only a limited number of species. In contrast, at the 3' end of multiple plant SINEs we detected the highly conserved 'Angio-domain'. This 37 bp segment defines the Angio-SINE superfamily, which encompasses 24 plant SINE families widely distributed across 13 orders within the plant kingdom. We retrieved 28 433 full-length Angio-SINE copies from genome assemblies of 46 plant species, frequently located in genes. Compensatory mutations in and adjacent to the Angio-domain imply selective restraints maintaining its RNA structure. Angio-SINE families share segmental sequence similarities, indicating a modular evolution with strong Angio-domain preservation. We suggest that the conserved domain contributes to the evolutionary success of Angio-SINEs through either structural interactions between SINE RNA and proteins increasing their transpositional efficiency, or by enhancing their accumulation in genes.
Collapse
Affiliation(s)
- Kathrin M Seibt
- Faculty of Biology, Technische Universität Dresden, Zellescher Weg 20b, Dresden, 01217, Germany
| | - Thomas Schmidt
- Faculty of Biology, Technische Universität Dresden, Zellescher Weg 20b, Dresden, 01217, Germany
| | - Tony Heitkam
- Faculty of Biology, Technische Universität Dresden, Zellescher Weg 20b, Dresden, 01217, Germany
| |
Collapse
|
61
|
Gamage AM, Zhu F, Ahn M, Foo RJH, Hey YY, Low DHW, Mendenhall IH, Dutertre CA, Wang LF. Immunophenotyping monocytes, macrophages and granulocytes in the Pteropodid bat Eonycteris spelaea. Sci Rep 2020; 10:309. [PMID: 31941952 PMCID: PMC6962400 DOI: 10.1038/s41598-019-57212-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 12/12/2019] [Indexed: 02/07/2023] Open
Abstract
Bats are asymptomatic reservoir hosts for several highly pathogenic viruses. Understanding this enigmatic relationship between bats and emerging zoonotic viruses requires tools and approaches which enable the comparative study of bat immune cell populations and their functions. We show that bat genomes have a conservation of immune marker genes which delineate phagocyte populations in humans, while lacking key mouse surface markers such as Ly6C and Ly6G. Cross-reactive antibodies against CD44, CD11b, CD14, MHC II, and CD206 were multiplexed to characterize circulating monocytes, granulocytes, bone-marrow derived macrophages (BMDMs) and lung alveolar macrophages (AMs) in the cave nectar bat Eonycteris spelaea. Transcriptional profiling of bat monocytes and BMDMs identified additional markers – including MARCO, CD68, CD163, CD172α, and CD88 – which can be used to further characterize bat myeloid populations. Bat cells often resembled their human counterparts when comparing immune parameters that are divergent between humans and mice, such as the expression patterns of certain immune cell markers. A genome-wide comparison of immune-related genes also revealed a much closer phylogenetic relationship between bats and humans compared to rodents. Taken together, this study provides a set of tools and a comparative framework which will be important for unravelling viral disease tolerance mechanisms in bats.
Collapse
Affiliation(s)
- Akshamal M Gamage
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore, Singapore
| | - Feng Zhu
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore, Singapore
| | - Matae Ahn
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore, Singapore
| | - Randy Jee Hiang Foo
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore, Singapore
| | - Ying Ying Hey
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore, Singapore
| | - Dolyce H W Low
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore, Singapore
| | - Ian H Mendenhall
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore, Singapore
| | - Charles-Antoine Dutertre
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore, Singapore.,Singapore Immunology Network (SIgN), Agency for Science Technology and Research (A*STAR), Singapore, Singapore
| | - Lin-Fa Wang
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore, Singapore.
| |
Collapse
|
62
|
Kentepozidou E, Aitken SJ, Feig C, Stefflova K, Ibarra-Soria X, Odom DT, Roller M, Flicek P. Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains. Genome Biol 2020; 21:5. [PMID: 31910870 PMCID: PMC6945661 DOI: 10.1186/s13059-019-1894-x] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 11/21/2019] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND CTCF binding contributes to the establishment of a higher-order genome structure by demarcating the boundaries of large-scale topologically associating domains (TADs). However, despite the importance and conservation of TADs, the role of CTCF binding in their evolution and stability remains elusive. RESULTS We carry out an experimental and computational study that exploits the natural genetic variation across five closely related species to assess how CTCF binding patterns stably fixed by evolution in each species contribute to the establishment and evolutionary dynamics of TAD boundaries. We perform CTCF ChIP-seq in multiple mouse species to create genome-wide binding profiles and associate them with TAD boundaries. Our analyses reveal that CTCF binding is maintained at TAD boundaries by a balance of selective constraints and dynamic evolutionary processes. Regardless of their conservation across species, CTCF binding sites at TAD boundaries are subject to stronger sequence and functional constraints compared to other CTCF sites. TAD boundaries frequently harbor dynamically evolving clusters containing both evolutionarily old and young CTCF sites as a result of the repeated acquisition of new species-specific sites close to conserved ones. The overwhelming majority of clustered CTCF sites colocalize with cohesin and are significantly closer to gene transcription start sites than nonclustered CTCF sites, suggesting that CTCF clusters particularly contribute to cohesin stabilization and transcriptional regulation. CONCLUSIONS Dynamic conservation of CTCF site clusters is an apparently important feature of CTCF binding evolution that is critical to the functional stability of a higher-order chromatin structure.
Collapse
Affiliation(s)
- Elissavet Kentepozidou
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD UK
| | - Sarah J. Aitken
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
- Department of Histopathology, Addenbrooke’s Hospital, Cambridge University Hospitals NHS Foundation Trust, Hills Road, Cambridge, CB2 0QQ UK
| | - Christine Feig
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
| | - Klara Stefflova
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
| | - Ximena Ibarra-Soria
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
| | - Duncan T. Odom
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
- Division Regulatory Genomics and Cancer Evolution, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Maša Roller
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA UK
| |
Collapse
|
63
|
Shepard KA, Korsak LIT, DeBartolo D, Akins MR. Axonal localization of the fragile X family of RNA binding proteins is conserved across mammals. J Comp Neurol 2019; 528:502-519. [PMID: 31502255 DOI: 10.1002/cne.24772] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 09/04/2019] [Accepted: 09/05/2019] [Indexed: 11/05/2022]
Abstract
Spatial segregation of proteins to neuronal axons arises in part from local translation of mRNAs that are first transported into axons in ribonucleoprotein particles (RNPs), complexes containing mRNAs and RNA binding proteins. Understanding the importance of local translation for a particular circuit requires not only identifying axonal RNPs and their mRNA cargoes, but also whether these RNPs are broadly conserved or restricted to only a few species. Fragile X granules (FXGs) are axonal RNPs containing the fragile X related family of RNA binding proteins along with ribosomes and specific mRNAs. FXGs were previously identified in mouse, rat, and human brains in a conserved subset of neuronal circuits but with species-dependent developmental profiles. Here, we asked whether FXGs are a broadly conserved feature of the mammalian brain and sought to better understand the species-dependent developmental expression pattern. We found FXGs in a conserved subset of neurons and circuits in the brains of every examined species that together include mammalian taxa separated by up to 160 million years of divergent evolution. A developmental analysis of rodents revealed that FXG expression in frontal cortex and olfactory bulb followed consistent patterns in all species examined. In contrast, FXGs in hippocampal mossy fibers increased in abundance across development for most species but decreased across development in guinea pigs and members of the Mus genus, animals that navigate particularly small home ranges in the wild. The widespread conservation of FXGs suggests that axonal translation is an ancient, conserved mechanism for regulating the proteome of mammalian axons.
Collapse
Affiliation(s)
| | - Lulu I T Korsak
- Department of Biology, Drexel University, Philadelphia, Pennsylvania
| | | | - Michael R Akins
- Department of Biology, Drexel University, Philadelphia, Pennsylvania.,Department of Neurobiology and Anatomy, Drexel University, Philadelphia, Pennsylvania
| |
Collapse
|
64
|
Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, Ullrich KK, Tautz D. A de novo evolved gene in the house mouse regulates female pregnancy cycles. eLife 2019; 8:44392. [PMID: 31436535 PMCID: PMC6760900 DOI: 10.7554/elife.44392] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 08/21/2019] [Indexed: 12/16/2022] Open
Abstract
The de novo emergence of new genes has been well documented through genomic analyses. However, a functional analysis, especially of very young protein-coding genes, is still largely lacking. Here, we identify a set of house mouse-specific protein-coding genes and assess their translation by ribosome profiling and mass spectrometry data. We functionally analyze one of them, Gm13030, which is specifically expressed in females in the oviduct. The interruption of the reading frame affects the transcriptional network in the oviducts at a specific stage of the estrous cycle. This includes the upregulation of Dcpp genes, which are known to stimulate the growth of preimplantation embryos. As a consequence, knockout females have their second litters after shorter times and have a higher infanticide rate. Given that Gm13030 shows no signs of positive selection, our findings support the hypothesis that a de novo evolved gene can directly adopt a function without much sequence adaptation. Different species have specific genes that set them apart from other species. Yet exactly how these species-specific genes originate is not fully known. The traditional view is that existing old genes are duplicated to make a ‘spare’ copy, which can change through mutations into a new gene with a new role gradually over time. Despite there being lots of evidence supporting this theory, not all new genes found in recent years can be traced back to older genes. This led to an alternative view – that recently evolved genes can also appear ‘de novo’, and come from regions of random DNA sequences that did not previously code for a protein. So far, the possibility of genes forming de novo during evolution has largely been supported by comparing and analyzing the genomes of related species. However, very little is known about the biological role these de novo genes play. Now, Xie et al. have generated a list of recently evolved de novo mouse genes, and carried out a detailed analysis of one de novo gene expressed in females at the time when embryos implant into the uterus wall. To study the role of this gene, Xie et al. created a strain of knock-out mice that have a defunct version of the protein coded by the gene. Loss of this protein caused female mice to have their second litter after a shorter period of time and increased the likelihood that female mice would terminate their newborn pups. This suggests that this newly discovered de novo gene is involved in regulating the female reproductive cycles of mice. Further analysis showed that this de novo gene counteracts the action of an older gene that promotes the implantation of embryos. This gene has therefore likely evolved due to the benefit it offers mothers, as it protects them from experiencing the increased physiological stress caused by a premature second pregnancy. These findings support the idea that genes which have evolved de novo can have an essential biological purpose despite coming from random DNA sequences. This establishes that de novo evolution of genes is the second major mechanism of how new genes with significant biological roles can form in the genome.
Collapse
Affiliation(s)
- Chen Xie
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Cemalettin Bekpen
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Sven Künzel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Rebecca Krebs-Wheaton
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Neva Skrabar
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Kristian Karsten Ullrich
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
65
|
Sheehan MJ, Campbell P, Miller CH. Evolutionary patterns of major urinary protein scent signals in house mice and relatives. Mol Ecol 2019; 28:3587-3601. [DOI: 10.1111/mec.15155] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 06/10/2019] [Accepted: 06/12/2019] [Indexed: 01/04/2023]
Affiliation(s)
| | - Polly Campbell
- Evolution, Ecology and Organismal Biology University of California – Riverside Riverside CA USA
| | | |
Collapse
|
66
|
Morgan AP, Bell TA, Crowley JJ, Pardo-Manuel de Villena F. Instability of the Pseudoautosomal Boundary in House Mice. Genetics 2019; 212:469-487. [PMID: 31028113 PMCID: PMC6553833 DOI: 10.1534/genetics.119.302232] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 04/23/2019] [Indexed: 12/14/2022] Open
Abstract
Faithful segregation of homologous chromosomes at meiosis requires pairing and recombination. In taxa with dimorphic sex chromosomes, pairing between them in the heterogametic sex is limited to a narrow interval of residual sequence homology known as the pseudoautosomal region (PAR). Failure to form the obligate crossover in the PAR is associated with male infertility in house mice (Mus musculus) and humans. Yet despite this apparent functional constraint, the boundary and organization of the PAR is highly variable in mammals, and even between subspecies of mice. Here, we estimate the genetic map in a previously documented expansion of the PAR in the M. musculus castaneus subspecies and show that the local recombination rate is 100-fold higher than the autosomal background. We identify an independent shift in the PAR boundary in the M. musculus musculus subspecies and show that it involves a complex rearrangement, but still recombines in heterozygous males. Finally, we demonstrate pervasive copy-number variation at the PAR boundary in wild populations of M. m. domesticus, M. m. musculus, and M. m. castaneus Our results suggest that the intensity of recombination activity in the PAR, coupled with relatively weak constraints on its sequence, permit the generation and maintenance of unusual levels of polymorphism in the population of unknown functional significance.
Collapse
Affiliation(s)
- Andrew P Morgan
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27514
| | - Timothy A Bell
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27514
| | - James J Crowley
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27514
- Department of Psychiatry, University of North Carolina, Chapel Hill, North Carolina 27514
- Department of Clinical Neuroscience, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Fernando Pardo-Manuel de Villena
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27514
| |
Collapse
|
67
|
Todd CD, Deniz Ö, Taylor D, Branco MR. Functional evaluation of transposable elements as enhancers in mouse embryonic and trophoblast stem cells. eLife 2019; 8:e44344. [PMID: 31012843 PMCID: PMC6544436 DOI: 10.7554/elife.44344] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 04/20/2019] [Indexed: 12/18/2022] Open
Abstract
Transposable elements (TEs) are thought to have helped establish gene regulatory networks. Both the embryonic and extraembryonic lineages of the early mouse embryo have seemingly co-opted TEs as enhancers, but there is little evidence that they play significant roles in gene regulation. Here we tested a set of long terminal repeat TE families for roles as enhancers in mouse embryonic and trophoblast stem cells. Epigenomic and transcriptomic data suggested that a large number of TEs helped to establish tissue-specific gene expression programmes. Genetic editing of individual TEs confirmed a subset of these regulatory relationships. However, a wider survey via CRISPR interference of RLTR13D6 elements in embryonic stem cells revealed that only a minority play significant roles in gene regulation. Our results suggest that a subset of TEs are important for gene regulation in early mouse development, and highlight the importance of functional experiments when evaluating gene regulatory roles of TEs.
Collapse
Affiliation(s)
- Christopher D Todd
- Blizard Institute, Barts and The London School of Medicine and DentistryQueen Mary University of LondonLondonUnited Kingdom
- Centre for Genomic Health, Life Sciences InstituteQueen Mary University of LondonLondonUnited Kingdom
| | - Özgen Deniz
- Blizard Institute, Barts and The London School of Medicine and DentistryQueen Mary University of LondonLondonUnited Kingdom
- Centre for Genomic Health, Life Sciences InstituteQueen Mary University of LondonLondonUnited Kingdom
| | - Darren Taylor
- Centre for Genomic Health, Life Sciences InstituteQueen Mary University of LondonLondonUnited Kingdom
| | - Miguel R Branco
- Centre for Genomic Health, Life Sciences InstituteQueen Mary University of LondonLondonUnited Kingdom
| |
Collapse
|
68
|
Foy SG, Wilson BA, Bertram J, Cordes MHJ, Masel J. A Shift in Aggregation Avoidance Strategy Marks a Long-Term Direction to Protein Evolution. Genetics 2019; 211:1345-1355. [PMID: 30692195 PMCID: PMC6456324 DOI: 10.1534/genetics.118.301719] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 01/25/2019] [Indexed: 01/06/2023] Open
Abstract
To detect a direction to evolution, without the pitfalls of reconstructing ancestral states, we need to compare "more evolved" to "less evolved" entities. But because all extant species have the same common ancestor, none are chronologically more evolved than any other. However, different gene families were born at different times, allowing us to compare young protein-coding genes to those that are older and hence have been evolving for longer. To be retained during evolution, a protein must not only have a function, but must also avoid toxic dysfunction such as protein aggregation. There is conflict between the two requirements: hydrophobic amino acids form the cores of protein folds, but also promote aggregation. Young genes avoid strongly hydrophobic amino acids, which is presumably the simplest solution to the aggregation problem. Here we show that young genes' few hydrophobic residues are clustered near one another along the primary sequence, presumably to assist folding. The higher aggregation risk created by the higher hydrophobicity of older genes is counteracted by more subtle effects in the ordering of the amino acids, including a reduction in the clustering of hydrophobic residues until they eventually become more interspersed than if distributed randomly. This interspersion has previously been reported to be a general property of proteins, but here we find that it is restricted to old genes. Quantitatively, the index of dispersion delineates a gradual trend, i.e., a decrease in the clustering of hydrophobic amino acids over billions of years.
Collapse
Affiliation(s)
- Scott G Foy
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| | - Benjamin A Wilson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| | - Jason Bertram
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| | - Matthew H J Cordes
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona 85721
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| |
Collapse
|
69
|
Abstract
Affordable, high-throughput DNA sequencing has accelerated the pace of genome assembly over the past decade. Genome assemblies from high-throughput, short-read sequencing, however, are often not as contiguous as the first generation of genome assemblies. Whereas early genome assembly projects were often aided by clone maps or other mapping data, many current assembly projects forego these scaffolding data and only assemble genomes into smaller segments. Recently, new technologies have been invented that allow chromosome-scale assembly at a lower cost and faster speed than traditional methods. Here, we give an overview of the problem of chromosome-scale assembly and traditional methods for tackling this problem. We then review new technologies for chromosome-scale assembly and recent genome projects that used these technologies to create highly contiguous genome assemblies at low cost.
Collapse
Affiliation(s)
- Edward S. Rice
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA;,
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA;,
- Dovetail Genomics, LLC, Santa Cruz, California 95060, USA
| |
Collapse
|
70
|
Bourque G, Burns KH, Gehring M, Gorbunova V, Seluanov A, Hammell M, Imbeault M, Izsvák Z, Levin HL, Macfarlan TS, Mager DL, Feschotte C. Ten things you should know about transposable elements. Genome Biol 2018; 19:199. [PMID: 30454069 PMCID: PMC6240941 DOI: 10.1186/s13059-018-1577-z] [Citation(s) in RCA: 617] [Impact Index Per Article: 102.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Transposable elements (TEs) are major components of eukaryotic genomes. However, the extent of their impact on genome evolution, function, and disease remain a matter of intense interrogation. The rise of genomics and large-scale functional assays has shed new light on the multi-faceted activities of TEs and implies that they should no longer be marginalized. Here, we introduce the fundamental properties of TEs and their complex interactions with their cellular environment, which are crucial to understanding their impact and manifold consequences for organismal biology. While we draw examples primarily from mammalian systems, the core concepts outlined here are relevant to a broad range of organisms.
Collapse
Affiliation(s)
- Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, H3A 0G1, Canada.
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec, H3A 0G1, Canada.
| | - Kathleen H Burns
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Mary Gehring
- Whitehead Institute for Biomedical Research and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Vera Gorbunova
- Department of Biology, University of Rochester, Rochester, NY, 14627, USA
| | - Andrei Seluanov
- Department of Biology, University of Rochester, Rochester, NY, 14627, USA
| | - Molly Hammell
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Michaël Imbeault
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Zsuzsanna Izsvák
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Henry L Levin
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health, Bethesda, Maryland, USA
| | - Todd S Macfarlan
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health, Bethesda, Maryland, USA
| | - Dixie L Mager
- Terry Fox Laboratory, British Columbia Cancer Agency and Department of Medical Genetics, University of BC, Vancouver, BC, V5Z1L3, Canada
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, 14850, USA.
| |
Collapse
|
71
|
Kolmogorov M, Armstrong J, Raney BJ, Streeter I, Dunn M, Yang F, Odom D, Flicek P, Keane TM, Thybert D, Paten B, Pham S. Chromosome assembly of large and complex genomes using multiple references. Genome Res 2018; 28:1720-1732. [PMID: 30341161 PMCID: PMC6211643 DOI: 10.1101/gr.236273.118] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 09/24/2018] [Indexed: 11/25/2022]
Abstract
Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large and complex genomes. By taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout 2 infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. By using Ragout 2, we transformed NGS assemblies of 16 laboratory mouse strains into sets of complete chromosomes, leaving <5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long Pacific Biosciences (PacBio) reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. We applied Ragout 2 to the Mus caroli and Mus pahari genomes, which exhibit karyotype-scale variations compared with other genomes from the Muridae family. Chromosome painting maps confirmed most large-scale rearrangements that Ragout 2 detected. We applied Ragout 2 to improve draft sequences of three ape genomes that have recently been published. Ragout 2 transformed three sets of contigs (generated using PacBio reads only) into chromosome-scale assemblies with accuracy comparable to chromosome assemblies generated in the original study using BioNano maps, Hi-C, BAC clones, and FISH.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA
| | - Joel Armstrong
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Brian J Raney
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ian Streeter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Matthew Dunn
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Fengtang Yang
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Duncan Odom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
- Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE Cambridge, United Kingdom
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Thomas M Keane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
- School of Life Sciences, University of Nottingham, Nottingham NG7 2NR, United Kingdom
| | - David Thybert
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Earlham Institute, Norwich Research Park, Norwich NR4 7UG, United Kingdom
| | - Benedict Paten
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Son Pham
- BioTuring Incorporated, San Diego, California 92121, USA
| |
Collapse
|
72
|
Rogers J. Adding resolution and dimensionality to comparative genomics: moving from reference genomes to clade genomics. Genome Biol 2018; 19:115. [PMID: 30107805 PMCID: PMC6090731 DOI: 10.1186/s13059-018-1500-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The main goal and promise of comparative genomics has been to create a comprehensive catalog of genomic information and function across the phenomenal diversity of living systems. A recent study has demonstrated the evolutionary insights possible by generating high-quality whole-genome assemblies from multiple species of a clade.
Collapse
Affiliation(s)
- Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
73
|
Aitken SJ, Ibarra-Soria X, Kentepozidou E, Flicek P, Feig C, Marioni JC, Odom DT. CTCF maintains regulatory homeostasis of cancer pathways. Genome Biol 2018; 19:106. [PMID: 30086769 PMCID: PMC6081938 DOI: 10.1186/s13059-018-1484-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 07/16/2018] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND CTCF binding to DNA helps partition the mammalian genome into discrete structural and regulatory domains. Complete removal of CTCF from mammalian cells causes catastrophic genome dysregulation, likely due to widespread collapse of 3D chromatin looping and alterations to inter- and intra-TAD interactions within the nucleus. In contrast, Ctcf hemizygous mice with lifelong reduction of CTCF expression are viable, albeit with increased cancer incidence. Here, we exploit chronic Ctcf hemizygosity to reveal its homeostatic roles in maintaining genome function and integrity. RESULTS We find that Ctcf hemizygous cells show modest but robust changes in almost a thousand sites of genomic CTCF occupancy; these are enriched for lower affinity binding events with weaker evolutionary conservation across the mouse lineage. Furthermore, we observe dysregulation of the expression of several hundred genes, which are concentrated in cancer-related pathways, and are caused by changes in transcriptional regulation. Chromatin structure is preserved but some loop interactions are destabilized; these are often found around differentially expressed genes and their enhancers. Importantly, the transcriptional alterations identified in vitro are recapitulated in mouse tumors and also in human cancers. CONCLUSIONS This multi-dimensional genomic and epigenomic profiling of a Ctcf hemizygous mouse model system shows that chronic depletion of CTCF dysregulates steady-state gene expression by subtly altering transcriptional regulation, changes which can also be observed in primary tumors.
Collapse
Affiliation(s)
- Sarah J. Aitken
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
- Department of Histopathology, Addenbrooke’s Hospital, Cambridge University Hospitals NHS Foundation Trust, Hills Road, Cambridge, CB2 0QQ UK
| | - Ximena Ibarra-Soria
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
| | - Elissavet Kentepozidou
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD UK
| | - Christine Feig
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
| | - John C. Marioni
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA UK
| | - Duncan T. Odom
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
| |
Collapse
|
74
|
Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, Gordon D, Earl D, Keane T, Eichler EE, Haussler D, Stanke M, Paten B. Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res 2018; 28:1029-1038. [PMID: 29884752 PMCID: PMC6028123 DOI: 10.1101/gr.233460.117] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Accepted: 05/03/2018] [Indexed: 01/13/2023]
Abstract
The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.
Collapse
Affiliation(s)
- Ian T Fiddes
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA.,10x Genomics, Pleasanton, California 94566, USA
| | - Joel Armstrong
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
| | - Stefanie Nachtweide
- Institute of Mathematics and Computer Science, University of Greifswald, 17489 Greifswald, Germany
| | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Jason G Underwood
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Dent Earl
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
| | - Thomas Keane
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
| | - Mario Stanke
- Institute of Mathematics and Computer Science, University of Greifswald, 17489 Greifswald, Germany
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, California 95064, USA
| |
Collapse
|
75
|
Comparing Apples to Apples and Oranges to Oranges. Trends Genet 2018; 34:571-572. [PMID: 29853203 DOI: 10.1016/j.tig.2018.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 05/15/2018] [Indexed: 11/23/2022]
Abstract
A new study sequenced and assembled two rodent genomes to better understand the evolutionary forces shaping mammalian genomes. Their results suggest multiple roles for genomic repeats.
Collapse
|