1
|
Kojima S, Koyama S, Ka M, Saito Y, Parrish EH, Endo M, Takata S, Mizukoshi M, Hikino K, Takeda A, Gelinas AF, Heaton SM, Koide R, Kamada AJ, Noguchi M, Hamada M, Kamatani Y, Murakawa Y, Ishigaki K, Nakamura Y, Ito K, Terao C, Momozawa Y, Parrish NF. Mobile element variation contributes to population-specific genome diversification, gene regulation and disease risk. Nat Genet 2023:10.1038/s41588-023-01390-2. [PMID: 37169872 DOI: 10.1038/s41588-023-01390-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 04/04/2023] [Indexed: 05/13/2023]
Abstract
Mobile genetic elements (MEs) are heritable mutagens that recursively generate structural variants (SVs). ME variants (MEVs) are difficult to genotype and integrate in statistical genetics, obscuring their impact on genome diversification and traits. We developed a tool that accurately genotypes MEVs using short-read whole-genome sequencing (WGS) and applied it to global human populations. We find unexpected population-specific MEV differences, including an Alu insertion distribution distinguishing Japanese from other populations. Integrating MEVs with expression quantitative trait loci (eQTL) maps shows that MEV classes regulate tissue-specific gene expression by shared mechanisms, including creating or attenuating enhancers and recruiting post-transcriptional regulators, supporting class-wide interpretability. MEVs more often associate with gene expression changes than SNVs, thus plausibly impacting traits. Performing genome-wide association study (GWAS) with MEVs pinpoints potential causes of disease risk, including a LINE-1 insertion associated with keloid and fasciitis. This work implicates MEVs as drivers of human divergence and disease risk.
Collapse
Affiliation(s)
- Shohei Kojima
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan.
| | - Satoshi Koyama
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
| | - Mirei Ka
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
- Next-Generation Precision Medicine Development, Integrative Genomics Laboratory, Graduate School of Medicine, Department of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Yuka Saito
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
- Graduate School of Medical Life Science, Yokohama City University, Yokohama, Japan
| | - Erica H Parrish
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
| | - Mikiko Endo
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Sadaaki Takata
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Misaki Mizukoshi
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Keiko Hikino
- Laboratory for Pharmacogenomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Atsushi Takeda
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Asami F Gelinas
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
| | - Steven M Heaton
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
| | - Rie Koide
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
| | - Anselmo J Kamada
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
- Paleovirology Lab, Department of Biology, University of Oxford, Oxford, UK
| | - Michiya Noguchi
- Cell Engineering Division, BioResource Research Center, RIKEN, Tsukuba, Japan
| | - Michiaki Hamada
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Yoichiro Kamatani
- Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yasuhiro Murakawa
- RIKEN-IFOM Joint Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Institute for the Advanced Study of Human Biology, Kyoto University, Kyoto, Japan
- IFOM ETS - the AIRC Institute of Molecular Oncology, Milan, Italy
| | - Kazuyoshi Ishigaki
- Laboratory for Human Immunogenetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yukio Nakamura
- Cell Engineering Division, BioResource Research Center, RIKEN, Tsukuba, Japan
| | - Kaoru Ito
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Nicholas F Parrish
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan.
| |
Collapse
|
2
|
Riba A, Fumagalli MR, Caselle M, Osella M. A Model-Driven Quantitative Analysis of Retrotransposon Distributions in the Human Genome. Genome Biol Evol 2021; 12:2045-2059. [PMID: 32986810 PMCID: PMC7750997 DOI: 10.1093/gbe/evaa201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2020] [Indexed: 12/21/2022] Open
Abstract
Retrotransposons, DNA sequences capable of creating copies of themselves, compose about half of the human genome and played a central role in the evolution of mammals. Their current position in the host genome is the result of the retrotranscription process and of the following host genome evolution. We apply a model from statistical physics to show that the genomic distribution of the two most populated classes of retrotransposons in human deviates from random placement, and that this deviation increases with time. The time dependence suggests a major role of the host genome dynamics in shaping the current retrotransposon distributions. Focusing on a neutral scenario, we show that a simple model based on random placement followed by genome expansion and sequence duplications can reproduce the empirical retrotransposon distributions, even though more complex and possibly selective mechanisms can have contributed. Besides the inherent interest in understanding the origin of current retrotransposon distributions, this work sets a general analytical framework to analyze quantitatively the effects of genome evolutionary dynamics on the distribution of genomic elements.
Collapse
Affiliation(s)
| | - Maria Rita Fumagalli
- Institute of Biophysics - CNR, National Research Council, Genova, Italy.,Department of Environmental Science and Policy, Center for Complexity and Biosystems, University of Milan, Milano, Italy
| | - Michele Caselle
- Department of Physics and INFN, University of Torino, Torino, Italy
| | - Matteo Osella
- Department of Physics and INFN, University of Torino, Torino, Italy
| |
Collapse
|
3
|
Integration of HIV in the Human Genome: Which Sites Are Preferential? A Genetic and Statistical Assessment. Int J Genomics 2016; 2016:2168590. [PMID: 27294106 PMCID: PMC4880676 DOI: 10.1155/2016/2168590] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 04/24/2016] [Indexed: 12/17/2022] Open
Abstract
Chromosomal fragile sites (FSs) are loci where gaps and breaks may occur and are preferential integration targets for some viruses, for example, Hepatitis B, Epstein-Barr virus, HPV16, HPV18, and MLV vectors. However, the integration of the human immunodeficiency virus (HIV) in Giemsa bands and in FSs is not yet completely clear. This study aimed to assess the integration preferences of HIV in FSs and in Giemsa bands using an in silico study. HIV integration positions from Jurkat cells were used and two nonparametric tests were applied to compare HIV integration in dark versus light bands and in FS versus non-FS (NFSs). The results show that light bands are preferential targets for integration of HIV-1 in Jurkat cells and also that it integrates with equal intensity in FSs and in NFSs. The data indicates that HIV displays different preferences for FSs compared to other viruses. The aim was to develop and apply an approach to predict the conditions and constraints of HIV insertion in the human genome which seems to adequately complement empirical data.
Collapse
|
4
|
Chandra T, Ewels PA, Schoenfelder S, Furlan-Magaril M, Wingett SW, Kirschner K, Thuret JY, Andrews S, Fraser P, Reik W. Global reorganization of the nuclear landscape in senescent cells. Cell Rep 2015; 10:471-83. [PMID: 25640177 PMCID: PMC4542308 DOI: 10.1016/j.celrep.2014.12.055] [Citation(s) in RCA: 219] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Revised: 11/13/2014] [Accepted: 12/22/2014] [Indexed: 02/03/2023] Open
Abstract
Cellular senescence has been implicated in tumor suppression, development, and aging and is accompanied by large-scale chromatin rearrangements, forming senescence-associated heterochromatic foci (SAHF). However, how the chromatin is reorganized during SAHF formation is poorly understood. Furthermore, heterochromatin formation in senescence appears to contrast with loss of heterochromatin in Hutchinson-Gilford progeria. We mapped architectural changes in genome organization in cellular senescence using Hi-C. Unexpectedly, we find a dramatic sequence- and lamin-dependent loss of local interactions in heterochromatin. This change in local connectivity resolves the paradox of opposing chromatin changes in senescence and progeria. In addition, we observe a senescence-specific spatial clustering of heterochromatic regions, suggesting a unique second step required for SAHF formation. Comparison of embryonic stem cells (ESCs), somatic cells, and senescent cells shows a unidirectional loss in local chromatin connectivity, suggesting that senescence is an endpoint of the continuous nuclear remodelling process during differentiation.
Collapse
Affiliation(s)
- Tamir Chandra
- Epigenetics Programme, The Babraham Institute, Cambridge CB22 3AT, UK; The Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK.
| | | | | | | | | | - Kristina Kirschner
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK
| | - Jean-Yves Thuret
- CEA, iBiTec-S, SBIGeM/CNRS FRE3377 I2BC/Université Paris-Sud, Gif-sur-Yvette 91191, France
| | - Simon Andrews
- Bioinformatics Group, The Babraham Institute, Cambridge CB22 3AT, UK
| | - Peter Fraser
- Nuclear Dynamics Programme, The Babraham Institute, Cambridge CB22 3AT, UK
| | - Wolf Reik
- Epigenetics Programme, The Babraham Institute, Cambridge CB22 3AT, UK; The Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK
| |
Collapse
|
5
|
Hellen EH, Brookfield JF. Alu elements in primates are preferentially lost from areas of high GC content. PeerJ 2013; 1:e78. [PMID: 23717800 PMCID: PMC3661076 DOI: 10.7717/peerj.78] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Accepted: 05/02/2013] [Indexed: 11/20/2022] Open
Abstract
The currently-accepted dogma when analysing human Alu transposable elements is that ‘young’ Alu elements are found in low GC regions and ‘old’ Alus in high GC regions. The correlation between high GC regions and high gene frequency regions make this observation particularly difficult to explain. Although a number of studies have tackled the problem, no analysis has definitively explained the reason for this trend. These observations have been made by relying on the subfamily as a proxy for age of an element. In this study, we suggest that this is a misleading assumption and instead analyse the relationship between the taxonomic distribution of an individual element and its surrounding GC environment. An analysis of 103906 Alu elements across 6 human chromosomes was carried out, using the presence of orthologous Alu elements in other primate species as a proxy for age. We show that the previously-reported effect of GC content correlating with subfamily age is not reflected by the ages of the individual elements. Instead, elements are preferentially lost from areas of high GC content over time. The correlation between GC content and subfamily may be due to a change in insertion bias in the young subfamilies. The link between Alu subfamily age and GC region was made due to an over-simplification of the data and is incorrect. We suggest that use of subfamilies as a proxy for age is inappropriate and that the analysis of ortholog presence in other primate species provides a deeper insight into the data.
Collapse
Affiliation(s)
- Elizabeth Hb Hellen
- Centre for Genetics and Genomics, School of Biology, University of Nottingham , University Park, Nottingham , UK
| | | |
Collapse
|
6
|
Li W, Sosa D, Jose MV. Human repetitive sequence densities are mostly negatively correlated with R/Y-based nucleosome-positioning motifs and positively correlated with W/S-based motifs. Genomics 2013; 101:125-33. [DOI: 10.1016/j.ygeno.2012.10.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Revised: 10/28/2012] [Accepted: 10/29/2012] [Indexed: 01/25/2023]
|
7
|
Carels N, Frías D. A Statistical Method without Training Step for the Classification of Coding Frame in Transcriptome Sequences. Bioinform Biol Insights 2013; 7:35-54. [PMID: 23400232 PMCID: PMC3561939 DOI: 10.4137/bbi.s10053] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
In this study, we investigated the modalities of coding open reading frame (cORF) classification of expressed sequence tags (EST) by using the universal feature method (UFM). The UFM algorithm is based on the scoring of purine bias (Rrr) and stop codon frequencies. UFM classifies ORFs as coding or non-coding through a score based on 5 factors: (i) stop codon frequency; (ii) the product of the probabilities of purines occurring in the three positions of nucleotide triplets; (iii) the product of the probabilities of Cytosine (C), Guanine (G), and Adenine (A) occurring in the 1st, 2nd, and 3rd positions of triplets, respectively; (iv) the probabilities of a G occurring in the 1st and 2nd positions of triplets; and (v) the probabilities of a T occurring in the 1st and an A in the 2nd position of triplets. Because UFM is based on primary determinants of coding sequences that are conserved throughout the biosphere, it is suitable for cORF classification of any sequence in eukaryote transcriptomes without prior knowledge. Considering the protein sequences of the Protein Data Bank (RCSB PDB or more simply PDB) as a reference, we found that UFM classifies cORFs of ≥200 bp (if the coding strand is known) and cORFs of ≥300 bp (if the coding strand is unknown), and releases them in their coding strand and coding frame, which allows their automatic translation into protein sequences with a success rate equal to or higher than 95%. We first established the statistical parameters of UFM using ESTs from Plasmodium falciparum, Arabidopsis thaliana, Oryza sativa, Zea mays, Drosophila melanogaster, Homo sapiens and Chlamydomonas reinhardtii in reference to the protein sequences of PDB. Second, we showed that the success rate of cORF classification using UFM is expected to apply to approximately 95% of higher eukaryote genes that encode for proteins. Third, we used UFM in combination with CAP3 to assemble large EST samples into cORFs that we used to analyze transcriptome phenotypes in rice, maize, and humans. We discuss the error rate and the interference of noisy sequences such as pseudogenes, transposons, and retrotransposons. This method is suitable for rapid cORF extraction from transcriptome data and allows correct description of the genome phenotypes of plant genomes without prior knowledge. Additional care is necessary when addressing the human transcriptome due to the interference caused by large amounts of noisy sequences. UFM can be regarded as a low complexity tool for prior knowledge extraction concerning the coding fraction of the transcriptome of any eukaryote. Due to its low level of complexity, UFM is also very robust to variations of codon usage.
Collapse
Affiliation(s)
- Nicolas Carels
- Fundação Oswaldo Cruz (FIOCRUZ), Instituto Oswaldo Cruz (IOC), Laboratório de Genômica Funcional e Bioinformática, Rio de Janeiro, RJ, Brazil
| | | |
Collapse
|
8
|
van der Kuyl AC, Berkhout B. The biased nucleotide composition of the HIV genome: a constant factor in a highly variable virus. Retrovirology 2012; 9:92. [PMID: 23131071 PMCID: PMC3511177 DOI: 10.1186/1742-4690-9-92] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Accepted: 10/14/2012] [Indexed: 01/09/2023] Open
Abstract
Viruses often deviate from their hosts in the nucleotide composition of their genomes. The RNA genome of the lentivirus family of retroviruses, including human immunodeficiency virus (HIV), contains e.g. an above average percentage of adenine (A) nucleotides, while being extremely poor in cytosine (C). Such a deviant base composition has implications for the amino acids that are encoded by the open reading frames (ORFs), both in the requirement of specific tRNA species and in the preference for amino acids encoded by e.g. A-rich codons. Nucleotide composition does obviously affect the secondary and tertiary structure of the RNA genome and its biological functions, but it does also influence phylogenetic analysis of viral genome sequences, and possibly the activity of the integrated DNA provirus. Over time, the nucleotide composition of the HIV-1 genome is exceptionally conserved, varying by less than 1% per base position per isolate within either group M, N, or O during 1983–2009. This extreme stability of the nucleotide composition may possibly be achieved by negative selection, perhaps conserving semi-stable RNA secondary structure as reverse transcription would be significantly affected for a less A-rich genome where secondary structures are expected to be more stable and thus more difficult to unfold. This review will discuss all aspects of the lentiviral genome composition, both of the RNA and of its derived double-stranded DNA genome, with a focus on HIV-1, the nucleotide composition over time, the effects of artificially humanized codons as well as contributions of immune system pressure on HIV nucleotide bias.
Collapse
Affiliation(s)
- Antoinette C van der Kuyl
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam, Academic Medical Center of the University of Amsterdam, Meibergdreef 15, Amsterdam, AZ 1105, The Netherlands.
| | | |
Collapse
|
9
|
Abstract
Repetitive sequences, especially transposon-derived interspersed repetitive elements, account for a large fraction of the genome in most eukaryotes. Despite the repetitive nature, these transposable elements display quantitative and qualitative differences even among species of the same lineage. Although transposable elements contribute greatly as a driving force to the biological diversity during evolution, they can induce embryonic lethality and genetic disorders as a result of insertional mutagenesis and genomic rearrangement. Temporary relaxation of the epigenetic control of retrotransposons during early germline development opens a risky window that can allow retrotransposons to escape from host constraints and to propagate abundantly in the host genome. Because germline mutations caused by retrotransposon activation are heritable and thus can be deleterious to the offspring, an adaptive strategy has evolved in host cells, especially in the germline. In this review, we will attempt to summarize general defense mechanisms deployed by the eukaryotic genome, with an emphasis on pathways utilized by the male germline to confer retrotransposon silencing.
Collapse
Affiliation(s)
- Jianqiang Bao
- Department of Physiology and Cell Biology, University of Nevada School of Medicine, Reno, Nevada, USA
| | | |
Collapse
|