1
|
Steely CJ, Watkins WS, Baird L, Jorde LB. The mutational dynamics of short tandem repeats in large, multigenerational families. Genome Biol 2022; 23:253. [PMID: 36510265 PMCID: PMC9743774 DOI: 10.1186/s13059-022-02818-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 11/17/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Short tandem repeats (STRs) compose approximately 3% of the genome, and mutations at STR loci have been linked to dozens of human diseases including amyotrophic lateral sclerosis, Friedreich ataxia, Huntington disease, and fragile X syndrome. Improving our understanding of these mutations would increase our knowledge of the mutational dynamics of the genome and may uncover additional loci that contribute to disease. To estimate the genome-wide pattern of mutations at STR loci, we analyze blood-derived whole-genome sequencing data for 544 individuals from 29 three-generation CEPH pedigrees. These pedigrees contain both sets of grandparents, the parents, and an average of 9 grandchildren per family. RESULTS We use HipSTR to identify de novo STR mutations in the 2nd generation of these pedigrees and require transmission to the third generation for validation. Analyzing approximately 1.6 million STR loci, we estimate the empirical de novo STR mutation rate to be 5.24 × 10-5 mutations per locus per generation. Perfect repeats mutate about 2 × more often than imperfect repeats. De novo STRs are significantly enriched in Alu elements. CONCLUSIONS Approximately 30% of new STR mutations occur within Alu elements, which compose only 11% of the genome, but only 10% are found in LINE-1 insertions, which compose 17% of the genome. Phasing these mutations to the parent of origin shows that parental transmission biases vary among families. We estimate the average number of de novo genome-wide STR mutations per individual to be approximately 85, which is similar to the average number of observed de novo single nucleotide variants.
Collapse
Affiliation(s)
- Cody J. Steely
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - W. Scott Watkins
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - Lisa Baird
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| | - Lynn B. Jorde
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112 USA
| |
Collapse
|
2
|
Balachandran P, Walawalkar IA, Flores JI, Dayton JN, Audano PA, Beck CR. Transposable element-mediated rearrangements are prevalent in human genomes. Nat Commun 2022; 13:7115. [PMID: 36402840 PMCID: PMC9675761 DOI: 10.1038/s41467-022-34810-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/08/2022] [Indexed: 11/21/2022] Open
Abstract
Transposable elements constitute about half of human genomes, and their role in generating human variation through retrotransposition is broadly studied and appreciated. Structural variants mediated by transposons, which we call transposable element-mediated rearrangements (TEMRs), are less well studied, and the mechanisms leading to their formation as well as their broader impact on human diversity are poorly understood. Here, we identify 493 unique TEMRs across the genomes of three individuals. While homology directed repair is the dominant driver of TEMRs, our sequence-resolved TEMR resource allows us to identify complex inversion breakpoints, triplications or other high copy number polymorphisms, and additional complexities. TEMRs are enriched in genic loci and can create potentially important risk alleles such as a deletion in TRIM65, a known cancer biomarker and therapeutic target. These findings expand our understanding of this important class of structural variation, the mechanisms responsible for their formation, and establish them as an important driver of human diversity.
Collapse
Affiliation(s)
| | | | - Jacob I Flores
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jacob N Dayton
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA.
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
3
|
Navarro-Lafuente F, Adoamnei E, Arense-Gonzalo JJ, Prieto-Sánchez MT, Sánchez-Ferrer ML, Parrado A, Fernández MF, Suarez B, López-Acosta A, Sánchez-Guillamón A, García-Marcos L, Morales E, Mendiola J, Torres-Cantero AM. Maternal urinary concentrations of bisphenol A during pregnancy are associated with global DNA methylation in cord blood of newborns in the "NELA" birth cohort. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 838:156540. [PMID: 35688234 DOI: 10.1016/j.scitotenv.2022.156540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 06/03/2022] [Accepted: 06/03/2022] [Indexed: 06/15/2023]
Abstract
Endocrine disrupting chemicals (EDCs) set a public health risk through disruption of normal physiological processes. The toxicoepigenetic mechanisms of developmental exposure to common EDCs, such as bisphenol A (BPA), are poorly known. The present study aimed to evaluate associations between perinatal maternal urinary concentrations of BPA, bisphenol S (BPS) and bisphenol F (BPF) and LINE-1 (long interspersed nuclear elements) and Alu (short interspersed nuclear elements, SINEs) DNA methylation levels in newborns, as surrogate markers of global DNA methylation. Data come from 318 mother-child pairs of the `Nutrition in Early Life and Asthma´ (NELA) birth cohort. Urinary bisphenol concentration was measured by dispersive liquid-liquid microextraction and ultrahigh performance liquid chromatography with tandem mass spectrometry detection. DNA methylation was quantitatively assessed by bisulphite pyrosequencing on 3 LINEs and 5 SINEs. Unadjusted linear regression analyses showed that higher concentration of maternal urinary BPA in 24th week's pregnancy was associated with an increase in LINE-1 methylation in all newborns (p = 0.01) and, particularly, in male newborns (p = 0.03). These associations remained in full adjusted models [beta = 0.09 (95 % CI = 0.03; 0.14) for all newborns; and beta = 0.10 (95 % CI = 0.03; 0.17) for males], including a non-linear association for female newborns as well (p-trend = 0.003). No associations were found between maternal concentrations of bisphenol and Alu sequences. Our results suggest that exposure to environmental levels of BPA may be associated with a modest increase in LINE-1 methylation -as a relevant marker of epigenomic stability- during human fetal development. However, any effects on global DNA methylation are likely to be small, and of uncertain biological significance.
Collapse
Affiliation(s)
| | - Evdochia Adoamnei
- University of Murcia, Murcia, Spain; Biomedical Research Institute of Murcia (IMIB), Murcia, Spain.
| | - Julián J Arense-Gonzalo
- University of Murcia, Murcia, Spain; Biomedical Research Institute of Murcia (IMIB), Murcia, Spain
| | - María T Prieto-Sánchez
- University of Murcia, Murcia, Spain; Biomedical Research Institute of Murcia (IMIB), Murcia, Spain; "Virgen de la Arrixaca" University Clinical Hospital, Murcia, Spain
| | - María L Sánchez-Ferrer
- University of Murcia, Murcia, Spain; Biomedical Research Institute of Murcia (IMIB), Murcia, Spain; "Virgen de la Arrixaca" University Clinical Hospital, Murcia, Spain
| | - Antonio Parrado
- Biomedical Research Institute of Murcia (IMIB), Murcia, Spain
| | - Mariana F Fernández
- University of Granada, Centro de Investigación Biomédica, Granada, Spain; Instituto de Investigación Biosanitaria Ibs. Granada, Granada, Spain; Consortium for Biomedical Research in Epidemiology and Public Health (CIBER Epidemiología y Salud Pública, CIBERESP), Instituto de Salud Carlos III, Madrid, Spain
| | - Beatriz Suarez
- University of Granada, Centro de Investigación Biomédica, Granada, Spain; Instituto de Investigación Biosanitaria Ibs. Granada, Granada, Spain; Consortium for Biomedical Research in Epidemiology and Public Health (CIBER Epidemiología y Salud Pública, CIBERESP), Instituto de Salud Carlos III, Madrid, Spain
| | | | | | - Luis García-Marcos
- University of Murcia, Murcia, Spain; Biomedical Research Institute of Murcia (IMIB), Murcia, Spain; "Virgen de la Arrixaca" University Clinical Hospital, Murcia, Spain; Network of Asthma and Adverse and Allergic Reactions (ARADyAL), Spain
| | - Eva Morales
- University of Murcia, Murcia, Spain; Biomedical Research Institute of Murcia (IMIB), Murcia, Spain
| | - Jaime Mendiola
- University of Murcia, Murcia, Spain; Biomedical Research Institute of Murcia (IMIB), Murcia, Spain; Consortium for Biomedical Research in Epidemiology and Public Health (CIBER Epidemiología y Salud Pública, CIBERESP), Instituto de Salud Carlos III, Madrid, Spain
| | - Alberto M Torres-Cantero
- University of Murcia, Murcia, Spain; Biomedical Research Institute of Murcia (IMIB), Murcia, Spain; "Virgen de la Arrixaca" University Clinical Hospital, Murcia, Spain; Consortium for Biomedical Research in Epidemiology and Public Health (CIBER Epidemiología y Salud Pública, CIBERESP), Instituto de Salud Carlos III, Madrid, Spain
| |
Collapse
|
4
|
Do Ty3/Gypsy Transposable Elements Play Preferential Roles in Sex Chromosome Differentiation? Life (Basel) 2022; 12:life12040522. [PMID: 35455013 PMCID: PMC9025612 DOI: 10.3390/life12040522] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 03/13/2022] [Accepted: 03/30/2022] [Indexed: 12/16/2022] Open
Abstract
Transposable elements (TEs) comprise a substantial portion of eukaryotic genomes. They have the unique ability to integrate into new locations and serve as the main source of genomic novelties by mediating chromosomal rearrangements and regulating portions of functional genes. Recent studies have revealed that TEs are abundant in sex chromosomes. In this review, we propose evolutionary relationships between specific TEs, such as Ty3/Gypsy, and sex chromosomes in different lineages based on the hypothesis that these elements contributed to sex chromosome differentiation processes. We highlight how TEs can drive the dynamics of sex-determining regions via suppression recombination under a selective force to affect the organization and structural evolution of sex chromosomes. The abundance of TEs in the sex-determining regions originates from TE-poor genomic regions, suggesting a link between TE accumulation and the emergence of the sex-determining regions. TEs are generally considered to be a hallmark of chromosome degeneration. Finally, we outline recent approaches to identify TEs and study their sex-related roles and effects in the differentiation and evolution of sex chromosomes.
Collapse
|
5
|
Riba A, Fumagalli MR, Caselle M, Osella M. A Model-Driven Quantitative Analysis of Retrotransposon Distributions in the Human Genome. Genome Biol Evol 2021; 12:2045-2059. [PMID: 32986810 PMCID: PMC7750997 DOI: 10.1093/gbe/evaa201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2020] [Indexed: 12/21/2022] Open
Abstract
Retrotransposons, DNA sequences capable of creating copies of themselves, compose about half of the human genome and played a central role in the evolution of mammals. Their current position in the host genome is the result of the retrotranscription process and of the following host genome evolution. We apply a model from statistical physics to show that the genomic distribution of the two most populated classes of retrotransposons in human deviates from random placement, and that this deviation increases with time. The time dependence suggests a major role of the host genome dynamics in shaping the current retrotransposon distributions. Focusing on a neutral scenario, we show that a simple model based on random placement followed by genome expansion and sequence duplications can reproduce the empirical retrotransposon distributions, even though more complex and possibly selective mechanisms can have contributed. Besides the inherent interest in understanding the origin of current retrotransposon distributions, this work sets a general analytical framework to analyze quantitatively the effects of genome evolutionary dynamics on the distribution of genomic elements.
Collapse
Affiliation(s)
| | - Maria Rita Fumagalli
- Institute of Biophysics - CNR, National Research Council, Genova, Italy.,Department of Environmental Science and Policy, Center for Complexity and Biosystems, University of Milan, Milano, Italy
| | - Michele Caselle
- Department of Physics and INFN, University of Torino, Torino, Italy
| | - Matteo Osella
- Department of Physics and INFN, University of Torino, Torino, Italy
| |
Collapse
|
6
|
Metze K, Adam R, Florindo JB. The fractal dimension of chromatin - a potential molecular marker for carcinogenesis, tumor progression and prognosis. Expert Rev Mol Diagn 2019; 19:299-312. [DOI: 10.1080/14737159.2019.1597707] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Konradin Metze
- Department of Pathology, Faculty of Medical Sciences, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Randall Adam
- Department of Pathology, Faculty of Medical Sciences, State University of Campinas (UNICAMP), Campinas, Brazil
| | - João Batista Florindo
- Department of Applied Mathematics, Institute of Mathematics, Statistics and Scientific Computing, State University of Campinas, Campinas, Brazil
| |
Collapse
|
7
|
Shin W, Mun S, Kim J, Lee W, Park DG, Choi S, Lee TY, Cha S, Han K. Novel Discovery of LINE-1 in a Korean Individual by a Target Enrichment Method. Mol Cells 2019; 42:87-95. [PMID: 30699287 PMCID: PMC6354063 DOI: 10.14348/molcells.2018.0351] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 10/10/2018] [Accepted: 10/26/2018] [Indexed: 11/27/2022] Open
Abstract
Long interspersed element-1 (LINE-1 or L1) is an autonomous retrotransposon, which is capable of inserting into a new region of genome. Previous studies have reported that these elements lead to genomic variations and altered functions by affecting gene expression and genetic networks. Mounting evidence strongly indicates that genetic diseases or various cancers can occur as a result of retrotransposition events that involve L1s. Therefore, the development of methodologies to study the structural variations and interpersonal insertion polymorphisms by L1 element-associated changes in an individual genome is invaluable. In this study, we applied a systematic approach to identify human-specific L1s (i.e., L1Hs) through the bioinformatics analysis of high-throughput next-generation sequencing data. We identified 525 candidates that could be inferred to carry non-reference L1Hs in a Korean individual genome (KPGP9). Among them, we randomly selected 40 candidates and validated that approximately 92.5% of non-reference L1Hs were inserted into a KPGP9 genome. In addition, unlike conventional methods, our relatively simple and expedited approach was highly reproducible in confirming the L1 insertions. Taken together, our findings strongly support that the identification of non-reference L1Hs by our novel target enrichment method demonstrates its future application to genomic variation studies on the risk of cancer and genetic disorders.
Collapse
Affiliation(s)
- Wonseok Shin
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116,
Korea
| | - Seyoung Mun
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116,
Korea
| | - Junse Kim
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116,
Korea
| | - Wooseok Lee
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116,
Korea
| | - Dong-Guk Park
- Department of Surgery, Dankook University College of Medicine, Cheonan 31116,
Korea
| | - Seungkyu Choi
- Department of Pathology, Dankook University College of Medicine, Cheonan 31116,
Korea
| | - Tae Yoon Lee
- Department of Technology Education and Department of Biomedical Engineering, Chungnam National University, Daejeon 34134,
Korea
| | - Seunghee Cha
- Department of Oral and Maxillofacial Diagnostic Sciences, University of Florida College of Dentistry, Gainesville, FL 32610,
USA
| | - Kyudong Han
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116,
Korea
| |
Collapse
|
8
|
ALUminating the Path of Atherosclerosis Progression: Chaos Theory Suggests a Role for Alu Repeats in the Development of Atherosclerotic Vascular Disease. Int J Mol Sci 2018; 19:ijms19061734. [PMID: 29895733 PMCID: PMC6032270 DOI: 10.3390/ijms19061734] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Revised: 06/04/2018] [Accepted: 06/09/2018] [Indexed: 12/12/2022] Open
Abstract
Atherosclerosis (ATH) and coronary artery disease (CAD) are chronic inflammatory diseases with an important genetic background; they derive from the cumulative effect of multiple common risk alleles, most of which are located in genomic noncoding regions. These complex diseases behave as nonlinear dynamical systems that show a high dependence on their initial conditions; thus, long-term predictions of disease progression are unreliable. One likely possibility is that the nonlinear nature of ATH could be dependent on nonlinear correlations in the structure of the human genome. In this review, we show how chaos theory analysis has highlighted genomic regions that have shared specific structural constraints, which could have a role in ATH progression. These regions were shown to be enriched with repetitive sequences of the Alu family, genomic parasites that have colonized the human genome, which show a particular secondary structure and are involved in the regulation of gene expression. Here, we show the impact of Alu elements on the mechanisms that regulate gene expression, especially highlighting the molecular mechanisms via which the Alu elements alter the inflammatory response. We devote special attention to their relationship with the long noncoding RNA (lncRNA); antisense noncoding RNA in the INK4 locus (ANRIL), a risk factor for ATH; their role as microRNA (miRNA) sponges; and their ability to interfere with the regulatory circuitry of the (nuclear factor kappa B) NF-κB response. We aim to characterize ATH as a nonlinear dynamic system, in which small initial alterations in the expression of a number of repetitive elements are somehow amplified to reach phenotypic significance.
Collapse
|
9
|
Manzardo AM, Butler MG. Examination of Global Methylation and Targeted Imprinted Genes in Prader-Willi Syndrome. ACTA ACUST UNITED AC 2017; 2. [PMID: 28111641 DOI: 10.21767/2472-1158.100026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
CONTEXT Methylation changes observed in Prader-Willi syndrome (PWS) may impact global methylation as well as regional methylation status of imprinted genes on chromosome 15 (in cis) or other imprinted obesity-related genes on other chromosomes (in trans) leading to differential effects on gene expression impacting obesity phenotype unique to (PWS). OBJECTIVE Characterize the global methylation profiles and methylation status for select imprinted genes associated with obesity phenotype in a well-characterized imprinted, obesity-related syndrome (PWS) relative to a cohort of obese and non-obese individuals. DESIGN Global methylation was assayed using two methodologies: 1) enriched LINE-1 repeat sequences by EpigenDx and 2) ELISA-based immunoassay method sensitive to genomic 5-methylcytosine by Epigentek. Target gene methylation patterns at selected candidate obesity gene loci were determined using methylation-specific PCR. SETTING Study participants were recruited as part of an ongoing research program on obesity-related genomics and Prader-Willi syndrome. PARTICIPANTS Individuals with non-syndromic obesity (N=26), leanness (N=26) and PWS (N=39). RESULTS A detailed characterization of the imprinting status of select target genes within the critical PWS 15q11-q13 genomic region showed enhanced cis but not trans methylation of imprinted genes. No significant differences in global methylation were found between non-syndromic obese, PWS or non-obese controls. INTERVENTION None. MAIN OUTCOME MEASURES Percentage methylation and the methylation index. CONCLUSION The methylation abnormality in PWS due to errors of genomic imprinting effects both upstream and downstream effectors in the 15q11-q13 region showing enhanced cis but not trans methylation of imprinted genes. Obesity in our subject cohorts did not appear to impact global methylation levels using the described methodology.
Collapse
Affiliation(s)
- A M Manzardo
- Department of Psychiatry and Behavioral Sciences, University of Kansas Medical Center, 3901 Rainbow Blvd, MS 4015, Kansas City, Kansas, USA
| | - M G Butler
- Department of Psychiatry and Behavioral Sciences, University of Kansas Medical Center, 3901 Rainbow Blvd, MS 4015, Kansas City, Kansas, USA; Department of Pediatrics, University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
10
|
Mezzasalma M, Visone V, Petraccioli A, Odierna G, Capriglione T, Guarino FM. Non-random accumulation of LINE1-like sequences on differentiated snake W chromosomes. J Zool (1987) 2016. [DOI: 10.1111/jzo.12355] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- M. Mezzasalma
- Department of Biology; University of Naples Federico II; Naples Italy
| | - V. Visone
- Department of Biology; University of Naples Federico II; Naples Italy
| | - A. Petraccioli
- Department of Biology; University of Naples Federico II; Naples Italy
| | - G. Odierna
- Department of Biology; University of Naples Federico II; Naples Italy
| | - T. Capriglione
- Department of Biology; University of Naples Federico II; Naples Italy
| | - F. M. Guarino
- Department of Biology; University of Naples Federico II; Naples Italy
| |
Collapse
|
11
|
Polychronopoulos D, Athanasopoulou L, Almirantis Y. Fractality and entropic scaling in the chromosomal distribution of conserved noncoding elements in the human genome. Gene 2016; 584:148-60. [DOI: 10.1016/j.gene.2016.02.022] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 01/22/2016] [Accepted: 02/14/2016] [Indexed: 11/15/2022]
|
12
|
Colliva A, Pellegrini R, Testori A, Caselle M. Ising-model description of long-range correlations in DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:052703. [PMID: 26066195 DOI: 10.1103/physreve.91.052703] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Indexed: 06/04/2023]
Abstract
We model long-range correlations of nucleotides in the human DNA sequence using the long-range one-dimensional (1D) Ising model. We show that, for distances between 10(3) and 10(6) bp, the correlations show a universal behavior and may be described by the non-mean-field limit of the long-range 1D Ising model. This allows us to make some testable hypothesis on the nature of the interaction between distant portions of the DNA chain which led to the DNA structure that we observe today in higher eukaryotes.
Collapse
Affiliation(s)
- A Colliva
- Dipartimento di Fisica dell'Università di Torino and I.N.F.N. sez. di Torino, Via Pietro Giuria 1, I-10125 Torino, Italy
| | - R Pellegrini
- Physics Department, Swansea University, Singleton Park, Swansea SA2 8PP, UK
| | - A Testori
- Dipartimento di Fisica dell'Università di Torino and I.N.F.N. sez. di Torino, Via Pietro Giuria 1, I-10125 Torino, Italy
| | - M Caselle
- Dipartimento di Fisica dell'Università di Torino and I.N.F.N. sez. di Torino, Via Pietro Giuria 1, I-10125 Torino, Italy
| |
Collapse
|
13
|
|
14
|
Tsiagkas G, Nikolaou C, Almirantis Y. Orphan and gene related CpG Islands follow power-law-like distributions in several genomes: evidence of function-related and taxonomy-related modes of distribution. Comput Biol Chem 2014; 53 Pt A:84-96. [PMID: 25242375 DOI: 10.1016/j.compbiolchem.2014.08.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
CpG Islands (CGIs) are compositionally defined short genomic stretches, which have been studied in the human, mouse, chicken and later in several other genomes. Initially, they were assigned the role of transcriptional regulation of protein-coding genes, especially the house-keeping ones, while more recently there is found evidence that they are involved in several other functions as well, which might include regulation of the expression of RNA genes, DNA replication etc. Here, an investigation of their distributional characteristics in a variety of genomes is undertaken for both whole CGI populations as well as for CGI subsets that lie away from known genes (gene-unrelated or "orphan" CGIs). In both cases power-law-like linearity in double logarithmic scale is found. An evolutionary model, initially put forward for the explanation of a similar pattern found in gene populations is implemented. It includes segmental duplication events and eliminations of most of the duplicated CGIs, while a moderate rate of non-duplicated CGI eliminations is also applied in some cases. Simulations reproduce all the main features of the observed inter-CGI chromosomal size distributions. Our results on power-law-like linearity found in orphan CGI populations suggest that the observed distributional pattern is independent of the analogous pattern that protein coding segments were reported to follow. The power-law-like patterns in the genomic distributions of CGIs described herein are found to be compatible with several other features of the composition, abundance or functional role of CGIs reported in the current literature across several genomes, on the basis of the proposed evolutionary model.
Collapse
Affiliation(s)
- Giannis Tsiagkas
- Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", 15310 Athens, Greece
| | - Christoforos Nikolaou
- Computational Genomics Group, Department of Biology, University of Crete, 71409 Heraklion, Greece
| | - Yannis Almirantis
- Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", 15310 Athens, Greece.
| |
Collapse
|
15
|
Dios F, Barturen G, Lebrón R, Rueda A, Hackenberg M, Oliver JL. DNA clustering and genome complexity. Comput Biol Chem 2014; 53 Pt A:71-8. [PMID: 25182383 DOI: 10.1016/j.compbiolchem.2014.08.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 01/08/2023]
Abstract
Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of 'clusters-within-clusters' parallels the 'domains within domains' phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering.
Collapse
Affiliation(s)
- Francisco Dios
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - Guillermo Barturen
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - Ricardo Lebrón
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - Antonio Rueda
- Plataforma Andaluza de Genómica y Bioinformática (GBPA), Edificio INSUR, Calle Albert Einstein, 41092 Sevilla, Spain
| | - Michael Hackenberg
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - José L Oliver
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain.
| |
Collapse
|
16
|
Li W, Freudenberg J. Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. Comput Biol Chem 2014; 53 Pt A:108-17. [PMID: 25241312 DOI: 10.1016/j.compbiolchem.2014.08.015] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 12/31/2022]
Abstract
Repetitive and redundant regions of a genome are particularly problematic for mapping sequencing reads. In the present paper, we compile a list of the unmappable regions in the human genome based on the following definition: hypothetical reads with length 1 kb which cannot be uniquely mapped with zero-mismatch alignment for the described regions, considering both the forward and reverse strand. The respective collection of unmappable regions covers 0.77% of the sequence of human autosomes and 8.25% of the sex chromosomes in the reference genome GRCh37/hg19 (overall 1.23%). Not surprisingly, our unmappable regions overlap greatly with segmental duplication, transposable elements, and structural variants. About 99.8% of bases in our unmappable regions are part of either segmental duplication or transposable elements and 98.3% overlap structural variant annotations. Notably, some of these regions overlap units with important biological functions, including 4% of protein-coding genes. In contrast, these regions have zero intersection with the ultraconserved elements, very low overlap with microRNAs, tRNAs, pseudogenes, CpG islands, tandem repeats, microsatellites, sensitive non-coding regions, and the mapping blacklist regions from the ENCODE project.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, NY 11030, USA.
| | - Jan Freudenberg
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, NY 11030, USA
| |
Collapse
|
17
|
Accelerated Evolution of Fetuin Family Proteins inProtobothrops flavoviridis(Habu Snake) Serum and the Discovery of an L1-Like Genomic Element in the Intronic Sequence of a Fetuin-Encoding Gene. Biosci Biotechnol Biochem 2014; 77:582-90. [DOI: 10.1271/bbb.120829] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
18
|
Polychronopoulos D, Sellis D, Almirantis Y. Conserved noncoding elements follow power-law-like distributions in several genomes as a result of genome dynamics. PLoS One 2014; 9:e95437. [PMID: 24787386 PMCID: PMC4008492 DOI: 10.1371/journal.pone.0095437] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Accepted: 03/26/2014] [Indexed: 12/31/2022] Open
Abstract
Conserved, ultraconserved and other classes of constrained elements (collectively referred as CNEs here), identified by comparative genomics in a wide variety of genomes, are non-randomly distributed across chromosomes. These elements are defined using various degrees of conservation between organisms and several thresholds of minimal length. We here investigate the chromosomal distribution of CNEs by studying the statistical properties of distances between consecutive CNEs. We find widespread power-law-like distributions, i.e. linearity in double logarithmic scale, in the inter-CNE distances, a feature which is connected with fractality and self-similarity. Given that CNEs are often found to be spatially associated with genes, especially with those that regulate developmental processes, we verify by appropriate gene masking that a power-law-like pattern emerges irrespectively of whether elements found close or inside genes are excluded or not. An evolutionary model is put forward for the understanding of these findings that includes segmental or whole genome duplication events and eliminations (loss) of most of the duplicated CNEs. Simulations reproduce the main features of the observed size distributions. Power-law-like patterns in the genomic distributions of CNEs are in accordance with current knowledge about their evolutionary history in several genomes.
Collapse
Affiliation(s)
- Dimitris Polychronopoulos
- Institute of Biosciences and Applications, National Center for Scientific Research “Demokritos”, Athens, Greece
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | - Diamantis Sellis
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Yannis Almirantis
- Institute of Biosciences and Applications, National Center for Scientific Research “Demokritos”, Athens, Greece
- * E-mail:
| |
Collapse
|
19
|
Implications of human genome structural heterogeneity: functionally related genes tend to reside in organizationally similar genomic regions. BMC Genomics 2014; 15:252. [PMID: 24684786 PMCID: PMC4234528 DOI: 10.1186/1471-2164-15-252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Accepted: 03/21/2014] [Indexed: 01/30/2023] Open
Abstract
Background In an earlier study, we hypothesized that genomic segments with different sequence
organization patterns (OPs) might display functional specificity despite their
similar GC content. Here we tested this hypothesis by dividing the human genome
into 100 kb segments, classifying these segments into five compositional
groups according to GC content, and then characterizing each segment within the
five groups by oligonucleotide counting (k-mer analysis; also referred to as
compositional spectrum analysis, or CSA), to examine the distribution of sequence
OPs in the segments. We performed the CSA on the entire DNA, i.e., its coding and
non-coding parts the latter being much more abundant in the genome than the
former. Results We identified 38 OP-type clusters of segments that differ in their compositional
spectrum (CS) organization. Many of the segments that shared the same OP type were
enriched with genes related to the same biological processes (developmental,
signaling, etc.), components of biochemical complexes, or organelles. Thirteen
OP-type clusters showed significant enrichment in genes connected to specific
gene-ontology terms. Some of these clusters seemed to reflect certain events
during periods of horizontal gene transfer and genome expansion, and subsequent
evolution of genomic regions requiring coordinated regulation. Conclusions There may be a tendency for genes that are involved in the same biological
process, complex or organelle to use the same OP, even at a distance of ~
100 kb from the genes. Although the intergenic DNA is non-coding, the general
pattern of sequence organization (e.g., reflected in over-represented
oligonucleotide “words”) may be important and were protected, to some
extent, in the course of evolution.
Collapse
|
20
|
A Study of Fractality and Long-Range Order in the Distribution of Transposable Elements in Eukaryotic Genomes Using the Scaling Properties of Block Entropy and Box-Counting. ENTROPY 2014. [DOI: 10.3390/e16041860] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
21
|
Wong JYY, De Vivo I, Lin X, Grashow R, Cavallari J, Christiani DC. The association between global DNA methylation and telomere length in a longitudinal study of boilermakers. Genet Epidemiol 2014; 38:254-64. [PMID: 24616077 DOI: 10.1002/gepi.21796] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Revised: 11/24/2013] [Accepted: 01/07/2014] [Indexed: 01/18/2023]
Abstract
The objectives of this study were to determine if global DNA methylation, as reflected in LINE-1 and Alu elements, is associated with telomere length and whether it modifies the rate of telomeric change. A repeated-measures longitudinal study was performed with a panel of 87 boilermaker subjects. The follow-up period was 29 months. LINE-1 and Alu methylation was determined using pyrosequencing. Leukocyte relative telomere length was assessed via real-time qPCR. Linear-mixed models were used to estimate the association between DNA methylation and telomere length. A structural equation model (SEM) was used to explore the hypothesized relationship between DNA methylation, proxies of particulate matter exposure, and telomere length at baseline. There appeared to be a positive association between both LINE-1 and Alu methylation levels, and telomere length. For every incremental increase in LINE-1 methylation, there was a statistically significant 1.0 × 10(-1) (95% CI: 4.6 × 10(-2), 1.5 × 10(-1), P < 0.01) unit increase in relative telomere length, controlling for age at baseline, current and past smoking status, work history, BMI (log kg/m(2) ) and leukocyte differentials. Furthermore, for every incremental increase in Alu methylation, there was a statistically significant 6.2 × 10(-2) (95% CI: 1.0 × 10(-2), 1.1 × 10(-1), P = 0.02) unit increase in relative telomere length. The interaction between LINE-1 methylation and follow-up time was statistically significant with an estimate -9.8 × 10(-3) (95% CI: -1.8 × 10(-2), -1.9 × 10(-3), P = 0.02); suggesting that the rate of telomeric change was modified by the degree of LINE-1 methylation. No statistically significant association was found between the cumulative PM exposure construct, with global DNA methylation and telomere length at baseline.
Collapse
Affiliation(s)
- Jason Y Y Wong
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America; Department of Environmental Health, Harvard School of Public Health, Boston, Massachusetts, United States of America; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | | | | | | | | | | |
Collapse
|
22
|
Lisanti S, Omar WAW, Tomaszewski B, De Prins S, Jacobs G, Koppen G, Mathers JC, Langie SAS. Comparison of methods for quantification of global DNA methylation in human cells and tissues. PLoS One 2013; 8:e79044. [PMID: 24260150 PMCID: PMC3832524 DOI: 10.1371/journal.pone.0079044] [Citation(s) in RCA: 121] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 09/26/2013] [Indexed: 12/25/2022] Open
Abstract
DNA methylation is a key epigenetic modification which, in mammals, occurs mainly at CpG dinucleotides. Most of the CpG methylation in the genome is found in repetitive regions, rich in dormant transposons and endogenous retroviruses. Global DNA hypomethylation, which is a common feature of several conditions such as ageing and cancer, can cause the undesirable activation of dormant repeat elements and lead to altered expression of associated genes. DNA hypomethylation can cause genomic instability and may contribute to mutations and chromosomal recombinations. Various approaches for quantification of global DNA methylation are widely used. Several of these approaches measure a surrogate for total genomic methyl cytosine and there is uncertainty about the comparability of these methods. Here we have applied 3 different approaches (luminometric methylation assay, pyrosequencing of the methylation status of the Alu repeat element and of the LINE1 repeat element) for estimating global DNA methylation in the same human cell and tissue samples and have compared these estimates with the "gold standard" of methyl cytosine quantification by HPLC. Next to HPLC, the LINE1 approach shows the smallest variation between samples, followed by Alu. Pearson correlations and Bland-Altman analyses confirmed that global DNA methylation estimates obtained via the LINE1 approach corresponded best with HPLC-based measurements. Although, we did not find compelling evidence that the gold standard measurement by HPLC could be substituted with confidence by any of the surrogate assays for detecting global DNA methylation investigated here, the LINE1 assay seems likely to be an acceptable surrogate in many cases.
Collapse
Affiliation(s)
- Sofia Lisanti
- Human Nutrition Research Centre, Institute for Ageing and Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne, United Kingdom
- Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne, United Kingdom
| | - Wan A. W. Omar
- Human Nutrition Research Centre, Institute for Ageing and Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne, United Kingdom
- Advance Medical and Dental Institute, Universiti Sains Malaysia, Penang, Malaysia
| | - Bartłomiej Tomaszewski
- Human Nutrition Research Centre, Institute for Ageing and Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne, United Kingdom
- Centre for Brain Ageing and Vitality, Institute for Ageing & Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne, United Kingdom
| | - Sofie De Prins
- Environmental Risk and Health unit, Flemish Institute for Technological Research (VITO), Mol, Belgium
- Faculty of Pharmaceutical, Biomedical and Veterinary Sciences, University of Antwerp, Antwerp, Belgium
| | - Griet Jacobs
- Environmental Risk and Health unit, Flemish Institute for Technological Research (VITO), Mol, Belgium
| | - Gudrun Koppen
- Environmental Risk and Health unit, Flemish Institute for Technological Research (VITO), Mol, Belgium
| | - John C. Mathers
- Human Nutrition Research Centre, Institute for Ageing and Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne, United Kingdom
- Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne, United Kingdom
- Centre for Brain Ageing and Vitality, Institute for Ageing & Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne, United Kingdom
| | - Sabine A. S. Langie
- Human Nutrition Research Centre, Institute for Ageing and Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne, United Kingdom
- Environmental Risk and Health unit, Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Brain Ageing and Vitality, Institute for Ageing & Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne, United Kingdom
- * E-mail:
| |
Collapse
|
23
|
Benard A, van de Velde CJH, Lessard L, Putter H, Takeshima L, Kuppen PJK, Hoon DSB. Epigenetic status of LINE-1 predicts clinical outcome in early-stage rectal cancer. Br J Cancer 2013; 109:3073-83. [PMID: 24220694 PMCID: PMC3859941 DOI: 10.1038/bjc.2013.654] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Revised: 09/26/2013] [Accepted: 10/01/2013] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND We evaluated the clinical prognostic value of methylation of two non-coding repeat sequences, long interspersed element 1 (LINE-1) and Alu, in rectal tumour tissues. In addition to DNA methylation, expression of histone modifications H3K27me3 and H3K9Ac was studied in this patient cohort. METHODS LINE-1 and Alu methylation were assessed in DNA extracted from formalin-fixed paraffin-embedded tissues. A pilot (30 tumour and 25 normal tissues) and validation study (189 tumour and 53 normal tissues) were performed. Histone modifications H3K27me3 and H3K9Ac were immunohistochemically stained on tissue microarrays of the study cohort. RESULTS In early-stage rectal cancer (stage I-II), hypomethylation of LINE-1 was an independent clinical prognostic factor, showing shorter patient survival (P=0.014; HR: 4.6) and a higher chance of tumour recurrence (P=0.001; HR: 9.6). Alu methylation did not show any significant correlation with clinical parameters, suggesting an active role of LINE-1 in tumour development. Expression of H3K27me3 (silencing gene expression) and H3K9Ac (activating gene expression) in relation to methylation status of LINE-1 and Alu supported this specific role of LINE-1 methylation. CONCLUSION The epigenetic status of LINE-1, but not of Alu, is prognostic in rectal cancer, indicating an active role for LINE-1 in determining clinical outcome.
Collapse
Affiliation(s)
- A Benard
- 1] Department of Molecular Oncology, John Wayne Cancer Institute, Santa Monica, CA 90404, USA [2] Department of Surgery, Leiden University Medical Center, Leiden 2300RC, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
24
|
Abstract
Fractal characteristics of chromatin, revealed by light or electron microscopy, have been reported during the last 20 years. Fractal features can easily be estimated in digitalized microscopic images and are helpful for diagnosis and prognosis of neoplasias. During carcinogenesis and tumor progression, an increase of the fractal dimension (FD) of stained nuclei has been shown in intraepithelial lesions of the uterine cervix and the anus, oral squamous cell carcinomas or adenocarcinomas of the pancreas. Furthermore, an increased FD of chromatin is an unfavorable prognostic factor in squamous cell carcinomas of the oral cavity and the larynx, melanomas and multiple myelomas. High goodness-of-fit of the regression line of the FD is a favorable prognostic factor in acute leukemias and multiple myelomas. The nucleus has fractal and power-law organization in several different levels, which might in part be interrelated. Some possible relations between modifications of the chromatin organization during carcinogenesis and tumor progression and an increase of the FD of stained chromatin are suggested. Furthermore, increased complexity of the chromatin structure, loss of heterochromatin and a less-perfect self-organization of the nucleus in aggressive neoplasias are discussed.
Collapse
Affiliation(s)
- Konradin Metze
- Department of Pathology, Faculty of Medical Sciences Research Group, 'Analytical Cellular Pathology' and National Institute of Photonics Applied to Cell Biology, University of Campinas, Campinas, Brazil +55 19 32893897 kmetze.at.fcm.unicamp.br
| |
Collapse
|
25
|
Wagstaff BJ, Hedges DJ, Derbes RS, Campos Sanchez R, Chiaromonte F, Makova KD, Roy-Engel AM. Rescuing Alu: recovery of new inserts shows LINE-1 preserves Alu activity through A-tail expansion. PLoS Genet 2012; 8:e1002842. [PMID: 22912586 PMCID: PMC3415434 DOI: 10.1371/journal.pgen.1002842] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Accepted: 05/30/2012] [Indexed: 12/15/2022] Open
Abstract
Alu elements are trans-mobilized by the autonomous non-LTR retroelement, LINE-1 (L1). Alu-induced insertion mutagenesis contributes to about 0.1% human genetic disease and is responsible for the majority of the documented instances of human retroelement insertion-induced disease. Here we introduce a SINE recovery method that provides a complementary approach for comprehensive analysis of the impact and biological mechanisms of Alu retrotransposition. Using this approach, we recovered 226 de novo tagged Alu inserts in HeLa cells. Our analysis reveals that in human cells marked Alu inserts driven by either exogenously supplied full length L1 or ORF2 protein are indistinguishable. Four percent of de novo Alu inserts were associated with genomic deletions and rearrangements and lacked the hallmarks of retrotransposition. In contrast to L1 inserts, 5′ truncations of Alu inserts are rare, as most of the recovered inserts (96.5%) are full length. De novo Alus show a random pattern of insertion across chromosomes, but further characterization revealed an Alu insertion bias exists favoring insertion near other SINEs, highly conserved elements, with almost 60% landing within genes. De novo Alu inserts show no evidence of RNA editing. Priming for reverse transcription rarely occurred within the first 20 bp (most 5′) of the A-tail. The A-tails of recovered inserts show significant expansion, with many at least doubling in length. Sequence manipulation of the construct led to the demonstration that the A-tail expansion likely occurs during insertion due to slippage by the L1 ORF2 protein. We postulate that the A-tail expansion directly impacts Alu evolution by reintroducing new active source elements to counteract the natural loss of active Alus and minimizing Alu extinction. SINEs are mobile elements that are found ubiquitously throughout a large diversity of genomes from plants to mammals. The human SINE, Alu, is among the most successful mobile elements, with more than one million copies in the genome. Due to its high activity and ability to insert throughout the genome, Alu retrotransposition is responsible for the majority of diseases reported to be caused by mobile element activity. To further evaluate the genomic impact of SINEs, we recovered and characterized over 200 de novo Alu inserts under controlled conditions. Our data reinforce observations on the mutagenic potential of Alu, with newly retrotransposed Alu elements favoring insertion into genic and highly conserved elements. Alu-mediated deletions and rearrangements are infrequent and lack the typical hallmarks of TPRT retrotransposition, suggesting the use of an alternate method for resolving retrotransposition intermediates or an atypical insertion mechanism. Our data also provide novel insights into SINE retrotransposition biology. We found that slippage of L1 ORF2 protein during reverse transcription expands the A-tails of de novo insertions. We propose that the L1 ORF2 protein plays a major role in minimizing Alu extinction by reintroducing active Alu elements to counter the natural loss of Alu source elements.
Collapse
Affiliation(s)
- Bradley J. Wagstaff
- Tulane Cancer Center, Department of Epidemiology, Tulane University, New Orleans, Louisiana, United States of America
| | - Dale J. Hedges
- Hussman Institute for Human Genomics, Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, United States of America
| | - Rebecca S. Derbes
- Tulane Cancer Center, Department of Epidemiology, Tulane University, New Orleans, Louisiana, United States of America
| | - Rebeca Campos Sanchez
- Department of Biology, Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Francesca Chiaromonte
- Department of Biology, Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Kateryna D. Makova
- Department of Biology, Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Astrid M. Roy-Engel
- Tulane Cancer Center, Department of Epidemiology, Tulane University, New Orleans, Louisiana, United States of America
- * E-mail:
| |
Collapse
|
26
|
Frenkel S, Kirzhner V, Korol A. Organizational heterogeneity of vertebrate genomes. PLoS One 2012; 7:e32076. [PMID: 22384143 PMCID: PMC3288070 DOI: 10.1371/journal.pone.0032076] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Accepted: 01/23/2012] [Indexed: 01/06/2023] Open
Abstract
Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Collapse
Affiliation(s)
| | | | - Abraham Korol
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Mount Carmel, Haifa, Israel
| |
Collapse
|
27
|
Klimopoulos A, Sellis D, Almirantis Y. Widespread occurrence of power-law distributions in inter-repeat distances shaped by genome dynamics. Gene 2012; 499:88-98. [PMID: 22370293 DOI: 10.1016/j.gene.2012.02.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Revised: 02/05/2012] [Accepted: 02/06/2012] [Indexed: 11/25/2022]
Abstract
Repetitive DNA sequences derived from transposable elements (TE) are distributed in a non-random way, co-clustering with other classes of repeat elements, genes and other genomic components. In a previous work we reported power-law-like size distributions (linearity in log-log scale) in the spatial arrangement of Alu and LINE1 elements in the human genome. Here we investigate the large-scale features of the spatial arrangement of all principal classes of TEs in 14 genomes from phylogenetically distant organisms by studying the size distribution of inter-repeat distances. Power-law-like size distributions are found to be widespread, extending up to several orders of magnitude. In order to understand the emergence of this distributional pattern, we introduce an evolutionary scenario, which includes (i) Insertions of DNA segments (e.g., more recent repeats) into the considered sequence and (ii) Eliminations of members of the studied TE family. In the proposed model we also incorporate the potential for transposition events (characteristic of the DNA transposons' life-cycle) and segmental duplications. Simulations reproduce the main features of the observed size distributions. Furthermore, we investigate the effects of various genomic features on the presence and extent of power-law size distributions including TE class and age, mode of parental TE transmission, GC content, deletion and recombination rates in the studied genomic region, etc. Our observations corroborate the hypothesis that insertions of genomic material and eliminations of repeats are at the basis of power-laws in inter-repeat distances. The existence of these power-laws could facilitate the formation of the recently proposed "fractal globule" for the confined chromatin organization.
Collapse
Affiliation(s)
- Alexandros Klimopoulos
- National Center for Scientific Research "Demokritos," Institute of Biology, 153 10 Athens, Greece.
| | | | | |
Collapse
|
28
|
Chromatin Organization by Repetitive Elements (CORE): A Genomic Principle for the Higher-Order Structure of Chromosomes. Genes (Basel) 2011; 2:502-15. [PMID: 24710208 PMCID: PMC3927610 DOI: 10.3390/genes2030502] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Revised: 07/21/2011] [Accepted: 07/25/2011] [Indexed: 12/01/2022] Open
Abstract
Eukaryotic genomes contain a large amount of DNA repeats (also known as repetitive DNA, repetitive elements, and repetitive sequences). Here, I propose a role of repetitive DNA in the formation of higher-order structures of chromosomes. The central idea of this theory is that chromatin regions with repetitive sequences pair with regions harboring homologous repeats and that such somatic repeat pairing (RP) assembles repetitive DNA chromatin into compact chromosomal domains that specify chromatin folding in a site-directed manner. According to this theory, DNA repeats are not randomly distributed in the genome. Instead, they form a core framework that coordinates the architecture of chromosomes. In contrast to the viewpoint that DNA repeats are genomic ‘junk’, this theory advocates that repetitive sequences are chromatin organizer modules that determine chromatin-chromatin contact points within chromosomes. This novel concept, if correct, would suggest that DNA repeats in the linear genome encode a blueprint for higher-order chromosomal organization.
Collapse
|
29
|
Lim SP, Neilsen P, Kumar R, Abell A, Callen DF. The Application of Delivery Systems for DNA Methyltransferase Inhibitors. BioDrugs 2011; 25:227-42. [DOI: 10.2165/11592770-000000000-00000] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
30
|
A genome-wide analysis of FRT-like sequences in the human genome. PLoS One 2011; 6:e18077. [PMID: 21448289 PMCID: PMC3063242 DOI: 10.1371/journal.pone.0018077] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Accepted: 02/24/2011] [Indexed: 11/19/2022] Open
Abstract
Efficient and precise genome manipulations can be achieved by the
Flp/FRT system of site-specific DNA recombination.
Applications of this system are limited, however, to cases when target sites for
Flp recombinase, FRT sites, are pre-introduced into a genome
locale of interest. To expand use of the Flp/FRT system in
genome engineering, variants of Flp recombinase can be evolved to recognize
pre-existing genomic sequences that resemble FRT and thus can
serve as recombination sites. To understand the distribution and sequence
properties of genomic FRT-like sites, we performed a
genome-wide analysis of FRT-like sites in the human genome
using the experimentally-derived parameters. Out of 642,151 identified
FRT-like sequences, 581,157 sequences were unique and
12,452 sequences had at least one exact duplicate. Duplicated
FRT-like sequences are located mostly within LINE1, but
also within LTRs of endogenous retroviruses, Alu repeats and other repetitive
DNA sequences. The unique FRT-like sequences were classified
based on the number of matches to FRT within the first four
proximal bases pairs of the Flp binding elements of FRT and the
nature of mismatched base pairs in the same region. The data obtained will be
useful for the emerging field of genome engineering.
Collapse
|
31
|
Zhang W, Edwards A, Fan W, Deininger P, Zhang K. Alu distribution and mutation types of cancer genes. BMC Genomics 2011; 12:157. [PMID: 21429208 PMCID: PMC3074553 DOI: 10.1186/1471-2164-12-157] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Accepted: 03/23/2011] [Indexed: 12/24/2022] Open
Abstract
Background Alu elements are the most abundant retrotransposable elements comprising ~11% of the human genome. Many studies have highlighted the role that Alu elements have in genetic instability and how their contribution to the assortment of mutagenic events can lead to cancer. As of yet, little has been done to quantitatively assess the association between Alu distribution and genes that are causally implicated in oncogenesis. Results We have investigated the effect of various Alu densities on the mutation type based classifications of cancer genes. In order to establish the direct relationship between Alus and the cancer genes of interest, genome wide Alu-related densities were measured using genes rather than the sliding windows of fixed length as the units. Several novel genomic features, such as the density of the adjacent Alu pairs and the number of Alu-Exon-Alu triplets, were developed in order to extend the investigation via the multivariate statistical analysis toward more advanced biological insight. In addition, we characterized the genome-wide intron Alu distribution with a mixture model that distinguished genes containing Alu elements from those with no Alus, and evaluated the gene-level effect of the 5'-TTAAAA motif associated with Alu insertion sites using a two-step regression analysis method. Conclusions The study resulted in several novel findings worthy of further investigation. They include: (1) Recessive cancer genes (tumor suppressor genes) are enriched with Alu elements (p < 0.01) compared to dominant cancer genes (oncogenes) and the entire set of genes in the human genome; (2) Alu-related genomic features can be used to cluster cancer genes into biological meaningful groups; (3) The retention of exon Alus has been restricted in the human genome development, and an upper limit to the chromosome-level exon Alu densities is suggested by the distribution profile; (4) For the genes with at least one intron Alu repeat in individual chromosomes, the intron Alu densities can be well fitted by a Gamma distribution; (5) The effect of the 5'-TTAAAA motif on Alu densities varies across different chromosomes.
Collapse
Affiliation(s)
- Wensheng Zhang
- Department of Computer Science, Xavier University of Louisiana, 1 Drexel Drive, New Orleans, LA 70125, USA
| | | | | | | | | |
Collapse
|
32
|
Athanasopoulou L, Athanasopoulos S, Karamanos K, Almirantis Y. Scaling properties and fractality in the distribution of coding segments in eukaryotic genomes revealed through a block entropy approach. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 82:051917. [PMID: 21230510 DOI: 10.1103/physreve.82.051917] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2010] [Revised: 09/19/2010] [Indexed: 05/30/2023]
Abstract
Statistical methods, including block entropy based approaches, have already been used in the study of long-range features of genomic sequences seen as symbol series, either considering the full alphabet of the four nucleotides or the binary purine or pyrimidine character set. Here we explore the alternation of short protein-coding segments with long noncoding spacers in entire chromosomes, focusing on the scaling properties of block entropy. In previous studies, it has been shown that the sizes of noncoding spacers follow power-law-like distributions in most chromosomes of eukaryotic organisms from distant taxa. We have developed a simple evolutionary model based on well-known molecular events (segmental duplications followed by elimination of most of the duplicated genes) which reproduces the observed linearity in log-log plots. The scaling properties of block entropy H(n) have been studied in several works. Their findings suggest that linearity in semilogarithmic scale characterizes symbol sequences which exhibit fractal properties and long-range order, while this linearity has been shown in the case of the logistic map at the Feigenbaum accumulation point. The present work starts with the observation that the block entropy of the Cantor-like binary symbol series scales in a similar way. Then, we perform the same analysis for the full set of human chromosomes and for several chromosomes of other eukaryotes. A similar but less extended linearity in semilogarithmic scale, indicating fractality, is observed, while randomly formed surrogate sequences clearly lack this type of scaling. Genomic sequences always present entropy values much lower than their random surrogates. Symbol sequences produced by the aforementioned evolutionary model follow the scaling found in genomic sequences, thus corroborating the conjecture that "segmental duplication-gene elimination" dynamics may have contributed to the observed long rangeness in the coding or noncoding alternation in genomes.
Collapse
|
33
|
Power-laws in the genomic distribution of coding segments in several organisms: an evolutionary trace of segmental duplications, possible paleopolyploidy and gene loss. Gene 2009; 447:18-28. [PMID: 19591912 DOI: 10.1016/j.gene.2009.04.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2008] [Revised: 03/18/2009] [Accepted: 04/08/2009] [Indexed: 02/02/2023]
Abstract
Large-scale features of the spatial arrangement of protein-coding segments (PCS) are investigated by means of the inter-PCS spacers' size distributions, which have been found to follow power-laws. Linearity in double-logarithmic scale extends to several orders of magnitude in the genomes of organisms as disparate as mammals, insects and plants. This feature is also present in the most compact eukaryotic genomes and in half of the examined bacteria, despite their very limited non-coding space. We have tried to determine the sequence of events in the course of genomes' evolution which may account for the formation of the observed size distributions. The proposed mechanism essentially includes two types of events: (i) segmental duplications (and possibly paleopolyploidy), and (ii) the subsequent loss of most of the duplicated genes. It is shown by computer simulations that the formulated scenario generates power-law-like inter-PCS spacers' size distributions, which remain robust for a variety of parameter choices, even if insertion of external sequences, such as viruses or proliferating retroelements is included. Moreover, power-laws are preserved after most of the non-coding DNA has been removed, thus explaining the finding of this pattern in genomes as compact as that of Takifugu rubripes.
Collapse
|
34
|
Abstract
The fact that promoters are essential for the function of all genes presents the basis of the general idea that retrotranspositions give rise to processed pseudogenes. However, recent studies have demonstrated that some retrotransposed genes are transcriptionally active. Because promoters are not thought to be retrotransposed along with exonic sequences, these transcriptionally active genes must have acquired a functional promoter by mechanisms that are yet to be determined. Hence, comparison between a retrotransposed gene and its source gene appears to provide a unique opportunity to investigate the promoter creation for a new gene. Here, we identified 29 gene pairs in the human genome, consisting of a functional retrotransposed gene and its parental gene, and compared their respective promoters. In more than half of these cases, we unexpectedly found that a large part of the core promoter had been transcribed, reverse transcribed, and then integrated to be operative at the transposed locus. This observation can be ascribed to the recent discovery that transcription start sites tend to be interspersed rather than situated at 1 specific site. This propensity could confer retrotransposability to promoters per se. Accordingly, the retrotransposability can explain the genesis of some alternative promoters.
Collapse
Affiliation(s)
- Kohji Okamura
- Human Genome Centre, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | | |
Collapse
|