1
|
Wang H, Wu P, Xiong L, Kim HS, Kim JH, Ki JS. Nuclear genome of dinoflagellates: Size variation and insights into evolutionary mechanisms. Eur J Protistol 2024; 93:126061. [PMID: 38394997 DOI: 10.1016/j.ejop.2024.126061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 01/29/2024] [Accepted: 01/30/2024] [Indexed: 02/25/2024]
Abstract
Recent progress in high-throughput sequencing technologies has dramatically increased availability of genome data for prokaryotes and eukaryotes. Dinoflagellates have distinct chromosomes and a huge genome size, which make their genomic analysis complicated. Here, we reviewed the nuclear genomes of core dinoflagellates, focusing on the genome and cell size. Till now, the genome sizes of several dinoflagellates (more than 25) have been measured by certain methods (e.g., flow cytometry), showing a range of 3-250 pg of genomic DNA per cell. In contrast to their relatively small cell size, their genomes are huge (about 1-80 times the human haploid genome). In the present study, we collected the genome and cell size data of dinoflagellates and compared their relationships. We found that dinoflagellate genome size exhibits a positive correlation with cell size. On the other hand, we recognized that the genome size is not correlated with phylogenetic relatedness. These may be caused by genome duplication, increased gene copy number, repetitive non-coding DNA, transposon expansion, horizontal gene transfer, organelle-to-nucleus gene transfer, and/or mRNA reintegration into the genome. Ultimate verification of these factors as potential causative mechanisms would require sequencing of more dinoflagellate genomes in the future.
Collapse
Affiliation(s)
- Hui Wang
- Hunan Province Key Laboratory of Typical Environmental Pollution and Health Hazards, School of Public Health, Hengyang Medical School, University of South China, Hengyang 421001, China; Department of Life Science, Sangmyung University, Seoul 03016, Republic of Korea
| | - Peiling Wu
- Hunan Province Key Laboratory of Typical Environmental Pollution and Health Hazards, School of Public Health, Hengyang Medical School, University of South China, Hengyang 421001, China
| | - Lu Xiong
- Hunan Province Key Laboratory of Typical Environmental Pollution and Health Hazards, School of Public Health, Hengyang Medical School, University of South China, Hengyang 421001, China
| | - Han-Sol Kim
- Department of Life Science, Sangmyung University, Seoul 03016, Republic of Korea
| | - Jin Ho Kim
- Department of Earth and Marine Science, College of Ocean Sciences, Jeju National University, Jeju 63243, Republic of Korea
| | - Jang-Seu Ki
- Department of Life Science, Sangmyung University, Seoul 03016, Republic of Korea; Department of Biotechnology, Sangmyung University, Seoul 03016, Republic of Korea.
| |
Collapse
|
2
|
Polyploidy as a Fundamental Phenomenon in Evolution, Development, Adaptation and Diseases. Int J Mol Sci 2022; 23:ijms23073542. [PMID: 35408902 PMCID: PMC8998937 DOI: 10.3390/ijms23073542] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 03/21/2022] [Accepted: 03/22/2022] [Indexed: 02/02/2023] Open
Abstract
DNA replication during cell proliferation is 'vertical' copying, which reproduces an initial amount of genetic information. Polyploidy, which results from whole-genome duplication, is a fundamental complement to vertical copying. Both organismal and cell polyploidy can emerge via premature cell cycle exit or via cell-cell fusion, the latter giving rise to polyploid hybrid organisms and epigenetic hybrids of somatic cells. Polyploidy-related increase in biological plasticity, adaptation, and stress resistance manifests in evolution, development, regeneration, aging, oncogenesis, and cardiovascular diseases. Despite the prevalence in nature and importance for medicine, agri- and aquaculture, biological processes and epigenetic mechanisms underlying these fundamental features largely remain unknown. The evolutionarily conserved features of polyploidy include activation of transcription, response to stress, DNA damage and hypoxia, and induction of programs of morphogenesis, unicellularity, and longevity, suggesting that these common features confer adaptive plasticity, viability, and stress resistance to polyploid cells and organisms. By increasing cell viability, polyploidization can provide survival under stressful conditions where diploid cells cannot survive. However, in somatic cells it occurs at the expense of specific function, thus promoting developmental programming of adult cardiovascular diseases and increasing the risk of cancer. Notably, genes arising via evolutionary polyploidization are heavily involved in cancer and other diseases. Ploidy-related changes of gene expression presumably originate from chromatin modifications and the derepression of bivalent genes. The provided evidence elucidates the role of polyploidy in evolution, development, aging, and carcinogenesis, and may contribute to the development of new strategies for promoting regeneration and preventing cardiovascular diseases and cancer.
Collapse
|
3
|
Growth of Biological Complexity from Prokaryotes to Hominids Reflected in the Human Genome. Int J Mol Sci 2021; 22:ijms222111640. [PMID: 34769071 PMCID: PMC8583824 DOI: 10.3390/ijms222111640] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/20/2021] [Accepted: 10/25/2021] [Indexed: 12/12/2022] Open
Abstract
The growth of complexity in evolution is a most intriguing phenomenon. Using gene phylostratigraphy, we showed this growth (as reflected in regulatory mechanisms) in the human genome, tracing the path from prokaryotes to hominids. Generally, the different regulatory gene families expanded at different times, yet only up to the Euteleostomi (bony vertebrates). The only exception was the expansion of transcription factors (TF) in placentals; however, we argue that this was not related to increase in general complexity. Surprisingly, although TF originated in the Prokaryota while chromatin appeared only in the Eukaryota, the expansion of epigenetic factors predated the expansion of TF. Signaling receptors, tumor suppressors, oncogenes, and aging- and disease-associated genes (indicating vulnerabilities in terms of complex organization and strongly enrichment in regulatory genes) also expanded only up to the Euteleostomi. The complexity-related gene properties (protein size, number of alternative splicing mRNA, length of untranslated mRNA, number of biological processes per gene, number of disordered regions in a protein, and density of TF–TF interactions) rose in multicellular organisms and declined after the Euteleostomi, and possibly earlier. At the same time, the speed of protein sequence evolution sharply increased in the genes that originated after the Euteleostomi. Thus, several lines of evidence indicate that molecular mechanisms of complexity growth were changing with time, and in the phyletic lineage leading to humans, the most salient shift occurred after the basic vertebrate body plan was fixed with bony skeleton. The obtained results can be useful for evolutionary medicine.
Collapse
|
4
|
Anatskaya OV, Vinogradov AE, Vainshelbaum NM, Giuliani A, Erenpreisa J. Phylostratic Shift of Whole-Genome Duplications in Normal Mammalian Tissues towards Unicellularity Is Driven by Developmental Bivalent Genes and Reveals a Link to Cancer. Int J Mol Sci 2020; 21:ijms21228759. [PMID: 33228223 PMCID: PMC7699474 DOI: 10.3390/ijms21228759] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 11/15/2020] [Accepted: 11/17/2020] [Indexed: 12/17/2022] Open
Abstract
Tumours were recently revealed to undergo a phylostratic and phenotypic shift to unicellularity. As well, aggressive tumours are characterized by an increased proportion of polyploid cells. In order to investigate a possible shared causation of these two features, we performed a comparative phylostratigraphic analysis of ploidy-related genes, obtained from transcriptomic data for polyploid and diploid human and mouse tissues using pairwise cross-species transcriptome comparison and principal component analysis. Our results indicate that polyploidy shifts the evolutionary age balance of the expressed genes from the late metazoan phylostrata towards the upregulation of unicellular and early metazoan phylostrata. The up-regulation of unicellular metabolic and drug-resistance pathways and the downregulation of pathways related to circadian clock were identified. This evolutionary shift was associated with the enrichment of ploidy with bivalent genes (p < 10−16). The protein interactome of activated bivalent genes revealed the increase of the connectivity of unicellulars and (early) multicellulars, while circadian regulators were depressed. The mutual polyploidy-c-MYC-bivalent genes-associated protein network was organized by gene-hubs engaged in both embryonic development and metastatic cancer including driver (proto)-oncogenes of viral origin. Our data suggest that, in cancer, the atavistic shift goes hand-in-hand with polyploidy and is driven by epigenetic mechanisms impinging on development-related bivalent genes.
Collapse
Affiliation(s)
- Olga V. Anatskaya
- Department of Bioinformatics and Functional Genomics, Institute of Cytology, Russian Academy of sciences, 194064 St. Petersburg, Russia
- Correspondence: (O.V.A.); (A.E.V.); (J.E.)
| | - Alexander E. Vinogradov
- Department of Bioinformatics and Functional Genomics, Institute of Cytology, Russian Academy of sciences, 194064 St. Petersburg, Russia
- Correspondence: (O.V.A.); (A.E.V.); (J.E.)
| | - Ninel M. Vainshelbaum
- Department of Oncology, Latvian Biomedical Research and Study Centre, Cancer Research Division, LV-1067 Riga, Latvia;
- Faculty of Biology, University of Latvia, LV-1586 Riga, Latvia
| | | | - Jekaterina Erenpreisa
- Department of Oncology, Latvian Biomedical Research and Study Centre, Cancer Research Division, LV-1067 Riga, Latvia;
- Correspondence: (O.V.A.); (A.E.V.); (J.E.)
| |
Collapse
|
5
|
Vinogradov AE, Anatskaya OV. Systemic evolutionary changes in mammalian gene expression. Biosystems 2020; 198:104256. [PMID: 32976926 DOI: 10.1016/j.biosystems.2020.104256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 09/18/2020] [Accepted: 09/18/2020] [Indexed: 12/16/2022]
Abstract
Changes in gene expression play an important role in evolution and can be relevant to evolutionary medicine. In this work, a strong relationship was found between the statistical significance of evolutionary changes in the expression of orthologous genes in the five or six homologous mammalian tissues and the across-tissues unidirectionality of changes (i.e., they occur in the same direction in different tissues -- all upward or all downward). In the area of highly significant changes, the fraction of unidirectionally changed genes (UCG) was above 0.9 (random expectation is 0.03). This observation indicates that the most pronounced evolutionary changes in mammalian gene expression are systemic (i.e., they operate at the whole-organism level). The UCG are strongly enriched in the housekeeping genes. More specifically, in the human-chimpanzee comparison, the UCG are enriched in the pathways belonging to gene expression (translation is prominent), cell cycle control, ubiquitin-dependent protein degradation (mostly related to cell cycle control), apoptosis, and Parkinson's disease. In the human-macaque comparison, the two other neurodegenerative diseases (Alzheimer's and Huntington's) are added to the enriched pathways. The consolidation of gene expression changes at the level of pathways indicates that they are not neutral but functional. The systemic expression changes probably maintain the across-tissues balance of basic physiological processes in the course of evolution (e.g., during the movement along the fast-slow life axis). These results can be useful for understanding the variation in longevity and susceptibility to cancer and widespread neurodegenerative diseases. This approach can also guide the choice of prospective genes for studies aiming to decipher cis-regulatory code (the gene list is provided).
Collapse
Affiliation(s)
| | - Olga V Anatskaya
- Institute of Cytology, Russian Academy of Sciences, St. Petersburg, 194064, Russia
| |
Collapse
|
6
|
Oti M, Falck J, Huynen MA, Zhou H. CTCF-mediated chromatin loops enclose inducible gene regulatory domains. BMC Genomics 2016; 17:252. [PMID: 27004515 PMCID: PMC4804521 DOI: 10.1186/s12864-016-2516-6] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Accepted: 02/23/2016] [Indexed: 11/10/2022] Open
Abstract
Background The CCTC-binding factor (CTCF) protein is involved in genome organization, including mediating three-dimensional chromatin interactions. Human patient lymphocytes with mutations in a single copy of the CTCF gene have reduced expression of enhancer-associated genes involved in response to stimuli. We hypothesize that CTCF interactions stabilize enhancer-promoter chromatin interaction domains, facilitating increased expression of genes in response to stimuli. Here we systematically investigate this model using computational analyses. Results We use CTCF ChIA-PET data from the ENCODE project to show that CTCF-associated chromatin loops have a tendency to enclose regions of enhancer-regulated stimulus responsive genes, insulating them from neighboring regions of constitutively expressed housekeeping genes. To facilitate cell type-specific CTCF loop identification, we develop an algorithm to predict CTCF loops from ChIP-seq data alone by exploiting the CTCF motif directionality in loop anchors. We apply this algorithm to a hundred ENCODE cell line datasets, confirming the universality of our observations as well as identifying a general distinction between primary and immortal cells in loop-enclosed gene content. Finally, we combine the existing evidence to propose a model for the formation of CTCF loops in which partner sites are brought together by chromatin template reeling through stationary RNA polymerases, consistent with the transcription factory hypothesis. Conclusions We provide computational evidence that CTCF-mediated chromatin interactions enclose domains of stimulus responsive enhancer-regulated genes, insulating them from nearby housekeeping genes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2516-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Martin Oti
- Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands. .,Present address: Institute of Biophysics Carlos Chagas Filho (IBCCF), Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil.
| | - Jonas Falck
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud university medical center, Nijmegen, The Netherlands
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud university medical center, Nijmegen, The Netherlands
| | - Huiqing Zhou
- Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands. .,Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud university medical center, Nijmegen, The Netherlands.
| |
Collapse
|
7
|
Vinogradov AE. Consolidation of slow or fast but not moderately evolving genes at the level of pathways and processes. Gene 2015; 561:30-4. [PMID: 25707747 DOI: 10.1016/j.gene.2015.01.066] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Revised: 01/04/2015] [Accepted: 01/09/2015] [Indexed: 11/15/2022]
Abstract
Conservatism versus innovation is probably the most important dichotomy of all evolving systems. In molecular evolution the distinction between conservative (negative) selection, innovative (positive) selection and unconstrained evolution (drift) is usually ambiguous at the gene level. Only rare cases with the ratio of nonsynonymous to synonymous nucleotide substitutions above unity (dN/dS>1) are thought to be due to positive selection, whereas the lower dN/dS ratio may indicate negative selection in combination with drift. The density of the dN/dS ratio for orthologous genes forms a unimodal distribution where no particular regions can be discerned. Here it is shown that at the level of overrepresented pathways and processes the picture is strikingly different. The distribution is strongly polarized with a wide completely depressed middle part. This three-phase distribution is very robust. It is observed with various substitution models and remains at very low significance of overrepresentation (up to p<0.99). This fact suggests consolidation of either negative or positive selection but not of unconstrained evolution at the level of pathways/processes. The effect is demonstrated for different phylogenetic distances: from human to other primates, mammals and vertebrates. This approach suggests estimating the boundaries for conservative and innovative selection using the pathway/process level. Emphasizing the role of a critical mass of negatively or positively selected genes in a pathway/process, it can elucidate how the bridge between 'tinkering' at the gene level and 'design' at the higher levels is forming.
Collapse
|
8
|
Pingault L, Choulet F, Alberti A, Glover N, Wincker P, Feuillet C, Paux E. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome. Genome Biol 2015; 16:29. [PMID: 25853487 PMCID: PMC4355351 DOI: 10.1186/s13059-015-0601-9] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 01/28/2015] [Indexed: 12/19/2022] Open
Abstract
Background Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. Results By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Conclusions Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0601-9) contains supplementary material, which is available to authorized users.
Collapse
|
9
|
Park SG, Hannenhalli S, Choi SS. Conservation in first introns is positively associated with the number of exons within genes and the presence of regulatory epigenetic signals. BMC Genomics 2014; 15:526. [PMID: 24964727 PMCID: PMC4085337 DOI: 10.1186/1471-2164-15-526] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 06/18/2014] [Indexed: 01/04/2023] Open
Abstract
Background Genomes of higher eukaryotes have surprisingly long first introns and in some cases, the first introns have been shown to have higher conservation relative to other introns. However, the functional relevance of conserved regions in the first introns is poorly understood. Leveraging the recent ENCODE data, here we assess potential regulatory roles of conserved regions in the first intron of human genes. Results We first show that relative to other downstream introns, the first introns are enriched for blocks of highly conserved sequences. We also found that the first introns are enriched for several chromatin marks indicative of active regulatory regions and this enrichment of regulatory marks is correlated with enrichment of conserved blocks in the first intron; the enrichments of conservation and regulatory marks in first intron are not entirely explained by a general, albeit variable, bias for certain marks toward the 5’ end of introns. Interestingly, conservation as well as proportions of active regulatory chromatin marks in the first intron of a gene correlates positively with the numbers of exons in the gene but the correlation is significantly weakened in second introns and negligible beyond the second intron. The first intron conservation is also positively correlated with the gene’s expression level in several human tissues. Finally, a gene-wise analysis shows significant enrichments of active chromatin marks in conserved regions of first introns, relative to the conserved regions in other introns of the same gene. Conclusions Taken together, our analyses strongly suggest that first introns are enriched for active transcriptional regulatory signals under purifying selection. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-526) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Sridhar Hannenhalli
- Department of Cell Biology and Molecular Genetics, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, MD 20742, USA.
| | | |
Collapse
|
10
|
Minority of mammalian orthologs can be regarded as physiologically closest genes. Gene X 2012; 509:201-5. [DOI: 10.1016/j.gene.2012.08.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Revised: 07/31/2012] [Accepted: 08/19/2012] [Indexed: 11/18/2022] Open
|
11
|
Comparative analysis of the structural and expressional parameters of microRNA target genes. Gene 2012; 497:103-9. [PMID: 22305979 DOI: 10.1016/j.gene.2012.01.033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2012] [Accepted: 01/18/2012] [Indexed: 02/02/2023]
Abstract
MicroRNAs (miRNAs) generally pair with the 3'UTRs of their target mRNAs to repress gene expression. It has reported that miRNA targets (TGs) are longer and evolve more slowly than non-targets (NTGs). We confirmed the observation and also found novel structural and expressional characteristics of TGs. The length difference between TGs and NTGs was greatest for the 3'UTRs, although a difference was also observed for CDSs and introns. Widely expressed genes were shorter for both TGs and NTGs; however, TGs were significantly longer than NTGs in all ranges of expression. TGs were more likely than NTGs to be widely expressed, which might explain why TGs evolve more slowly than NTGs. Finally, we found that TG mRNAs have faster decay rates. In addition, the decay rate of a TG mRNA transcript was found to be positively correlated with the number or density of target sites located in that TG's mRNA transcript.
Collapse
|
12
|
Park J, Xu K, Park T, Yi SV. What are the determinants of gene expression levels and breadths in the human genome? Hum Mol Genet 2011; 21:46-56. [PMID: 21945885 PMCID: PMC3235009 DOI: 10.1093/hmg/ddr436] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
In complex organisms, different tissues express different genes, which ultimately shape the function and phenotype of each tissue. An important goal of modern biology is to understand how some genes are turned on and off in specific tissues and how the numbers of different gene expression products are determined. These aspects are named ‘expression breadth’ (or ‘tissue specificity’) and ‘expression level’, respectively. Here, we show that we can predict substantial amount of variation in levels and breadths of gene expression using genomic information of each gene. Interestingly, many genomic traits are correlated with both aspects of gene expression in similar directions, suggesting shared molecular pathways. However, to elucidate distinctive molecular mechanisms governing gene expression levels and breadths, we need to identify the relative significance of each genomic trait on these two aspects of gene expression. To this end, we developed a novel multivariate multiple regression method. Using this new method, we show that gene compactness (in particular, the mean size of exons), codon usage bias and non-synonymous rates have a stronger influence on expression levels compared with their effects on expression breadths. In contrast, the propensity of promoter DNA methylation is a stronger indicator of expression breadths than of expression levels. Interestingly, intron DNA methylation exhibits an opposite pattern to the promoter DNA methylation in the human genome, suggesting that DNA methylation may play multiple roles depending upon its genomic targets. Furthermore, synonymous rates have stronger associations with expression breadths than with expression levels in the human genome. These findings provide clues toward distinctive molecular mechanisms regulating different aspects of gene expression.
Collapse
Affiliation(s)
- Jungsun Park
- Bioinformatics and Biostatistics Laboratory, Department of Statistics, Seoul National University, Seoul 151-742, Korea
| | | | | | | |
Collapse
|
13
|
Abstract
CpG islands mark CpG-enriched regions in otherwise CpG-depleted vertebrate genomes. While the regulatory importance of CpG islands is widely accepted, it is little appreciated that CpG islands vary greatly in lengths. For example, CpG islands in the human genome vary ∼30-fold in their lengths. Here we report findings suggesting that the lengths of CpG islands have functional consequences. Specifically, we show that promoters associated with long CpG islands (long-CGI promoters) are distinct from other promoters. First, long-CGI promoters are uniquely associated with genes with an intermediate level of gene expression breadths. Notably, intermediate expression breadths require the most complex mode of gene regulation, from the standpoint of information content. Second, long-CGI promoters encode more RNA polymerase II (Polr2a) binding sites than other promoters. Third, the actual binding patterns of Polr2a occur in a more tissue-specific manner in long-CGI promoters compared to other CGI promoters. Moreover, long-CGI promoters contain the largest numbers of experimentally characterized transcription start sites compared to other promoters, and the types of transcription start sites in them are biased toward tissue-specific patterns of gene expression. Finally, long-CGI promoters are preferentially associated with genes involved in development and regulation. Together, these findings indicate that functionally relevant variations of CpG islands exist. By investigating consequences of certain CpG island traits, we can gain additional insights into the mechanism and evolution of regulatory complexity of gene expression.
Collapse
|
14
|
Woody JL, Severin AJ, Bolon YT, Joseph B, Diers BW, Farmer AD, Weeks N, Muehlbauer GJ, Nelson RT, Grant D, Specht JE, Graham MA, Cannon SB, May GD, Vance CP, Shoemaker RC. Gene expression patterns are correlated with genomic and genic structure in soybean. Genome 2011; 54:10-8. [PMID: 21217801 DOI: 10.1139/g10-090] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Studies have indicated that exon and intron size and intergenic distance are correlated with gene expression levels and expression breadth. Previous reports on these correlations in plants and animals have been conflicting. In this study, next-generation sequence data, which has been shown to be more sensitive than previous expression profiling technologies, were generated and analyzed from 14 tissues. Our results revealed a novel dichotomy. At the low expression level, an increase in expression breadth correlated with an increase in transcript size because of an increase in the number of exons and introns. No significant changes in intron or exon sizes were noted. Conversely, genes expressed at the intermediate to high expression levels displayed a decrease in transcript size as their expression breadth increased. This was due to smaller exons, with no significant change in the number of exons. Taking advantage of the known gene space of soybean, we evaluated the positioning of genes and found significant clustering of similarly expressed genes. Identifying the correlations between the physical parameters of individual genes could lead to uncovering the role of regulation owing to nucleotide composition, which might have potential impacts in discerning the role of the noncoding regions.
Collapse
Affiliation(s)
- Jenna L Woody
- Department of Agronomy, Iowa State University, Ames, 50011, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Fuertes MA, Pérez JM, Zuckerkandl E, Alonso C. Introns form compositional clusters in parallel with the compositional clusters of the coding sequences to which they pertain. J Mol Evol 2010; 72:1-13. [PMID: 21132282 DOI: 10.1007/s00239-010-9411-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 11/10/2010] [Indexed: 11/29/2022]
Abstract
This report deals with the study of compositional properties of human gene sequences evaluating similarities and differences among functionally distinct sectors of the gene independently of the reading frame. To retrieve the compositional information of DNA, we present a neighbor base dependent coding system in which the alphabet of 64 letters (DNA triplets) is compressed to an alphabet of 14 letters here termed triplet composons. The triplets containing the same set of distinct bases in whatever order and number form a triplet composon. The reading of the DNA sequence is performed starting at any letter of the initial triplet and then moving, triplet-to-triplet, until the end of the sequence. The readings were made in an overlapping way along the length of the sequences. The analysis of the compositional content in terms of the composon usage frequencies of the gene sequences shows that: (i) the compositional content of the sequences is far from that of random sequences, even in the case of non-protein coding sequences; (ii) coding sequences can be classified as components of compositional clusters; and (iii) intron sequences in a cluster have the same composon usage frequencies, even as their base composition differs notably from that of their home coding sequences. A comparison of the composon usage frequencies between human and mouse homologous genes indicated that two clusters found in humans do not have their counterpart in mouse whereas the others clusters are stable in both species with respect to their composon usage frequencies in both coding and noncoding sequences.
Collapse
Affiliation(s)
- Miguel A Fuertes
- Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain.
| | | | | | | |
Collapse
|
16
|
Zeng J, Yi SV. DNA methylation and genome evolution in honeybee: gene length, expression, functional enrichment covary with the evolutionary signature of DNA methylation. Genome Biol Evol 2010; 2:770-80. [PMID: 20924039 PMCID: PMC2975444 DOI: 10.1093/gbe/evq060] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
A growing body of evidence suggests that DNA methylation is functionally divergent among different taxa. The recently discovered functional methylation system in the honeybee Apis mellifera presents an attractive invertebrate model system to study evolution and function of DNA methylation. In the honeybee, DNA methylation is mostly targeted toward transcription units (gene bodies) of a subset of genes. Here, we report an intriguing covariation of length and epigenetic status of honeybee genes. Hypermethylated and hypomethylated genes in honeybee are dramatically different in their lengths for both exons and introns. By analyzing orthologs in Drosophila melanogaster, Acyrthosiphonpisum, and Ciona intestinalis, we show genes that were short and long in the past are now preferentially situated in hyper- and hypomethylated classes respectively, in the honeybee. Moreover, we demonstrate that a subset of high-CpG genes are conspicuously longer than expected under the evolutionary relationship alone and that they are enriched in specific functional categories. We suggest that gene length evolution in the honeybee is partially driven by evolutionary forces related to regulation of gene expression, which in turn is associated with DNA methylation. However, lineage-specific patterns of gene length evolution suggest that there may exist additional forces underlying the observed interaction between DNA methylation and gene lengths in the honeybee.
Collapse
Affiliation(s)
- Jia Zeng
- School of Biology, Georgia Institute of Technology, USA
| | | |
Collapse
|
17
|
Park SG, Choi SS. Expression breadth and expression abundance behave differently in correlations with evolutionary rates. BMC Evol Biol 2010; 10:241. [PMID: 20691101 PMCID: PMC2924872 DOI: 10.1186/1471-2148-10-241] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2010] [Accepted: 08/07/2010] [Indexed: 01/12/2023] Open
Abstract
Background One of the main objectives of the molecular evolution and evolutionary systems biology field is to reveal the underlying principles that dictate protein evolutionary rates. Several studies argue that expression abundance is the most critical component in determining the rate of evolution, especially in unicellular organisms. However, the expression breadth also needs to be considered for multicellular organisms. Results In the present paper, we analyzed the relationship between the two expression variables and rates using two different genome-scale expression datasets, microarrays and ESTs. A significant positive correlation between the expression abundance (EA) and expression breadth (EB) was revealed by Kendall's rank correlation tests. A novel random shuffling approach was applied for EA and EB to compare the correlation coefficients obtained from real data sets to those estimated based on random chance. A novel method called a Fixed Group Analysis (FGA) was designed and applied to investigate the correlations between expression variables and rates when one of the two expression variables was evenly fixed. Conclusions In conclusion, all of these analyses and tests consistently showed that the breadth rather than the abundance of gene expression is tightly linked with the evolutionary rate in multicellular organisms.
Collapse
Affiliation(s)
- Seung Gu Park
- Department of Medical Biotechnology, College of Biomedical Science, and Institute of Bioscience & Biotechnology, Kangwon National University, Chunchon 200-701, Korea
| | | |
Collapse
|
18
|
Shen-Orr SS, Pilpel Y, Hunter CP. Composition and regulation of maternal and zygotic transcriptomes reflects species-specific reproductive mode. Genome Biol 2010; 11:R58. [PMID: 20515465 PMCID: PMC2911106 DOI: 10.1186/gb-2010-11-6-r58] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2009] [Revised: 04/23/2010] [Accepted: 06/01/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Early embryos contain mRNA transcripts expressed from two distinct origins; those expressed from the mother's genome and deposited in the oocyte (maternal) and those expressed from the embryo's genome after fertilization (zygotic). The transition from maternal to zygotic control occurs at different times in different animals according to the extent and form of maternal contributions, which likely reflect evolutionary and ecological forces. Maternally deposited transcripts rely on post-transcriptional regulatory mechanisms for precise spatial and temporal expression in the embryo, whereas zygotic transcripts can use both transcriptional and post-transcriptional regulatory mechanisms. The differences in maternal contributions between animals may be associated with gene regulatory changes detectable by the size and complexity of the associated regulatory regions. RESULTS We have used genomic data to identify and compare maternal and/or zygotic expressed genes from six different animals and find evidence for selection acting to shape gene regulatory architecture in thousands of genes. We find that mammalian maternal genes are enriched for complex regulatory regions, suggesting an increase in expression specificity, while egg-laying animals are enriched for maternal genes that lack transcriptional specificity. CONCLUSIONS We propose that this lack of specificity for maternal expression in egg-laying animals indicates that a large fraction of maternal genes are expressed non-functionally, providing only supplemental nutritional content to the developing embryo. These results provide clear predictive criteria for analysis of additional genomes.
Collapse
Affiliation(s)
- Shai S Shen-Orr
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Ave, Cambridge, MA 02138, USA
| | | | | |
Collapse
|
19
|
Rao YS, Wang ZF, Chai XW, Wu GZ, Zhou M, Nie QH, Zhang XQ. Selection for the compactness of highly expressed genes in Gallus gallus. Biol Direct 2010; 5:35. [PMID: 20465857 PMCID: PMC2883972 DOI: 10.1186/1745-6150-5-35] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2009] [Accepted: 05/14/2010] [Indexed: 11/10/2022] Open
Abstract
Background Coding sequence (CDS) length, gene size, and intron length vary within a genome and among genomes. Previous studies in diverse organisms, including human, D. Melanogaster, C. elegans, S. cerevisiae, and Arabidopsis thaliana, indicated that there are negative relationships between expression level and gene size, CDS length as well as intron length. Different models such as selection for economy model, genomic design model, and mutational bias hypotheses have been proposed to explain such observation. The debate of which model is a superior one to explain the observation has not been settled down. The chicken (Gallus gallus) is an important model organism that bridges the evolutionary gap between mammals and other vertebrates. As D. Melanogaster, chicken has a larger effective population size, selection for chicken genome is expected to be more effective in increasing protein synthesis efficiency. Therefore, in this study the chicken was used as a model organism to elucidate the interaction between gene features and expression pattern upon selection pressure. Results Based on different technologies, we gathered expression data for nuclear protein coding, single-splicing genes from Gallus gallus genome and compared them with gene parameters. We found that gene size, CDS length, first intron length, average intron length, and total intron length are negatively correlated with expression level and expression breadth significantly. The tissue specificity is positively correlated with the first intron length but negatively correlated with the average intron length, and not correlated with the CDS length and protein domain numbers. Comparison analyses showed that ubiquitously expressed genes and narrowly expressed genes with the similar expression levels do not differ in compactness. Our data provided evidence that the genomic design model can not, at least in part, explain our observations. We grouped all somatic-tissue-specific genes (n = 1105), and compared the first intron length and the average intron length between highly expressed genes (top 5% expressed genes) and weakly expressed genes (bottom 5% expressed genes). We found that the first intron length and the average intron length in highly expressed genes are not different from that in weakly expressed genes. We also made a comparison between ubiquitously expressed genes and narrowly expressed somatic genes with similar expression levels. Our data demonstrated that ubiquitously expressed genes are less compact than narrowly expressed genes with the similar expression levels. Obviously, these observations can not be explained by mutational bias hypotheses either. We also found that the significant trend between genes' compactness and expression level could not be affected by local mutational biases. We argued that the selection of economy model is most likely one to explain the relationship between gene expression and gene characteristics in chicken genome. Conclusion Natural selection appears to favor the compactness of highly expressed genes in chicken genome. This observation can be explained by the selection of economy model. Reviewers This article was reviewed by Dr. Gavin Huttley, Dr. Liran Carmel (nominated by Dr. Eugene V. Koonin) and Dr. Araxi Urrutia (nominated by Dr. Laurence D. Hurst).
Collapse
Affiliation(s)
- You S Rao
- Department of Biological Technology, Jiangxi Educational Institute, Nanchang, Jiangxi, China
| | | | | | | | | | | | | |
Collapse
|
20
|
Adaptive Evolution Hotspots at the GC-Extremes of the Human Genome: Evidence for Two Functionally Distinct Pathways of Positive Selection. Adv Bioinformatics 2010:856825. [PMID: 20454629 PMCID: PMC2862947 DOI: 10.1155/2010/856825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2009] [Revised: 12/31/2009] [Accepted: 02/10/2010] [Indexed: 11/21/2022] Open
Abstract
We recently reported that the human genome is ‘‘splitting” into two gene subgroups characterised by polarised GC content (Tang et al, 2007), and that such evolutionary change may be accelerated by programmed genetic instability (Zhao et al, 2008). Here we extend this work by mapping the presence of two separate high-evolutionary-rate (Ka/Ks) hotspots in the human genome—one characterized by low GC content, high intron length, and low gene expression, and the other by high GC content, high exon number, and high gene expression. This finding suggests that at least two different mechanisms mediate adaptive genetic evolution in higher organisms: (1) intron lengthening and reduced repair in hypermethylated lowly-transcribed genes, and (2) duplication and/or insertion events affecting highly-transcribed genes, creating low-essentiality satellite daughter genes in nearby regions of active chromatin. Since the latter mechanism is expected to be far more efficient than the former in generating variant genes that increase fitnesss, these results also provide a potential explanation for the controversial value of sequence analysis in defining positively selected genes.
Collapse
|
21
|
Vinogradov AE. Human transcriptome nexuses: basic-eukaryotic and metazoan. Genomics 2010; 95:345-54. [PMID: 20298777 DOI: 10.1016/j.ygeno.2010.03.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Revised: 03/01/2010] [Accepted: 03/08/2010] [Indexed: 01/10/2023]
Abstract
Using a new approach, I analysed human transcriptome coexpression network and revealed two large-scale nexuses. Besides gene coexpression, each nexus is characterized by a combination of gene evolutionary origin, function and among-tissues expression breadth. The first nexus contains mostly genes of pre-metazoan origin, which are widely expressed and have cell-centred functions. The second nexus is enriched in genes of metazoan origin, which are expressed more narrowly and have organism-centred functions. The revealed nexuses are supported by asymmetry in distribution of transcription factor targets between them. Within the metazoan nexus, there is a subnexus that is more pronounced in the nervous tissues and is enriched in gene regulatory complexity. It mostly contains genes related to nervous system, cell communication and multicellular organism processes and development. The revealed nexuses indicate a dichotomy in the transcriptional regulation and can provide a framework for further functional genomics studies.
Collapse
|
22
|
Cenik C, Derti A, Mellor JC, Berriz GF, Roth FP. Genome-wide functional analysis of human 5' untranslated region introns. Genome Biol 2010; 11:R29. [PMID: 20222956 PMCID: PMC2864569 DOI: 10.1186/gb-2010-11-3-r29] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2010] [Accepted: 03/11/2010] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Approximately 35% of human genes contain introns within the 5' untranslated region (UTR). Introns in 5'UTRs differ from those in coding regions and 3'UTRs with respect to nucleotide composition, length distribution and density. Despite their presumed impact on gene regulation, the evolution and possible functions of 5'UTR introns remain largely unexplored. RESULTS We performed a genome-scale computational analysis of 5'UTR introns in humans. We discovered that the most highly expressed genes tended to have short 5'UTR introns rather than having long 5'UTR introns or lacking 5'UTR introns entirely. Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories. In particular, genes with regulatory roles were surprisingly enriched in having 5'UTR introns. Finally, we analyzed the evolution of 5'UTR introns in non-receptor protein tyrosine kinases (NRTK), and identified a conserved DNA motif enriched within the 5'UTR introns of human NRTKs. CONCLUSIONS Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner. While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.
Collapse
Affiliation(s)
- Can Cenik
- Harvard Medical School, Department of Biological Chemistry and Molecular Pharmacology, 250 Longwood Avenue, SGMB-322, Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
23
|
Yang H. In plants, expression breadth and expression level distinctly and non-linearly correlate with gene structure. Biol Direct 2009; 4:45; discussion 45. [PMID: 19930585 PMCID: PMC2794262 DOI: 10.1186/1745-6150-4-45] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 11/21/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Compactness of highly/broadly expressed genes in human has been explained as selection for efficiency, regional mutation biases or genomic design. However, highly expressed genes in flowering plants were shown to be less compact than lowly expressed ones. On the other hand, opposite facts have also been documented that pollen-expressed Arabidopsis genes tend to contain shorter introns and highly expressed moss genes are compact. This issue is important because it provides a chance to compare the selectionism and the neutralism views about genome evolution. Furthermore, this issue also helps to understand the fates of introns, from the angle of gene expression. RESULTS In this study, I used expression data covering more tissues and employ new analytical methods to reexamine the correlations between gene expression and gene structure for two flowering plants, Arabidopsis thaliana and Oryza sativa. It is shown that, different aspects of expression pattern correlate with different parts of gene sequences in distinct ways. In detail, expression level is significantly negatively correlated with gene size, especially the size of non-coding regions, whereas expression breadth correlates with non-coding structural parameters positively and with coding region parameters negatively. Furthermore, the relationships between expression level and structural parameters seem to be non-linear, with the extremes of structural parameters possibly scale as power-laws or logrithmic functions of expression levels. CONCLUSION In plants, highly expressed genes are compact, especially in the non-coding regions. Broadly expressed genes tend to contain longer non-coding sequences, which may be necessary for complex regulations. In combination with previous studies about other plants and about animals, some common scenarios about the correlation between gene expression and gene structure begin to emerge. Based on the functional relationships between extreme values of structural characteristics and expression level, an effort was made to evaluate the relative effectiveness of the energy-cost hypothesis and the time-cost hypothesis.
Collapse
Affiliation(s)
- Hangxing Yang
- T-Life Research Center, Department of Physics, Fudan University, Shanghai, PR China.
| |
Collapse
|
24
|
Farré D, Albà MM. Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. Mol Biol Evol 2009; 27:325-35. [PMID: 19822635 DOI: 10.1093/molbev/msp242] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Gene duplication is a major mechanism for molecular evolutionary innovation. Young gene duplicates typically exhibit elevated rates of protein evolution and, according to a number of recent studies, increased expression divergence. However, the nature of these changes is still poorly understood. To gain novel insights into the functional consequences of gene duplication, we have undertaken an in-depth analysis of a large data set of gene families containing primate- and/or rodent-specific gene duplicates. We have found a clear tendency toward an increase in protein, promoter, and expression divergence with increasing number of duplication events undergone by each gene since the human-mouse split. In addition, gene duplication is significantly associated with a reduction in expression breadth and intensity. Interestingly, it is possible to identify three main groups regarding the evolution of gene expression following gene duplication. The first group, which comprises around 25% of the families, shows patterns compatible with tissue-expression partitioning. The second and largest group, comprising 33-53% of the families, shows broad expression of one of the gene copies and reduced, overlapping, expression of the other copy or copies. This can be attributed, in most cases, to loss of expression in several tissues of one or more gene copies. Finally, a substantial number of families, 19-35%, maintain a very high level of tissue-expression overlap (>0.8) after tens of millions of years of evolution. These families may have been subject to selection for increased gene dosage.
Collapse
|
25
|
Chen JJ, Wang Y. [Recent progress in plant genome size evolution]. YI CHUAN = HEREDITAS 2009; 31:464-470. [PMID: 19586839 DOI: 10.3724/sp.j.1005.2009.00464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
It has been known that eukaryotic genomes span a wide range of sizes regardless of organism complexity. The observed differences in genome size are primarily due to polyploidy level and abundance of non-coding DNA, especially the contribution of transposable elements (TEs). Here we reviewed the current progress in genome size variation of plant species and the underlying evolutionary forces that contribute to genome expansion or contraction. Polyploidization and the accumulation of transposable element are the primary contributors to genome expansion. As to the mechanisms of DNA loss, unequal homologous recombination and illegitimate recombination are thought to be the counterbalances to the unlimited expansion of a genome. The evolutionary direction of plant genome size is also discussed, which tends to favor larger genomes with deletion mechanisms acting to only attenuate genome expansion but not reverse.
Collapse
Affiliation(s)
- Jian-Jun Chen
- Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China.
| | | |
Collapse
|
26
|
Vinogradov AE, Anatskaya OV. Loss of protein interactions and regulatory divergence in yeast whole-genome duplicates. Genomics 2009; 93:534-42. [PMID: 19272438 DOI: 10.1016/j.ygeno.2009.02.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2008] [Revised: 02/26/2009] [Accepted: 02/27/2009] [Indexed: 11/19/2022]
Abstract
Whole-genome duplications are important for the growth of genome complexity. We investigated various factors involved in the evolution of yeast whole-genome duplicates (ohnologs) making emphasis on the analysis of protein interactions. We found that ohnologs have a lower number of protein interactions compared with small-scale duplicates and singletons (by about -40%). The loss of interactions was proportional to their initial number and independent of ohnolog position in the protein interaction network. A faster evolving member of an ohnolog pair has a lower number of interactions compared to its counterpart. The Gene Ontology mapping of non-overlapping and overlapping interactants of paired ohnologs reveals a sharp asymmetry in GO terms related to regulation. The fraction of these terms is much higher in non-overlapping interactants (compared to overlapping interactants and total dataset). Network clustering coefficient is lower in ohnologs, yet they show an increased density of protein interactions restricted within the whole ohnologs set. These facts suggest that subfunctionalization (or subneofunctionalization) reflected in the loss of protein interactions was a prevailing process in the divergence of ohnologs, which distinguishes them from small-scale duplicates. The loss of protein interactions was associated with the regulatory divergence between the members of an ohnolog pair. A small-scale modularity (reflected in clustering coefficient) probably was not important for ohnologs retention, yet a larger-scale modularity could be involved in their evolution.
Collapse
Affiliation(s)
- Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Ave. 4, St. Petersburg 194064, Russia.
| | | |
Collapse
|
27
|
Colinas J, Schmidler SC, Bohrer G, Iordanov B, Benfey PN. Intergenic and genic sequence lengths have opposite relationships with respect to gene expression. PLoS One 2008; 3:e3670. [PMID: 18989364 PMCID: PMC2576458 DOI: 10.1371/journal.pone.0003670] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2008] [Accepted: 10/11/2008] [Indexed: 12/20/2022] Open
Abstract
Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression.
Collapse
Affiliation(s)
- Juliette Colinas
- Department of Biology and IGSP Center for Systems Biology, Duke University, Durham, North Carolina, United States of America
| | - Scott C. Schmidler
- Department of Statistical Sciences, Duke University, Durham, North Carolina, United States of America
| | - Gil Bohrer
- Department of Civil & Environmental Engineering & Geodetic Science, Ohio State University, Columbus, Ohio, United States of America
| | | | - Philip N. Benfey
- Department of Biology and IGSP Center for Systems Biology, Duke University, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
28
|
Vinogradov AE. Modularity of cellular networks shows general center-periphery polarization. ACTA ACUST UNITED AC 2008; 24:2814-7. [PMID: 18953046 DOI: 10.1093/bioinformatics/btn555] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The modular biology is supposed to be a bridge from the molecular to the systems biology. Using a new approach, it is shown here that the protein interaction networks of yeast Saccharomyces cerevisiae and bacteria Escherichia coli consist of two large-scale modularity layers, central and peripheral, separated by a zone of depressed modularity. This finding based on the analysis of network topology is further supported by the discovery that there are many more Gene Ontology categories (terms) and KEGG biochemical pathways that are overrepresented in the central and peripheral layers than in the intermediate zone. The categories of the central layer are mostly related to nuclear information processing, regulation and cell cycle, whereas the peripheral layer is dealing with various metabolic and energetic processes, transport and cell communication. A similar center-periphery polarization of modularity is found in the protein domain networks ('built-in interactome') and in a powergrid (as a non-biological example). These data suggest a 'polarized modularity' model of cellular networks where the central layer seems to be regulatory and to use information storage of the nucleus, whereas the peripheral layer seems devoted to more specialized tasks and environmental interactions, with a complex 'bus' between the layers.
Collapse
|
29
|
On the nature of human housekeeping genes. Trends Genet 2008; 24:481-4. [DOI: 10.1016/j.tig.2008.08.004] [Citation(s) in RCA: 204] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2008] [Revised: 07/31/2008] [Accepted: 08/02/2008] [Indexed: 01/27/2023]
|
30
|
Vinogradov AE, Anatskaya OV. Organismal complexity, cell differentiation and gene expression: human over mouse. Nucleic Acids Res 2007; 35:6350-6. [PMID: 17881362 PMCID: PMC2095826 DOI: 10.1093/nar/gkm723] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2007] [Revised: 08/12/2007] [Accepted: 09/01/2007] [Indexed: 01/25/2023] Open
Abstract
We present a molecular and cellular phenomenon underlying the intriguing increase in phenotypic organizational complexity. For the same set of human-mouse orthologous genes (11 534 gene pairs) and homologous tissues (32 tissue pairs), human shows a greater fraction of tissue-specific genes and a greater ratio of the total expression of tissue-specific genes to housekeeping genes in each studied tissue, which suggests a generally higher level of evolutionary cell differentiation (specialization). This phenomenon is spectacularly more pronounced in those human tissues that are more directly involved in the increase of complexity, longevity and body size (i.e. it is reflected on the organismal level as well). Genes with a change in expression breadth show a greater human-mouse divergence of promoter regions and encoded proteins (i.e. the functional genomics data are supported by the structural analysis). Human also shows the higher expression of translation machinery. The upstream untranslated regions (5'UTRs) of human mRNAs are longer than mouse 5'UTRs (even after correction for the difference in genome sizes) and contain more uAUG codons, which suggest a more complex regulation at the translational level in human cells (and agrees well with the augmented cell specialization).
Collapse
Affiliation(s)
- Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, St. Petersburg 194064, Russia.
| | | |
Collapse
|
31
|
Li SW, Feng L, Niu DK. Selection for the miniaturization of highly expressed genes. Biochem Biophys Res Commun 2007; 360:586-92. [PMID: 17610841 DOI: 10.1016/j.bbrc.2007.06.085] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2007] [Accepted: 06/18/2007] [Indexed: 11/29/2022]
Abstract
Most widely expressed genes are also highly expressed. Based on high or wide expression, different models were proposed to explain the small sizes of highly/widely expressed genes. We found that housekeeping genes are not more compact than narrowly expressed genes with similar expression levels, but compactness and expression level are correlated in housekeeping genes (except that highly expressed Arabidopsis HK genes have longer intron length). Meanwhile, we found evidence that genes with high functional/regulatory complexity do not have longer introns and longer proteins. The genome design hypothesis is thus not supported. Furthermore, we found that housekeeping genes are not more compact than the narrowly expressed somatic genes with similar average expression levels. Because housekeeping genes are expected to have much higher germline expression levels than narrowly expressed somatic genes, transcription-associated deletion bias is not supported. Selection of the compactness of highly expressed genes for economy is supported.
Collapse
Affiliation(s)
- Shu-Wei Li
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | | | | |
Collapse
|
32
|
Abstract
BACKGROUND Promoter-associated CpG islands (PCIs) mediate methylation-dependent gene silencing, yet tend to co-locate to transcriptionally active genes. To address this paradox, we used data mining to assess the behavior of PCI-positive (PCI+) genes in the human genome. RESULTS PCI+ genes exhibit a bimodal distribution: (1) a 'housekeeping-like' subset characterized by higher GC content and lower intron length/number, and (2) a 'pseudogene paralog' subset characterized by lower GC content and higher intron length/number (p<0.001). These subsets are functionally distinguishable, with the former gene group characterized by higher expression levels and lower evolutionary rate (p<0.001). PCI-negative (PCI-) genes exhibit higher evolutionary rate and narrower expression breadth than PCI+ genes (p<0.001), consistent with more frequent tissue-specific inactivation. CONCLUSIONS Adaptive evolution of the human genome appears driven in part by declining transcription of a subset of PCI+ genes, predisposing to both CpG-->TpA mutation and intron insertion. We propose a model of evolving biological complexity in which environmentally-selected gains or losses of PCI methylation respectively favor positive or negative selection, thus polarizing PCI+ gene structures around a genomic core of ancestral PCI- genes.
Collapse
Affiliation(s)
- Clara S.M. Tang
- Laboratory of Computational Oncology, Department of Medicine, The University of Hong Kong, Pokfulam, Hong Kong, Hong Kong
| | - Richard J. Epstein
- Laboratory of Computational Oncology, Department of Medicine, The University of Hong Kong, Pokfulam, Hong Kong, Hong Kong
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|