51
|
Woody JL, Shoemaker RC. Gene expression: sizing it all up. Front Genet 2011; 2:70. [PMID: 22303365 PMCID: PMC3268623 DOI: 10.3389/fgene.2011.00070] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Accepted: 09/29/2011] [Indexed: 11/13/2022] Open
Abstract
Genomic architecture appears to be a largely unexplored component of gene expression. That architecture can be related to chromatin domains, transposable element neighborhoods, epigenetic modifications of the genome, and more. Although surely not the end of the story, we are learning that when it comes to gene expression, size is also important. We have been surprised to find that certain patterns of expression, tissue specific versus constitutive, or high expression versus low expression, are often associated with physical attributes of the gene and genome. Multiple studies have shown an inverse relationship between gene expression patterns and various physical parameters of the genome such as intron size, exon size, intron number, and size of intergenic regions. An increase in expression level and breadth often correlates with a decrease in the size of physical attributes of the gene. Three models have been proposed to explain these relationships. Contradictory results were found in several organisms when expression level and expression breadth were analyzed independently. However, when both factors were combined in a single study a novel relationship was revealed. At low levels of expression, an increase in expression breadth correlated with an increase in genic, intergenic, and intragenic sizes. Contrastingly, at high levels of expression, an increase in expression breadth inversely correlated with the size of the gene. In this article we explore the several hypotheses regarding genome physical parameters and gene expression.
Collapse
|
52
|
Chang CW, Cheng WC, Chen CR, Shu WY, Tsai ML, Huang CL, Hsu IC. Identification of human housekeeping genes and tissue-selective genes by microarray meta-analysis. PLoS One 2011; 6:e22859. [PMID: 21818400 PMCID: PMC3144958 DOI: 10.1371/journal.pone.0022859] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Accepted: 06/29/2011] [Indexed: 01/26/2023] Open
Abstract
Background Categorizing protein-encoding transcriptomes of normal tissues into housekeeping genes and tissue-selective genes is a fundamental step toward studies of genetic functions and genetic associations to tissue-specific diseases. Previous studies have been mainly based on a few data sets with limited samples in each tissue, which restrained the representativeness of their identified genes, and resulted in low consensus among them. Results This study compiled 1,431 samples in 43 normal human tissues from 104 microarray data sets. We developed a new method to improve gene expression assessment, and showed that more than ten samples are needed to robustly identify the protein-encoding transcriptome of a tissue. We identified 2,064 housekeeping genes and 2,293 tissue-selective genes, and analyzed gene lists by functional enrichment analysis. The housekeeping genes are mainly involved in fundamental cellular functions, and the tissue-selective genes are strikingly related to functions and diseases corresponding to tissue-origin. We also compared agreements and related functions among our housekeeping genes and those of previous studies, and pointed out some reasons for the low consensuses. Conclusions The results indicate that sufficient samples have improved the identification of protein-encoding transcriptome of a tissue. Comprehensive meta-analysis has proved the high quality of our identified HK and TS genes. These results could offer a useful resource for future research on functional and genomic features of HK and TS genes.
Collapse
Affiliation(s)
- Cheng-Wei Chang
- Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu, Taiwan
| | - Wei-Chung Cheng
- Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu, Taiwan
| | - Chaang-Ray Chen
- Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu, Taiwan
| | - Wun-Yi Shu
- Institute of Statistics, National Tsing Hua University, Hsinchu, Taiwan
| | - Min-Lung Tsai
- Institute of Athletics, National Taiwan Sport University, Taichung, Taiwan
| | - Ching-Lung Huang
- Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu, Taiwan
| | - Ian C. Hsu
- Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu, Taiwan
- * E-mail:
| |
Collapse
|
53
|
Kommadath A, Nie H, Groenen MAM, te Pas MFW, Veerkamp RF, Smits MA. Regional regulation of transcription in the bovine genome. PLoS One 2011; 6:e20413. [PMID: 21673989 PMCID: PMC3108615 DOI: 10.1371/journal.pone.0020413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Accepted: 05/02/2011] [Indexed: 11/18/2022] Open
Abstract
Eukaryotic genes are distributed along chromosomes as clusters of highly expressed genes termed RIDGEs (Regions of IncreaseD Gene Expression) and lowly expressed genes termed anti-RIDGEs, interspersed among genes expressed at intermediate levels or not expressed. Previous studies based on this observation suggested a dual mechanism of gene regulation, where, in addition to transcription factors, the chromosomal domain influences the expression level of their embedded genes. The objectives here were to provide evidence for the existence of chromosomal regional regulation of transcription in the bovine genome, to analyse the genomic features of genes located within RIDGEs versus anti-RIDGEs and tissue-specific genes versus housekeeping and to examine the genomic distribution of genes subject to positive selection in bovines. Gene expression analysis of four brain tissues and the anterior pituitary of 28 cows identified 70 RIDGEs and 41 anti-RIDGEs (harbouring 3735 and 1793 bovine genes respectively) across the bovine genome which are significantly higher than expected by chance. Housekeeping genes (defined here as genes expressed in all five tissues) were over-represented within RIDGEs but tissue-specific genes (genes expressed in only one of the five tissues) were not. Housekeeping genes and genes within RIDGEs had, in general, higher expression levels and GC content but shorter gene lengths and intron lengths than tissue-specific genes and genes within anti-RIDGES. Our findings suggest the existence of chromosomal regional regulation of transcription in the bovine genome. The genomic features observed for genes within RIDGEs and housekeeping genes in bovines agree with previous studies in several other species further strengthening the hypothesis of selective pressure to keep the highly and widely expressed genes short and compact for transcriptional efficiency. Further, positively selected genes were found non-randomly distributed on the genome with a preference for RIDGEs and regions of intermediate gene expression compared to anti-RIDGEs.
Collapse
Affiliation(s)
- Arun Kommadath
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Lelystad, The Netherlands.
| | | | | | | | | | | |
Collapse
|
54
|
Dong B, Zhang P, Chen X, Liu L, Wang Y, He S, Chen R. Predicting housekeeping genes based on Fourier analysis. PLoS One 2011; 6:e21012. [PMID: 21687628 PMCID: PMC3110801 DOI: 10.1371/journal.pone.0021012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2010] [Accepted: 05/18/2011] [Indexed: 11/19/2022] Open
Abstract
Housekeeping genes (HKGs) generally have fundamental functions in basic biochemical processes in organisms, and usually have relatively steady expression levels across various tissues. They play an important role in the normalization of microarray technology. Using Fourier analysis we transformed gene expression time-series from a Hela cell cycle gene expression dataset into Fourier spectra, and designed an effective computational method for discriminating between HKGs and non-HKGs using the support vector machine (SVM) supervised learning algorithm which can extract significant features of the spectra, providing a basis for identifying specific gene expression patterns. Using our method we identified 510 human HKGs, and then validated them by comparison with two independent sets of tissue expression profiles. Results showed that our predicted HKG set is more reliable than three previously identified sets of HKGs.
Collapse
Affiliation(s)
- Bo Dong
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, People's Republic of China
- Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Peng Zhang
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, People's Republic of China
- Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Xiaowei Chen
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, People's Republic of China
- Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Li Liu
- Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China
- Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Yunfei Wang
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, People's Republic of China
- Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Shunmin He
- Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Runsheng Chen
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, People's Republic of China
| |
Collapse
|
55
|
Jjingo D, Huda A, Gundapuneni M, Mariño-Ramírez L, Jordan IK. Effect of the transposable element environment of human genes on gene length and expression. Genome Biol Evol 2011; 3:259-71. [PMID: 21362639 PMCID: PMC3070429 DOI: 10.1093/gbe/evr015] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Independent lines of investigation have documented effects of both transposable elements (TEs) and gene length (GL) on gene expression. However, TE gene fractions are highly correlated with GL, suggesting that they cannot be considered independently. We evaluated the TE environment of human genes and GL jointly in an attempt to tease apart their relative effects. TE gene fractions and GL were compared with the overall level of gene expression and the breadth of expression across tissues. GL is strongly correlated with overall expression level but weakly correlated with the breadth of expression, confirming the selection hypothesis that attributes the compactness of highly expressed genes to selection for economy of transcription. However, TE gene fractions overall, and for the L1 family in particular, show stronger anticorrelations with expression level than GL, indicating that GL may not be the most important target of selection for transcriptional economy. These results suggest a specific mechanism, removal of TEs, by which highly expressed genes are selectively tuned for efficiency. MIR elements are the only family of TEs with gene fractions that show a positive correlation with tissue-specific expression, suggesting that they may provide regulatory sequences that help to control human gene expression. Consistent with this notion, MIR fractions are relatively enriched close to transcription start sites and associated with coexpression in specific sets of related tissues. Our results confirm the overall relevance of the TE environment to gene expression and point to distinct mechanisms by which different TE families may contribute to gene regulation.
Collapse
Affiliation(s)
- Daudi Jjingo
- School of Biology, Georgia Institute of Technology, GA, USA
| | | | | | | | | |
Collapse
|
56
|
Han F, Zhu B. Evolutionary analysis of three gibberellin oxidase genes in rice, Arabidopsis, and soybean. Gene 2011; 473:23-35. [PMID: 21056641 DOI: 10.1016/j.gene.2010.10.010] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2010] [Revised: 10/19/2010] [Accepted: 10/25/2010] [Indexed: 02/06/2023]
Abstract
GAs are plant hormones that play fundamental roles in plant growth and development. GA2ox, GA3ox, and GA20ox are three key enzymes in GA biosynthesis. These enzymes belong to the 2OG-Fe (II) oxygenase superfamily and are independently encoded by different gene families. To date, genome-wide comparative analyses of GA oxidases in plant species have not been thoroughly carried out. In the present work, 61 GA oxidase family genes from rice (Oryza sativa), Arabidopsis, and soybean (Glycine max) were identified and a full study of these genes including phylogenetic tree construction, gene structure, gene family expansion and analysis of functional motifs was performed. Based on phylogeny, most of the GA oxidases were divided into four subgroups that reflected functional classifications. Intron/intron average length of GA oxidase genes in rice analysis revealed that GA oxidase genes in rice experienced substantial evolutionary divergence. Segmental duplication events were mainly found in soybean genome. However, in rice and Arabidopsis, no single expansion pattern exhibited dominance, indicating that GA oxidase genes from these species might have been subjected to a more complex evolutionary mechanism. In addition, special functional motifs were discovered in GA20ox, GA3ox, and GA2ox, which suggested that different functional motifs are associated with differences in protein function. Taken together our results suggest that GA oxidase family genes have undergone divergent evolutionary routes, especially at the monocot-dicot split, with dynamic evolution occurring in Arabidopsis thaliana and soybean.
Collapse
Affiliation(s)
- Fengming Han
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | | |
Collapse
|
57
|
Woody JL, Severin AJ, Bolon YT, Joseph B, Diers BW, Farmer AD, Weeks N, Muehlbauer GJ, Nelson RT, Grant D, Specht JE, Graham MA, Cannon SB, May GD, Vance CP, Shoemaker RC. Gene expression patterns are correlated with genomic and genic structure in soybean. Genome 2011; 54:10-8. [PMID: 21217801 DOI: 10.1139/g10-090] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Studies have indicated that exon and intron size and intergenic distance are correlated with gene expression levels and expression breadth. Previous reports on these correlations in plants and animals have been conflicting. In this study, next-generation sequence data, which has been shown to be more sensitive than previous expression profiling technologies, were generated and analyzed from 14 tissues. Our results revealed a novel dichotomy. At the low expression level, an increase in expression breadth correlated with an increase in transcript size because of an increase in the number of exons and introns. No significant changes in intron or exon sizes were noted. Conversely, genes expressed at the intermediate to high expression levels displayed a decrease in transcript size as their expression breadth increased. This was due to smaller exons, with no significant change in the number of exons. Taking advantage of the known gene space of soybean, we evaluated the positioning of genes and found significant clustering of similarly expressed genes. Identifying the correlations between the physical parameters of individual genes could lead to uncovering the role of regulation owing to nucleotide composition, which might have potential impacts in discerning the role of the noncoding regions.
Collapse
Affiliation(s)
- Jenna L Woody
- Department of Agronomy, Iowa State University, Ames, 50011, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
58
|
Zeng J, Yi SV. DNA methylation and genome evolution in honeybee: gene length, expression, functional enrichment covary with the evolutionary signature of DNA methylation. Genome Biol Evol 2010; 2:770-80. [PMID: 20924039 PMCID: PMC2975444 DOI: 10.1093/gbe/evq060] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
A growing body of evidence suggests that DNA methylation is functionally divergent among different taxa. The recently discovered functional methylation system in the honeybee Apis mellifera presents an attractive invertebrate model system to study evolution and function of DNA methylation. In the honeybee, DNA methylation is mostly targeted toward transcription units (gene bodies) of a subset of genes. Here, we report an intriguing covariation of length and epigenetic status of honeybee genes. Hypermethylated and hypomethylated genes in honeybee are dramatically different in their lengths for both exons and introns. By analyzing orthologs in Drosophila melanogaster, Acyrthosiphonpisum, and Ciona intestinalis, we show genes that were short and long in the past are now preferentially situated in hyper- and hypomethylated classes respectively, in the honeybee. Moreover, we demonstrate that a subset of high-CpG genes are conspicuously longer than expected under the evolutionary relationship alone and that they are enriched in specific functional categories. We suggest that gene length evolution in the honeybee is partially driven by evolutionary forces related to regulation of gene expression, which in turn is associated with DNA methylation. However, lineage-specific patterns of gene length evolution suggest that there may exist additional forces underlying the observed interaction between DNA methylation and gene lengths in the honeybee.
Collapse
Affiliation(s)
- Jia Zeng
- School of Biology, Georgia Institute of Technology, USA
| | | |
Collapse
|
59
|
Park SG, Choi SS. Expression breadth and expression abundance behave differently in correlations with evolutionary rates. BMC Evol Biol 2010; 10:241. [PMID: 20691101 PMCID: PMC2924872 DOI: 10.1186/1471-2148-10-241] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2010] [Accepted: 08/07/2010] [Indexed: 01/12/2023] Open
Abstract
Background One of the main objectives of the molecular evolution and evolutionary systems biology field is to reveal the underlying principles that dictate protein evolutionary rates. Several studies argue that expression abundance is the most critical component in determining the rate of evolution, especially in unicellular organisms. However, the expression breadth also needs to be considered for multicellular organisms. Results In the present paper, we analyzed the relationship between the two expression variables and rates using two different genome-scale expression datasets, microarrays and ESTs. A significant positive correlation between the expression abundance (EA) and expression breadth (EB) was revealed by Kendall's rank correlation tests. A novel random shuffling approach was applied for EA and EB to compare the correlation coefficients obtained from real data sets to those estimated based on random chance. A novel method called a Fixed Group Analysis (FGA) was designed and applied to investigate the correlations between expression variables and rates when one of the two expression variables was evenly fixed. Conclusions In conclusion, all of these analyses and tests consistently showed that the breadth rather than the abundance of gene expression is tightly linked with the evolutionary rate in multicellular organisms.
Collapse
Affiliation(s)
- Seung Gu Park
- Department of Medical Biotechnology, College of Biomedical Science, and Institute of Bioscience & Biotechnology, Kangwon National University, Chunchon 200-701, Korea
| | | |
Collapse
|
60
|
Nie H, Crooijmans RPMA, Lammers A, van Schothorst EM, Keijer J, Neerincx PBT, Leunissen JAM, Megens HJ, Groenen MAM. Gene expression in chicken reveals correlation with structural genomic features and conserved patterns of transcription in the terrestrial vertebrates. PLoS One 2010; 5:e11990. [PMID: 20700537 PMCID: PMC2916831 DOI: 10.1371/journal.pone.0011990] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2010] [Accepted: 07/13/2010] [Indexed: 11/26/2022] Open
Abstract
Background The chicken is an important agricultural and avian-model species. A survey of gene expression in a range of different tissues will provide a benchmark for understanding expression levels under normal physiological conditions in birds. With expression data for birds being very scant, this benchmark is of particular interest for comparative expression analysis among various terrestrial vertebrates. Methodology/Principal Findings We carried out a gene expression survey in eight major chicken tissues using whole genome microarrays. A global picture of gene expression is presented for the eight tissues, and tissue specific as well as common gene expression were identified. A Gene Ontology (GO) term enrichment analysis showed that tissue-specific genes are enriched with GO terms reflecting the physiological functions of the specific tissue, and housekeeping genes are enriched with GO terms related to essential biological functions. Comparisons of structural genomic features between tissue-specific genes and housekeeping genes show that housekeeping genes are more compact. Specifically, coding sequence and particularly introns are shorter than genes that display more variation in expression between tissues, and in addition intergenic space was also shorter. Meanwhile, housekeeping genes are more likely to co-localize with other abundantly or highly expressed genes on the same chromosomal regions. Furthermore, comparisons of gene expression in a panel of five common tissues between birds, mammals and amphibians showed that the expression patterns across tissues are highly similar for orthologuous genes compared to random gene pairs within each pair-wise comparison, indicating a high degree of functional conservation in gene expression among terrestrial vertebrates. Conclusions The housekeeping genes identified in this study have shorter gene length, shorter coding sequence length, shorter introns, and shorter intergenic regions, there seems to be selection pressure on economy in genes with a wide tissue distribution, i.e. these genes are more compact. A comparative analysis showed that the expression patterns of orthologous genes are conserved in the terrestrial vertebrates during evolution.
Collapse
Affiliation(s)
- Haisheng Nie
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, The Netherlands
| | | | | | | | | | | | | | | | | |
Collapse
|
61
|
Shen-Orr SS, Pilpel Y, Hunter CP. Composition and regulation of maternal and zygotic transcriptomes reflects species-specific reproductive mode. Genome Biol 2010; 11:R58. [PMID: 20515465 PMCID: PMC2911106 DOI: 10.1186/gb-2010-11-6-r58] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2009] [Revised: 04/23/2010] [Accepted: 06/01/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Early embryos contain mRNA transcripts expressed from two distinct origins; those expressed from the mother's genome and deposited in the oocyte (maternal) and those expressed from the embryo's genome after fertilization (zygotic). The transition from maternal to zygotic control occurs at different times in different animals according to the extent and form of maternal contributions, which likely reflect evolutionary and ecological forces. Maternally deposited transcripts rely on post-transcriptional regulatory mechanisms for precise spatial and temporal expression in the embryo, whereas zygotic transcripts can use both transcriptional and post-transcriptional regulatory mechanisms. The differences in maternal contributions between animals may be associated with gene regulatory changes detectable by the size and complexity of the associated regulatory regions. RESULTS We have used genomic data to identify and compare maternal and/or zygotic expressed genes from six different animals and find evidence for selection acting to shape gene regulatory architecture in thousands of genes. We find that mammalian maternal genes are enriched for complex regulatory regions, suggesting an increase in expression specificity, while egg-laying animals are enriched for maternal genes that lack transcriptional specificity. CONCLUSIONS We propose that this lack of specificity for maternal expression in egg-laying animals indicates that a large fraction of maternal genes are expressed non-functionally, providing only supplemental nutritional content to the developing embryo. These results provide clear predictive criteria for analysis of additional genomes.
Collapse
Affiliation(s)
- Shai S Shen-Orr
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Ave, Cambridge, MA 02138, USA
| | | | | |
Collapse
|
62
|
Rao YS, Wang ZF, Chai XW, Wu GZ, Zhou M, Nie QH, Zhang XQ. Selection for the compactness of highly expressed genes in Gallus gallus. Biol Direct 2010; 5:35. [PMID: 20465857 PMCID: PMC2883972 DOI: 10.1186/1745-6150-5-35] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2009] [Accepted: 05/14/2010] [Indexed: 11/10/2022] Open
Abstract
Background Coding sequence (CDS) length, gene size, and intron length vary within a genome and among genomes. Previous studies in diverse organisms, including human, D. Melanogaster, C. elegans, S. cerevisiae, and Arabidopsis thaliana, indicated that there are negative relationships between expression level and gene size, CDS length as well as intron length. Different models such as selection for economy model, genomic design model, and mutational bias hypotheses have been proposed to explain such observation. The debate of which model is a superior one to explain the observation has not been settled down. The chicken (Gallus gallus) is an important model organism that bridges the evolutionary gap between mammals and other vertebrates. As D. Melanogaster, chicken has a larger effective population size, selection for chicken genome is expected to be more effective in increasing protein synthesis efficiency. Therefore, in this study the chicken was used as a model organism to elucidate the interaction between gene features and expression pattern upon selection pressure. Results Based on different technologies, we gathered expression data for nuclear protein coding, single-splicing genes from Gallus gallus genome and compared them with gene parameters. We found that gene size, CDS length, first intron length, average intron length, and total intron length are negatively correlated with expression level and expression breadth significantly. The tissue specificity is positively correlated with the first intron length but negatively correlated with the average intron length, and not correlated with the CDS length and protein domain numbers. Comparison analyses showed that ubiquitously expressed genes and narrowly expressed genes with the similar expression levels do not differ in compactness. Our data provided evidence that the genomic design model can not, at least in part, explain our observations. We grouped all somatic-tissue-specific genes (n = 1105), and compared the first intron length and the average intron length between highly expressed genes (top 5% expressed genes) and weakly expressed genes (bottom 5% expressed genes). We found that the first intron length and the average intron length in highly expressed genes are not different from that in weakly expressed genes. We also made a comparison between ubiquitously expressed genes and narrowly expressed somatic genes with similar expression levels. Our data demonstrated that ubiquitously expressed genes are less compact than narrowly expressed genes with the similar expression levels. Obviously, these observations can not be explained by mutational bias hypotheses either. We also found that the significant trend between genes' compactness and expression level could not be affected by local mutational biases. We argued that the selection of economy model is most likely one to explain the relationship between gene expression and gene characteristics in chicken genome. Conclusion Natural selection appears to favor the compactness of highly expressed genes in chicken genome. This observation can be explained by the selection of economy model. Reviewers This article was reviewed by Dr. Gavin Huttley, Dr. Liran Carmel (nominated by Dr. Eugene V. Koonin) and Dr. Araxi Urrutia (nominated by Dr. Laurence D. Hurst).
Collapse
Affiliation(s)
- You S Rao
- Department of Biological Technology, Jiangxi Educational Institute, Nanchang, Jiangxi, China
| | | | | | | | | | | | | |
Collapse
|
63
|
Temperature and length-dependent modulation of the MH class II beta gene expression in brook charr (Salvelinus fontinalis) by a cis-acting minisatellite. Mol Immunol 2010; 47:1817-29. [PMID: 20381151 DOI: 10.1016/j.molimm.2009.12.012] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2009] [Accepted: 12/23/2009] [Indexed: 01/19/2023]
Abstract
It is widely recognized that the variation in gene regulation is an important factor from which evolutionary changes in diverse aspects of phenotype can be observed in all organisms. Distinctive elements with functional roles on gene regulation have been identified within the non-coding part of the genome, including repeated elements. Major histocompatibility complex (MHC) genes have been the subject of an abundant literature which made them unique candidates for studies of adaptation in natural populations. Yet, the vast majority of studies on MHC genes have dealt with patterns of polymorphism in sequence variation while very few paid attention to the possible implication of differential expression in adaptive responses. In this paper, we report the identification of a polymorphic minisatellite formed of a 32 nucleotides motif (38% G+C) involved in regulation of the major histocompatibility class II beta gene (MHII beta) of brook charr (Salvelinus fontinalis). Our main objectives were: to analyze the variability of this minisatellite found in the second intron of the MHII beta gene and to document its effect to the variation of expression level of this gene under different environmental conditions. Distinctive number of the minisatellite repeats were associated with each different MHII beta alleles identified from exon 2 sequences. Relative expression levels of specific alleles in heterozygous individuals were determined from fish lymphocytes in different genotypes. We found that alleles carrying the longest minisatellite showed a significant 1.67-2.56-fold reduction in the transcript expression relatively to the shortest one. Results obtained in three different genotypes also indicated that the repressive activity associated to the longest minisatellite was more effective at 18 degrees C compared to 6 degrees C. In contrast, no significant difference was observed in transcript levels between alleles with comparable minisatellite length at both temperatures. We also depicted a significant up-regulation of the total MHII beta transcript at 6 degrees C relative to 18 degrees C. These results reveal for the first time that a temperature-sensitive minisatellite could potentially play an important role in the gene regulation of the adaptive immune response in fishes.
Collapse
|
64
|
Vinogradov AE. Human transcriptome nexuses: basic-eukaryotic and metazoan. Genomics 2010; 95:345-54. [PMID: 20298777 DOI: 10.1016/j.ygeno.2010.03.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Revised: 03/01/2010] [Accepted: 03/08/2010] [Indexed: 01/10/2023]
Abstract
Using a new approach, I analysed human transcriptome coexpression network and revealed two large-scale nexuses. Besides gene coexpression, each nexus is characterized by a combination of gene evolutionary origin, function and among-tissues expression breadth. The first nexus contains mostly genes of pre-metazoan origin, which are widely expressed and have cell-centred functions. The second nexus is enriched in genes of metazoan origin, which are expressed more narrowly and have organism-centred functions. The revealed nexuses are supported by asymmetry in distribution of transcription factor targets between them. Within the metazoan nexus, there is a subnexus that is more pronounced in the nervous tissues and is enriched in gene regulatory complexity. It mostly contains genes related to nervous system, cell communication and multicellular organism processes and development. The revealed nexuses indicate a dichotomy in the transcriptional regulation and can provide a framework for further functional genomics studies.
Collapse
|
65
|
Cenik C, Derti A, Mellor JC, Berriz GF, Roth FP. Genome-wide functional analysis of human 5' untranslated region introns. Genome Biol 2010; 11:R29. [PMID: 20222956 PMCID: PMC2864569 DOI: 10.1186/gb-2010-11-3-r29] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2010] [Accepted: 03/11/2010] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Approximately 35% of human genes contain introns within the 5' untranslated region (UTR). Introns in 5'UTRs differ from those in coding regions and 3'UTRs with respect to nucleotide composition, length distribution and density. Despite their presumed impact on gene regulation, the evolution and possible functions of 5'UTR introns remain largely unexplored. RESULTS We performed a genome-scale computational analysis of 5'UTR introns in humans. We discovered that the most highly expressed genes tended to have short 5'UTR introns rather than having long 5'UTR introns or lacking 5'UTR introns entirely. Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories. In particular, genes with regulatory roles were surprisingly enriched in having 5'UTR introns. Finally, we analyzed the evolution of 5'UTR introns in non-receptor protein tyrosine kinases (NRTK), and identified a conserved DNA motif enriched within the 5'UTR introns of human NRTKs. CONCLUSIONS Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner. While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.
Collapse
Affiliation(s)
- Can Cenik
- Harvard Medical School, Department of Biological Chemistry and Molecular Pharmacology, 250 Longwood Avenue, SGMB-322, Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
66
|
Mukhopadhyay P, Ghosh TC. Relationship between gene compactness and base composition in rice and human genome. J Biomol Struct Dyn 2010; 27:477-88. [PMID: 19916569 DOI: 10.1080/07391102.2010.10507332] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
In human, highly expressed genes contain shorter and fewer introns and these have been attributed to selection for economy in transcription and translation. On the other hand, in plants, it has been shown that highly expressed genes tend to be longer than lowly expressed genes. Here, in this study, we analyzed compositional influence on genome organization in both rice and human. We demonstrated that, in GC rich rice genes, highly expressed genes are less compact than lowly expressed genes. In GC-poor class, there is no difference in gene compactness between highly and lowly expressed genes. However, the scenario is different for human as there is no influence of GC composition on gene compactness due to their expression levels. We also reported that, highly expressed rice GC-rich pre-mRNA tend to form less stable secondary structure than that of lowly expressed genes. However, on removing intronic sequences, highly expressed mRNA form a stable secondary structure as compared to lowly expressed GC-rich genes. We suggest that in GC-rich rice genes long introns are under selection for enhancing transcriptional efficiency by modulating pre-mRNA secondary structural stability. Thus evolutionary mechanisms behind genome organization are different between these two genomes (human and rice).
Collapse
Affiliation(s)
- Pamela Mukhopadhyay
- Bioinformatics Centre, Bose Institute P 1/12, C.I.T. Scheme VII M - Kolkata 700054- India.
| | | |
Collapse
|
67
|
Wang PPS, Ruvinsky I. Computational prediction of Caenorhabditis box H/ACA snoRNAs using genomic properties of their host genes. RNA (NEW YORK, N.Y.) 2010; 16:290-298. [PMID: 20038629 PMCID: PMC2811658 DOI: 10.1261/rna.1876210] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 10/27/2009] [Indexed: 05/28/2023]
Abstract
Identification of small nucleolar RNAs (snoRNAs) in genomic sequences has been challenging due to the relative paucity of sequence features. Many current prediction algorithms rely on detection of snoRNA motifs complementary to target sites in snRNAs and rRNAs. However, recent discovery of snoRNAs without apparent targets requires development of alternative prediction methods. We present an approach that combines rule-based filters and a Bayesian Classifier to identify a class of snoRNAs (H/ACA) without requiring target sequence information. It takes advantage of unique attributes of their genomic organization and improved species-specific motif characterization to predict snoRNAs that may otherwise be difficult to discover. Searches in the genomes of Caenorhabditis elegans and the closely related Caenorhabditis briggsae suggest that our method performs well compared to recent benchmark algorithms. Our results illustrate the benefits of training gene discovery engines on features restricted to particular phylogenetic groups and the utility of incorporating diverse data types in gene prediction.
Collapse
Affiliation(s)
- Paul Po-Shen Wang
- Department of Ecology and Evolution , University of Chicago, Chicago, Illinois 60637, USA
| | | |
Collapse
|
68
|
Abstract
Proteins encoded by highly expressed genes evolve more slowly. This correlation is thought to arise owing to purifying selection against toxicity of misfolded proteins (that should be more crucial for highly expressed genes). It is now widely accepted that this individual (by-gene) effect is a dominant cause in protein evolution. Here, I show that in mammals, the evolutionary rate of a protein is much more strongly related to the evolutionary rate of coexpressed proteins (and proteins of the same biological pathway) than to the expression level of its encoding gene. The complexity of gene regulation (estimated by the numbers of transcription factor targets and regulatory microRNA targets in the encoding gene) is another important cause, which is much stronger than gene expression level. Proteins encoded by complexly regulated genes evolve more slowly. The intronic length and the ratio of intronic to coding sequence lengths also correlate negatively with protein evolutionary rate (which contradicts the expectation from the negative link between expression level and evolutionary rate). One more important factor, which is much stronger than gene expression level, is evolutionary age. More recent proteins evolve faster, and expression level of an encoding gene becomes quite a minor cause in the evolution of mammal proteins of metazoan origin. These data suggest that, in contrast to a widespread opinion, systemic factors dominate mammal protein evolution.
Collapse
|
69
|
Hu Z. Insight into microRNA regulation by analyzing the characteristics of their targets in humans. BMC Genomics 2009; 10:594. [PMID: 20003303 PMCID: PMC2799441 DOI: 10.1186/1471-2164-10-594] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2009] [Accepted: 12/10/2009] [Indexed: 01/24/2023] Open
Abstract
Background microRNAs (miRNAs) are believed to regulate their targets through posttranscriptional gene regulation and have the potential to silence gene expression via multiple mechanisms. Despite previous advances on miRNA regulation of gene expression, little has been investigated from a genome scale. Results To gain new insight into miRNA regulation in humans, we used large scale data and carried out a series of studies to compare various features of miRNA target genes to that of non-miRNA target genes. We observed significant differences between miRNA and non-miRNA target genes for a number of characteristics, including higher and broader mRNA expression, faster mRNA decay rate, longer protein half-life, and longer gene structures. Based on these features and by analyzing their relationships we found that miRNA target genes, other than having miRNA repression, were most likely under more complex regulation than non-miRNA target genes, which was evidenced by their higher and broader gene expression but longer gene structures. Our results of higher and broader gene expression but fast mRNA decay rates also provide evidence that miRNA dampening of the output of preexisting transcripts facilitates a more rapid and robust transition to new expression programs. This could be achieved by enhancing mRNA degradation through an additive effect from multiple miRNA targeting. Conclusion Genome-scale analysis on the nature of miRNA target genes has revealed a general mechanism for miRNA regulation of human gene expression. The results of this study also indicate that miRNA target genes, other than having miRNA repression, are under more complex gene regulation than non-miRNA target genes. These findings provide novel insight into miRNA regulation of human gene expression.
Collapse
Affiliation(s)
- Zihua Hu
- Center for Computational Research, New York State Center of Excellence in Bioinformatics & Life Sciences, Department of Biostatistics, Department of Medicine, State University of New York (SUNY), Buffalo, NY 14260, USA.
| |
Collapse
|
70
|
Guerra Cardoso H, Doroteia Campos M, Rita Costa A, Catarina Campos M, Nothnagel T, Arnholdt-Schmitt B. Carrot alternative oxidase gene AOX2a demonstrates allelic and genotypic polymorphisms in intron 3. PHYSIOLOGIA PLANTARUM 2009; 137:592-608. [PMID: 19941625 DOI: 10.1111/j.1399-3054.2009.01299.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels) are becoming important genetic markers for major crop species. In this study, we focus on variations at genomic level of the Daucus carota L. AOX2a gene. The use of gene-specific primers designed in exon regions on the boundaries of introns permitted to recognize intron length polymorphism (ILP) in intron 3 AOX2a by simple polymerase chain reaction (PCR) assays. The length of intron 3 can vary in individual carrot plants. Thus, allelic variation can be used as a tool to discriminate between single plant genotypes. Using this approach, individual plants from cv. Rotin and from diverse breeding lines and cultivars were identified that showed genetic variability by AOX2a ILPs. Repetitive patterns of intron length variation have been observed which allows grouping of genotypes. Polymorphic and identical PCR fragments revealed underlying high levels of sequence polymorphism. Variability was due to InDel events and intron single nucleotide polymorphisms (ISNPs), with a repetitive deletion in intron 3 affecting a putative pre-miRNA site. The results suggest that high AOX2a gene diversity in D. carota can be explored for the development of functional markers related to agronomic traits.
Collapse
Affiliation(s)
- Hélia Guerra Cardoso
- EU Marie Curie Chair, ICAAM, University of Evora, Apartado 94, 7002-554 Evora, Portugal
| | | | | | | | | | | |
Collapse
|
71
|
Costa JH, de Melo DF, Gouveia Z, Cardoso HG, Peixe A, Arnholdt-Schmitt B. The alternative oxidase family of Vitis vinifera reveals an attractive model to study the importance of genomic design. PHYSIOLOGIA PLANTARUM 2009; 137:553-65. [PMID: 19682279 DOI: 10.1111/j.1399-3054.2009.01267.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
'Genomic design' refers to the structural organization of gene sequences. Recently, the role of intron sequences for gene regulation is being better understood. Further, introns possess high rates of polymorphism that are considered as the major source for speciation. In molecular breeding, the length of gene-specific introns is recognized as a tool to discriminate genotypes with diverse traits of agronomic interest. 'Economy selection' and 'time-economy selection' have been proposed as models for explaining why highly expressed genes typically contain small introns. However, in contrast to these theories, plant-specific selection reveals that highly expressed genes contain introns that are large. In the presented research, 'wet'Aox gene identification from grapevine is advanced by a bioinformatics approach to study the species-specific organization of Aox gene structures in relation to available expressed sequence tag (EST) data. Two Aox1 and one Aox2 gene sequences have been identified in Vitis vinifera using grapevine cultivars from Portugal and Germany. Searching the complete genome sequence data of two grapevine cultivars confirmed that V. vinifera alternative oxidase (Aox) is encoded by a small multigene family composed of Aox1a, Aox1b and Aox2. An analysis of EST distribution revealed high expression of the VvAox2 gene. A relationship between the atypical long primary transcript of VvAox2 (in comparison to other plant Aox genes) and its expression level is suggested. V. vinifera Aox genes contain four exons interrupted by three introns except for Aox1a which contains an additional intron in the 3'-UTR. The lengths of primary Aox transcripts were estimated for each gene in two V. vinifera varieties: PN40024 and Pinot Noir. In both varieties, Aox1a and Aox1b contained small introns that corresponded to primary transcript lengths ranging from 1501 to 1810 bp. The Aox2 of PN40024 (12 329 bp) was longer than that from Pinot Noir (7279 bp) because of selection against a transposable-element insertion that is 5028 bp in size. An EST database basic local alignment search tool (BLAST) search of GenBank revealed the following ESTs percentages for each gene: Aox1a (26.2%), Aox1b (11.9%) and Aox2 (61.9%). Aox1a was expressed in fruits and roots, Aox1b expression was confined to flowers and Aox2 was ubiquitously expressed. These data for V. vinifera show that atypically long Aox intron lengths are related to high levels of gene expression. Furthermore, it is shown for the first time that two grapevine cultivars can be distinguished by Aox intron length polymorphism.
Collapse
Affiliation(s)
- José Hélio Costa
- Department of Biochemistry and Molecular Biology, Federal University of Ceará, PO Box 6029, 60455-900, Fortaleza, Ceará, Brazil
| | | | | | | | | | | |
Collapse
|
72
|
Yang H. In plants, expression breadth and expression level distinctly and non-linearly correlate with gene structure. Biol Direct 2009; 4:45; discussion 45. [PMID: 19930585 PMCID: PMC2794262 DOI: 10.1186/1745-6150-4-45] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 11/21/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Compactness of highly/broadly expressed genes in human has been explained as selection for efficiency, regional mutation biases or genomic design. However, highly expressed genes in flowering plants were shown to be less compact than lowly expressed ones. On the other hand, opposite facts have also been documented that pollen-expressed Arabidopsis genes tend to contain shorter introns and highly expressed moss genes are compact. This issue is important because it provides a chance to compare the selectionism and the neutralism views about genome evolution. Furthermore, this issue also helps to understand the fates of introns, from the angle of gene expression. RESULTS In this study, I used expression data covering more tissues and employ new analytical methods to reexamine the correlations between gene expression and gene structure for two flowering plants, Arabidopsis thaliana and Oryza sativa. It is shown that, different aspects of expression pattern correlate with different parts of gene sequences in distinct ways. In detail, expression level is significantly negatively correlated with gene size, especially the size of non-coding regions, whereas expression breadth correlates with non-coding structural parameters positively and with coding region parameters negatively. Furthermore, the relationships between expression level and structural parameters seem to be non-linear, with the extremes of structural parameters possibly scale as power-laws or logrithmic functions of expression levels. CONCLUSION In plants, highly expressed genes are compact, especially in the non-coding regions. Broadly expressed genes tend to contain longer non-coding sequences, which may be necessary for complex regulations. In combination with previous studies about other plants and about animals, some common scenarios about the correlation between gene expression and gene structure begin to emerge. Based on the functional relationships between extreme values of structural characteristics and expression level, an effort was made to evaluate the relative effectiveness of the energy-cost hypothesis and the time-cost hypothesis.
Collapse
Affiliation(s)
- Hangxing Yang
- T-Life Research Center, Department of Physics, Fudan University, Shanghai, PR China.
| |
Collapse
|
73
|
Morgan AA, Dudley JT, Deshpande T, Butte AJ. Dynamism in gene expression across multiple studies. Physiol Genomics 2009; 40:128-40. [PMID: 19920211 DOI: 10.1152/physiolgenomics.90403.2008] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
In this study we develop methods of examining gene expression dynamics, how and when genes change expression, and demonstrate their application in a meta-analysis involving over 29,000 microarrays. By defining measures across many experimental conditions, we have a new way of characterizing dynamics, complementary to measures looking at changes in absolute variation or breadth of tissues showing expression. We show conservation in overall patterns of dynamism across three species (human, mouse, and rat) and show associations with known disease-related genes. We discuss the enriched functional properties of the sets of genes showing different patterns of dynamics and show that the differences in expression dynamics is associated with the variety of different transcription factor regulatory sites. These results can influence thinking about the selection of genes for microarray design and the analysis of measurements of mRNA expression variation in a global context of expression dynamics across many conditions, as genes that are rarely differentially expressed between experimental conditions may be the subject of increased scrutiny when they significantly vary in expression between experimental subsets.
Collapse
Affiliation(s)
- Alexander A Morgan
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | | | | | | |
Collapse
|
74
|
Evolutionary genetic insights into Plasmodium falciparum functional genes. Parasitol Res 2009; 106:349-55. [PMID: 19902252 DOI: 10.1007/s00436-009-1668-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Accepted: 10/20/2009] [Indexed: 10/20/2022]
Abstract
Complex and rapidly evolving behavior of the human malaria parasite Plasmodium falciparum have always been mysterious to the evolutionary biologists, as the parasite is the most virulent and now becoming the most prevalent malaria parasite species across the globe. With the availability of complete genome sequence of P. falciparum, better understanding of the genome design and evolution could be possible. We herein utilized the available information of all known functional genes from whole genome of P. falciparum and investigate the differential mode of gene evolution. The study comparing P. falciparum functional genes with Plasmodium vivax revealed about 82% of genes to be conserved in the later species and the rest, 18% to be totally unique to P. falciparum. Genetic architectural pattern of functional genes shows absence of introns in about a half of the conserved genes, whereas almost all unique genes have introns. Similarly, distribution of intron number and length were also observed to be different for conserved and unique genes of P. falciparum. Statistically significant positive correlations between total intron length and gene lengths were detected in 11 chromosomes for unique genes, whereas only in three chromosomes for conserved genes. Preference of intron presence in some P. falciparum genes were also detected which provide functional relevance of introns. The study provides, for the first time, a detail evolutionary analysis of functional genes of a devastating malaria parasite. The marked differences in organization of introns between the unique and conserved genes in P. falciparum, and the contribution of introns to genome complexity are some of the hallmarks of the study.
Collapse
|
75
|
Kandul NP, Noor MAF. Large introns in relation to alternative splicing and gene evolution: a case study of Drosophila bruno-3. BMC Genet 2009; 10:67. [PMID: 19840385 PMCID: PMC2767349 DOI: 10.1186/1471-2156-10-67] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Accepted: 10/19/2009] [Indexed: 01/12/2023] Open
Abstract
Background Alternative splicing (AS) of maturing mRNA can generate structurally and functionally distinct transcripts from the same gene. Recent bioinformatic analyses of available genome databases inferred a positive correlation between intron length and AS. To study the interplay between intron length and AS empirically and in more detail, we analyzed the diversity of alternatively spliced transcripts (ASTs) in the Drosophila RNA-binding Bruno-3 (Bru-3) gene. This gene was known to encode thirteen exons separated by introns of diverse sizes, ranging from 71 to 41,973 nucleotides in D. melanogaster. Although Bru-3's structure is expected to be conducive to AS, only two ASTs of this gene were previously described. Results Cloning of RT-PCR products of the entire ORF from four species representing three diverged Drosophila lineages provided an evolutionary perspective, high sensitivity, and long-range contiguity of splice choices currently unattainable by high-throughput methods. Consequently, we identified three new exons, a new exon fragment and thirty-three previously unknown ASTs of Bru-3. All exon-skipping events in the gene were mapped to the exons surrounded by introns of at least 800 nucleotides, whereas exons split by introns of less than 250 nucleotides were always spliced contiguously in mRNA. Cases of exon loss and creation during Bru-3 evolution in Drosophila were also localized within large introns. Notably, we identified a true de novo exon gain: exon 8 was created along the lineage of the obscura group from intronic sequence between cryptic splice sites conserved among all Drosophila species surveyed. Exon 8 was included in mature mRNA by the species representing all the major branches of the obscura group. To our knowledge, the origin of exon 8 is the first documented case of exonization of intronic sequence outside vertebrates. Conclusion We found that large introns can promote AS via exon-skipping and exon turnover during evolution likely due to frequent errors in their removal from maturing mRNA. Large introns could be a reservoir of genetic diversity, because they have a greater number of mutable sites than short introns. Taken together, gene structure can constrain and/or promote gene evolution.
Collapse
Affiliation(s)
- Nikolai P Kandul
- Biology Department, Duke University, PO Box 90338, FFSC 4244, Durham, NC 27708, USA.
| | | |
Collapse
|
76
|
Carmel L, Koonin EV. A universal nonmonotonic relationship between gene compactness and expression levels in multicellular eukaryotes. Genome Biol Evol 2009; 1:382-90. [PMID: 20333206 PMCID: PMC2817431 DOI: 10.1093/gbe/evp038] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2009] [Indexed: 01/21/2023] Open
Abstract
Analysis of gene architecture and expression levels of four organisms, Homo sapiens, Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana, reveals a surprising, nonmonotonic, universal relationship between expression level and gene compactness. With increasing expression level, the genes tend at first to become longer but, from a certain level of expression, they become more and more compact, resulting in an approximate bell-shaped dependence. There are two leading hypotheses to explain the compactness of highly expressed genes. The selection hypothesis predicts that gene compactness is predominantly driven by the level of expression, whereas the genomic design hypothesis predicts that expression breadth across tissues is the driving force. We observed the connection between gene expression breadth in humans and gene compactness to be significantly weaker than the connection between expression level and compactness, a result that is compatible with the selection hypothesis but not the genome design hypothesis. The initial gene elongation with increasing expression level could be explained, at least in part, by accumulation of regulatory elements enhancing expression, in particular, in introns. This explanation is compatible with the observed positive correlation between intron density and expression level of a gene. Conversely, the trend toward increasing compactness for highly expressed genes could be caused by selection for minimization of energy and time expenditure during transcription and splicing and for increased fidelity of transcription, splicing, and/or translation that is likely to be particularly critical for highly expressed genes. Regardless of the exact nature of the forces that shape the gene architecture, we present evidence that, at least, in animals, coding and noncoding parts of genes show similar architectonic trends.
Collapse
Affiliation(s)
- Liran Carmel
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
77
|
Mutational biases and selective forces shaping the structure of Arabidopsis genes. PLoS One 2009; 4:e6356. [PMID: 19633720 PMCID: PMC2712092 DOI: 10.1371/journal.pone.0006356] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 06/01/2009] [Indexed: 01/08/2023] Open
Abstract
Recently features of gene expression profiles have been associated with structural parameters of gene sequences in organisms representing a diverse set of taxa. The emerging picture indicates that natural selection, mediated by gene expression profiles, has a significant role in determining genic structures. However the current situation is less clear in plants as the available data indicates that the effect of natural selection mediated by gene expression is very weak. Moreover, the direction of the patterns in plants appears to contradict those observed in animal genomes. In the present work we analized expression data for >18000 Arabidopsis genes retrieved from public datasets obtained with different technologies (MPSS and high density chip arrays) and compared them with gene parameters. Our results show that the impact of natural selection mediated by expression on genes sequences is significant and distinguishable from the effects of regional mutational biases. In addition, we provide evidence that the level and the breadth of gene expression are related in opposite ways to many structural parameters of gene sequences. Higher levels of expression abundance are associated with smaller transcripts, consistent with the need to reduce costs of both transcription and translation. Expression breadth, however, shows a contrasting pattern, i.e. longer genes have higher breadth of expression, possibly to ensure those structural features associated with gene plasticity. Based on these results, we propose that the specific balance between these two selective forces play a significant role in shaping the structure of Arabidopsis genes.
Collapse
|
78
|
She X, Rohl CA, Castle JC, Kulkarni AV, Johnson JM, Chen R. Definition, conservation and epigenetics of housekeeping and tissue-enriched genes. BMC Genomics 2009; 10:269. [PMID: 19534766 PMCID: PMC2706266 DOI: 10.1186/1471-2164-10-269] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2008] [Accepted: 06/17/2009] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Housekeeping genes (HKG) are constitutively expressed in all tissues while tissue-enriched genes (TEG) are expressed at a much higher level in a single tissue type than in others. HKGs serve as valuable experimental controls in gene and protein expression experiments, while TEGs tend to represent distinct physiological processes and are frequently candidates for biomarkers or drug targets. The genomic features of these two groups of genes expressed in opposing patterns may shed light on the mechanisms by which cells maintain basic and tissue-specific functions. RESULTS Here, we generate gene expression profiles of 42 normal human tissues on custom high-density microarrays to systematically identify 1,522 HKGs and 975 TEGs and compile a small subset of 20 housekeeping genes which are highly expressed in all tissues with lower variance than many commonly used HKGs. Cross-species comparison shows that both the functions and expression patterns of HKGs are conserved. TEGs are enriched with respect to both segmental duplication and copy number variation, while no such enrichment is observed for HKGs, suggesting the high expression of HKGs are not due to high copy numbers. Analysis of genomic and epigenetic features of HKGs and TEGs reveals that the high expression of HKGs across different tissues is associated with decreased nucleosome occupancy at the transcription start site as indicated by enhanced DNase hypersensitivity. Additionally, we systematically and quantitatively demonstrated that the CpG islands' enrichment in HKGs transcription start sites (TSS) and their depletion in TEGs TSS. Histone methylation patterns differ significantly between HKGs and TEGs, suggesting that methylation contributes to the differential expression patterns as well. CONCLUSION We have compiled a set of high quality HKGs that should provide higher and more consistent expression when used as references in laboratory experiments than currently used HKGs. The comparison of genomic features between HKGs and TEGs shows that HKGs are more conserved than TEGs in terms of functions, expression pattern and polymorphisms. In addition, our results identify chromatin structure and epigenetic features of HKGs and TEGs that are likely to play an important role in regulating their strikingly different expression patterns.
Collapse
Affiliation(s)
- Xinwei She
- Rosetta Inpharmatics LLC, Seattle, WA 98109, USA.
| | | | | | | | | | | |
Collapse
|
79
|
Patrushev LI, Minkevich IG. The problem of the eukaryotic genome size. BIOCHEMISTRY (MOSCOW) 2009; 73:1519-52. [PMID: 19216716 DOI: 10.1134/s0006297908130117] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The current state of knowledge concerning the unsolved problem of the huge interspecific eukaryotic genome size variations not correlating with the species phenotypic complexity (C-value enigma also known as C-value paradox) is reviewed. Characteristic features of eukaryotic genome structure and molecular mechanisms that are the basis of genome size changes are examined in connection with the C-value enigma. It is emphasized that endogenous mutagens, including reactive oxygen species, create a constant nuclear environment where any genome evolves. An original quantitative model and general conception are proposed to explain the C-value enigma. In accordance with the theory, the noncoding sequences of the eukaryotic genome provide genes with global and differential protection against chemical mutagens and (in addition to the anti-mutagenesis and DNA repair systems) form a new, third system that protects eukaryotic genetic information. The joint action of these systems controls the spontaneous mutation rate in coding sequences of the eukaryotic genome. It is hypothesized that the genome size is inversely proportional to functional efficiency of the anti-mutagenesis and/or DNA repair systems in a particular biological species. In this connection, a model of eukaryotic genome evolution is proposed.
Collapse
Affiliation(s)
- L I Patrushev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia.
| | | |
Collapse
|
80
|
Kristiansson E, Thorsen M, Tamás MJ, Nerman O. Evolutionary forces act on promoter length: identification of enriched cis-regulatory elements. Mol Biol Evol 2009; 26:1299-307. [PMID: 19258451 DOI: 10.1093/molbev/msp040] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Transcription factors govern gene expression by binding to short DNA sequences called cis-regulatory elements. These sequences are typically located in promoters, which are regions of variable length upstream of the open reading frames of genes. Here, we report that promoter length and gene function are related in yeast, fungi, and plants. In particular, the promoters for stress-responsive genes are in general longer than those of other genes. Essential genes have, on the other hand, relatively short promoters. We utilize these findings in a novel method for identifying relevant cis-regulatory elements in a set of coexpressed genes. The method is shown to generate more accurate results and fewer false positives compared with other common procedures. Our results suggest that genes with complex transcriptional regulation tend to have longer promoters than genes responding to few signals. This phenomenon is present in all investigated species, indicating that evolution adjust promoter length according to gene function. Identification of cis-regulatory elements in Saccharomyces cerevisiae can be done with the web service located at http://enricher.zool.gu.se.
Collapse
|
81
|
Basu MK, Poliakov E, Rogozin IB. Domain mobility in proteins: functional and evolutionary implications. Brief Bioinform 2009; 10:205-16. [PMID: 19151098 DOI: 10.1093/bib/bbn057] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A substantial fraction of eukaryotic proteins contains multiple domains, some of which show a tendency to occur in diverse domain architectures and can be considered mobile (or 'promiscuous'). These promiscuous domains are typically involved in protein-protein interactions and play crucial roles in interaction networks, particularly those contributing to signal transduction. They also play a major role in creating diversity of protein domain architecture in the proteome. It is now apparent that promiscuity is a volatile and relatively fast-changing feature in evolution, and that only a few domains retain their promiscuity status throughout evolution. Many such domains attained their promiscuity status independently in different lineages. Only recently, we have begun to understand the diversity of protein domain architectures and the role the promiscuous domains play in evolution of this diversity. However, many of the biological mechanisms of protein domain mobility remain shrouded in mystery. In this review, we discuss our present understanding of protein domain promiscuity, its evolution and its role in cellular function.
Collapse
Affiliation(s)
- Malay Kumar Basu
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | |
Collapse
|
82
|
Colinas J, Schmidler SC, Bohrer G, Iordanov B, Benfey PN. Intergenic and genic sequence lengths have opposite relationships with respect to gene expression. PLoS One 2008; 3:e3670. [PMID: 18989364 PMCID: PMC2576458 DOI: 10.1371/journal.pone.0003670] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2008] [Accepted: 10/11/2008] [Indexed: 12/20/2022] Open
Abstract
Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression.
Collapse
Affiliation(s)
- Juliette Colinas
- Department of Biology and IGSP Center for Systems Biology, Duke University, Durham, North Carolina, United States of America
| | - Scott C. Schmidler
- Department of Statistical Sciences, Duke University, Durham, North Carolina, United States of America
| | - Gil Bohrer
- Department of Civil & Environmental Engineering & Geodetic Science, Ohio State University, Columbus, Ohio, United States of America
| | | | - Philip N. Benfey
- Department of Biology and IGSP Center for Systems Biology, Duke University, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
83
|
On the nature of human housekeeping genes. Trends Genet 2008; 24:481-4. [DOI: 10.1016/j.tig.2008.08.004] [Citation(s) in RCA: 204] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2008] [Revised: 07/31/2008] [Accepted: 08/02/2008] [Indexed: 01/27/2023]
|
84
|
Jeffares DC, Penkett CJ, Bähler J. Rapidly regulated genes are intron poor. Trends Genet 2008; 24:375-8. [PMID: 18586348 DOI: 10.1016/j.tig.2008.05.006] [Citation(s) in RCA: 275] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Revised: 05/27/2008] [Accepted: 05/27/2008] [Indexed: 10/21/2022]
Abstract
We show that genes with rapidly changing expression levels in response to stress contain significantly lower intron densities in yeasts, thale cress and mice. Therefore, we propose that introns can delay regulatory responses and are selected against in genes whose transcripts require rapid adjustment for survival of environmental challenges. These findings could provide an explanation for the apparent extensive intron loss during the evolution of some eukaryotic lineages.
Collapse
|
85
|
Huang YF, Niu DK. Evidence against the energetic cost hypothesis for the short introns in highly expressed genes. BMC Evol Biol 2008; 8:154. [PMID: 18492248 PMCID: PMC2424036 DOI: 10.1186/1471-2148-8-154] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Accepted: 05/20/2008] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND In animals, the moss Physcomitrella patens and the pollen of Arabidopsis thaliana, highly expressed genes have shorter introns than weakly expressed genes. A popular explanation for this is selection for transcription efficiency, which includes two sub-hypotheses: to minimize the energetic cost or to minimize the time cost. RESULTS In an individual human, different organs may differ up to hundreds of times in cell number (for example, a liver versus a hypothalamus). Considered at the individual level, a gene specifically expressed in a large organ is actually transcribed tens or hundreds of times more than a gene with a similar expression level (a measure of mRNA abundance per cell) specifically expressed in a small organ. According to the energetic cost hypothesis, the former should have shorter introns than the latter. However, in humans and mice we have not found significant differences in intron length between large-tissue/organ-specific genes and small-tissue/organ-specific genes with similar expression levels. Qualitative estimation shows that the deleterious effect (that is, the energetic burden) of long introns in highly expressed genes is too negligible to be efficiently selected against in mammals. CONCLUSION The short introns in highly expressed genes should not be attributed to energy constraint. We evaluated evidence for the time cost hypothesis and other alternatives.
Collapse
Affiliation(s)
- Yi-Fei Huang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, P R China.
| | | |
Collapse
|
86
|
Zhu J, He F, Song S, Wang J, Yu J. How many human genes can be defined as housekeeping with current expression data? BMC Genomics 2008; 9:172. [PMID: 18416810 PMCID: PMC2396180 DOI: 10.1186/1471-2164-9-172] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2007] [Accepted: 04/16/2008] [Indexed: 12/16/2022] Open
Abstract
Background Housekeeping (HK) genes are ubiquitously expressed in all tissue/cell types and constitute a basal transcriptome for the maintenance of basic cellular functions. Partitioning transcriptomes into HK and tissue-specific (TS) genes relatively is fundamental for studying gene expression and cellular differentiation. Although many studies have aimed at large-scale and thorough categorization of human HK genes, a meaningful consensus has yet to be reached. Results We collected two latest gene expression datasets (both EST and microarray data) from public databases and analyzed the gene expression profiles in 18 human tissues that have been well-documented by both two data types. Benchmarked by a manually-curated HK gene collection (HK408), we demonstrated that present data from EST sampling was far from saturated, and the inadequacy has limited the gene detectability and our understanding of TS expressions. Due to a likely over-stringent threshold, microarray data showed higher false negative rate compared with EST data, leading to a significant underestimation of HK genes. Based on EST data, we found that 40.0% of the currently annotated human genes were universally expressed in at least 16 of 18 tissues, as compared to only 5.1% specifically expressed in a single tissue. Our current EST-based estimate on human HK genes ranged from 3,140 to 6,909 in number, a ten-fold increase in comparison with previous microarray-based estimates. Conclusion We concluded that a significant fraction of human genes, at least in the currently annotated data depositories, was broadly expressed. Our understanding of tissue-specific expression was still preliminary and required much more large-scale and high-quality transcriptomic data in future studies. The new HK gene list categorized in this study will be useful for genome-wide analyses on structural and functional features of HK genes.
Collapse
Affiliation(s)
- Jiang Zhu
- Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
| | | | | | | | | |
Collapse
|
87
|
Greco D, Somervuo P, Di Lieto A, Raitila T, Nitsch L, Castrén E, Auvinen P. Physiology, pathology and relatedness of human tissues from gene expression meta-analysis. PLoS One 2008; 3:e1880. [PMID: 18382664 PMCID: PMC2268968 DOI: 10.1371/journal.pone.0001880] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2007] [Accepted: 02/25/2008] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Development and maintenance of the identity of tissues is of central importance for multicellular organisms. Based on gene expression profiles, it is possible to divide genes in housekeeping genes and those whose expression is preferential in one or a few tissues and which provide specialized functions that have a strong effect on the physiology of the whole organism. RESULTS We have surveyed the gene expression in 78 normal human tissues integrating publicly available microarray gene expression data. A total amount of 1601 genes were identified as selectively expressed in one or more tissues. The tissue-selective genes covered a wide range of cellular and molecular functions, and could be linked to 361 human diseases with Mendelian inheritance. Based on the gene expression profiles, we were able to form a network of tissues reflecting their functional relatedness and, to certain extent, their development. Using co-citation driven gene network technique and promoter analysis, we predicted a transcriptional module where the co-operation of the transcription factors E2F and NF-kappaB can possibly regulate a number of genes involved in the neurogenesis that takes place in the adult hippocampus. CONCLUSIONS Here we propose that integration of gene expression data from Affymetrix GeneChip experiments is possible through re-annotation and commonly used pre-processing methods. We suggest that some functional aspects of the tissues can be explained by the co-operation of multiple transcription factors that regulate the expression of selected groups of genes.
Collapse
Affiliation(s)
- Dario Greco
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| | | | | | | | | | | | | |
Collapse
|
88
|
Dynamic covariation between gene expression and genome characteristics. Gene 2008; 410:53-66. [PMID: 18191345 DOI: 10.1016/j.gene.2007.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2007] [Revised: 11/13/2007] [Accepted: 11/29/2007] [Indexed: 11/21/2022]
Abstract
Gene and protein expression is controlled so that cells can react to changing intra- and extracellular signals by modulating biochemical networks and pathways. We have previously shown that gene expression and the properties of expressed proteins are dynamically correlated. Here we investigated correlations between gene related parameters and gene expression patterns, and found statistically significant correlations in microarray datasets for different cell types, organisms and processes, including human B and T cell stimulation, cell cycle in HeLa cells, infection in intestinal epithelial cells, Drosophila melanogaster life span, and Saccharomyces cerevisiae cell cycle. Our method was applied to time course datasets individually for each time point. We derived from sequence information numerous parameters for nucleotide composition, two-base composition, codon usage, skew parameters, and codon bias. In addition to coding regions, we also investigated correlations for complete genes and introns. Significant dynamic correlations were identified for each of the analyses. Our method also proved useful for detecting dynamic shifts in gene expression profiles, such as in the D. melanogaster dataset. Detection of changes in the properties of expressed genes and proteins might be useful for predicting or following biological processes, responses, growth, differentiation and possibly in related disorders.
Collapse
|
89
|
Urrutia AO, Ocaña LB, Hurst LD. Do Alu repeats drive the evolution of the primate transcriptome? Genome Biol 2008; 9:R25. [PMID: 18241332 PMCID: PMC2374697 DOI: 10.1186/gb-2008-9-2-r25] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2007] [Revised: 01/02/2008] [Accepted: 02/01/2008] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Of all repetitive elements in the human genome, Alus are unusual in being enriched near to genes that are expressed across a broad range of tissues. This has led to the proposal that Alus might be modifying the expression breadth of neighboring genes, possibly by providing CpG islands, modifying transcription factor binding, or altering chromatin structure. Here we consider whether Alus have increased expression breadth of genes in their vicinity. RESULTS Contrary to the modification hypothesis, we find that those genes that have always had broad expression are richest in Alus, whereas those that are more likely to have become more broadly expressed have lower enrichment. This finding is consistent with a model in which Alus accumulate near broadly expressed genes but do not affect their expression breadth. Furthermore, this model is consistent with the finding that expression breadth of mouse genes predicts Alu density near their human orthologs. However, Alus were found to be related to some alternative measures of transcription profile divergence, although evidence is contradictory as to whether Alus associate with lowly or highly diverged genes. If Alu have any effect it is not by provision of CpG islands, because they are especially rare near to transcriptional start sites. Previously reported Alu enrichment for genes serving certain cellular functions, suggested to be evidence of functional importance of Alus, appears to be partly a byproduct of the association with broadly expressed genes. CONCLUSION The abundance of Alu near broadly expressed genes is better explained by their preferential preservation near to housekeeping genes rather than by a modifying effect on expression of genes.
Collapse
Affiliation(s)
- Araxi O Urrutia
- Department of Biology and Biochemistry, University of Bath, Bath, BA4 7AY, UK.
| | | | | |
Collapse
|
90
|
Fontanillas P, Hartl DL, Reuter M. Genome organization and gene expression shape the transposable element distribution in the Drosophila melanogaster euchromatin. PLoS Genet 2007; 3:e210. [PMID: 18081425 PMCID: PMC2098804 DOI: 10.1371/journal.pgen.0030210] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2007] [Accepted: 10/09/2007] [Indexed: 02/07/2023] Open
Abstract
The distribution of transposable elements (TEs) in a genome reflects a balance between insertion rate and selection against new insertions. Understanding the distribution of TEs therefore provides insights into the forces shaping the organization of genomes. Past research has shown that TEs tend to accumulate in genomic regions with low gene density and low recombination rate. However, little is known about the factors modulating insertion rates across the genome and their evolutionary significance. One candidate factor is gene expression, which has been suggested to increase local insertion rate by rendering DNA more accessible. We test this hypothesis by comparing the TE density around germline- and soma-expressed genes in the euchromatin of Drosophila melanogaster. Because only insertions that occur in the germline are transmitted to the next generation, we predicted a higher density of TEs around germline-expressed genes than soma-expressed genes. We show that the rate of TE insertions is greater near germline- than soma-expressed genes. However, this effect is partly offset by stronger selection for genome compactness (against excess noncoding DNA) on germline-expressed genes. We also demonstrate that the local genome organization in clusters of coexpressed genes plays a fundamental role in the genomic distribution of TEs. Our analysis shows that-in addition to recombination rate-the distribution of TEs is shaped by the interaction of gene expression and genome organization. The important role of selection for compactness sheds a new light on the role of TEs in genome evolution. Instead of making genomes grow passively, TEs are controlled by the forces shaping genome compactness, most likely linked to the efficiency of gene expression or its complexity and possibly their interaction with mechanisms of TE silencing.
Collapse
Affiliation(s)
- Pierre Fontanillas
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Daniel L Hartl
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Max Reuter
- The Galton Laboratory, Department of Biology, University College London, London, United Kingdom
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
91
|
Lawson MJ, Zhang L. Housekeeping and tissue-specific genes differ in simple sequence repeats in the 5'-UTR region. Gene 2007; 407:54-62. [PMID: 17964742 DOI: 10.1016/j.gene.2007.09.017] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2007] [Revised: 09/25/2007] [Accepted: 09/26/2007] [Indexed: 12/22/2022]
Abstract
SSRs (simple sequence repeats) have been shown to have a variety of effects on an organism. In this study, we compared SSRs in housekeeping and tissue-specific genes in human and mouse, in terms of SSR types and distributions in different regions including 5'-UTRs, introns, coding exons, 3'-UTRs, and upstream regions. Among all these regions, SSRs in the 5'-UTR show the most distinction between housekeeping genes and tissue-specific genes in both densities and repeat types. Specifically, SSR densities in 5'-UTRs in housekeeping genes are about 1.7 times higher than those in tissue-specific genes, in contrast to the 0.8-1.2 times differences between the two classes of genes in other regions. Tri-SSRs in 5'-UTRs of housekeeping genes are more GC rich than those of tissue-specific genes and CGG, the dominant type of tri-SSR in 5'-UTR, accounts for 74-79% of the tri-SSRs in housekeeping genes, as compared to 42-57% in tissue-specific genes. 75% of the tri-SSRs in the 5'-UTR of housekeeping genes have 4-5 repeat units, versus the 86-90% in tissue-specific genes. Taken together, our results suggest that SSRs may have an effect on gene expression and may play an important role in contributing to the different expression profiles between housekeeping and tissue-specific genes.
Collapse
Affiliation(s)
- Mark J Lawson
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
92
|
Li SW, Feng L, Niu DK. Selection for the miniaturization of highly expressed genes. Biochem Biophys Res Commun 2007; 360:586-92. [PMID: 17610841 DOI: 10.1016/j.bbrc.2007.06.085] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2007] [Accepted: 06/18/2007] [Indexed: 11/29/2022]
Abstract
Most widely expressed genes are also highly expressed. Based on high or wide expression, different models were proposed to explain the small sizes of highly/widely expressed genes. We found that housekeeping genes are not more compact than narrowly expressed genes with similar expression levels, but compactness and expression level are correlated in housekeeping genes (except that highly expressed Arabidopsis HK genes have longer intron length). Meanwhile, we found evidence that genes with high functional/regulatory complexity do not have longer introns and longer proteins. The genome design hypothesis is thus not supported. Furthermore, we found that housekeeping genes are not more compact than the narrowly expressed somatic genes with similar average expression levels. Because housekeeping genes are expected to have much higher germline expression levels than narrowly expressed somatic genes, transcription-associated deletion bias is not supported. Selection of the compactness of highly expressed genes for economy is supported.
Collapse
Affiliation(s)
- Shu-Wei Li
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | | | | |
Collapse
|
93
|
Abstract
Compact genes contain short and few introns, and they are highly expressed in different animal genomes. Recently, it has been shown that in Oryza sativa and Arabidopsis thaliana, highly expressed genes tend to be least compact, containing long and many introns. It has been suggested that selection on genome organization may have acted differently in plants compared with animals. Gene expression can be estimated as the number of hits when comparing a gene sequence with publicly available expressed sequence tags. Here it is shown that in the haploid moss Physcomitrella pates, highly expressed genes contain shorter introns than genes with low expression levels. This study therefore supports the hypothesis that selection may strongly favour transcriptional efficiency at least in the haploid phase of plant life cycles. It is concluded that plants do not necessarily respond to other selection pressures than animals regarding genome structuring.
Collapse
Affiliation(s)
- H K Stenøien
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway.
| |
Collapse
|
94
|
Petit N, Casillas S, Ruiz A, Barbadilla A. Protein polymorphism is negatively correlated with conservation of intronic sequences and complexity of expression patterns in Drosophila melanogaster. J Mol Evol 2007; 64:511-8. [PMID: 17460807 DOI: 10.1007/s00239-006-0047-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2006] [Accepted: 01/17/2007] [Indexed: 10/23/2022]
Abstract
We report a significant negative correlation between nonsynonymous polymorphism and intron length in Drosophila melanogaster. This correlation is similar to that between protein divergence and intron length previously reported in Drosophila. We show that the relationship can be explained by the content of conserved noncoding sequences (CNS) within introns. In addition, genes with a high regulatory complexity and many genetic interactions also exhibit larger amounts of CNS within their introns and lower values of nonsynonymous polymorphism. The present study provides relevant evidence on the importance of intron content and expression patterns on the levels of coding polymorphism.
Collapse
Affiliation(s)
- Natalia Petit
- Departament de Genètica i Microbiologia, Facultat de Biociències, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain.
| | | | | | | |
Collapse
|
95
|
Freudenberg J, Fu YH, Ptácek LJ. Bioinformatic analysis of human CNS-expressed ion channels as candidates for episodic nervous system disorders. Neurogenetics 2007; 8:159-68. [PMID: 17333079 DOI: 10.1007/s10048-007-0082-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2006] [Accepted: 01/29/2007] [Indexed: 10/23/2022]
Abstract
As monogenic forms of episodic nervous system disorders are often caused by ion channel mutations, we looked for features of human central nervous system (CNS) expressed ion channels that further our understanding of those phenotypes. To this end, we compared human ion channels with other CNS-expressed genes, which we categorized according to the existence of transmembrane domains. When looking at the phylogenetic distribution of these genes, we observed an increased percentage of ion channels that exist in vertebrate genomes while missing in invertebrate genomes. Because we hypothesized that this pattern may relate to a more specific expression, we searched for characteristics of ion channels that indicate a tighter expression regulation. We found that ion channels have longer intron and protein sequences, features typical of genes with more specific expression. In addition, ion channels have increased human-rodent conservation around their transcription start site, as indicated by a higher fraction of conserved noncoding regions. This points to a high relevance of mutations that regulate ion channel expression. When we finally asked whether vertebrate-specific diversification is also displayed by non-ion channel genes with important roles in the CNS, we found a similar phylogenetic distribution. This concordant phylogenetic pattern suggests that vertebrate-specific adaptations may account for a large part of the shared genetic basis of episodic CNS disorders, including monogenic and genetically complex disease manifestations. Consequently, this phylogenetic pattern may contribute to the prioritization of candidate genes in human genetic studies of episodic CNS disorders.
Collapse
Affiliation(s)
- Jan Freudenberg
- Laboratories of Neurogenetics, Department of Neurology, Institute of Human Genetics, University of California San Francisco, San Francisco, CA 94158-2922, USA.
| | | | | |
Collapse
|
96
|
Nakaya HI, Amaral PP, Louro R, Lopes A, Fachel AA, Moreira YB, El-Jundi TA, da Silva AM, Reis EM, Verjovski-Almeida S. Genome mapping and expression analyses of human intronic noncoding RNAs reveal tissue-specific patterns and enrichment in genes related to regulation of transcription. Genome Biol 2007; 8:R43. [PMID: 17386095 PMCID: PMC1868932 DOI: 10.1186/gb-2007-8-3-r43] [Citation(s) in RCA: 155] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2006] [Revised: 01/17/2007] [Accepted: 03/26/2007] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND RNAs transcribed from intronic regions of genes are involved in a number of processes related to post-transcriptional control of gene expression. However, the complement of human genes in which introns are transcribed, and the number of intronic transcriptional units and their tissue expression patterns are not known. RESULTS A survey of mRNA and EST public databases revealed more than 55,000 totally intronic noncoding (TIN) RNAs transcribed from the introns of 74% of all unique RefSeq genes. Guided by this information, we designed an oligoarray platform containing sense and antisense probes for each of 7,135 randomly selected TIN transcripts plus the corresponding protein-coding genes. We identified exonic and intronic tissue-specific expression signatures for human liver, prostate and kidney. The most highly expressed antisense TIN RNAs were transcribed from introns of protein-coding genes significantly enriched (p = 0.002 to 0.022) in the 'Regulation of transcription' Gene Ontology category. RNA polymerase II inhibition resulted in increased expression of a fraction of intronic RNAs in cell cultures, suggesting that other RNA polymerases may be involved in their biosynthesis. Members of a subset of intronic and protein-coding signatures transcribed from the same genomic loci have correlated expression patterns, suggesting that intronic RNAs regulate the abundance or the pattern of exon usage in protein-coding messages. CONCLUSION We have identified diverse intronic RNA expression patterns, pointing to distinct regulatory roles. This gene-oriented approach, using a combined intron-exon oligoarray, should permit further comparative analysis of intronic transcription under various physiological and pathological conditions, thus advancing current knowledge about the biological functions of these noncoding RNAs.
Collapse
Affiliation(s)
- Helder I Nakaya
- Departamento de Bioquimica, Instituto de Quimica, Universidade de São Paulo, 05508-900 São Paulo, SP, Brazil
| | - Paulo P Amaral
- Departamento de Bioquimica, Instituto de Quimica, Universidade de São Paulo, 05508-900 São Paulo, SP, Brazil
| | - Rodrigo Louro
- Departamento de Bioquimica, Instituto de Quimica, Universidade de São Paulo, 05508-900 São Paulo, SP, Brazil
| | - André Lopes
- Departamento de Bioquimica, Instituto de Quimica, Universidade de São Paulo, 05508-900 São Paulo, SP, Brazil
| | - Angela A Fachel
- Departamento de Bioquimica, Instituto de Quimica, Universidade de São Paulo, 05508-900 São Paulo, SP, Brazil
| | - Yuri B Moreira
- Departamento de Bioquimica, Instituto de Quimica, Universidade de São Paulo, 05508-900 São Paulo, SP, Brazil
| | - Tarik A El-Jundi
- Departamento de Bioquimica, Instituto de Quimica, Universidade de São Paulo, 05508-900 São Paulo, SP, Brazil
| | - Aline M da Silva
- Departamento de Bioquimica, Instituto de Quimica, Universidade de São Paulo, 05508-900 São Paulo, SP, Brazil
| | - Eduardo M Reis
- Departamento de Bioquimica, Instituto de Quimica, Universidade de São Paulo, 05508-900 São Paulo, SP, Brazil
| | - Sergio Verjovski-Almeida
- Departamento de Bioquimica, Instituto de Quimica, Universidade de São Paulo, 05508-900 São Paulo, SP, Brazil
| |
Collapse
|
97
|
Keebaugh AC, Sullivan RT, Thomas JW. Gene duplication and inactivation in the HPRT gene family. Genomics 2007; 89:134-42. [PMID: 16928426 DOI: 10.1016/j.ygeno.2006.07.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2006] [Accepted: 07/07/2006] [Indexed: 01/05/2023]
Abstract
Hypoxanthine phosphoribosyltransferase (HPRT1) is a key enzyme in the purine salvage pathway, and mutations in HPRT1 cause Lesch-Nyhan disease. The studies described here utilized targeted comparative mapping and sequencing, in conjunction with database searches, to assemble a collection of 53 HPRT1 homologs from 28 vertebrates. Phylogenetic analysis of these homologs revealed that the HPRT gene family expanded as the result of ancient vertebrate-specific duplications and is composed of three groups consisting of HPRT1, phosphoribosyl transferase domain containing protein 1 (PRTFDC1), and HPRT1L genes. All members of the vertebrate HPRT gene family share a common intron-exon structure; however, we have found that the three gene groups have distinct rates of evolution and potentially divergent functions. Finally, we report our finding that PRTFDC1 was recently inactivated in the mouse lineage and propose the loss of function of this gene as a candidate genetic basis for the phenotypic disparity between HPRT-deficient humans and mice.
Collapse
Affiliation(s)
- Alaine C Keebaugh
- Department of Human Genetics, School of Medicine, Atlanta, GA 30322, USA
| | | | | |
Collapse
|
98
|
Walther D, Brunnemann R, Selbig J. The regulatory code for transcriptional response diversity and its relation to genome structural properties in A. thaliana. PLoS Genet 2006; 3:e11. [PMID: 17291162 PMCID: PMC1796623 DOI: 10.1371/journal.pgen.0030011] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2006] [Accepted: 12/06/2006] [Indexed: 11/19/2022] Open
Abstract
Regulation of gene expression via specific cis-regulatory promoter elements has evolved in cellular organisms as a major adaptive mechanism to respond to environmental change. Assuming a simple model of transcriptional regulation, genes that are differentially expressed in response to a large number of different external stimuli should harbor more distinct regulatory elements in their upstream regions than do genes that only respond to few environmental challenges. We tested this hypothesis in Arabidopsis thaliana using the compendium of gene expression profiling data available in AtGenExpress and known cis-element motifs mapped to upstream gene promoter regions and studied the relation of the observed breadth of differential gene expression response with several fundamental genome architectural properties. We observed highly significant positive correlations between the density of cis-elements in upstream regions and the number of conditions in which a gene was differentially regulated. The correlation was most pronounced in regions immediately upstream of the transcription start sites. Multistimuli response genes were observed to be associated with significantly longer upstream intergenic regions, retain more paralogs in the Arabidopsis genome, are shorter, have fewer introns, and are more likely to contain TATA-box motifs in their promoters. In abiotic stress time series data, multistimuli response genes were found to be overrepresented among early-responding genes. Genes involved in the regulation of transcription, stress response, and signaling processes were observed to possess the greatest regulatory capacity. Our results suggest that greater gene expression regulatory complexity appears to be encoded by an increased density of cis-regulatory elements and provide further evidence for an evolutionary adaptation of the regulatory code at the genomic layout level. Larger intergenic spaces preceding multistimuli response genes may have evolved to allow greater regulatory gene expression potential. The induction or repression of specific genes has evolved in living organisms as a mechanism to respond to environmental changes. At the molecular level, this process is mediated via molecular switches, so-called regulatory elements, generally located in the genomic region adjacent to the gene they control, the gene promoter. Upon environmental change, specific proteins bind to such regulatory elements, thereby turning on or off the associated genes. As this molecular response is often specific to the external signal, genes that respond to a large number of different external stimuli should harbor more distinct regulatory elements in their promoter regions than should genes responding only to few environmental challenges. In analyzing data for the plant Arabidopsis thaliana, we observed that indeed an increased number of regulatory elements is associated with a broader range of responses. Several other genome structural properties, such as gene size, the occurrence of similar genes in the Arabidopsis genome, and the distance between genes, were also observed to be correlated with a broader breadth of response. The results suggest that greater regulatory complexity appears encoded by an increased density of regulatory elements and provide further evidence for an evolutionary adaptation of the regulatory code at the genomic architectural level.
Collapse
Affiliation(s)
- Dirk Walther
- Max Planck Institute for Molecular Plant Physiology, Potsdam, Germany.
| | | | | |
Collapse
|
99
|
Abstract
Research into the origins of introns is at a critical juncture in the resolution of theories on the evolution of early life (which came first, RNA or DNA?), the identity of LUCA (the last universal common ancestor, was it prokaryotic- or eukaryotic-like?), and the significance of noncoding nucleotide variation. One early notion was that introns would have evolved as a component of an efficient mechanism for the origin of genes. But alternative theories emerged as well. From the debate between the "introns-early" and "introns-late" theories came the proposal that introns arose before the origin of genetically encoded proteins and DNA, and the more recent "introns-first" theory, which postulates the presence of introns at that early evolutionary stage from a reconstruction of the "RNA world." Here we review seminal and recent ideas about intron origins. Recent discoveries about the patterns and causes of intron evolution make this one of the most hotly debated and exciting topics in molecular evolutionary biology today.
Collapse
Affiliation(s)
- Francisco Rodríguez-Trelles
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525, USA.
| | | | | |
Collapse
|
100
|
Pozzoli U, Menozzi G, Comi GP, Cagliani R, Bresolin N, Sironi M. Intron size in mammals: complexity comes to terms with economy. Trends Genet 2006; 23:20-4. [PMID: 17070957 DOI: 10.1016/j.tig.2006.10.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2006] [Revised: 09/18/2006] [Accepted: 10/18/2006] [Indexed: 11/23/2022]
Abstract
Different and contrasting models have been proposed to explain intron size evolution in mammals. Here, we demonstrate that intron and intergenic size per se has no adaptive role in gene expression regulation but reflects the need to preserve conserved intronic elements. Although the amount of non-coding functional elements explains the within-genome size variation of intergenic spacers, we show that an additional, additive pressure has been acting on highly expressed introns to reduce the cost of their transcription.
Collapse
Affiliation(s)
- Uberto Pozzoli
- Bioinformatic Laboratory, Scientific Institute IRCCS E. Medea, Via don L. Monza 20, 23842 Bosisio Parini (LC), Italy
| | | | | | | | | | | |
Collapse
|