1
|
Identification of Human Global, Tissue and Within-Tissue Cell-Specific Stably Expressed Genes at Single-Cell Resolution. Int J Mol Sci 2022; 23:ijms231810214. [PMID: 36142130 PMCID: PMC9499411 DOI: 10.3390/ijms231810214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/12/2022] [Accepted: 08/30/2022] [Indexed: 11/17/2022] Open
Abstract
Stably Expressed Genes (SEGs) are a set of genes with invariant expression. Identification of SEGs, especially among both healthy and diseased tissues, is of clinical relevance to enable more accurate data integration, gene expression comparison and biomarker detection. However, it remains unclear how many global SEGs there are, whether there are development-, tissue- or cell-specific SEGs, and whether diseases can influence their expression. In this research, we systematically investigate human SEGs at single-cell level and observe their development-, tissue- and cell-specificity, and expression stability under various diseased states. A hierarchical strategy is proposed to identify a list of 408 spatial-temporal SEGs. Development-specific SEGs are also identified, with adult tissue-specific SEGs enriched with the function of immune processes and fetal tissue-specific SEGs enriched in RNA splicing activities. Cells of the same type within different tissues tend to show similar SEG composition profiles. Diseases or stresses do not show influence on the expression stableness of SEGs in various tissues. In addition to serving as markers and internal references for data normalization and integration, we examine another possible application of SEGs, i.e., being applied for cell decomposition. The deconvolution model could accurately predict the fractions of major immune cells in multiple independent testing datasets of peripheral blood samples. The study provides a reliable list of human SEGs at the single-cell level, facilitates the understanding on the property of SEGs, and extends their possible applications.
Collapse
|
2
|
Mukherjee D, Saha D, Acharya D, Mukherjee A, Ghosh TC. Interplay between gene expression and gene architecture as a consequence of gene and genome duplications: evidence from metabolic genes of Arabidopsis thaliana. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2022; 28:1091-1108. [PMID: 35722515 PMCID: PMC9203644 DOI: 10.1007/s12298-022-01188-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/16/2022] [Accepted: 05/18/2022] [Indexed: 05/03/2023]
Abstract
Gene and genome duplications have been widespread during the evolution of flowering plant which resulted in the increment of biological complexity as well as creation of plasticity of a genome helping the species to adapt to changing environments. Duplicated genes with higher evolutionary rates can act as a mechanism of generating novel functions in secondary metabolism. In this study, we explored duplication as a potential factor governing the expression heterogeneity and gene architecture of Primary Metabolic Genes (PMGs) and Secondary Metabolic Genes (SMGs) of Arabidopsis thaliana. It is remarkable that different types of duplication processes controlled gene expression and tissue specificity differently in PMGs and SMGs. A complex relationship exists between gene architecture and expression patterns of primary and secondary metabolic genes. Our study reflects, expression heterogeneity and gene structure variation of primary and secondary metabolism in Arabidopsis thaliana are partly results of duplication events of different origins. Our study suggests that duplication has differential effect on PMGs and SMGs regarding expression pattern by controlling gene structure, epigenetic modifications, multifunctionality and subcellular compartmentalization. This study provides an insight into the evolution of metabolism in plants in the light of gene and genome scale duplication. Supplementary Information The online version contains supplementary material available at 10.1007/s12298-022-01188-2.
Collapse
Affiliation(s)
- Dola Mukherjee
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata, 700 054 India
| | - Deeya Saha
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata, 700 054 India
| | - Debarun Acharya
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata, 700 054 India
| | - Ashutosh Mukherjee
- Department of Botany, Vivekananda College, 269, Diamond Harbour Road, Thakurpukur, Kolkata, West Bengal 700063 India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata, 700 054 India
| |
Collapse
|
3
|
Epigenomic signatures on paralogous genes reveal underappreciated universality of active histone codes adopted across animals. Comput Struct Biotechnol J 2022; 20:353-367. [PMID: 35035788 PMCID: PMC8741409 DOI: 10.1016/j.csbj.2021.12.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 12/15/2021] [Accepted: 12/18/2021] [Indexed: 11/21/2022] Open
|
4
|
Telonis AG, Rigoutsos I. The transcriptional trajectories of pluripotency and differentiation comprise genes with antithetical architecture and repetitive-element content. BMC Biol 2021; 19:60. [PMID: 33765992 PMCID: PMC7995781 DOI: 10.1186/s12915-020-00928-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 11/18/2020] [Indexed: 12/12/2022] Open
Abstract
Background Extensive molecular differences exist between proliferative and differentiated cells. Here, we conduct a meta-analysis of publicly available transcriptomic datasets from preimplantation and differentiation stages examining the architectural properties and content of genes whose abundance changes significantly across developmental time points. Results Analysis of preimplantation embryos from human and mouse showed that short genes whose introns are enriched in Alu (human) and B (mouse) elements, respectively, have higher abundance in the blastocyst compared to the zygote. These highly expressed genes encode ribosomal proteins or metabolic enzymes. On the other hand, long genes whose introns are depleted in repetitive elements have lower abundance in the blastocyst and include genes from signaling pathways. Additionally, the sequences of the genes that are differentially expressed between the blastocyst and the zygote contain distinct collections of pyknon motifs that differ between up- and down-regulated genes. Further examination of the genes that participate in the stem cell-specific protein interaction network shows that their introns are short and enriched in Alu (human) and B (mouse) elements. As organogenesis progresses, in both human and mouse, we find that the primarily short and repeat-rich expressed genes make way for primarily longer, repeat-poor genes. With that in mind, we used a machine learning-based approach to identify gene signatures able to classify human adult tissues: we find that the most discriminatory genes comprising these signatures have long introns that are repeat-poor and include transcription factors and signaling-cascade genes. The introns of widely expressed genes across human tissues, on the other hand, are short and repeat-rich, and coincide with those with the highest expression at the blastocyst stage. Conclusions Protein-coding genes that are characteristic of each trajectory, i.e., proliferation/pluripotency or differentiation, exhibit antithetical biases in their intronic and exonic lengths and in their repetitive-element content. While the respective human and mouse gene signatures are functionally and evolutionarily conserved, their introns and exons are enriched or depleted in organism-specific repetitive elements. We posit that these organism-specific repetitive sequences found in exons and introns are used to effect the corresponding genes’ regulation. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-020-00928-8.
Collapse
Affiliation(s)
- Aristeidis G Telonis
- Computational Medicine Center, Sidney Kimmel College of Medicine, Thomas Jefferson University, 1020 Locust Street, Suite M81, Philadelphia, PA, 19107, USA. .,Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, 33136, USA.
| | - Isidore Rigoutsos
- Computational Medicine Center, Sidney Kimmel College of Medicine, Thomas Jefferson University, 1020 Locust Street, Suite M81, Philadelphia, PA, 19107, USA.
| |
Collapse
|
5
|
Ma H, Han YC, Palti Y, Gao G, Liu S, Palmquist DE, Wiens GD, Shepherd BS. Structure and regulation of the NK-lysin (1-4) and NK-lysin like (a and b) antimicrobial genes in rainbow trout (Oncorhynchus mykiss). DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2021; 116:103961. [PMID: 33301795 DOI: 10.1016/j.dci.2020.103961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/03/2020] [Accepted: 12/03/2020] [Indexed: 06/12/2023]
Abstract
Nk-lysin (Nkl), an antimicrobial peptide (AMP) product of natural killer cells and cytotoxic T cells in mammals, has recently been characterized in a number of finfish species. In this study, we identified six genes with sequence homology to Nkl and characterized their patterns of mRNA expression and abundances in rainbow trout (Oncorhynchus mykiss). The cDNA sequences for the six Nkls encoded precursor peptides of 128-133 aa in length, and mature peptides of 109-111 aa in length. Genomic DNA of the nkl1-4 genes consisted of five exons and four introns, whereas the nkl-like a & b genes consisted of four exons and three introns. Chromosomal locations of these peptides show that nkl1 was located on chromosome arm 25q, whereas the other five nkl genes were clustered on chromosome arm 19q. Phylogenetic analysis revealed a conserved structure of Nkls among the teleosts and further protein sequence analyses suggests that all six nkl genes fall within the Nkl sub-family of the Saposin family of proteins. Patterns of tissue-specific mRNA expression were asymmetric among the six trout Nkl homologues, with nkl1, nkl3, and nkl-like a & b occurring in immune competent organs such as spleen, gill, intestine and kidney, as well as pineal gland, brain and oocytes. However, nkl2 and nkl4, showed primary abundances in brain, pineal gland and oocyte tissues. Using mRNA sequencing, in whole-body pools of juvenile trout fry (1 g bw) exposed to Flavobacterium psychrophilum infection, we observed modest up-regulation (2-3 fold) of five (nkl 2-4 and nkl-like a & b) of the six nkl mRNAs over the five-day post-challenge time-course. However, no upregulation could be recorded in spleen tissue measured by qPCR in juvenile trout (270 g bw). Using mRNA sequencing again, mRNA abundances were determined in gill of juvenile trout (~57.7 g bw) exposed to various aquaculture stressors. The results indicated that all six nkls (nkl1-4 and nkl-like a and nkl-like b) were downregulated when exposed to high temperature, and that nkl1 was significantly downregulated following salinity challenge. Overall, these newly characterized AMPs may contribute to host innate immunity as they are modulated following pathogen challenge and by physiological stressors.
Collapse
Affiliation(s)
- Hao Ma
- USDA-ARS-NADC-Ruminant Diseases and Immunology Research Unit, 1920 Dayton Ave, Ames, IA, 50010, USA; USDA-ARS-National Center for Cool and Cold Water Aquaculture, 11861 Leetown Rd., Leetown, WV, 25430, USA
| | - Yueh-Chiang Han
- USDA-ARS-School of Freshwater Sciences, 600 E. Greenfield Ave., Milwaukee, WI, 53204, USA
| | - Yniv Palti
- USDA-ARS-National Center for Cool and Cold Water Aquaculture, 11861 Leetown Rd., Leetown, WV, 25430, USA
| | - Guangtu Gao
- USDA-ARS-National Center for Cool and Cold Water Aquaculture, 11861 Leetown Rd., Leetown, WV, 25430, USA
| | - Sixin Liu
- USDA-ARS-National Center for Cool and Cold Water Aquaculture, 11861 Leetown Rd., Leetown, WV, 25430, USA
| | - Debra E Palmquist
- USDA/ARS-Midwest Area Statistics Unit, 1815 N. Street, Peoria, IL, 61604, USA
| | - Gregory D Wiens
- USDA-ARS-National Center for Cool and Cold Water Aquaculture, 11861 Leetown Rd., Leetown, WV, 25430, USA
| | - Brian S Shepherd
- USDA-ARS-School of Freshwater Sciences, 600 E. Greenfield Ave., Milwaukee, WI, 53204, USA.
| |
Collapse
|
6
|
von der Heyde EL, Hallmann A. Babo1, formerly Vop1 and Cop1/2, is no eyespot photoreceptor but a basal body protein illuminating cell division in Volvox carteri. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 102:276-298. [PMID: 31778231 DOI: 10.1111/tpj.14623] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 10/29/2019] [Accepted: 11/19/2019] [Indexed: 06/10/2023]
Abstract
In photosynthetic organisms many processes are light dependent and sensing of light requires light-sensitive proteins. The supposed eyespot photoreceptor protein Babo1 (formerly Vop1) has previously been classified as an opsin due to the capacity for binding retinal. Here, we analyze Babo1 and provide evidence that it is no opsin. Due to the localization at the basal bodies, the former Vop1 and Cop1/2 proteins were renamed V.c. Babo1 and C.r. Babo1. We reveal a large family of more than 60 Babo1-related proteins from a wide range of species. The detailed subcellular localization of fluorescence-tagged Babo1 shows that it accumulates at the basal apparatus. More precisely, it is located predominantly at the basal bodies and to a lesser extent at the four strands of rootlet microtubules. We trace Babo1 during basal body separation and cell division. Dynamic structural rearrangements of Babo1 particularly occur right before the first cell division. In four-celled embryos Babo1 was exclusively found at the oldest basal bodies of the embryo and on the corresponding d-roots. The unequal distribution of Babo1 in four-celled embryos could be an integral part of a geometrical system in early embryogenesis, which establishes the anterior-posterior polarity and influences the spatial arrangement of all embryonic structures and characteristics. Due to its retinal-binding capacity, Babo1 could also be responsible for the unequal distribution of retinoids, knowing that such concentration gradients of retinoids can be essential for the correct patterning during embryogenesis of more complex organisms. Thus, our findings push the Babo1 research in another direction.
Collapse
Affiliation(s)
- Eva L von der Heyde
- Department of Cellular and Developmental Biology of Plants, University of Bielefeld, Universitätsstr 25, 33615, Bielefeld, Germany
| | - Armin Hallmann
- Department of Cellular and Developmental Biology of Plants, University of Bielefeld, Universitätsstr 25, 33615, Bielefeld, Germany
| |
Collapse
|
7
|
Das S, Bansal M. Variation of gene expression in plants is influenced by gene architecture and structural properties of promoters. PLoS One 2019; 14:e0212678. [PMID: 30908494 PMCID: PMC6433290 DOI: 10.1371/journal.pone.0212678] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2018] [Accepted: 02/07/2019] [Indexed: 12/03/2022] Open
Abstract
In higher eukaryotes, gene architecture and structural properties of promoters have emerged as significant factors influencing variation in number of transcripts (expression level) and specificity of gene expression in a tissue (expression breadth), which eventually shape the phenotype. In this study, transcriptome data of different tissue types at various developmental stages of A. thaliana, O. sativa, S. bicolor and Z. mays have been used to understand the relationship between properties of gene components and its expression. Our findings indicate that in plants, among all gene architecture and structural properties of promoters, compactness of genes in terms of intron content is significantly linked to gene expression level and breadth, whereas in human an exactly opposite scenario is seen. In plants, for the first time we have carried out a quantitative estimation of effect of a particular trait on expression level and breadth, by using multiple regression analysis and it confirms that intron content of primary transcript (as %) is a powerful determinant of expression breadth. Similarly, further regression analysis revealed that among structural properties of the promoters, stability is negatively linked to expression breadth, while DNase1 sensitivity strongly governs gene expression breadth in monocots and gene expression level in dicots. In addition, promoter regions of tissue specific genes are found to be enriched with TATA box and Y-patch motifs. Finally, multi copy orthologous genes in plants are found to be longer, highly regulated and tissue specific.
Collapse
Affiliation(s)
- Sanjukta Das
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| |
Collapse
|
8
|
Huang X, Li S, Zhan A. Genome-Wide Identification and Evaluation of New Reference Genes for Gene Expression Analysis Under Temperature and Salinity Stresses in Ciona savignyi. Front Genet 2019; 10:71. [PMID: 30809246 PMCID: PMC6380166 DOI: 10.3389/fgene.2019.00071] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 01/28/2019] [Indexed: 01/14/2023] Open
Abstract
Rapid adaptation/accommodation to changing environments largely contributes to maximal survival of invaders during biological invasions, usually leading to success in crossing multiple barriers and finally in varied environments in recipient habitats. Gene expression is one of the most important and rapid ways during responses to environmental stresses. Selection of proper reference genes is the crucial prerequisite for gene expression analysis using the common approach, real-time quantitative PCR (RT-qPCR). Here we identified eight candidate novel reference genes from the RNA-Seq data in an invasive model ascidian Ciona savignyi under temperature and salinity stresses. Subsequently, the expression stability of these eight novel reference genes, as well as other six traditionally used reference genes, was evaluated using RT-qPCR and comprehensive tool RefFinder. Under the temperature stress, two traditional reference genes, ribosomal proteins S15 and L17 (RPS15, RPL17), and one novel gene Ras homolog A (RhoA), were recommended as the top three stable genes, which can be used to normalize target genes with a high and moderate expression level, respectively. Under the salinity stress, transmembrane 9 superfamily member (TMN), MOB kinase activator 1A-like gene (MOB) and ubiquitin-conjugating enzyme (UBQ2) were suggested as the top three stable genes. On the other hand, several commonly used reference genes such as α-tubulin (TubA), β-tubulin (TubB) and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) showed unstable expressions, thus these genes should not be used as internal controls for gene expression analysis. We also tested the expression level of an important stress response gene, large proline-rich protein bag6-like gene (BAG) using different reference genes. As expected, we observed different results and conclusions when using different normalization methods, thus suggesting the importance of selection of proper reference genes and associated normalization methods. Our results provide a valuable reference gene resource for the normalization of gene expression in the study of environmental adaptation/accommodation during biological invasions using C. savignyi as a model.
Collapse
Affiliation(s)
- Xuena Huang
- Key Laboratory of Environmental Biotechnology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing, China
| | - Shiguo Li
- Key Laboratory of Environmental Biotechnology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, China
| | - Aibin Zhan
- Key Laboratory of Environmental Biotechnology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
9
|
Raveh A, Margaliot M, Sontag ED, Tuller T. A model for competition for ribosomes in the cell. J R Soc Interface 2016; 13:rsif.2015.1062. [PMID: 26962028 DOI: 10.1098/rsif.2015.1062] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A single mammalian cell includes an order of 10(4)-10(5) mRNA molecules and as many as 10(5)-10(6) ribosomes. Large-scale simultaneous mRNA translation induces correlations between the mRNA molecules, as they all compete for the finite pool of available ribosomes. This has important implications for the cell's functioning and evolution. Developing a better understanding of the intricate correlations between these simultaneous processes, rather than focusing on the translation of a single isolated transcript, should help in gaining a better understanding of mRNA translation regulation and the way elongation rates affect organismal fitness. A model of simultaneous translation is specifically important when dealing with highly expressed genes, as these consume more resources. In addition, such a model can lead to more accurate predictions that are needed in the interconnection of translational modules in synthetic biology. We develop and analyse a general dynamical model for large-scale simultaneous mRNA translation and competition for ribosomes. This is based on combining several ribosome flow models (RFMs) interconnected via a pool of free ribosomes. We use this model to explore the interactions between the various mRNA molecules and ribosomes at steady state. We show that the compound system always converges to a steady state and that it always entrains or phase locks to periodically time-varying transition rates in any of the mRNA molecules. We then study the effect of changing the transition rates in one mRNA molecule on the steady-state translation rates of the other mRNAs that results from the competition for ribosomes. We show that increasing any of the codon translation rates in a specific mRNA molecule yields a local effect, an increase in the translation rate of this mRNA, and also a global effect, the translation rates in the other mRNA molecules all increase or all decrease. These results suggest that the effect of codon decoding rates of endogenous and heterologous mRNAs on protein production is more complicated than previously thought. In addition, we show that increasing the length of an mRNA molecule decreases the production rate of all the mRNAs.
Collapse
Affiliation(s)
- Alon Raveh
- School of Electrical Engineering, Tel-Aviv University, Tel-Aviv 69978, Israel
| | - Michael Margaliot
- School of Electrical Engineering and the Sagol School of Neuroscience, Tel-Aviv University, Tel-Aviv 69978, Israel
| | - Eduardo D Sontag
- Department of Mathematics and the Center for Quantitative Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Tamir Tuller
- Department of Biomedical Engineering and the Sagol School of Neuroscience, Tel-Aviv University, Tel-Aviv 69978, Israel
| |
Collapse
|
10
|
Yang L, Wang S, Zhou M, Chen X, Zuo Y, Sun D, Lv Y. Comparative analysis of housekeeping and tissue-selective genes in human based on network topologies and biological properties. Mol Genet Genomics 2016; 291:1227-41. [PMID: 26897376 DOI: 10.1007/s00438-016-1178-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 01/26/2016] [Indexed: 01/14/2023]
Abstract
Housekeeping genes are genes that are turned on most of the time in almost every tissue to maintain cellular functions. Tissue-selective genes are predominantly expressed in one or a few biologically relevant tissue types. Benefitting from the massive gene expression microarray data obtained over the past decades, the properties of housekeeping and tissue-selective genes can now be investigated on a large-scale manner. In this study, we analyzed the topological properties of housekeeping and tissue-selective genes in the protein-protein interaction (PPI) network. Furthermore, we compared the biological properties and amino acid usage between these two gene groups. The results indicated that there were significant differences in topological properties between housekeeping and tissue-selective genes in the PPI network, and housekeeping genes had higher centrality properties and may play important roles in the complex biological network environment. We also found that there were significant differences in multiple biological properties and many amino acid compositions. The functional genes enrichment and subcellular localizations analysis was also performed to investigate the characterization of housekeeping and tissue-selective genes. The results indicated that the two gene groups showed significant different enrichment in drug targets, disease genes and toxin targets, and located in different subcellular localizations. At last, the discriminations between the properties of two gene groups were measured by the F-score, and expression stage had the most discriminative index in all properties. These findings may elucidate the biological mechanisms for understanding housekeeping and tissue-selective genes and may contribute to better annotate housekeeping and tissue-selective genes in other organisms.
Collapse
Affiliation(s)
- Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Xiaowen Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The National Research Center for Animal Transgenic Biotechnology, Inner Mongolia University, Hohhot, 010021, China
| | - Dianjun Sun
- Center for Endemic Disease Control, Chinese Center for Disease Control and Prevention, Harbin Medical University, Harbin, 150081, China.
| | - Yingli Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
11
|
Medvedeva IV, Demenkov PS, Ivanisenko VA. Computer analysis of protein functional sites projection on exon structure of genes in Metazoa. BMC Genomics 2015; 16 Suppl 13:S2. [PMID: 26693737 PMCID: PMC4686782 DOI: 10.1186/1471-2164-16-s13-s2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. RESULTS One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. CONCLUSIONS These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity.
Collapse
|
12
|
Pingault L, Choulet F, Alberti A, Glover N, Wincker P, Feuillet C, Paux E. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome. Genome Biol 2015; 16:29. [PMID: 25853487 PMCID: PMC4355351 DOI: 10.1186/s13059-015-0601-9] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 01/28/2015] [Indexed: 12/19/2022] Open
Abstract
Background Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. Results By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Conclusions Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0601-9) contains supplementary material, which is available to authorized users.
Collapse
|
13
|
Chaurasia A, Tarallo A, Bernà L, Yagi M, Agnisola C, D’Onofrio G. Length and GC content variability of introns among teleostean genomes in the light of the metabolic rate hypothesis. PLoS One 2014; 9:e103889. [PMID: 25093416 PMCID: PMC4122358 DOI: 10.1371/journal.pone.0103889] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 07/07/2014] [Indexed: 01/30/2023] Open
Abstract
A comparative analysis of five teleostean genomes, namely zebrafish, medaka, three-spine stickleback, fugu and pufferfish was performed with the aim to highlight the nature of the forces driving both length and base composition of introns (i.e., bpi and GCi). An inter-genome approach using orthologous intronic sequences was carried out, analyzing independently both variables in pairwise comparisons. An average length shortening of introns was observed at increasing average GCi values. The result was not affected by masking transposable and repetitive elements harbored in the intronic sequences. The routine metabolic rate (mass specific temperature-corrected using the Boltzmann's factor) was measured for each species. A significant correlation held between average differences of metabolic rate, length and GC content, while environmental temperature of fish habitat was not correlated with bpi and GCi. Analyzing the concomitant effect of both variables, i.e., bpi and GCi, at increasing genomic GC content, a decrease of bpi and an increase of GCi was observed for the significant majority of the intronic sequences (from ∼40% to ∼90%, in each pairwise comparison). The opposite event, concomitant increase of bpi and decrease of GCi, was counter selected (from <1% to ∼10%, in each pairwise comparison). The results further support the hypothesis that the metabolic rate plays a key role in shaping genome architecture and evolution of vertebrate genomes.
Collapse
Affiliation(s)
- Ankita Chaurasia
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- Campus UAB - CRAG Bellaterra - Cerdanyola del Vallès, Barcelona, Spain
| | - Andrea Tarallo
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
| | - Luisa Bernà
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- Molecular Biology Unit, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Mitsuharu Yagi
- Faculty of Fisheries, Nagasaki University, Bunkyo, Nagasaki, Japan
| | - Claudio Agnisola
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
| | - Giuseppe D’Onofrio
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- * E-mail:
| |
Collapse
|
14
|
Warnefors M, Kaessmann H. Evolution of the correlation between expression divergence and protein divergence in mammals. Genome Biol Evol 2013; 5:1324-35. [PMID: 23781097 PMCID: PMC3730345 DOI: 10.1093/gbe/evt093] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Divergence of protein sequences and gene expression patterns are two fundamental mechanisms that generate organismal diversity. Here, we have used genome and transcriptome data from eight mammals and one bird to study the positive correlation of these two processes throughout mammalian evolution. We demonstrate that the correlation is stable over time and most pronounced in neural tissues, which indicates that it is the result of strong negative selection. The correlation is not driven by genes with specific functions and may instead best be viewed as an evolutionary default state, which can nevertheless be evaded by certain gene types. In particular, genes with developmental and neural functions are skewed toward changes in gene expression, consistent with selection against pleiotropic effects associated with changes in protein sequences. Surprisingly, we find that the correlation between expression divergence and protein divergence is not explained by between-gene variation in expression level, tissue specificity, protein connectivity, or other investigated gene characteristics, suggesting that it arises independently of these gene traits. The selective constraints on protein sequences and gene expression patterns also fluctuate in a coordinate manner across phylogenetic branches: We find that gene-specific changes in the rate of protein evolution in a specific mammalian lineage tend to be accompanied by similar changes in the rate of expression evolution. Taken together, our findings highlight many new aspects of the correlation between protein divergence and expression divergence, and attest to its role as a fundamental property of mammalian genome evolution.
Collapse
Affiliation(s)
- Maria Warnefors
- Center for Integrative Genomics, University of Lausanne, Switzerland.
| | | |
Collapse
|
15
|
Yang YF, Zhu T, Niu DK. Association of intron loss with high mutation rate in Arabidopsis: implications for genome size evolution. Genome Biol Evol 2013; 5:723-33. [PMID: 23516254 PMCID: PMC4104619 DOI: 10.1093/gbe/evt043] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Despite the prevalence of intron losses during eukaryotic evolution, the selective forces acting on them have not been extensively explored. Arabidopsis thaliana lost half of its genome and experienced an elevated rate of intron loss after diverging from A. lyrata. The selective force for genome reduction was suggested to have driven the intron loss. However, the evolutionary mechanism of genome reduction is still a matter of debate. In this study, we found that intron-lost genes have high synonymous substitution rates. Assuming that differences in mutability among different introns are conserved among closely related species, we used the nucleotide substitution rate between orthologous introns in other species as the proxy of the mutation rate of Arabidopsis introns, either lost or extant. The lost introns were found to have higher mutation rates than extant introns. At the genome-wide level, A. thaliana has a higher mutation rate than A. lyrata, which correlates with the higher rate of intron loss and rapid genome reduction of A. thaliana. Our results indicate that selection to minimize mutational hazards might be the selective force for intron loss, and possibly also for genome reduction, in the evolution of A. thaliana. Small genome size and lower genome-wide intron density were widely reported to be correlated with phenotypic features, such as high metabolic rates and rapid growth. We argue that the mutational-hazard hypothesis is compatible with these correlations, by suggesting that selection for rapid growth might indirectly increase mutational hazards.
Collapse
Affiliation(s)
- Yu-Fei Yang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, China
| | | | | |
Collapse
|
16
|
Rao YS, Wang ZF, Chai XW, Nie QH, Zhang XQ. Relationship between 5′ UTR length and gene expression pattern in chicken. Genetica 2013; 141:311-8. [DOI: 10.1007/s10709-013-9730-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2013] [Accepted: 08/11/2013] [Indexed: 11/29/2022]
|
17
|
Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet 2013; 29:569-74. [PMID: 23810203 DOI: 10.1016/j.tig.2013.05.010] [Citation(s) in RCA: 803] [Impact Index Per Article: 73.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 05/06/2013] [Accepted: 05/30/2013] [Indexed: 10/26/2022]
Abstract
Housekeeping genes are involved in basic cell maintenance and, therefore, are expected to maintain constant expression levels in all cells and conditions. Identification of these genes facilitates exposure of the underlying cellular infrastructure and increases understanding of various structural genomic features. In addition, housekeeping genes are instrumental for calibration in many biotechnological applications and genomic studies. Advances in our ability to measure RNA expression have resulted in a gradual increase in the number of identified housekeeping genes. Here, we describe housekeeping gene detection in the era of massive parallel sequencing and RNA-seq. We emphasize the importance of expression at a constant level and provide a list of 3804 human genes that are expressed uniformly across a panel of tissues. Several exceptionally uniform genes are singled out for future experimental use, such as RT-PCR control genes. Finally, we discuss both ways in which current technology can meet some of past obstacles encountered, and several as yet unmet challenges.
Collapse
Affiliation(s)
- Eli Eisenberg
- Raymond and Beverly Sackler School of Physics and Astronomy, Tel-Aviv University, Tel Aviv 69978, Israel.
| | | |
Collapse
|
18
|
Catania F, Lynch M. A simple model to explain evolutionary trends of eukaryotic gene architecture and expression: how competition between splicing and cleavage/polyadenylation factors may affect gene expression and splice-site recognition in eukaryotes. Bioessays 2013; 35:561-70. [PMID: 23568225 DOI: 10.1002/bies.201200127] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Enormous phylogenetic variation exists in the number and sizes of introns in protein-coding genes. Although some consideration has been given to the underlying role of the population-genetic environment in defining such patterns, the influence of the intracellular environment remains virtually unexplored. Drawing from observations on interactions between co-transcriptional processes involved in splicing and mRNA 3'-end formation, a mechanistic model is proposed for splice-site recognition that challenges the commonly accepted intron- and exon-definition models. Under the suggested model, splicing factors that outcompete 3'-end processing factors for access to intronic binding sites concurrently favor the recruitment of 3'-end processing factors at the pre-mRNA tail. This hypothesis sheds new light on observations such as the intron-mediated enhancement of gene expression and the negative correlation between intron length and levels of gene expression.
Collapse
Affiliation(s)
- Francesco Catania
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| | | |
Collapse
|
19
|
Frequency of intron loss correlates with processed pseudogene abundance: a novel strategy to test the reverse transcriptase model of intron loss. BMC Biol 2013; 11:23. [PMID: 23497167 PMCID: PMC3652778 DOI: 10.1186/1741-7007-11-23] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Accepted: 03/05/2013] [Indexed: 11/23/2022] Open
Abstract
Background Although intron loss in evolution has been described, the mechanism involved is still unclear. Three models have been proposed, the reverse transcriptase (RT) model, genomic deletion model and double-strand-break repair model. The RT model, also termed mRNA-mediated intron loss, suggests that cDNA molecules reverse transcribed from spliced mRNA recombine with genomic DNA causing intron loss. Many studies have attempted to test this model based on its predictions, such as simultaneous loss of adjacent introns, 3'-side bias of intron loss, and germline expression of intron-lost genes. Evidence either supporting or opposing the model has been reported. The mechanism of intron loss proposed in the RT model shares the process of reverse transcription with the formation of processed pseudogenes. If the RT model is correct, genes that have produced more processed pseudogenes are more likely to undergo intron loss. Results In the present study, we observed that the frequency of intron loss is correlated with processed pseudogene abundance by analyzing a new dataset of intron loss obtained in mice and rats. Furthermore, we found that mRNA molecules of intron-lost genes are mostly translated on free cytoplasmic ribosomes, a feature shared by mRNA molecules of the parental genes of processed pseudogenes and long interspersed elements. This feature is likely convenient for intron-lost gene mRNA molecules to be reverse transcribed. Analyses of adjacent intron loss, 3'-side bias of intron loss, and germline expression of intron-lost genes also support the RT model. Conclusions Compared with previous evidence, the correlation between the abundance of processed pseudogenes and intron loss frequency more directly supports the RT model of intron loss. Exploring such a correlation is a new strategy to test the RT model in organisms with abundant processed pseudogenes.
Collapse
|
20
|
Choi SS, Hannenhalli S. Three independent determinants of protein evolutionary rate. J Mol Evol 2013; 76:98-111. [PMID: 23400388 DOI: 10.1007/s00239-013-9543-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 01/16/2013] [Indexed: 12/15/2022]
Abstract
One of the most widely accepted ideas related to the evolutionary rates of proteins is that functionally important residues or regions evolve slower than other regions, a reasonable outcome of which should be a slower evolutionary rate of the proteins with a higher density of functionally important sites. Oddly, the role of functional importance, mainly measured by essentiality, in determining evolutionary rate has been challenged in recent studies. Several variables other than protein essentiality, such as expression level, gene compactness, protein-protein interactions, etc., have been suggested to affect protein evolutionary rate. In the present review, we try to refine the concept of functional importance of a gene, and consider three factors-functional importance, expression level, and gene compactness, as independent determinants of evolutionary rate of a protein, based not only on their known correlation with evolutionary rate but also on a reasonable mechanistic model. We suggest a framework based on these mechanistic models to correctly interpret the correlations between evolutionary rates and the various variables as well as the interrelationships among the variables.
Collapse
Affiliation(s)
- Sun Shim Choi
- Department of Medical Biotechnology, College of Biomedical Science, and Institute of Bioscience & Biotechnology, Kangwon National University, Chuncheon, South Korea.
| | | |
Collapse
|
21
|
Menheniott TR, Kurklu B, Giraud AS. Gastrokines: stomach-specific proteins with putative homeostatic and tumor suppressor roles. Am J Physiol Gastrointest Liver Physiol 2013; 304:G109-21. [PMID: 23154977 DOI: 10.1152/ajpgi.00374.2012] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
During the past decade, a new family of stomach-specific proteins has been recognized. Known as "gastrokines" (GKNs), these secreted proteins are products of gastric mucus-producing cell lineages. GKNs are highly conserved in physical structure, and emerging data point to convergent functions in the modulation of gastric mucosal homeostasis and inflammation. While GKNs are highly prevalent in the normal stomach, frequent loss of GKN expression in gastric cancers, coupled with established antiproliferative activity, suggests putative tumor suppressor roles. Conversely, ectopic expression of GKNs in reparative lesions of Crohn's disease alludes to additional activity in epithelial wound healing and/or repair. Modes of action remain unsolved, but the recent demonstration of a GKN2-trefoil factor 1 heterodimer implicates functional interplay with trefoil factors. This review aims to provide a historical account of GKN biology and encapsulate the rapidly accumulating evidence supporting roles in gastric epithelial homeostasis and tumor suppression.
Collapse
Affiliation(s)
- Trevelyan R Menheniott
- Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Rd., Parkville, Melbourne, VIC 3052, Australia.
| | | | | |
Collapse
|
22
|
Williford A, Demuth JP. Gene expression levels are correlated with synonymous codon usage, amino acid composition, and gene architecture in the red flour beetle, Tribolium castaneum. Mol Biol Evol 2012; 29:3755-66. [PMID: 22826459 DOI: 10.1093/molbev/mss184] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Gene expression levels correlate with multiple aspects of gene sequence and gene structure in phylogenetically diverse taxa, suggesting an important role of gene expression levels in the evolution of protein-coding genes. Here we present results of a genome-wide study of the influence of gene expression on synonymous codon usage, amino acid composition, and gene structure in the red flour beetle, Tribolium castaneum. Consistent with the action of translational selection, we find that synonymous codon usage bias increases with gene expression. However, the correspondence between tRNA gene copy number and optimal codons is weak. At the amino acid level, translational selection is suggested by the positive correlation between tRNA gene numbers and amino acid usage, which is stronger for highly expressed genes. In addition, there is a clear trend for increased use of metabolically cheaper, less complex amino acids as gene expression increases. tRNA gene numbers also correlate negatively with amino acid size/complexity (S/C) score indicating the coupling between translational selection and selection to minimize the use of large/complex amino acids. Interestingly, the analysis of 10 additional genomes suggests that the correlation between tRNA gene numbers and amino acid S/C score is widespread and might be explained by selection against negative consequences of protein misfolding. At the level of gene structure, three major trends are detected: 1) complete coding region length increases across low and intermediate expression levels but decreases in highly expressed genes; 2) the average intron size shows the opposite trend, first decreasing with expression, followed by a slight increase in highly expressed genes; and 3) intron density remains nearly constant across all expression levels. These changes in gene architecture are only in partial agreement with selection favoring reduced cost of biosynthesis.
Collapse
Affiliation(s)
- Anna Williford
- Biology Department, University of Texas at Arlington, USA
| | | |
Collapse
|
23
|
Wu GCT, Chen FC. Determinants of exon-level evolutionary rates in Arabidopsis species. Evol Bioinform Online 2012; 8:389-415. [PMID: 22844194 PMCID: PMC3399485 DOI: 10.4137/ebo.s9743] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
What causes the variations in evolutionary rates is fundamental to molecular evolution. However, in plants, the causes of within-gene evolutionary rate variations remain underexplored. Here we use the principal component regression to examine the contributions of eleven exon features to the within-gene variations in nonsynonymous substitution rate (d(N)), synonymous substitution rate (d(S)), and the d(N)/d(S) ratio in Arabidopsis species. We demonstrate that exon features related to protein structural-functional constraints and mRNA splicing account for the largest proportions of within-gene variations in d(N)/d(S) and d(N). Meanwhile, for d(S), a combination of expression level, exon length, and structural-functional features explains the largest proportion of within-gene variances. Our results suggest that the determinants of within-gene variations differ from those of between-gene variations in evolutionary rates. Furthermore, the relative importance of different exon features also differs between plants and animals. Our study thus may shed a new light on the evolution of plant genes.
Collapse
Affiliation(s)
- Gideon C-T Wu
- Graduate Institute of Life Sciences, National Defense Medical Center, 114 Taiwan
| | | |
Collapse
|
24
|
Weinberger AD, Sun CL, Pluciński MM, Denef VJ, Thomas BC, Horvath P, Barrangou R, Gilmore MS, Getz WM, Banfield JF. Persisting viral sequences shape microbial CRISPR-based immunity. PLoS Comput Biol 2012; 8:e1002475. [PMID: 22532794 PMCID: PMC3330103 DOI: 10.1371/journal.pcbi.1002475] [Citation(s) in RCA: 120] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Accepted: 02/29/2012] [Indexed: 12/26/2022] Open
Abstract
Well-studied innate immune systems exist throughout bacteria and archaea, but a more recently discovered genomic locus may offer prokaryotes surprising immunological adaptability. Mediated by a cassette-like genomic locus termed Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), the microbial adaptive immune system differs from its eukaryotic immune analogues by incorporating new immunities unidirectionally. CRISPR thus stores genomically recoverable timelines of virus-host coevolution in natural organisms refractory to laboratory cultivation. Here we combined a population genetic mathematical model of CRISPR-virus coevolution with six years of metagenomic sequencing to link the recoverable genomic dynamics of CRISPR loci to the unknown population dynamics of virus and host in natural communities. Metagenomic reconstructions in an acid-mine drainage system document CRISPR loci conserving ancestral immune elements to the base-pair across thousands of microbial generations. This 'trailer-end conservation' occurs despite rapid viral mutation and despite rapid prokaryotic genomic deletion. The trailer-ends of many reconstructed CRISPR loci are also largely identical across a population. 'Trailer-end clonality' occurs despite predictions of host immunological diversity due to negative frequency dependent selection (kill the winner dynamics). Statistical clustering and model simulations explain this lack of diversity by capturing rapid selective sweeps by highly immune CRISPR lineages. Potentially explaining 'trailer-end conservation,' we record the first example of a viral bloom overwhelming a CRISPR system. The polyclonal viruses bloom even though they share sequences previously targeted by host CRISPR loci. Simulations show how increasing random genomic deletions in CRISPR loci purges immunological controls on long-lived viral sequences, allowing polyclonal viruses to bloom and depressing host fitness. Our results thus link documented patterns of genomic conservation in CRISPR loci to an evolutionary advantage against persistent viruses. By maintaining old immunities, selection may be tuning CRISPR-mediated immunity against viruses reemerging from lysogeny or migration.
Collapse
Affiliation(s)
- Ariel D. Weinberger
- Biophysics Graduate Group, University of California, Berkeley, California, United States of America
- Departments of Ophthalmology and Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Christine L. Sun
- Department of Plant and Microbial Biology, University of California, Berkeley, California, United States of America
| | - Mateusz M. Pluciński
- Department of Environmental Science, Policy and Management, University of California, Berkeley, California, United States of America
- Division of Epidemiology, School of Public Health, University of California, Berkeley, California, United States of America
| | - Vincent J. Denef
- Department of Environmental Science, Policy and Management, University of California, Berkeley, California, United States of America
| | - Brian C. Thomas
- Department of Environmental Science, Policy and Management, University of California, Berkeley, California, United States of America
| | | | - Rodolphe Barrangou
- DuPont Nutrition and Health, Madison, Wisconsin, United States of America
| | - Michael S. Gilmore
- Departments of Ophthalmology and Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts, United States of America
- Microbial Sciences Initiative, Harvard University, Cambridge, Massachusetts, United States of America
| | - Wayne M. Getz
- Department of Environmental Science, Policy and Management, University of California, Berkeley, California, United States of America
| | - Jillian F. Banfield
- Department of Environmental Science, Policy and Management, University of California, Berkeley, California, United States of America
- Department of Earth and Planetary Sciences, University of California, Berkeley, California, United States of America
| |
Collapse
|
25
|
Rogozin IB, Carmel L, Csuros M, Koonin EV. Origin and evolution of spliceosomal introns. Biol Direct 2012; 7:11. [PMID: 22507701 PMCID: PMC3488318 DOI: 10.1186/1745-6150-7-11] [Citation(s) in RCA: 217] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 03/15/2012] [Indexed: 12/31/2022] Open
Abstract
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
Collapse
Affiliation(s)
- Igor B Rogozin
- National Center for Biotechnology Information NLM/NIH, 8600 Rockville Pike, Bldg, 38A, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
26
|
Abstract
The recent explosion of genome sequences from all major phylogenetic groups has unveiled an unexpected wealth of cases of recurrent evolution of strikingly similar genomic features in different lineages. Here, we review the diverse known types of recurrent evolution in eukaryotic genomes, with a special focus on metazoans, ranging from reductive genome evolution to origins of splice-leader trans-splicing, from tandem exon duplications to gene family expansions. We first propose a general classification scheme for evolutionary recurrence at the genomic level, based on the type of driving force-mutation or selection-and the environmental and genomic circumstances underlying these forces. We then discuss various cases of recurrent genomic evolution under this scheme. Finally, we provide a broader context for repeated genomic evolution, including the unique relationship of genomic recurrence with the genotype-phenotype map, and the ways in which the study of recurrent genomic evolution can be used to understand fundamental evolutionary processes.
Collapse
Affiliation(s)
- Ignacio Maeso
- Department of Zoology, University of Oxford, United Kingdom
| | - Scott William Roy
- Department of Biology, Stanford University
- Department of Biology, San Francisco State University
| | - Manuel Irimia
- Department of Biology, Stanford University
- Banting and Best Department of Medical Research, Donnelly Centre, University of Toronto, Canada
| |
Collapse
|
27
|
Evolutionary systems biology: historical and philosophical perspectives on an emerging synthesis. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 751:1-28. [PMID: 22821451 DOI: 10.1007/978-1-4614-3567-9_1] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Systems biology (SB) is at least a decade old now and maturing rapidly. A more recent field, evolutionary systems biology (ESB), is in the process of further developing system-level approaches through the expansion of their explanatory and potentially predictive scope. This chapter will outline the varieties of ESB existing today by tracing the diverse roots and fusions that make up this integrative project. My approach is philosophical and historical. As well as examining the recent origins of ESB, I will reflect on its central features and the different clusters of research it comprises. In its broadest interpretation, ESB consists of five overlapping approaches: comparative and correlational ESB; network architecture ESB; network property ESB; population genetics ESB; and finally, standard evolutionary questions answered with SB methods. After outlining each approach with examples, I will examine some strong general claims about ESB, particularly that it can be viewed as the next step toward a fuller modern synthesis of evolutionary biology (EB), and that it is also the way forward for evolutionary and systems medicine. I will conclude with a discussion of whether the emerging field of ESB has the capacity to combine an even broader scope of research aims and efforts than it presently does.
Collapse
|
28
|
Nemes S, Parris TZ, Danielsson A, Kannius-Janson M, Jonasson JM, Steineck G, Helou K. Segmented regression, a versatile tool to analyze mRNA levels in relation to DNA copy number aberrations. Genes Chromosomes Cancer 2011; 51:77-82. [PMID: 22034095 DOI: 10.1002/gcc.20934] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2011] [Accepted: 08/31/2011] [Indexed: 12/11/2022] Open
Abstract
DNA copy number aberrations (CNA) and subsequent altered gene expression profiles (mRNA levels) are characteristic features of cancerous cells. Integrative genomic analysis aims to identify recurrent CNA that may have a potential role in cancer development, assuming that gene amplification is accompanied by overexpression, while deletions give rise to downregulation of gene expression. We propose a segmented regression-based approach to identify CNA-driven alteration of gene expression profiles. Segmented regression allows to fit piecewise linear models in different domains of CNA joined by a change-point, where the mRNA-CNA relationship undergoes structural changes. Here, we illustrate the implementation and applicability of the proposed model using 1,161 chromosome fragments detected as DNA CNA in primary tumors from 97 breast cancer patients. We identified significant CNA-driven changes in gene expression levels for 341 chromosome fragments, of which 72 showed a nonlinear relationship to CNA. For 59 of 72 chromosome fragments (82%), we observed an initial increase in mRNA levels due to changes in CNA. After the change-point was passed, the mRNA levels reached a plateau, and a further increase in DNA copy numbers did not induce further elevation in mRNA levels. In contrast, for 13 chromosome fragments, the change-point marked the point where mRNA production accelerated. We conclude that segmented regression modeling may provide valuable insights into the impact CNA have on gene expression in cancer cells.
Collapse
Affiliation(s)
- Szilárd Nemes
- Division of Clinical Cancer Epidemiology, Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | | | | | | | | | | | | |
Collapse
|
29
|
Woody JL, Shoemaker RC. Gene expression: sizing it all up. Front Genet 2011; 2:70. [PMID: 22303365 PMCID: PMC3268623 DOI: 10.3389/fgene.2011.00070] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Accepted: 09/29/2011] [Indexed: 11/13/2022] Open
Abstract
Genomic architecture appears to be a largely unexplored component of gene expression. That architecture can be related to chromatin domains, transposable element neighborhoods, epigenetic modifications of the genome, and more. Although surely not the end of the story, we are learning that when it comes to gene expression, size is also important. We have been surprised to find that certain patterns of expression, tissue specific versus constitutive, or high expression versus low expression, are often associated with physical attributes of the gene and genome. Multiple studies have shown an inverse relationship between gene expression patterns and various physical parameters of the genome such as intron size, exon size, intron number, and size of intergenic regions. An increase in expression level and breadth often correlates with a decrease in the size of physical attributes of the gene. Three models have been proposed to explain these relationships. Contradictory results were found in several organisms when expression level and expression breadth were analyzed independently. However, when both factors were combined in a single study a novel relationship was revealed. At low levels of expression, an increase in expression breadth correlated with an increase in genic, intergenic, and intragenic sizes. Contrastingly, at high levels of expression, an increase in expression breadth inversely correlated with the size of the gene. In this article we explore the several hypotheses regarding genome physical parameters and gene expression.
Collapse
|
30
|
Park J, Xu K, Park T, Yi SV. What are the determinants of gene expression levels and breadths in the human genome? Hum Mol Genet 2011; 21:46-56. [PMID: 21945885 PMCID: PMC3235009 DOI: 10.1093/hmg/ddr436] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
In complex organisms, different tissues express different genes, which ultimately shape the function and phenotype of each tissue. An important goal of modern biology is to understand how some genes are turned on and off in specific tissues and how the numbers of different gene expression products are determined. These aspects are named ‘expression breadth’ (or ‘tissue specificity’) and ‘expression level’, respectively. Here, we show that we can predict substantial amount of variation in levels and breadths of gene expression using genomic information of each gene. Interestingly, many genomic traits are correlated with both aspects of gene expression in similar directions, suggesting shared molecular pathways. However, to elucidate distinctive molecular mechanisms governing gene expression levels and breadths, we need to identify the relative significance of each genomic trait on these two aspects of gene expression. To this end, we developed a novel multivariate multiple regression method. Using this new method, we show that gene compactness (in particular, the mean size of exons), codon usage bias and non-synonymous rates have a stronger influence on expression levels compared with their effects on expression breadths. In contrast, the propensity of promoter DNA methylation is a stronger indicator of expression breadths than of expression levels. Interestingly, intron DNA methylation exhibits an opposite pattern to the promoter DNA methylation in the human genome, suggesting that DNA methylation may play multiple roles depending upon its genomic targets. Furthermore, synonymous rates have stronger associations with expression breadths than with expression levels in the human genome. These findings provide clues toward distinctive molecular mechanisms regulating different aspects of gene expression.
Collapse
Affiliation(s)
- Jungsun Park
- Bioinformatics and Biostatistics Laboratory, Department of Statistics, Seoul National University, Seoul 151-742, Korea
| | | | | | | |
Collapse
|
31
|
Niu DK, Yang YF. Why eukaryotic cells use introns to enhance gene expression: splicing reduces transcription-associated mutagenesis by inhibiting topoisomerase I cutting activity. Biol Direct 2011; 6:24. [PMID: 21592350 PMCID: PMC3118952 DOI: 10.1186/1745-6150-6-24] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 05/18/2011] [Indexed: 11/10/2022] Open
Abstract
Background The costs and benefits of spliceosomal introns in eukaryotes have not been established. One recognized effect of intron splicing is its known enhancement of gene expression. However, the mechanism regulating such splicing-mediated expression enhancement has not been defined. Previous studies have shown that intron splicing is a time-consuming process, indicating that splicing may not reduce the time required for transcription and processing of spliced pre-mRNA molecules; rather, it might facilitate the later rounds of transcription. Because the densities of active RNA polymerase II on most genes are less than one molecule per gene, direct interactions between the splicing apparatus and transcriptional complexes (from the later rounds of transcription) are infrequent, and thus unlikely to account for splicing-mediated gene expression enhancement. Presentation of the hypothesis The serine/arginine-rich protein SF2/ASF can inhibit the DNA topoisomerase I activity that removes negative supercoiling of DNA generated by transcription. Consequently, splicing could make genes more receptive to RNA polymerase II during the later rounds of transcription, and thus affect the frequency of gene transcription. Compared with the transcriptional enhancement mediated by strong promoters, intron-containing genes experience a lower frequency of cut-and-paste processes. The cleavage and religation activity of DNA strands by DNA topoisomerase I was recently shown to account for transcription-associated mutagenesis. Therefore, intron-mediated enhancement of gene expression could reduce transcription-associated genome instability. Testing the hypothesis Experimentally test whether transcription-associated mutagenesis is lower in intron-containing genes than in intronless genes. Use bioinformatic analysis to check whether exons flanking lost introns have higher frequencies of short deletions. Implications of the hypothesis The mechanism of intron-mediated enhancement proposed here may also explain the positive correlation observed between intron size and gene expression levels in unicellular organisms, and the greater number of intron containing genes in higher organisms. Reviewers This article was reviewed by Dr Arcady Mushegian, Dr Igor B Rogozin (nominated by Dr I King Jordan) and Dr Alexey S Kondrashov. For the full reviews, please go to the Reviewer's Reports section.
Collapse
Affiliation(s)
- Deng-Ke Niu
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China.
| | | |
Collapse
|
32
|
Jjingo D, Huda A, Gundapuneni M, Mariño-Ramírez L, Jordan IK. Effect of the transposable element environment of human genes on gene length and expression. Genome Biol Evol 2011; 3:259-71. [PMID: 21362639 PMCID: PMC3070429 DOI: 10.1093/gbe/evr015] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Independent lines of investigation have documented effects of both transposable elements (TEs) and gene length (GL) on gene expression. However, TE gene fractions are highly correlated with GL, suggesting that they cannot be considered independently. We evaluated the TE environment of human genes and GL jointly in an attempt to tease apart their relative effects. TE gene fractions and GL were compared with the overall level of gene expression and the breadth of expression across tissues. GL is strongly correlated with overall expression level but weakly correlated with the breadth of expression, confirming the selection hypothesis that attributes the compactness of highly expressed genes to selection for economy of transcription. However, TE gene fractions overall, and for the L1 family in particular, show stronger anticorrelations with expression level than GL, indicating that GL may not be the most important target of selection for transcriptional economy. These results suggest a specific mechanism, removal of TEs, by which highly expressed genes are selectively tuned for efficiency. MIR elements are the only family of TEs with gene fractions that show a positive correlation with tissue-specific expression, suggesting that they may provide regulatory sequences that help to control human gene expression. Consistent with this notion, MIR fractions are relatively enriched close to transcription start sites and associated with coexpression in specific sets of related tissues. Our results confirm the overall relevance of the TE environment to gene expression and point to distinct mechanisms by which different TE families may contribute to gene regulation.
Collapse
Affiliation(s)
- Daudi Jjingo
- School of Biology, Georgia Institute of Technology, GA, USA
| | | | | | | | | |
Collapse
|
33
|
Hao L, Ge X, Wan H, Hu S, Lercher MJ, Yu J, Chen WH. Human functional genetic studies are biased against the medically most relevant primate-specific genes. BMC Evol Biol 2010; 10:316. [PMID: 20961448 PMCID: PMC2970608 DOI: 10.1186/1471-2148-10-316] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Accepted: 10/20/2010] [Indexed: 12/02/2022] Open
Abstract
Background Many functional, structural and evolutionary features of human genes have been observed to correlate with expression breadth and/or gene age. Here, we systematically explore these correlations. Results Gene age and expression breadth are strongly correlated, but contribute independently to the variation of functional, structural and evolutionary features, even when we take account of variation in mRNA expression level. Human genes without orthologs in distant species ('young' genes) tend to be tissue-specific in their expression. As computational inference of gene function often relies on the existence of homologs in other species, and experimental characterization is facilitated by broad and high expression, young, tissue-specific human genes are often the least characterized. At the same time, young genes are most likely to be medically relevant. Conclusions Our results indicate that functional characterization of human genes is biased against young, tissue-specific genes that are mostly medically relevant. The biases should not be taken lightly because they may pose serious obstacles to our understanding of the molecular basis of human diseases. Future studies should thus be designed to specifically explore the properties of primate-specific genes.
Collapse
Affiliation(s)
- Lili Hao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| | | | | | | | | | | | | |
Collapse
|
34
|
Zeng J, Yi SV. DNA methylation and genome evolution in honeybee: gene length, expression, functional enrichment covary with the evolutionary signature of DNA methylation. Genome Biol Evol 2010; 2:770-80. [PMID: 20924039 PMCID: PMC2975444 DOI: 10.1093/gbe/evq060] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
A growing body of evidence suggests that DNA methylation is functionally divergent among different taxa. The recently discovered functional methylation system in the honeybee Apis mellifera presents an attractive invertebrate model system to study evolution and function of DNA methylation. In the honeybee, DNA methylation is mostly targeted toward transcription units (gene bodies) of a subset of genes. Here, we report an intriguing covariation of length and epigenetic status of honeybee genes. Hypermethylated and hypomethylated genes in honeybee are dramatically different in their lengths for both exons and introns. By analyzing orthologs in Drosophila melanogaster, Acyrthosiphonpisum, and Ciona intestinalis, we show genes that were short and long in the past are now preferentially situated in hyper- and hypomethylated classes respectively, in the honeybee. Moreover, we demonstrate that a subset of high-CpG genes are conspicuously longer than expected under the evolutionary relationship alone and that they are enriched in specific functional categories. We suggest that gene length evolution in the honeybee is partially driven by evolutionary forces related to regulation of gene expression, which in turn is associated with DNA methylation. However, lineage-specific patterns of gene length evolution suggest that there may exist additional forces underlying the observed interaction between DNA methylation and gene lengths in the honeybee.
Collapse
Affiliation(s)
- Jia Zeng
- School of Biology, Georgia Institute of Technology, USA
| | | |
Collapse
|
35
|
Rao YS, Wang ZF, Chai XW, Wu GZ, Zhou M, Nie QH, Zhang XQ. Selection for the compactness of highly expressed genes in Gallus gallus. Biol Direct 2010; 5:35. [PMID: 20465857 PMCID: PMC2883972 DOI: 10.1186/1745-6150-5-35] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2009] [Accepted: 05/14/2010] [Indexed: 11/10/2022] Open
Abstract
Background Coding sequence (CDS) length, gene size, and intron length vary within a genome and among genomes. Previous studies in diverse organisms, including human, D. Melanogaster, C. elegans, S. cerevisiae, and Arabidopsis thaliana, indicated that there are negative relationships between expression level and gene size, CDS length as well as intron length. Different models such as selection for economy model, genomic design model, and mutational bias hypotheses have been proposed to explain such observation. The debate of which model is a superior one to explain the observation has not been settled down. The chicken (Gallus gallus) is an important model organism that bridges the evolutionary gap between mammals and other vertebrates. As D. Melanogaster, chicken has a larger effective population size, selection for chicken genome is expected to be more effective in increasing protein synthesis efficiency. Therefore, in this study the chicken was used as a model organism to elucidate the interaction between gene features and expression pattern upon selection pressure. Results Based on different technologies, we gathered expression data for nuclear protein coding, single-splicing genes from Gallus gallus genome and compared them with gene parameters. We found that gene size, CDS length, first intron length, average intron length, and total intron length are negatively correlated with expression level and expression breadth significantly. The tissue specificity is positively correlated with the first intron length but negatively correlated with the average intron length, and not correlated with the CDS length and protein domain numbers. Comparison analyses showed that ubiquitously expressed genes and narrowly expressed genes with the similar expression levels do not differ in compactness. Our data provided evidence that the genomic design model can not, at least in part, explain our observations. We grouped all somatic-tissue-specific genes (n = 1105), and compared the first intron length and the average intron length between highly expressed genes (top 5% expressed genes) and weakly expressed genes (bottom 5% expressed genes). We found that the first intron length and the average intron length in highly expressed genes are not different from that in weakly expressed genes. We also made a comparison between ubiquitously expressed genes and narrowly expressed somatic genes with similar expression levels. Our data demonstrated that ubiquitously expressed genes are less compact than narrowly expressed genes with the similar expression levels. Obviously, these observations can not be explained by mutational bias hypotheses either. We also found that the significant trend between genes' compactness and expression level could not be affected by local mutational biases. We argued that the selection of economy model is most likely one to explain the relationship between gene expression and gene characteristics in chicken genome. Conclusion Natural selection appears to favor the compactness of highly expressed genes in chicken genome. This observation can be explained by the selection of economy model. Reviewers This article was reviewed by Dr. Gavin Huttley, Dr. Liran Carmel (nominated by Dr. Eugene V. Koonin) and Dr. Araxi Urrutia (nominated by Dr. Laurence D. Hurst).
Collapse
Affiliation(s)
- You S Rao
- Department of Biological Technology, Jiangxi Educational Institute, Nanchang, Jiangxi, China
| | | | | | | | | | | | | |
Collapse
|