1
|
Cornejo-Páramo P, Petrova V, Zhang X, Young RS, Wong ES. Emergence of enhancers at late DNA replicating regions. Nat Commun 2024; 15:3451. [PMID: 38658544 PMCID: PMC11043393 DOI: 10.1038/s41467-024-47391-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 03/26/2024] [Indexed: 04/26/2024] Open
Abstract
Enhancers are fast-evolving genomic sequences that control spatiotemporal gene expression patterns. By examining enhancer turnover across mammalian species and in multiple tissue types, we uncover a relationship between the emergence of enhancers and genome organization as a function of germline DNA replication time. While enhancers are most abundant in euchromatic regions, enhancers emerge almost twice as often in late compared to early germline replicating regions, independent of transposable elements. Using a deep learning sequence model, we demonstrate that new enhancers are enriched for mutations that alter transcription factor (TF) binding. Recently evolved enhancers appear to be mostly neutrally evolving and enriched in eQTLs. They also show more tissue specificity than conserved enhancers, and the TFs that bind to these elements, as inferred by binding sequences, also show increased tissue-specific gene expression. We find a similar relationship with DNA replication time in cancer, suggesting that these observations may be time-invariant principles of genome evolution. Our work underscores that genome organization has a profound impact in shaping mammalian gene regulation.
Collapse
Affiliation(s)
- Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, Sydney, NSW, Australia
| | - Veronika Petrova
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, Sydney, NSW, Australia
| | - Xuan Zhang
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
| | - Robert S Young
- Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, United Kingdom
- Zhejiang University - University of Edinburgh Institute, Zhejiang University, 718 East Haizhou Road, 314400, Haining, PR China
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, Sydney, NSW, Australia.
| |
Collapse
|
2
|
Lee Y, Cho CH, Noh C, Yang JH, Park SI, Lee YM, West JA, Bhattacharya D, Jo K, Yoon HS. Origin of minicircular mitochondrial genomes in red algae. Nat Commun 2023; 14:3363. [PMID: 37291154 PMCID: PMC10250338 DOI: 10.1038/s41467-023-39084-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 05/30/2023] [Indexed: 06/10/2023] Open
Abstract
Eukaryotic organelle genomes are generally of conserved size and gene content within phylogenetic groups. However, significant variation in genome structure may occur. Here, we report that the Stylonematophyceae red algae contain multipartite circular mitochondrial genomes (i.e., minicircles) which encode one or two genes bounded by a specific cassette and a conserved constant region. These minicircles are visualized using fluorescence microscope and scanning electron microscope, proving the circularity. Mitochondrial gene sets are reduced in these highly divergent mitogenomes. Newly generated chromosome-level nuclear genome assembly of Rhodosorus marinus reveals that most mitochondrial ribosomal subunit genes are transferred to the nuclear genome. Hetero-concatemers that resulted from recombination between minicircles and unique gene inventory that is responsible for mitochondrial genome stability may explain how the transition from typical mitochondrial genome to minicircles occurs. Our results offer inspiration on minicircular organelle genome formation and highlight an extreme case of mitochondrial gene inventory reduction.
Collapse
Affiliation(s)
- Yongsung Lee
- Department of Biological Sciences, Sungkyunkwan University, Suwon, 16419, Korea
| | - Chung Hyun Cho
- Department of Biological Sciences, Sungkyunkwan University, Suwon, 16419, Korea
| | - Chanyoung Noh
- Department of Chemistry, Sogang University, Seoul, 04107, Korea
| | - Ji Hyun Yang
- Department of Biological Sciences, Sungkyunkwan University, Suwon, 16419, Korea
| | - Seung In Park
- Department of Biological Sciences, Sungkyunkwan University, Suwon, 16419, Korea
| | - Yu Min Lee
- Department of Biological Sciences, Sungkyunkwan University, Suwon, 16419, Korea
| | - John A West
- School of Biosciences 2, University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, 08901, USA
| | - Kyubong Jo
- Department of Chemistry, Sogang University, Seoul, 04107, Korea.
| | - Hwan Su Yoon
- Department of Biological Sciences, Sungkyunkwan University, Suwon, 16419, Korea.
| |
Collapse
|
3
|
Zheng H, Liang Y, Hong B, Xu Y, Ren M, Wang Y, Huang L, Yang L, Tao J. Genome-Scale Analysis of the Grapevine KCS Genes Reveals Its Potential Role in Male Sterility. Int J Mol Sci 2023; 24:ijms24076510. [PMID: 37047480 PMCID: PMC10095565 DOI: 10.3390/ijms24076510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 03/20/2023] [Accepted: 03/22/2023] [Indexed: 04/03/2023] Open
Abstract
Very long-chain fatty acid (VLCFA) synthesis in plants, is primarily rate-limited by the enzyme 3-ketoacyl CoA synthase (KCS), which also controls the rate and carbon chain length of VLCFA synthesis. Disruption of VLCFA during pollen development, may affect the pollen wall formation and ultimately lead to male sterility. Our study identified 24 grapevine KCS (VvKCS) genes and provided new names based on their relative chromosome distribution. Based on sequence alignment and phylogenetic investigation, these genes were grouped into seven subgroups, members of the same subgroup having similar motif structures. Synteny analysis of VvKCS genes, showed that the segmental duplication events played an important role in expanding this gene family. Expression profiles obtained from the transcriptome data showed different expression patterns of VvKCS genes in different tissues. Comparison of transcriptome and RT-qPCR data of the male sterile grape ‘Y−14’ and its fertile parent ‘Shine Muscat’, revealed that 10 VvKCS genes were significantly differentially expressed at the meiosis stage, which is a critical period of pollen wall formation. Further, joint analysis by weighted gene co-expression network analysis (WGCNA) and Kyoto Encyclopedia of Genes and Genomes (KEGG), revealed that five of these VvKCS (VvKCS6/15/19/20/24) genes were involved in the fatty acid elongation pathway, which may ultimately affect the structural integrity of the pollen wall in ‘Y−14’. This systematic analysis provided a foundation for further functional characterization of VvKCS genes, with the aim of grapevine precision breeding improvement.
Collapse
|
4
|
Wan T, Gong Y, Liu Z, Zhou Y, Dai C, Wang Q. Evolution of complex genome architecture in gymnosperms. Gigascience 2022; 11:6659718. [PMID: 35946987 PMCID: PMC9364684 DOI: 10.1093/gigascience/giac078] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/09/2022] [Accepted: 07/15/2022] [Indexed: 11/25/2022] Open
Abstract
Gymnosperms represent an ancient lineage that diverged from early spermatophytes during the Devonian. The long fossil records and low diversity in living species prove their complex evolutionary history, which included ancient radiations and massive extinctions. Due to their ultra-large genome size, the whole-genome assembly of gymnosperms has only generated in the past 10 years and is now being further expanded into more taxonomic representations. Here, we provide an overview of the publicly available gymnosperm genome resources and discuss their assembly quality and recent findings in large genome architectures. In particular, we describe the genomic features most related to changes affecting the whole genome. We also highlight new realizations relative to repetitive sequence dynamics, paleopolyploidy, and long introns. Based on the results of relevant genomic studies of gymnosperms, we suggest additional efforts should be made toward exploring the genomes of medium-sized (5–15 gigabases) species. Lastly, more comparative analyses among high-quality assemblies are needed to understand the genomic shifts and the early species diversification of seed plants.
Collapse
Affiliation(s)
- Tao Wan
- Core Botanical Gardens/Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China.,Sino-Africa Joint Research Centre, Chinese Academy of Sciences, Wuhan 430074, China.,Key Laboratory of Southern Subtropical Plant Diversity, Fairy Lake Botanical Garden, Shenzhen & Chinese Academy of Science, Shenzhen 518004, China
| | - Yanbing Gong
- Department of Ecology, Tibetan Centre for Ecology and Conservation at WHU-TU, State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan 430072, China.,Research Center for Ecology, College of Science, Tibet University, Lhasa 850000, China
| | - Zhiming Liu
- Key Laboratory of Southern Subtropical Plant Diversity, Fairy Lake Botanical Garden, Shenzhen & Chinese Academy of Science, Shenzhen 518004, China
| | - YaDong Zhou
- School of Life Science, Nanchang University, Nanchang 330031, China
| | - Can Dai
- School of Resources and Environmental Science, Hubei University, Wuhan, China
| | - Qingfeng Wang
- Core Botanical Gardens/Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China.,Sino-Africa Joint Research Centre, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
5
|
Vinogradov AE, Anatskaya OV. Systemic evolutionary changes in mammalian gene expression. Biosystems 2020; 198:104256. [PMID: 32976926 DOI: 10.1016/j.biosystems.2020.104256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 09/18/2020] [Accepted: 09/18/2020] [Indexed: 12/16/2022]
Abstract
Changes in gene expression play an important role in evolution and can be relevant to evolutionary medicine. In this work, a strong relationship was found between the statistical significance of evolutionary changes in the expression of orthologous genes in the five or six homologous mammalian tissues and the across-tissues unidirectionality of changes (i.e., they occur in the same direction in different tissues -- all upward or all downward). In the area of highly significant changes, the fraction of unidirectionally changed genes (UCG) was above 0.9 (random expectation is 0.03). This observation indicates that the most pronounced evolutionary changes in mammalian gene expression are systemic (i.e., they operate at the whole-organism level). The UCG are strongly enriched in the housekeeping genes. More specifically, in the human-chimpanzee comparison, the UCG are enriched in the pathways belonging to gene expression (translation is prominent), cell cycle control, ubiquitin-dependent protein degradation (mostly related to cell cycle control), apoptosis, and Parkinson's disease. In the human-macaque comparison, the two other neurodegenerative diseases (Alzheimer's and Huntington's) are added to the enriched pathways. The consolidation of gene expression changes at the level of pathways indicates that they are not neutral but functional. The systemic expression changes probably maintain the across-tissues balance of basic physiological processes in the course of evolution (e.g., during the movement along the fast-slow life axis). These results can be useful for understanding the variation in longevity and susceptibility to cancer and widespread neurodegenerative diseases. This approach can also guide the choice of prospective genes for studies aiming to decipher cis-regulatory code (the gene list is provided).
Collapse
Affiliation(s)
| | - Olga V Anatskaya
- Institute of Cytology, Russian Academy of Sciences, St. Petersburg, 194064, Russia
| |
Collapse
|
6
|
Liechty WB, Scheuerle RL, Vela Ramirez JE, Peppas NA. Uptake and function of membrane-destabilizing cationic nanogels for intracellular drug delivery. Bioeng Transl Med 2019; 4:17-29. [PMID: 30680315 PMCID: PMC6336667 DOI: 10.1002/btm2.10120] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 10/12/2018] [Accepted: 10/18/2018] [Indexed: 02/06/2023] Open
Abstract
The design of intracellular drug delivery vehicles demands an in-depth understanding of their internalization and function upon entering the cell to tailor the physicochemical characteristics of these platforms and achieve efficacious treatments. Polymeric cationic systems have been broadly accepted to be membrane disruptive thus being beneficial for drug delivery inside the cell. However, if excessive destabilization takes place, it can lead to adverse effects. One of the strategies used to modulate the cationic charge is the incorporation of hydrophobic moieties, thus increasing the hydrophobic content. We have demonstrated the successful synthesis of nanogels based on diethylaminoethyl methacrylate and poly(ethylene glycol) methyl ether methacrylate. Addition of the hydrophobic monomers tert-butyl methacrylate or 2-(tert-butylamino)ethyl methacrylate shows improved polymer hydrophobicity and modulation of the critical swelling pH. Here, we evaluate the cytocompatibility, uptake, and function of these membrane-destabilizing cationic methacrylated nanogels using in vitro models. The obtained results suggest that the incorporation of hydrophobic monomers decreases the cytotoxicity of the nanogels to epithelial colorectal adenocarcinoma cells. Furthermore, analysis of the internalization pathways of these vehicles using inhibitors and imaging flow cytometry showed a significant decrease in uptake when macropinocytosis/phagocytosis inhibitors were present. The membrane-disruptive abilities of the cationic polymeric nanogels were confirmed using three different models. They demonstrated to cause hemolysis in sheep erythrocytes, lactate dehydrogenase leakage from a model cell line, and disrupt giant unilamellar vesicles. These findings provide new insights of the potential of polymeric nanoformulations for intracellular delivery.
Collapse
Affiliation(s)
- William B. Liechty
- McKetta Dept. of Chemical EngineeringThe University of Texas at AustinAustinTX 78712
| | - Rebekah L. Scheuerle
- McKetta Dept. of Chemical EngineeringThe University of Texas at AustinAustinTX 78712
| | - Julia E. Vela Ramirez
- McKetta Dept. of Chemical EngineeringThe University of Texas at AustinAustinTX 78712
- Dept. of Biomedical EngineeringThe University of Texas at AustinAustinTX 78712
- Institute for Biomaterials, Drug Delivery, and Regenerative MedicineThe University of Texas at AustinAustinTX 78712
| | - Nicholas A. Peppas
- McKetta Dept. of Chemical EngineeringThe University of Texas at AustinAustinTX 78712
- Dept. of Biomedical EngineeringThe University of Texas at AustinAustinTX 78712
- Institute for Biomaterials, Drug Delivery, and Regenerative MedicineThe University of Texas at AustinAustinTX 78712
- Depts. of Surgery and Perioperative CareDell Medical School, The University of Texas at AustinAustinTX 78712
- Division of Molecular Pharmaceutics and Drug DeliveryCollege of Pharmacy, The University of Texas at AustinAustinTX 78712
| |
Collapse
|
7
|
DNA helix: the importance of being AT-rich. Mamm Genome 2017; 28:455-464. [PMID: 28836096 DOI: 10.1007/s00335-017-9713-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 08/12/2017] [Indexed: 01/02/2023]
Abstract
The AT-rich DNA is mostly associated with condensed chromatin, whereas the GC-rich sequence is preferably located in the dispersed chromatin. The AT-rich genes are prone to be tissue-specific (silenced in most tissues), while the GC-rich genes tend to be housekeeping (expressed in many tissues). This paper reports another important property of DNA base composition, which can affect repertoire of genes with high AT content. The GC-rich sequence is more liable to mutation. We found that Spearman correlation between human gene GC content and mutation probability is above 0.9. The change of base composition even in synonymous sites affects mutation probability of nonsynonymous sites and thus of encoded proteins. There is a unique type of housekeeping genes, which are especially unsafe when prone to mutation. Natural selection which usually removes deleterious mutations, in the case of these genes only increases the hazard because it can descend to suborganismal (cellular) level. These are cell cycle-related genes. In accordance with the proposed concept, they have low GC content of synonymous sites (despite them being housekeeping). The gene-centred protein interaction enrichment analysis (PIEA) showed the core clusters of genes whose interactants are modularly enriched in genes with AT-rich synonymous codons. This interconnected network is involved in double-strand break repair, DNA integrity checkpoints and chromosome pairing at mitosis. The damage of these genes results in genome and chromosome instability leading to cancer and other 'error catastrophes'. Reducing the nonsynonymous mutations, the usage of AT-rich synonymous codons can decrease probability of cancer by above 20-fold.
Collapse
|
8
|
Symonová R, Majtánová Z, Arias-Rodriguez L, Mořkovský L, Kořínková T, Cavin L, Pokorná MJ, Doležálková M, Flajšhans M, Normandeau E, Ráb P, Meyer A, Bernatchez L. Genome Compositional Organization in Gars Shows More Similarities to Mammals than to Other Ray-Finned Fish. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2016; 328:607-619. [DOI: 10.1002/jez.b.22719] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 11/13/2016] [Accepted: 11/22/2016] [Indexed: 12/12/2022]
Affiliation(s)
- Radka Symonová
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
- Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
- Research Institute for Limnology; University of Innsbruck; Mondsee Austria
| | - Zuzana Majtánová
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
- Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
| | - Lenin Arias-Rodriguez
- División Académica de Ciencias Biológicas; Universidad Juárez Autónoma de Tabasco (UJAT); Villahermosa Tabasco México
| | - Libor Mořkovský
- Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
| | - Tereza Kořínková
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
| | - Lionel Cavin
- Muséum d'Histoire Naturelle; Geneva 6 Switzerland
| | - Martina Johnson Pokorná
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
- Department of Ecology; Faculty of Science; Charles University; Prague 2 Czech Republic
| | - Marie Doležálková
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
- Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
| | - Martin Flajšhans
- Faculty of Fisheries and Protection of Waters; South Bohemian Research Centre of Aquaculture and Biodiversity of Hydrocenoses; University of South Bohemia in České Budějovice; Vodňany Czech Republic
| | - Eric Normandeau
- IBIS, Department of Biology, University Laval, Pavillon Charles-Eugène-Marchand; Avenue de la Médecine Quebec City; Canada
| | - Petr Ráb
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
| | - Axel Meyer
- Chair in Zoology and Evolutionary Biology; Department of Biology; University of Konstanz; Konstanz Germany
| | - Louis Bernatchez
- IBIS, Department of Biology, University Laval, Pavillon Charles-Eugène-Marchand; Avenue de la Médecine Quebec City; Canada
| |
Collapse
|
9
|
Tarallo A, Gambi MC, D'Onofrio G. Lifestyle and DNA base composition in polychaetes. Physiol Genomics 2016; 48:883-888. [PMID: 27764763 DOI: 10.1152/physiolgenomics.00018.2016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 09/27/2016] [Indexed: 11/22/2022] Open
Abstract
A comparative analysis of polychaete species, classified as motile and low-motile forms, highlighted that the former were characterized not only by a higher metabolic rate (MR), but also by a higher genomic GC content. The fluctuation of both variables was not affected by the phylogenetic relationship of the species. Thus, present results further support that a very active lifestyle affects MR and GC at the same time, showing an unexpected similarity between invertebrates and vertebrates. In teleosts, indeed, a similar pattern has been also observed in comparisons of migratory and nonmigratory species. A cause-effect link between MR and GC has not yet been proved, but the fact that the two variables are significantly linked in all the organisms so far analyzed is, most probably, of relevant biological and evolutionary meaning. The present results fit very well within the frame of the metabolic rate hypothesis proposed to explain the DNA base composition variability among organisms. On the contrary, the thermostability hypothesis was not supported. At present, no data about the recombination rate in polychaetes were available to test the biased gene conversion (BGC hypothesis).
Collapse
Affiliation(s)
- Andrea Tarallo
- Stazione Zoologica Anton Dohrn, Department of Biology and Evolution of Marine Organisms, Naples, Italy; and
| | - Maria Cristina Gambi
- Stazione Zoologica Anton Dohrn, Department of Integrative Marine Ecology (Villa Dohrn-Benthic Ecology Center), Ischia, Naples, Italy
| | - Giuseppe D'Onofrio
- Stazione Zoologica Anton Dohrn, Department of Biology and Evolution of Marine Organisms, Naples, Italy; and
| |
Collapse
|
10
|
Tarallo A, Angelini C, Sanges R, Yagi M, Agnisola C, D'Onofrio G. On the genome base composition of teleosts: the effect of environment and lifestyle. BMC Genomics 2016; 17:173. [PMID: 26935583 PMCID: PMC4776435 DOI: 10.1186/s12864-016-2537-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 02/25/2016] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND The DNA base composition is well known to be highly variable among organisms. Bio-physic studies on the effect of the GC increments on the DNA structure have shown that GC-richer DNA sequences are more bendable. The result was the keystone of the hypothesis proposing the metabolic rate as the major force driving the GC content variability, since an increased resistance to the torsion stress is mainly required during the transcription process to avoid DNA breakage. Hence, the aim of the present work is to test if both salinity and migration, suggested to affect the metabolic rate of teleostean fishes, affect the average genomic GC content as well. Moreover, since the gill surface has been reported to be a major morphological expression of metabolic rate, this parameter was also analyzed in the light of the above hypothesis. RESULTS Teleosts living in different environments (freshwater and seawater) and with different lifestyles (migratory and non-migratory) were analyzed studying three variables: routine metabolic rate, gill area and genomic GC-content, none of them showing a phylogenetic signal among fish species. Routine metabolic rate, specific gill area and average genomic GC were higher in seawater than freshwater species. The same trend was observed comparing migratory versus non-migratory species. Crossing salinity and lifestyle, the active migratory species living in seawater show coincidentally the highest routine metabolic rate, the highest specific gill area and the highest average genomic GC content. CONCLUSIONS The results clearly highlight that environmental factors (salinity) and lifestyle (migration) affect not only the physiology (i.e. the routine metabolic rate), and the morphology (i.e. gill area) of teleosts, but also basic genome feature (i.e. the GC content), thus opening to an interesting liaison among the three variables in the light of the metabolic rate hypothesis.
Collapse
Affiliation(s)
- Andrea Tarallo
- Genome Evolution and Organization - Department BEOM, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | - Claudia Angelini
- Istituto per le Applicazioni del Calcolo "Mauro Picone" - CNR, Via Pietro Castellino, 111, 80131, Naples, Italy
| | - Remo Sanges
- Genome Evolution and Organization - Department BEOM, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | - Mitsuharu Yagi
- Faculty of Fisheries, Nagasaki University, 1-14 Bunkyo, Nagasaki, 852-8521, Japan
| | - Claudio Agnisola
- Department of Biology, Complesso Universitario di Monte Sant'Angelo, University of Naples Federico II, Edificio 7, Via Cinthia, 80126, Naples, Italy
| | - Giuseppe D'Onofrio
- Genome Evolution and Organization - Department BEOM, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy.
| |
Collapse
|
11
|
Panda A, Podder S, Chakraborty S, Ghosh TC. GC-made protein disorder sheds new light on vertebrate evolution. Genomics 2014; 104:530-7. [PMID: 25240915 DOI: 10.1016/j.ygeno.2014.09.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Revised: 08/05/2014] [Accepted: 09/10/2014] [Indexed: 10/24/2022]
Abstract
At the emergence of endothermic vertebrates, GC rich regions of the ectothermic ancestral genomes underwent a significant GC increase. Such an increase was previously postulated to increase thermodynamic and structural stability of proteins through selective increase of protein hydrophobicity. Here, we found that, increase in GC content promotes a higher content of disorder promoting amino acid in endothermic vertebrates proteins and that the increase in hydrophobicity is mainly due to a higher content of the small disorder promoting amino acid alanine. In endothermic vertebrates, prevalence of disordered residues was found to promote functional diversity of proteins encoded by GC rich genes. Higher fraction of disordered residues in this group of proteins was also found to minimize their aggregation tendency. Thus, we propose that the GC transition has favored disordered residues to promote functional diversity in GC rich genes, and to protect them against functional loss by protein misfolding.
Collapse
Affiliation(s)
- Arup Panda
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Soumita Podder
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Sandip Chakraborty
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India.
| |
Collapse
|
12
|
Chaurasia A, Tarallo A, Bernà L, Yagi M, Agnisola C, D’Onofrio G. Length and GC content variability of introns among teleostean genomes in the light of the metabolic rate hypothesis. PLoS One 2014; 9:e103889. [PMID: 25093416 PMCID: PMC4122358 DOI: 10.1371/journal.pone.0103889] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 07/07/2014] [Indexed: 01/30/2023] Open
Abstract
A comparative analysis of five teleostean genomes, namely zebrafish, medaka, three-spine stickleback, fugu and pufferfish was performed with the aim to highlight the nature of the forces driving both length and base composition of introns (i.e., bpi and GCi). An inter-genome approach using orthologous intronic sequences was carried out, analyzing independently both variables in pairwise comparisons. An average length shortening of introns was observed at increasing average GCi values. The result was not affected by masking transposable and repetitive elements harbored in the intronic sequences. The routine metabolic rate (mass specific temperature-corrected using the Boltzmann's factor) was measured for each species. A significant correlation held between average differences of metabolic rate, length and GC content, while environmental temperature of fish habitat was not correlated with bpi and GCi. Analyzing the concomitant effect of both variables, i.e., bpi and GCi, at increasing genomic GC content, a decrease of bpi and an increase of GCi was observed for the significant majority of the intronic sequences (from ∼40% to ∼90%, in each pairwise comparison). The opposite event, concomitant increase of bpi and decrease of GCi, was counter selected (from <1% to ∼10%, in each pairwise comparison). The results further support the hypothesis that the metabolic rate plays a key role in shaping genome architecture and evolution of vertebrate genomes.
Collapse
Affiliation(s)
- Ankita Chaurasia
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- Campus UAB - CRAG Bellaterra - Cerdanyola del Vallès, Barcelona, Spain
| | - Andrea Tarallo
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
| | - Luisa Bernà
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- Molecular Biology Unit, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Mitsuharu Yagi
- Faculty of Fisheries, Nagasaki University, Bunkyo, Nagasaki, Japan
| | - Claudio Agnisola
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
| | - Giuseppe D’Onofrio
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- * E-mail:
| |
Collapse
|
13
|
Implications of human genome structural heterogeneity: functionally related genes tend to reside in organizationally similar genomic regions. BMC Genomics 2014; 15:252. [PMID: 24684786 PMCID: PMC4234528 DOI: 10.1186/1471-2164-15-252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Accepted: 03/21/2014] [Indexed: 01/30/2023] Open
Abstract
Background In an earlier study, we hypothesized that genomic segments with different sequence
organization patterns (OPs) might display functional specificity despite their
similar GC content. Here we tested this hypothesis by dividing the human genome
into 100 kb segments, classifying these segments into five compositional
groups according to GC content, and then characterizing each segment within the
five groups by oligonucleotide counting (k-mer analysis; also referred to as
compositional spectrum analysis, or CSA), to examine the distribution of sequence
OPs in the segments. We performed the CSA on the entire DNA, i.e., its coding and
non-coding parts the latter being much more abundant in the genome than the
former. Results We identified 38 OP-type clusters of segments that differ in their compositional
spectrum (CS) organization. Many of the segments that shared the same OP type were
enriched with genes related to the same biological processes (developmental,
signaling, etc.), components of biochemical complexes, or organelles. Thirteen
OP-type clusters showed significant enrichment in genes connected to specific
gene-ontology terms. Some of these clusters seemed to reflect certain events
during periods of horizontal gene transfer and genome expansion, and subsequent
evolution of genomic regions requiring coordinated regulation. Conclusions There may be a tendency for genes that are involved in the same biological
process, complex or organelle to use the same OP, even at a distance of ~
100 kb from the genes. Although the intergenic DNA is non-coding, the general
pattern of sequence organization (e.g., reflected in over-represented
oligonucleotide “words”) may be important and were protected, to some
extent, in the course of evolution.
Collapse
|
14
|
Zhang Q, Edwards SV. The evolution of intron size in amniotes: a role for powered flight? Genome Biol Evol 2013; 4:1033-43. [PMID: 22930760 PMCID: PMC3490418 DOI: 10.1093/gbe/evs070] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Intronic DNA is a major component of eukaryotic genes and genomes and can be subject to
selective constraint and have functions in gene regulation. Intron size is of particular
interest given that it is thought to be the target of a variety of evolutionary forces and
has been suggested to be linked ultimately to various phenotypic traits, such as powered
flight. Using whole-genome analyses and comparative approaches that account for
phylogenetic nonindependence, we examined interspecific variation in intron size variation
in three data sets encompassing from 12 to 30 amniotes genomes and allowing for different
levels of genome coverage. In addition to confirming that intron size is negatively
associated with intron position and correlates with genome size, we found that on average
mammals have longer introns than birds and nonavian reptiles, a trend that is correlated
with the proliferation of repetitive elements in mammals. Two independent comparisons
between flying and nonflying sister groups both showed a reduction of intron size in
volant species, supporting an association between powered flight, or possibly the high
metabolic rates associated with flight, and reduced intron/genome size. Small intron size
in volant lineages is less easily explained as a neutral consequence of large effective
population size. In conclusion, we found that the evolution of intron size in amniotes
appears to be non-neutral, is correlated with genome size, and is likely influenced by
powered flight and associated high metabolic rates.
Collapse
Affiliation(s)
- Qu Zhang
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | | |
Collapse
|
15
|
Berná L, Chaurasia A, Angelini C, Federico C, Saccone S, D'Onofrio G. The footprint of metabolism in the organization of mammalian genomes. BMC Genomics 2012; 13:174. [PMID: 22568857 PMCID: PMC3384468 DOI: 10.1186/1471-2164-13-174] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 05/08/2012] [Indexed: 01/02/2023] Open
Abstract
Background At present five evolutionary hypotheses have been proposed to explain the great variability of the genomic GC content among and within genomes: the mutational bias, the biased gene conversion, the DNA breakpoints distribution, the thermal stability and the metabolic rate. Several studies carried out on bacteria and teleostean fish pointed towards the critical role played by the environment on the metabolic rate in shaping the base composition of genomes. In mammals the debate is still open, and evidences have been produced in favor of each evolutionary hypothesis. Human genes were assigned to three large functional categories (as well as to the corresponding functional classes) according to the KOG database: (i) information storage and processing, (ii) cellular processes and signaling, and (iii) metabolism. The classification was extended to the organisms so far analyzed performing a reciprocal Blastp and selecting the best reciprocal hit. The base composition was calculated for each sequence of the whole CDS dataset. Results The GC3 level of the above functional categories was increasing from (i) to (iii). This specific compositional pattern was found, as footprint, in all mammalian genomes, but not in frog and lizard ones. Comparative analysis of human versus both frog and lizard functional categories showed that genes involved in the metabolic processes underwent the highest GC3 increment. Analyzing the KOG functional classes of genes, again a well defined intra-genomic pattern was found in all mammals. Not only genes of metabolic pathways, but also genes involved in chromatin structure and dynamics, transcription, signal transduction mechanisms and cytoskeleton, showed an average GC3 level higher than that of the whole genome. In the case of the human genome, the genes of the aforementioned functional categories showed a high probability to be associated with the chromosomal bands. Conclusions In the light of different evolutionary hypotheses proposed so far, and contributing with different potential to the genome compositional heterogeneity of mammalian genomes, the one based on the metabolic rate seems to play not a minor role. Keeping in mind similar results reported in bacteria and in teleosts, the specific compositional patterns observed in mammals highlight metabolic rate as unifying factor that fits over a wide range of living organisms.
Collapse
Affiliation(s)
- Luisa Berná
- Genome Evolution and Organization - Department Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| | | | | | | | | | | |
Collapse
|
16
|
Komissarov AS, Gavrilova EV, Demin SJ, Ishov AM, Podgornaya OI. Tandemly repeated DNA families in the mouse genome. BMC Genomics 2011; 12:531. [PMID: 22035034 PMCID: PMC3218096 DOI: 10.1186/1471-2164-12-531] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2011] [Accepted: 10/28/2011] [Indexed: 12/23/2022] Open
Abstract
Background Functional and morphological studies of tandem DNA repeats, that combine high portion of most genomes, are mostly limited due to the incomplete characterization of these genome elements. We report here a genome wide analysis of the large tandem repeats (TR) found in the mouse genome assemblies. Results Using a bioinformatics approach, we identified large TR with array size more than 3 kb in two mouse whole genome shotgun (WGS) assemblies. Large TR were classified based on sequence similarity, chromosome position, monomer length, array variability, and GC content; we identified four superfamilies, eight families, and 62 subfamilies - including 60 not previously described. 1) The superfamily of centromeric minor satellite is only found in the unassembled part of the reference genome. 2) The pericentromeric major satellite is the most abundant superfamily and reveals high order repeat structure. 3) Transposable elements related superfamily contains two families. 4) The superfamily of heterogeneous tandem repeats includes four families. One family is found only in the WGS, while two families represent tandem repeats with either single or multi locus location. Despite multi locus location, TRPC-21A-MM is placed into a separated family due to its abundance, strictly pericentromeric location, and resemblance to big human satellites. To confirm our data, we next performed in situ hybridization with three repeats from distinct families. TRPC-21A-MM probe hybridized to chromosomes 3 and 17, multi locus TR-22A-MM probe hybridized to ten chromosomes, and single locus TR-54B-MM probe hybridized with the long loops that emerge from chromosome ends. In addition to in silico predicted several extra-chromosomes were positive for TR by in situ analysis, potentially indicating inaccurate genome assembly of the heterochromatic genome regions. Conclusions Chromosome-specific TR had been predicted for mouse but no reliable cytogenetic probes were available before. We report new analysis that identified in silico and confirmed in situ 3/17 chromosome-specific probe TRPC-21-MM. Thus, the new classification had proven to be useful tool for continuation of genome study, while annotated TR can be the valuable source of cytogenetic probes for chromosome recognition.
Collapse
|
17
|
Thanakiatkrai P, Welch L. Evaluation of nucleosome forming potentials (NFPs) of forensically important STRs. Forensic Sci Int Genet 2011; 5:285-90. [DOI: 10.1016/j.fsigen.2010.05.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2010] [Revised: 04/10/2010] [Accepted: 05/07/2010] [Indexed: 01/25/2023]
|
18
|
Frenkel ZM, Bettecken T, Trifonov EN. Nucleosome DNA sequence structure of isochores. BMC Genomics 2011; 12:203. [PMID: 21510861 PMCID: PMC3097165 DOI: 10.1186/1471-2164-12-203] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2010] [Accepted: 04/21/2011] [Indexed: 12/03/2022] Open
Abstract
Background Significant differences in G+C content between different isochore types suggest that the nucleosome positioning patterns in DNA of the isochores should be different as well. Results Extraction of the patterns from the isochore DNA sequences by Shannon N-gram extension reveals that while the general motif YRRRRRYYYYYR is characteristic for all isochore types, the dominant positioning patterns of the isochores vary between TAAAAATTTTTA and CGGGGGCCCCCG due to the large differences in G+C composition. This is observed in human, mouse and chicken isochores, demonstrating that the variations of the positioning patterns are largely G+C dependent rather than species-specific. The species-specificity of nucleosome positioning patterns is revealed by dinucleotide periodicity analyses in isochore sequences. While human sequences are showing CG periodicity, chicken isochores display AG (CT) periodicity. Mouse isochores show very weak CG periodicity only. Conclusions Nucleosome positioning pattern as revealed by Shannon N-gram extension is strongly dependent on G+C content and different in different isochores. Species-specificity of the pattern is subtle. It is reflected in the choice of preferentially periodical dinucleotides.
Collapse
Affiliation(s)
- Zakharia M Frenkel
- Genome Diversity Center, Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel
| | | | | |
Collapse
|
19
|
Vavouri T, Lehner B. Chromatin organization in sperm may be the major functional consequence of base composition variation in the human genome. PLoS Genet 2011; 7:e1002036. [PMID: 21490963 PMCID: PMC3072381 DOI: 10.1371/journal.pgen.1002036] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2010] [Accepted: 02/11/2011] [Indexed: 11/17/2022] Open
Abstract
Chromatin in sperm is different from that in other cells, with most of the genome packaged by protamines not nucleosomes. Nucleosomes are, however, retained at some genomic sites, where they have the potential to transmit paternal epigenetic information. It is not understood how this retention is specified. Here we show that base composition is the major determinant of nucleosome retention in human sperm, predicting retention very well in both genic and non-genic regions of the genome. The retention of nucleosomes at GC-rich sequences with high intrinsic nucleosome affinity accounts for the previously reported retention at transcription start sites and at genes that regulate development. It also means that nucleosomes are retained at the start sites of most housekeeping genes. We also report a striking link between the retention of nucleosomes in sperm and the establishment of DNA methylation-free regions in the early embryo. Taken together, this suggests that paternal nucleosome transmission may facilitate robust gene regulation in the early embryo. We propose that chromatin organization in the male germline, rather than in somatic cells, is the major functional consequence of fine-scale base composition variation in the human genome. The selective pressure driving base composition evolution in mammals could, therefore, be the need to transmit paternal epigenetic information to the zygote.
Collapse
Affiliation(s)
- Tanya Vavouri
- EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation, Universitat Pompeu Fabra, Barcelona, Spain
| | | |
Collapse
|
20
|
Misawa K, Kikuno RF. Relationship between amino acid composition and gene expression in the mouse genome. BMC Res Notes 2011; 4:20. [PMID: 21272306 PMCID: PMC3038927 DOI: 10.1186/1756-0500-4-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2010] [Accepted: 01/27/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Codon bias is a phenomenon that refers to the differences in the frequencies of synonymous codons among different genes. In many organisms, natural selection is considered to be a cause of codon bias because codon usage in highly expressed genes is biased toward optimal codons. Methods have previously been developed to predict the expression level of genes from their nucleotide sequences, which is based on the observation that synonymous codon usage shows an overall bias toward a few codons called major codons. However, the relationship between codon bias and gene expression level, as proposed by the translation-selection model, is less evident in mammals. FINDINGS We investigated the correlations between the expression levels of 1,182 mouse genes and amino acid composition, as well as between gene expression and codon preference. We found that a weak but significant correlation exists between gene expression levels and amino acid composition in mouse. In total, less than 10% of variation of expression levels is explained by amino acid components. We found the effect of codon preference on gene expression was weaker than the effect of amino acid composition, because no significant correlations were observed with respect to codon preference. CONCLUSION These results suggest that it is difficult to predict expression level from amino acid components or from codon bias in mouse.
Collapse
Affiliation(s)
- Kazuharu Misawa
- Research Program for Computational Science, Research and Development Group for Next-Generation Integrated Living Matter Simulation, Fusion of Data and Analysis Research and Development Team, RIKEN, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan.
| | | |
Collapse
|
21
|
Uliano E, Chaurasia A, Bernà L, Agnisola C, D'Onofrio G. Metabolic rate and genomic GC: what we can learn from teleost fish. Mar Genomics 2010; 3:29-34. [PMID: 21798194 DOI: 10.1016/j.margen.2010.02.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Revised: 02/05/2010] [Accepted: 02/11/2010] [Indexed: 11/29/2022]
Abstract
Teleosts are a highly diverse group of animals occupying all kind of aquatic environment. Data on routine mass specific metabolic rate were re-examined correcting them for the Boltzmann's factor. Teleostean fish were grouped in five broad groups, corresponding to major environmental classifications: polar, temperate, sub-tropical, tropical and deep-water. The specific routine metabolic rate, temperature-corrected using the Boltzmann's factor (MR), and the average base composition of genomes (GC%) were calculated in each group. Fish of the polar habitat showed the highest MR. Temperate fish displayed a significantly higher MR than tropical fish, which had the lowest average value. These results were apparently in agreement with the cold adaptation hypothesis. In contrast with this hypothesis, however, the MR of fish living in deep-water environment turned out to be not significantly different from that of fish living in tropical habitats. Most probably, the amount of oxygen dissolved in the water directly affects MR adaptation. Regarding the different habitats, the genomic GC levels showed a decreasing trend similar to that of MR. Indeed, both polar and temperate fish showed a GC level significantly higher than that of both sub-tropical and tropical fish. Plotting the genomic GC levels versus the MR a significant positive correlation was found, supporting the hypothesis that metabolic rate can explain not only the compositional transition mode (e.g. amphibian/mammals), but also the compositional shifting mode (e.g. fish/fish) of evolution observed for vertebrate genomes.
Collapse
Affiliation(s)
- Erminia Uliano
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
| | | | | | | | | |
Collapse
|
22
|
Mukhopadhyay P, Ghosh TC. Relationship between gene compactness and base composition in rice and human genome. J Biomol Struct Dyn 2010; 27:477-88. [PMID: 19916569 DOI: 10.1080/07391102.2010.10507332] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
In human, highly expressed genes contain shorter and fewer introns and these have been attributed to selection for economy in transcription and translation. On the other hand, in plants, it has been shown that highly expressed genes tend to be longer than lowly expressed genes. Here, in this study, we analyzed compositional influence on genome organization in both rice and human. We demonstrated that, in GC rich rice genes, highly expressed genes are less compact than lowly expressed genes. In GC-poor class, there is no difference in gene compactness between highly and lowly expressed genes. However, the scenario is different for human as there is no influence of GC composition on gene compactness due to their expression levels. We also reported that, highly expressed rice GC-rich pre-mRNA tend to form less stable secondary structure than that of lowly expressed genes. However, on removing intronic sequences, highly expressed mRNA form a stable secondary structure as compared to lowly expressed GC-rich genes. We suggest that in GC-rich rice genes long introns are under selection for enhancing transcriptional efficiency by modulating pre-mRNA secondary structural stability. Thus evolutionary mechanisms behind genome organization are different between these two genomes (human and rice).
Collapse
Affiliation(s)
- Pamela Mukhopadhyay
- Bioinformatics Centre, Bose Institute P 1/12, C.I.T. Scheme VII M - Kolkata 700054- India.
| | | |
Collapse
|
23
|
Costa JH, de Melo DF, Gouveia Z, Cardoso HG, Peixe A, Arnholdt-Schmitt B. The alternative oxidase family of Vitis vinifera reveals an attractive model to study the importance of genomic design. PHYSIOLOGIA PLANTARUM 2009; 137:553-65. [PMID: 19682279 DOI: 10.1111/j.1399-3054.2009.01267.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
'Genomic design' refers to the structural organization of gene sequences. Recently, the role of intron sequences for gene regulation is being better understood. Further, introns possess high rates of polymorphism that are considered as the major source for speciation. In molecular breeding, the length of gene-specific introns is recognized as a tool to discriminate genotypes with diverse traits of agronomic interest. 'Economy selection' and 'time-economy selection' have been proposed as models for explaining why highly expressed genes typically contain small introns. However, in contrast to these theories, plant-specific selection reveals that highly expressed genes contain introns that are large. In the presented research, 'wet'Aox gene identification from grapevine is advanced by a bioinformatics approach to study the species-specific organization of Aox gene structures in relation to available expressed sequence tag (EST) data. Two Aox1 and one Aox2 gene sequences have been identified in Vitis vinifera using grapevine cultivars from Portugal and Germany. Searching the complete genome sequence data of two grapevine cultivars confirmed that V. vinifera alternative oxidase (Aox) is encoded by a small multigene family composed of Aox1a, Aox1b and Aox2. An analysis of EST distribution revealed high expression of the VvAox2 gene. A relationship between the atypical long primary transcript of VvAox2 (in comparison to other plant Aox genes) and its expression level is suggested. V. vinifera Aox genes contain four exons interrupted by three introns except for Aox1a which contains an additional intron in the 3'-UTR. The lengths of primary Aox transcripts were estimated for each gene in two V. vinifera varieties: PN40024 and Pinot Noir. In both varieties, Aox1a and Aox1b contained small introns that corresponded to primary transcript lengths ranging from 1501 to 1810 bp. The Aox2 of PN40024 (12 329 bp) was longer than that from Pinot Noir (7279 bp) because of selection against a transposable-element insertion that is 5028 bp in size. An EST database basic local alignment search tool (BLAST) search of GenBank revealed the following ESTs percentages for each gene: Aox1a (26.2%), Aox1b (11.9%) and Aox2 (61.9%). Aox1a was expressed in fruits and roots, Aox1b expression was confined to flowers and Aox2 was ubiquitously expressed. These data for V. vinifera show that atypically long Aox intron lengths are related to high levels of gene expression. Furthermore, it is shown for the first time that two grapevine cultivars can be distinguished by Aox intron length polymorphism.
Collapse
Affiliation(s)
- José Hélio Costa
- Department of Biochemistry and Molecular Biology, Federal University of Ceará, PO Box 6029, 60455-900, Fortaleza, Ceará, Brazil
| | | | | | | | | | | |
Collapse
|
24
|
Yang H. In plants, expression breadth and expression level distinctly and non-linearly correlate with gene structure. Biol Direct 2009; 4:45; discussion 45. [PMID: 19930585 PMCID: PMC2794262 DOI: 10.1186/1745-6150-4-45] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 11/21/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Compactness of highly/broadly expressed genes in human has been explained as selection for efficiency, regional mutation biases or genomic design. However, highly expressed genes in flowering plants were shown to be less compact than lowly expressed ones. On the other hand, opposite facts have also been documented that pollen-expressed Arabidopsis genes tend to contain shorter introns and highly expressed moss genes are compact. This issue is important because it provides a chance to compare the selectionism and the neutralism views about genome evolution. Furthermore, this issue also helps to understand the fates of introns, from the angle of gene expression. RESULTS In this study, I used expression data covering more tissues and employ new analytical methods to reexamine the correlations between gene expression and gene structure for two flowering plants, Arabidopsis thaliana and Oryza sativa. It is shown that, different aspects of expression pattern correlate with different parts of gene sequences in distinct ways. In detail, expression level is significantly negatively correlated with gene size, especially the size of non-coding regions, whereas expression breadth correlates with non-coding structural parameters positively and with coding region parameters negatively. Furthermore, the relationships between expression level and structural parameters seem to be non-linear, with the extremes of structural parameters possibly scale as power-laws or logrithmic functions of expression levels. CONCLUSION In plants, highly expressed genes are compact, especially in the non-coding regions. Broadly expressed genes tend to contain longer non-coding sequences, which may be necessary for complex regulations. In combination with previous studies about other plants and about animals, some common scenarios about the correlation between gene expression and gene structure begin to emerge. Based on the functional relationships between extreme values of structural characteristics and expression level, an effort was made to evaluate the relative effectiveness of the energy-cost hypothesis and the time-cost hypothesis.
Collapse
Affiliation(s)
- Hangxing Yang
- T-Life Research Center, Department of Physics, Fudan University, Shanghai, PR China.
| |
Collapse
|
25
|
Duret L, Galtier N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet 2009; 10:285-311. [PMID: 19630562 DOI: 10.1146/annurev-genom-082908-150001] [Citation(s) in RCA: 468] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recombination is typically thought of as a symmetrical process resulting in large-scale reciprocal genetic exchanges between homologous chromosomes. Recombination events, however, are also accompanied by short-scale, unidirectional exchanges known as gene conversion in the neighborhood of the initiating double-strand break. A large body of evidence suggests that gene conversion is GC-biased in many eukaryotes, including mammals and human. AT/GC heterozygotes produce more GC- than AT-gametes, thus conferring a population advantage to GC-alleles in high-recombining regions. This apparently unimportant feature of our molecular machinery has major evolutionary consequences. Structurally, GC-biased gene conversion explains the spatial distribution of GC-content in mammalian genomes-the so-called isochore structure. Functionally, GC-biased gene conversion promotes the segregation and fixation of deleterious AT --> GC mutations, thus increasing our genomic mutation load. Here we review the recent evidence for a GC-biased gene conversion process in mammals, and its consequences for genomic landscapes, molecular evolution, and human functional genomics.
Collapse
Affiliation(s)
- Laurent Duret
- Université de Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Evolutive, F-69622, Villeurbanne, France.
| | | |
Collapse
|
26
|
ContDist: a tool for the analysis of quantitative gene and promoter properties. BMC Bioinformatics 2009; 10:7. [PMID: 19128472 PMCID: PMC2631519 DOI: 10.1186/1471-2105-10-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2008] [Accepted: 01/07/2009] [Indexed: 12/03/2022] Open
Abstract
Background The understanding of how promoter regions regulate gene expression is complicated and far from being fully understood. It is known that histones' regulation of DNA compactness, DNA methylation, transcription factor binding sites and CpG islands play a role in the transcriptional regulation of a gene. Many high-throughput techniques exist nowadays which permit the detection of epigenetic marks and regulatory elements in the promoter regions of thousands of genes. However, so far the subsequent analysis of such experiments (e.g. the resulting gene lists) have been hampered by the fact that currently no tool exists for a detailed analysis of the promoter regions. Results We present ContDist, a tool to statistically analyze quantitative gene and promoter properties. The software includes approximately 200 quantitative features of gene and promoter regions for 7 commonly studied species. In contrast to "traditionally" ontological analysis which only works on qualitative data, all the features in the underlying annotation database are quantitative gene and promoter properties. Utilizing the strong focus on the promoter region of this tool, we show its usefulness in two case studies; the first on differentially methylated promoters and the second on the fundamental differences between housekeeping and tissue specific genes. The two case studies allow both the confirmation of recent findings as well as revealing previously unreported biological relations. Conclusion ContDist is a new tool with two important properties: 1) it has a strong focus on the promoter region which is usually disregarded by virtually all ontology tools and 2) it uses quantitative (continuously distributed) features of the genes and its promoter regions which are not available in any other tool. ContDist is available from
Collapse
|
27
|
Comparative analysis of distinct non-coding characteristics potentially contributing to the divergence of human tissue-specific genes. Genetica 2008; 136:127-34. [DOI: 10.1007/s10709-008-9323-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2007] [Accepted: 08/25/2008] [Indexed: 10/21/2022]
|
28
|
Schmidt T, Frishman D. Assignment of isochores for all completely sequenced vertebrate genomes using a consensus. Genome Biol 2008; 9:R104. [PMID: 18590563 PMCID: PMC2481423 DOI: 10.1186/gb-2008-9-6-r104] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Revised: 05/22/2008] [Accepted: 06/30/2008] [Indexed: 11/16/2022] Open
Abstract
A new consensus isochore assignment method and a database of isochore maps for all completely sequenced vertebrate genomes are presented. We show that although the currently available isochore mapping methods agree on the isochore classification of about two-thirds of the human DNA, they produce significantly different results with regard to the location of isochore boundaries and isochore length distribution. We present a new consensus isochore assignment method based on majority voting and provide IsoBase, a comprehensive on-line database of isochore maps for all completely sequenced vertebrate genomes.
Collapse
Affiliation(s)
- Thorsten Schmidt
- Department of Genome-Oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, D-85350 Freising, Germany
| | | |
Collapse
|
29
|
Chojnowski JL, Braun EL. Turtle isochore structure is intermediate between amphibians and other amniotes. Integr Comp Biol 2008; 48:454-62. [PMID: 21669806 DOI: 10.1093/icb/icn062] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Vertebrate genomes are comprised of isochores that are relatively long (>100 kb) regions with a relatively homogenous (either GC-rich or AT-rich) base composition and with rather sharp boundaries with neighboring isochores. Mammals and living archosaurs (birds and crocodilians) have heterogeneous genomes that include very GC-rich isochores. In sharp contrast, the genomes of amphibians and fishes are more homogeneous and they have a lower overall GC content. Because DNA with higher GC content is more thermostable, the elevated GC content of mammalian and archosaurian DNA has been hypothesized to be an adaptation to higher body temperatures. This hypothesis can be tested by examining structure of isochores across the reptilian clade, which includes the archosaurs, testudines (turtles), and lepidosaurs (lizards and snakes), because reptiles exhibit diverse body sizes, metabolic rates, and patterns of thermoregulation. This study focuses on a comparative analysis of a new set of expressed genes of the red-eared slider turtle and orthologs of the turtle genes in mammalian (human, mouse, dog, and opossum), archosaurian (chicken and alligator), and amphibian (western clawed frog) genomes. EST (expressed sequence tag) data from a turtle cDNA library enriched for genes that have specialized functions (developmental genes) revealed using the GC content of the third-codon-position to examine isochore structure requires careful consideration of the types of genes examined. The more highly expressed genes (e.g., housekeeping genes) are more likely to be GC-rich than are genes with specialized functions. However, the set of highly expressed turtle genes demonstrated that the turtle genome has a GC content that is intermediate between the GC-poor amphibians and the GC-rich mammals and archosaurs. There was a strong correlation between the GC content of all turtle genes and the GC content of other vertebrate genes, with the slope of the line describing this relationship also indicating that the isochore structure of turtles is intermediate between that of amphibians and other amniotes. These data are consistent with some thermal hypotheses of isochore evolution, but we believe that the credible set of models for isochore evolution still includes a variety of models. These data expand the amount of genomic data available from reptiles upon which future studies of reptilian genomics can build.
Collapse
Affiliation(s)
- Jena L Chojnowski
- Department of Zoology, University of Florida, 223 Bartram Hall, PO Box 118525, Gainesville, FL 32611, USA
| | | |
Collapse
|
30
|
Duret L, Arndt PF. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet 2008; 4:e1000071. [PMID: 18464896 PMCID: PMC2346554 DOI: 10.1371/journal.pgen.1000071] [Citation(s) in RCA: 254] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2007] [Accepted: 04/11/2008] [Indexed: 01/19/2023] Open
Abstract
Unraveling the evolutionary forces responsible for variations of neutral substitution patterns among taxa or along genomes is a major issue for detecting selection within sequences. Mammalian genomes show large-scale regional variations of GC-content (the isochores), but the substitution processes at the origin of this structure are poorly understood. We analyzed the pattern of neutral substitutions in 1 Gb of primate non-coding regions. We show that the GC-content toward which sequences are evolving is strongly negatively correlated to the distance to telomeres and positively correlated to the rate of crossovers (R2 = 47%). This demonstrates that recombination has a major impact on substitution patterns in human, driving the evolution of GC-content. The evolution of GC-content correlates much more strongly with male than with female crossover rate, which rules out selectionist models for the evolution of isochores. This effect of recombination is most probably a consequence of the neutral process of biased gene conversion (BGC) occurring within recombination hotspots. We show that the predictions of this model fit very well with the observed substitution patterns in the human genome. This model notably explains the positive correlation between substitution rate and recombination rate. Theoretical calculations indicate that variations in population size or density in recombination hotspots can have a very strong impact on the evolution of base composition. Furthermore, recombination hotspots can create strong substitution hotspots. This molecular drive affects both coding and non-coding regions. We therefore conclude that along with mutation, selection and drift, BGC is one of the major factors driving genome evolution. Our results also shed light on variations in the rate of crossover relative to non-crossover events, along chromosomes and according to sex, and also on the conservation of hotspot density between human and chimp. Mammalian genomes show a very strong heterogeneity of base composition along chromosomes (the so-called isochores). The functional significance of these peculiar genomic landscapes is highly debated: do isochores confer some selective advantage, or are they simply the by-product of neutral evolutionary processes? To resolve this issue, we analyzed the pattern of substitution in the human genome by comparison with chimpanzee and macaque. We show that the evolution of base composition (GC-content) is essentially determined by the rate of recombination. This effect appears to be much stronger in male than in female germline, which rules out selective explanations for the evolution of isochores. We show that this impact of recombination is most probably a consequence of the process of biased gene conversion (BGC). This neutral process mimics the action of selection and can induce strong substitution hotspots within recombination hotspots, sometimes leading to the fixation of deleterious mutations. BGC appears to be one of the major factors driving genome evolution. It is therefore essential to take this process into account if we want to be able to interpret genome sequences.
Collapse
Affiliation(s)
- Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Villeurbanne, France
- * E-mail: (LD); (PFA)
| | - Peter F. Arndt
- Department for Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
- * E-mail: (LD); (PFA)
| |
Collapse
|
31
|
Pozzoli U, Menozzi G, Fumagalli M, Cereda M, Comi GP, Cagliani R, Bresolin N, Sironi M. Both selective and neutral processes drive GC content evolution in the human genome. BMC Evol Biol 2008; 8:99. [PMID: 18371205 PMCID: PMC2292697 DOI: 10.1186/1471-2148-8-99] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2007] [Accepted: 03/27/2008] [Indexed: 11/10/2022] Open
Abstract
Background Mammalian genomes consist of regions differing in GC content, referred to as isochores or GC-content domains. The scientific debate is still open as to whether such compositional heterogeneity is a selected or neutral trait. Results Here we analyze SNP allele frequencies, retrotransposon insertion polymorphisms (RIPs), as well as fixed substitutions accumulated in the human lineage since its divergence from chimpanzee to indicate that biased gene conversion (BGC) has been playing a role in within-genome GC content variation. Yet, a distinct contribution to GC content evolution is accounted for by a selective process. Accordingly, we searched for independent evidences that GC content distribution does not conform to neutral expectations. Indeed, after correcting for possible biases, we show that intron GC content and size display isochore-specific correlations. Conclusion We consider that the more parsimonious explanation for our results is that GC content is subjected to the action of both weak selection and BGC in the human genome with features such as nucleosome positioning or chromatin conformation possibly representing the final target of selective processes. This view might reconcile previous contrasting findings and add some theoretical background to recent evidences suggesting that GC content domains display different behaviors with respect to highly regulated biological processes such as developmentally-stage related gene expression and programmed replication timing during neural stem cell differentiation.
Collapse
Affiliation(s)
- Uberto Pozzoli
- Scientific Institute IRCCS E, Medea, Bioinformatic Lab, Via don L, Monza 20, 23842 Bosisio Parini (LC), Italy.
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Levitsky VG, Ignatieva EV, Ananko EA, Turnaev II, Merkulova TI, Kolchanov NA, Hodgman TC. Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions. BMC Bioinformatics 2007; 8:481. [PMID: 18093302 PMCID: PMC2265442 DOI: 10.1186/1471-2105-8-481] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2007] [Accepted: 12/19/2007] [Indexed: 12/22/2022] Open
Abstract
Background Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered. Results To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-κB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies. To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA. Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies. Conclusion Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.
Collapse
Affiliation(s)
- Victor G Levitsky
- Institute of Cytology and Genetics SB RAS, Novosibirsk, 630090, Russia.
| | | | | | | | | | | | | |
Collapse
|
33
|
Different functional classes of genes are characterized by different compositional properties. FEBS Lett 2007; 581:5819-24. [DOI: 10.1016/j.febslet.2007.11.052] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 11/19/2022]
|
34
|
Ren L, Gao G, Zhao D, Ding M, Luo J, Deng H. Developmental stage related patterns of codon usage and genomic GC content: searching for evolutionary fingerprints with models of stem cell differentiation. Genome Biol 2007; 8:R35. [PMID: 17349061 PMCID: PMC1868930 DOI: 10.1186/gb-2007-8-3-r35] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2006] [Revised: 01/08/2007] [Accepted: 03/12/2007] [Indexed: 11/26/2022] Open
Abstract
Developmental-stage-related patterns of gene expression correlate with codon usage and genomic GC content in stem cell hierarchies. Background The usage of synonymous codons shows considerable variation among mammalian genes. How and why this usage is non-random are fundamental biological questions and remain controversial. It is also important to explore whether mammalian genes that are selectively expressed at different developmental stages bear different molecular features. Results In two models of mouse stem cell differentiation, we established correlations between codon usage and the patterns of gene expression. We found that the optimal codons exhibited variation (AT- or GC-ending codons) in different cell types within the developmental hierarchy. We also found that genes that were enriched (developmental-pivotal genes) or specifically expressed (developmental-specific genes) at different developmental stages had different patterns of codon usage and local genomic GC (GCg) content. Moreover, at the same developmental stage, developmental-specific genes generally used more GC-ending codons and had higher GCg content compared with developmental-pivotal genes. Further analyses suggest that the model of translational selection might be consistent with the developmental stage-related patterns of codon usage, especially for the AT-ending optimal codons. In addition, our data show that after human-mouse divergence, the influence of selective constraints is still detectable. Conclusion Our findings suggest that developmental stage-related patterns of gene expression are correlated with codon usage (GC3) and GCg content in stem cell hierarchies. Moreover, this paper provides evidence for the influence of natural selection at synonymous sites in the mouse genome and novel clues for linking the molecular features of genes to their patterns of expression during mammalian ontogenesis.
Collapse
Affiliation(s)
- Lichen Ren
- College of Life Sciences, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Ge Gao
- Center for Bioinformatics, College of Life Sciences, National Laboratory of Protein Engineering and Plant Genetics Engineering, Peking University, Beijing, 100871, PR China
| | - Dongxin Zhao
- Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, 100871, PR China
| | - Mingxiao Ding
- Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, 100871, PR China
| | - Jingchu Luo
- Center for Bioinformatics, College of Life Sciences, National Laboratory of Protein Engineering and Plant Genetics Engineering, Peking University, Beijing, 100871, PR China
| | - Hongkui Deng
- Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, 100871, PR China
| |
Collapse
|
35
|
Sharma VK, Kumar N, Brahmachari SK, Ramachandran S. Abundance of dinucleotide repeats and gene expression are inversely correlated: a role for gene function in addition to intron length. Physiol Genomics 2007; 31:96-103. [PMID: 17550993 DOI: 10.1152/physiolgenomics.00183.2006] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
High and broad transcription of eukaryotic genes is facilitated by cost minimization, clustered localization in the genome, elevated G+C content, and low nucleosome formation potential. In this scenario, illumination of correlation between abundance of (TG/CA)(n>or=12) repeats, which are negative cis modulators of transcription, and transcriptional levels and other commonly occurring dinucleotide repeats, is required. Three independent microarray datasets were used to examine the correlation of (TG/CA)(n>or=12) and other dinucleotide repeats with gene expression. Compared with the expected equi-distribution pattern under neutral model, highly transcribed genes were poor in repeats, and conversely, weakly transcribed genes were rich in repeats. Furthermore, the inverse correlation between repeat abundance and transcriptional levels appears to be a global phenomenon encompassing all genes regardless of their breadth of transcription. This selective pattern of exclusion of (TG/CA)(n>or=12) and (AT)(n>or=12) repeats in highly transcribed genes is an additional factor along with cost minimization and elevated GC, and therefore, multiple factors govern high transcription of genes. We observed that even after controlling for the effects of GC and average intron lengths, the effect of repeats albeit somewhat weaker was persistent and definite. In the ribosomal protein coding genes, sequence analysis of orthologs suggests that negative selection for repeats perhaps occurred early in evolution. These observations suggest that negative selection of (TG/CA)(n>or=12) microsatellites in the evolution of the highly expressed genes was also controlled by gene function in addition to intron length.
Collapse
Affiliation(s)
- Vineet K Sharma
- G. N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Delhi, India
| | | | | | | |
Collapse
|
36
|
Prendergast JGD, Campbell H, Gilbert N, Dunlop MG, Bickmore WA, Semple CAM. Chromatin structure and evolution in the human genome. BMC Evol Biol 2007; 7:72. [PMID: 17490477 PMCID: PMC1876461 DOI: 10.1186/1471-2148-7-72] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2006] [Accepted: 05/09/2007] [Indexed: 11/16/2022] Open
Abstract
Background Evolutionary rates are not constant across the human genome but genes in close proximity have been shown to experience similar levels of divergence and selection. The higher-order organisation of chromosomes has often been invoked to explain such phenomena but previously there has been insufficient data on chromosome structure to investigate this rigorously. Using the results of a recent genome-wide analysis of open and closed human chromatin structures we have investigated the global association between divergence, selection and chromatin structure for the first time. Results In this study we have shown that, paradoxically, synonymous site divergence (dS) at non-CpG sites is highest in regions of open chromatin, primarily as a result of an increased number of transitions, while the rates of other traditional measures of mutation (intergenic, intronic and ancient repeat divergence as well as SNP density) are highest in closed regions of the genome. Analysis of human-chimpanzee divergence across intron-exon boundaries indicates that although genes in relatively open chromatin generally display little selection at their synonymous sites, those in closed regions show markedly lower divergence at their fourfold degenerate sites than in neighbouring introns and intergenic regions. Exclusion of known Exonic Splice Enhancer hexamers has little affect on the divergence observed at fourfold degenerate sites across chromatin categories; however, we show that closed chromatin is enriched with certain classes of ncRNA genes whose RNA secondary structure may be particularly important. Conclusion We conclude that, overall, non-CpG mutation rates are lowest in open regions of the genome and that regions of the genome with a closed chromatin structure have the highest background mutation rate. This might reflect lower rates of DNA damage or enhanced DNA repair processes in regions of open chromatin. Our results also indicate that dS is a poor measure of mutation rates, particularly when used in closed regions of the genome, as genes in closed regions generally display relatively strong levels of selection at their synonymous sites.
Collapse
Affiliation(s)
- James GD Prendergast
- Colon Cancer Genetics Group, Division of Oncology, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Harry Campbell
- Public Health Sciences, Department of Community Health Sciences, University of Edinburgh, Edinburgh, UK
| | - Nick Gilbert
- MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU,UK
| | - Malcolm G Dunlop
- Colon Cancer Genetics Group, Division of Oncology, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Wendy A Bickmore
- MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU,UK
| | - Colin AM Semple
- MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU,UK
| |
Collapse
|
37
|
Abstract
Compact genes contain short and few introns, and they are highly expressed in different animal genomes. Recently, it has been shown that in Oryza sativa and Arabidopsis thaliana, highly expressed genes tend to be least compact, containing long and many introns. It has been suggested that selection on genome organization may have acted differently in plants compared with animals. Gene expression can be estimated as the number of hits when comparing a gene sequence with publicly available expressed sequence tags. Here it is shown that in the haploid moss Physcomitrella pates, highly expressed genes contain shorter introns than genes with low expression levels. This study therefore supports the hypothesis that selection may strongly favour transcriptional efficiency at least in the haploid phase of plant life cycles. It is concluded that plants do not necessarily respond to other selection pressures than animals regarding genome structuring.
Collapse
Affiliation(s)
- H K Stenøien
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway.
| |
Collapse
|
38
|
Freudenberg J, Fu YH, Ptácek LJ. Human recombination rates are increased around accelerated conserved regions—evidence for continued selection? Bioinformatics 2007; 23:1441-3. [PMID: 17463031 DOI: 10.1093/bioinformatics/btm137] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION We hypothesized that recombination rates might be increased at genetic loci that are subject to more intense selection. Here, we test this hypothesis by using a recently published set of accelerated conserved regions and fine-scale recombination rate estimates provided by the HapMap project. RESULTS We observed that fine-scale recombination rates are increased around conserved noncoding regions that show accelerated evolution in human or chimp, as compared to noncoding regions showing accelerated evolution in mouse and those being conserved between human and fugu. Recombination rates around hominid accelerated conserved regions (ACRs) are furthermore increased as compared to exonic regions. On the other hand, GC-content is reduced around ACRs, excluding a major confounding influence of GC-content on the observed variation in recombination rate. CONCLUSION Our observations indicate that selection intensity could be an important determinant of local recombination rate variation and that continued positive selection might act at many ACR loci. Alternatively, a confounding factor needs to be found that causes a congruent signal in recombination rate estimates based on human polymorphism data and in the comparative genomic data. Researchers who consider the explanation involving selection as more likely may expect more common functional sequence variants at ACRs in genetic association studies.
Collapse
Affiliation(s)
- Jan Freudenberg
- University of California San Francisco, Department of Neurology, Institute of Human Genetics, San Francisco, CA 94158-2922, USA.
| | | | | |
Collapse
|
39
|
Du Z, Kong P, Gao Y, Li N. Enrichment of G4 DNA motif in transcriptional regulatory region of chicken genome. Biochem Biophys Res Commun 2007; 354:1067-70. [PMID: 17275786 DOI: 10.1016/j.bbrc.2007.01.093] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2007] [Accepted: 01/18/2007] [Indexed: 11/21/2022]
Abstract
G-quadruplex or G4 DNA is a stable, four-stranded DNA structure formed from guanine-rich regions. Based on the hypothesis that G4 DNA participated in the regulation of transcription, we analyzed G4 DNA in 5kb 5' flanking regions of 2892 chicken RefSeq genes with annotated transcription start sites (TSS). In total, 4769 distinct putative G4 DNA motifs (G4M) were identified in 1880 (65%) genes. The pattern of distribution of the G4M showed a gradient along the 5' flanking regions; from -5 to -4kb, to -1kb to the TSS, the frequency (number of G4M per kilobase) increased significantly from 0.192 to 0.768, and 62.56% of the G4M in the 1kb upstream regions were located in the region -400 to the TSS, where a core promoter is always present. Thus, 38.24% of the analyzed genes contained at least one G4M in the 400bp upstream region. Our findings support the hypothesis that G4M are involved in gene transcription.
Collapse
Affiliation(s)
- Zhuo Du
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
| | | | | | | |
Collapse
|
40
|
Dekker J. GC- and AT-rich chromatin domains differ in conformation and histone modification status and are differentially modulated by Rpd3p. Genome Biol 2007; 8:R116. [PMID: 17577398 PMCID: PMC2394764 DOI: 10.1186/gb-2007-8-6-r116] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Accepted: 06/18/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Base-composition varies throughout the genome and is related to organization of chromosomes in distinct domains (isochores). Isochore domains differ in gene expression levels, replication timing, levels of meiotic recombination and chromatin structure. The molecular basis for these differences is poorly understood. RESULTS We have compared GC- and AT-rich isochores of yeast with respect to chromatin conformation, histone modification status and transcription. Using 3C analysis we show that, along chromosome III, GC-rich isochores have a chromatin structure that is characterized by lower chromatin interaction frequencies compared to AT-rich isochores, which may point to a more extended chromatin conformation. In addition, we find that throughout the genome, GC-rich and AT-rich genes display distinct levels of histone modifications. Interestingly, elimination of the histone deacetylase Rpd3p differentially affects conformation of GC- and AT-rich domains. Further, deletion of RPD3 activates expression of GC-rich genes more strongly than AT-rich genes. Analyses of effects of the histone deacetylase inhibitor trichostatin A, global patterns of Rpd3p binding and effects of deletion of RPD3 on histone H4 acetylation confirmed that conformation and activity of GC-rich chromatin are more sensitive to Rpd3p-mediated deacetylation than AT-rich chromatin. CONCLUSION We find that GC-rich and AT-rich chromatin domains display distinct chromatin conformations and are marked by distinct patterns of histone modifications. We identified the histone deacetylase Rpd3p as an attenuator of these base composition-dependent differences in chromatin status. We propose that GC-rich chromatin domains tend to occur in a more active conformation and that Rpd3p activity represses this propensity throughout the genome.
Collapse
Affiliation(s)
- Job Dekker
- Program in Gene Function and Expression and Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Plantation Street, Worcester, MA 01605-4321, USA.
| |
Collapse
|
41
|
Abstract
Research into the origins of introns is at a critical juncture in the resolution of theories on the evolution of early life (which came first, RNA or DNA?), the identity of LUCA (the last universal common ancestor, was it prokaryotic- or eukaryotic-like?), and the significance of noncoding nucleotide variation. One early notion was that introns would have evolved as a component of an efficient mechanism for the origin of genes. But alternative theories emerged as well. From the debate between the "introns-early" and "introns-late" theories came the proposal that introns arose before the origin of genetically encoded proteins and DNA, and the more recent "introns-first" theory, which postulates the presence of introns at that early evolutionary stage from a reconstruction of the "RNA world." Here we review seminal and recent ideas about intron origins. Recent discoveries about the patterns and causes of intron evolution make this one of the most hotly debated and exciting topics in molecular evolutionary biology today.
Collapse
Affiliation(s)
- Francisco Rodríguez-Trelles
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525, USA.
| | | | | |
Collapse
|
42
|
Pozzoli U, Menozzi G, Comi GP, Cagliani R, Bresolin N, Sironi M. Intron size in mammals: complexity comes to terms with economy. Trends Genet 2006; 23:20-4. [PMID: 17070957 DOI: 10.1016/j.tig.2006.10.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2006] [Revised: 09/18/2006] [Accepted: 10/18/2006] [Indexed: 11/23/2022]
Abstract
Different and contrasting models have been proposed to explain intron size evolution in mammals. Here, we demonstrate that intron and intergenic size per se has no adaptive role in gene expression regulation but reflects the need to preserve conserved intronic elements. Although the amount of non-coding functional elements explains the within-genome size variation of intergenic spacers, we show that an additional, additive pressure has been acting on highly expressed introns to reduce the cost of their transcription.
Collapse
Affiliation(s)
- Uberto Pozzoli
- Bioinformatic Laboratory, Scientific Institute IRCCS E. Medea, Via don L. Monza 20, 23842 Bosisio Parini (LC), Italy
| | | | | | | | | | | |
Collapse
|
43
|
Abstract
Human tissue-specific genes were reported to be longer than housekeeping genes (both in coding and intronic parts). The competing neutralist and adaptationist models were proposed to explain this observation. Here I show that in human genome the longest are genes with the intermediate expression pattern. From the standpoint of information theory, the regulation of such genes should be most complex. In the genomewide context, they are found here to have the higher informational load on all available levels: from participation in protein interaction networks, pathways and modules reflected in Gene Ontology categories through transcription factor regulatory sets and protein functional domains to amino acid tuples (words) in encoded proteins and nucleotide tuples in introns and promoter regions. Thus, the intermediately expressed genes have the higher functional and regulatory complexity that is reflected in their greater length (which is consistent with the 'genome design' model). The dichotomy of housekeeping versus tissue-specific entities is more pronounced on the modular level than on the molecular level. There are much lesser intermediate-specific modules (modules overrepresented in the intermediately expressed genes) than housekeeping or tissue-specific modules (normalized to gene number). The dichotomy of housekeeping versus tissue-specific genes and modules in multicellular organisms is probably caused by the burden of regulatory complexity acted on the intermediately expressed genes.
Collapse
|
44
|
Luykx P, Bajić IV, Khuri S. NXSensor web tool for evaluating DNA for nucleosome exclusion sequences and accessibility to binding factors. Nucleic Acids Res 2006; 34:W560-5. [PMID: 16845070 PMCID: PMC1538820 DOI: 10.1093/nar/gkl158] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Nucleosomes, a basic structural unit of eukaryotic chromatin, play a significant role in regulating gene expression. We have developed a web tool based on DNA sequences known from empirical and theoretical studies to influence DNA bending and flexibility, and to exclude nucleosomes. NXSensor (available at ) finds nucleosome exclusion sequences, evaluates their length and spacing, and computes an ‘accessibility score’ giving the proportion of base pairs likely to be nucleosome-free. Application of NXSensor to the promoter regions of housekeeping (HK) genes and those of tissue-specific (TS) genes revealed a significant difference between the two classes of gene, the former being significantly more open, on average, particularly near transcription start sites (TSSs). NXSensor should be a useful tool in assessing the likelihood of nucleosome formation in regions involved in gene regulation and other aspects of chromatin function.
Collapse
Affiliation(s)
| | - Ivan V. Bajić
- School of Engineering Science, Simon Fraser UniversityBurnaby, B.C. V5A 1S6, Canada
| | - Sawsan Khuri
- The Dr. John T. Macdonald Foundation Center for Medical Genetics, University of Miami Miller School of MedicineMiami, FL 33101 USA
- To whom correspondence should be addressed. Tel: +1 305 243 6069; Fax: +1 305 243 3919;
| |
Collapse
|
45
|
Li W, Miramontes P. Large-scale oscillation of structure-related DNA sequence features in human chromosome 21. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 74:021912. [PMID: 17025477 DOI: 10.1103/physreve.74.021912] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2006] [Indexed: 05/12/2023]
Abstract
Human chromosome 21 is the only chromosome in the human genome that exhibits oscillation of the (G+C) content of a cycle length of hundreds kilobases (kb) ( 500 kb near the right telomere). We aim at establishing the existence of a similar periodicity in structure-related sequence features in order to relate this (G+C)% oscillation to other biological phenomena. The following quantities are shown to oscillate with the same 500 kb periodicity in human chromosome 21: binding energy calculated by two sets of dinucleotide-based thermodynamic parameters, AA/TT and AAA/TTT bi- and tri-nucleotide density, 5'-TA-3' dinucleotide density, and signal for 10- or 11-base periodicity of AA/TT or AAA/TTT. These intrinsic quantities are related to structural features of the double helix of DNA molecules, such as base-pair binding, untwisting or unwinding, stiffness, and a putative tendency for nucleosome formation.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, New York 11030, USA.
| | | |
Collapse
|
46
|
Orlov YL, Levitskii VG, Smirnova OG, Podkolodnaya OA, Khlebodarova TM, Kolchanov NA. Statistical analysis of DNA sequences containing nucleosome positioning sites. Biophysics (Nagoya-shi) 2006. [DOI: 10.1134/s0006350906040051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
47
|
Gaillard C, Strauss F. DNA topology and genome organization in higher eukaryotes: a model. J Theor Biol 2006; 243:604-7. [PMID: 16930627 DOI: 10.1016/j.jtbi.2006.07.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2006] [Revised: 07/04/2006] [Accepted: 07/06/2006] [Indexed: 11/28/2022]
|
48
|
Bajic VB, Tan SL, Christoffels A, Schönbach C, Lipovich L, Yang L, Hofmann O, Kruger A, Hide W, Kai C, Kawai J, Hume DA, Carninci P, Hayashizaki Y. Mice and men: their promoter properties. PLoS Genet 2006; 2:e54. [PMID: 16683032 PMCID: PMC1449896 DOI: 10.1371/journal.pgen.0020054] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2005] [Accepted: 02/27/2006] [Indexed: 12/28/2022] Open
Abstract
Using the two largest collections of Mus musculus and Homo sapiens transcription start sites (TSSs) determined based on CAGE tags, ditags, full-length cDNAs, and other transcript data, we describe the compositional landscape surrounding TSSs with the aim of gaining better insight into the properties of mammalian promoters. We classified TSSs into four types based on compositional properties of regions immediately surrounding them. These properties highlighted distinctive features in the extended core promoters that helped us delineate boundaries of the transcription initiation domain space for both species. The TSS types were analyzed for associations with initiating dinucleotides, CpG islands, TATA boxes, and an extensive collection of statistically significant cis-elements in mouse and human. We found that different TSS types show preferences for different sets of initiating dinucleotides and cis-elements. Through Gene Ontology and eVOC categories and tissue expression libraries we linked TSS characteristics to expression. Moreover, we show a link of TSS characteristics to very specific genomic organization in an example of immune-response-related genes (GO:0006955). Our results shed light on the global properties of the two transcriptomes not revealed before and therefore provide the framework for better understanding of the transcriptional mechanisms in the two species, as well as a framework for development of new and more efficient promoter- and gene-finding tools.
Collapse
Affiliation(s)
- Vladimir B Bajic
- Knowledge Extraction Laboratory, Institute for Infocomm Research, Singapore.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Vinogradov AE, Anatskaya OV. Genome size and metabolic intensity in tetrapods: a tale of two lines. Proc Biol Sci 2006; 273:27-32. [PMID: 16519230 PMCID: PMC1560010 DOI: 10.1098/rspb.2005.3266] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
We show the negative link between genome size and metabolic intensity in tetrapods, using the heart index (relative heart mass) as a unified indicator of metabolic intensity in poikilothermal and homeothermal animals. We found two separate regression lines of heart index on genome size for reptiles-birds and amphibians-mammals (the slope of regression is steeper in reptiles-birds). We also show a negative correlation between GC content and nucleosome formation potential in vertebrate DNA, and, consistent with this relationship, a positive correlation between genome GC content and nuclear size (independent of genome size). It is known that there are two separate regression lines of genome GC content on genome size for reptiles-birds and amphibians-mammals: reptiles-birds have the relatively higher GC content (for their genome sizes) compared to amphibians-mammals. Our results suggest uniting all these data into one concept. The slope of negative regression between GC content and nucleosome formation potential is steeper in exons than in non-coding DNA (where nucleosome formation potential is generally higher), which indicates a special role of non-coding DNA for orderly chromatin organization. The chromatin condensation and nuclear size are supposed to be key parameters that accommodate the effects of both genome size and GC content and connect them with metabolic intensity. Our data suggest that the reptilian-birds clade evolved special relationships among these parameters, whereas mammals preserved the amphibian-like relationships. Surprisingly, mammals, although acquiring a more complex general organization, seem to retain certain genome-related properties that are similar to amphibians. At the same time, the slope of regression between nucleosome formation potential and GC content is steeper in poikilothermal than in homeothermal genomes, which suggests that mammals and birds acquired certain common features of genomic organization.
Collapse
Affiliation(s)
- Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, St Petersburg 194064, Russia.
| | | |
Collapse
|
50
|
Webster MT, Axelsson E, Ellegren H. Strong Regional Biases in Nucleotide Substitution in the Chicken Genome. Mol Biol Evol 2006; 23:1203-16. [PMID: 16551647 DOI: 10.1093/molbev/msk008] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Interspersed repeats have emerged as a valuable tool for studying neutral patterns of molecular evolution. Here we analyze variation in the rate and pattern of nucleotide substitution across all autosomes in the chicken genome by comparing the present-day CR1 repeat sequences with their ancestral copies and reconstructing nucleotide substitutions with a maximum likelihood model. The results shed light on the origin and evolution of large-scale heterogeneity in GC content found in the genomes of birds and mammals--the isochore structure. In contrast to mammals, where GC content is becoming homogenized, heterogeneity in GC content is being reinforced in the chicken genome. This is also supported by patterns of substitution inferred from alignments of introns in chicken, turkey, and quail. Analysis of individual substitution frequencies is consistent with the biased gene conversion (BGC) model of isochore evolution, and it is likely that patterns of evolution in the chicken genome closely resemble those in the ancestral amniote genome, when it is inferred that isochores originated. Microchromosomes and distal regions of macrochromosomes are found to have elevated substitution rates and a more GC-biased pattern of nucleotide substitution. This can largely be accounted for by a strong correlation between GC content and the rate and pattern of substitution. The results suggest that an interaction between increased mutability at CpG motifs and fixation biases due to BGC could explain increased levels of divergence in GC-rich regions.
Collapse
Affiliation(s)
- Matthew T Webster
- Department of Evolution, Genomics and Systematics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.
| | | | | |
Collapse
|