1
|
Bai MZ, Guo YY. Bioinformatics Analysis of MSH1 Genes of Green Plants: Multiple Parallel Length Expansions, Intron Gains and Losses, Partial Gene Duplications, and Alternative Splicing. Int J Mol Sci 2023; 24:13620. [PMID: 37686425 PMCID: PMC10487979 DOI: 10.3390/ijms241713620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 08/28/2023] [Accepted: 08/29/2023] [Indexed: 09/10/2023] Open
Abstract
MutS homolog 1 (MSH1) is involved in the recombining and repairing of organelle genomes and is essential for maintaining their stability. Previous studies indicated that the length of the gene varied greatly among species and detected species-specific partial gene duplications in Physcomitrella patens. However, there are critical gaps in the understanding of the gene size expansion, and the extent of the partial gene duplication of MSH1 remains unclear. Here, we screened MSH1 genes in 85 selected species with genome sequences representing the main clades of green plants (Viridiplantae). We identified the MSH1 gene in all lineages of green plants, except for nine incomplete species, for bioinformatics analysis. The gene is a singleton gene in most of the selected species with conserved amino acids and protein domains. Gene length varies greatly among the species, ranging from 3234 bp in Ostreococcus tauri to 805,861 bp in Cycas panzhihuaensis. The expansion of MSH1 repeatedly occurred in multiple clades, especially in Gymnosperms, Orchidaceae, and Chloranthus spicatus. MSH1 has exceptionally long introns in certain species due to the gene length expansion, and the longest intron even reaches 101,025 bp. And the gene length is positively correlated with the proportion of the transposable elements (TEs) in the introns. In addition, gene structure analysis indicated that the MSH1 of green plants had undergone parallel intron gains and losses in all major lineages. However, the intron number of seed plants (gymnosperm and angiosperm) is relatively stable. All the selected gymnosperms contain 22 introns except for Gnetum montanum and Welwitschia mirabilis, while all the selected angiosperm species preserve 21 introns except for the ANA grade. Notably, the coding region of MSH1 in algae presents an exceptionally high GC content (47.7% to 75.5%). Moreover, over one-third of the selected species contain species-specific partial gene duplications of MSH1, except for the conserved mosses-specific partial gene duplication. Additionally, we found conserved alternatively spliced MSH1 transcripts in five species. The study of MSH1 sheds light on the evolution of the long genes of green plants.
Collapse
Affiliation(s)
| | - Yan-Yan Guo
- College of Plant Protection, Henan Agricultural University, Zhengzhou 450046, China
| |
Collapse
|
2
|
Genome Evolution and the Future of Phylogenomics of Non-Avian Reptiles. Animals (Basel) 2023; 13:ani13030471. [PMID: 36766360 PMCID: PMC9913427 DOI: 10.3390/ani13030471] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/13/2023] [Accepted: 01/15/2023] [Indexed: 02/01/2023] Open
Abstract
Non-avian reptiles comprise a large proportion of amniote vertebrate diversity, with squamate reptiles-lizards and snakes-recently overtaking birds as the most species-rich tetrapod radiation. Despite displaying an extraordinary diversity of phenotypic and genomic traits, genomic resources in non-avian reptiles have accumulated more slowly than they have in mammals and birds, the remaining amniotes. Here we review the remarkable natural history of non-avian reptiles, with a focus on the physical traits, genomic characteristics, and sequence compositional patterns that comprise key axes of variation across amniotes. We argue that the high evolutionary diversity of non-avian reptiles can fuel a new generation of whole-genome phylogenomic analyses. A survey of phylogenetic investigations in non-avian reptiles shows that sequence capture-based approaches are the most commonly used, with studies of markers known as ultraconserved elements (UCEs) especially well represented. However, many other types of markers exist and are increasingly being mined from genome assemblies in silico, including some with greater information potential than UCEs for certain investigations. We discuss the importance of high-quality genomic resources and methods for bioinformatically extracting a range of marker sets from genome assemblies. Finally, we encourage herpetologists working in genomics, genetics, evolutionary biology, and other fields to work collectively towards building genomic resources for non-avian reptiles, especially squamates, that rival those already in place for mammals and birds. Overall, the development of this cross-amniote phylogenomic tree of life will contribute to illuminate interesting dimensions of biodiversity across non-avian reptiles and broader amniotes.
Collapse
|
3
|
Srikulnath K, Ahmad SF, Singchat W, Panthum T. Why Do Some Vertebrates Have Microchromosomes? Cells 2021; 10:2182. [PMID: 34571831 PMCID: PMC8466491 DOI: 10.3390/cells10092182] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 08/17/2021] [Accepted: 08/17/2021] [Indexed: 12/27/2022] Open
Abstract
With more than 70,000 living species, vertebrates have a huge impact on the field of biology and research, including karyotype evolution. One prominent aspect of many vertebrate karyotypes is the enigmatic occurrence of tiny and often cytogenetically indistinguishable microchromosomes, which possess distinctive features compared to macrochromosomes. Why certain vertebrate species carry these microchromosomes in some lineages while others do not, and how they evolve remain open questions. New studies have shown that microchromosomes exhibit certain unique characteristics of genome structure and organization, such as high gene densities, low heterochromatin levels, and high rates of recombination. Our review focuses on recent concepts to expand current knowledge on the dynamic nature of karyotype evolution in vertebrates, raising important questions regarding the evolutionary origins and ramifications of microchromosomes. We introduce the basic karyotypic features to clarify the size, shape, and morphology of macro- and microchromosomes and report their distribution across different lineages. Finally, we characterize the mechanisms of different evolutionary forces underlying the origin and evolution of microchromosomes.
Collapse
Affiliation(s)
- Kornsorn Srikulnath
- Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.)
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- The International Undergraduate Program in Bioscience and Technology, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Amphibian Research Center, Hiroshima University, 1-3-1, Kagamiyama, Higashihiroshima 739-8526, Japan
| | - Syed Farhan Ahmad
- Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.)
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- The International Undergraduate Program in Bioscience and Technology, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| | - Worapong Singchat
- Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.)
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| | - Thitipong Panthum
- Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.)
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| |
Collapse
|
4
|
Top O, Milferstaedt SWL, van Gessel N, Hoernstein SNW, Özdemir B, Decker EL, Reski R. Expression of a human cDNA in moss results in spliced mRNAs and fragmentary protein isoforms. Commun Biol 2021; 4:964. [PMID: 34385580 PMCID: PMC8361020 DOI: 10.1038/s42003-021-02486-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 07/26/2021] [Indexed: 12/18/2022] Open
Abstract
Production of biopharmaceuticals relies on the expression of mammalian cDNAs in host organisms. Here we show that the expression of a human cDNA in the moss Physcomitrium patens generates the expected full-length and four additional transcripts due to unexpected splicing. This mRNA splicing results in non-functional protein isoforms, cellular misallocation of the proteins and low product yields. We integrated these results together with the results of our analysis of all 32,926 protein-encoding Physcomitrella genes and their 87,533 annotated transcripts in a web application, physCO, for automatized optimization. A thus optimized cDNA results in about twelve times more protein, which correctly localizes to the ER. An analysis of codon preferences of different production hosts suggests that similar effects occur also in non-plant hosts. We anticipate that the use of our methodology will prevent so far undetected mRNA heterosplicing resulting in maximized functional protein amounts for basic biology and biotechnology.
Collapse
Affiliation(s)
- Oguz Top
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany
- Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg, Germany
- Plant Molecular Cell Biology, Department Biology I, LMU Biocenter, Ludwig-Maximilians-University Munich, Planegg-Martinsried, Germany
| | - Stella W L Milferstaedt
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany
- Cluster of Excellence livMatS @ FIT - Freiburg Center for Interactive Materials and Bioinspired Technologies, University of Freiburg, Freiburg, Germany
| | - Nico van Gessel
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | | | - Bugra Özdemir
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Eva L Decker
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Ralf Reski
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany.
- Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg, Germany.
- Cluster of Excellence livMatS @ FIT - Freiburg Center for Interactive Materials and Bioinspired Technologies, University of Freiburg, Freiburg, Germany.
- CIBSS - Centre for Integrative Biological Signalling Studies, Freiburg, Germany.
| |
Collapse
|
5
|
Kumar U, Khandia R, Singhal S, Puranik N, Tripathi M, Pateriya AK, Khan R, Emran TB, Dhama K, Munjal A, Alqahtani T, Alqahtani AM. Insight into Codon Utilization Pattern of Tumor Suppressor Gene EPB41L3 from Different Mammalian Species Indicates Dominant Role of Selection Force. Cancers (Basel) 2021; 13:cancers13112739. [PMID: 34205890 PMCID: PMC8198080 DOI: 10.3390/cancers13112739] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/27/2021] [Accepted: 05/27/2021] [Indexed: 12/13/2022] Open
Abstract
Simple Summary The present study envisaged the codon usage pattern analysis of tumor suppressor gene EPB41L3 for the human, brown rat, domesticated cattle, and Sumatran orangutan. Most amino acids are coded by more than one synonymous codon, but they are used in a biased manner. The codon usage bias results from multiple factors like compositional properties, dinucleotide abundance, neutrality, parity, tRNA pool, etc. Understanding codon bias is central to fields as diverse as molecular evolution, gene expressivity, protein translation, and protein folding. This kind of studies is important to see the effects of various evolutionary forces on codon usage. The present study indicated that the selection force is dominant over other forces shaping codon usage in the envisaged organisms. Abstract Uneven codon usage within genes as well as among genomes is a usual phenomenon across organisms. It plays a significant role in the translational efficiency and evolution of a particular gene. EPB41L3 is a tumor suppressor protein-coding gene, and in the present study, the pattern of codon usage was envisaged. The full-length sequences of the EPB41L3 gene for the human, brown rat, domesticated cattle, and Sumatran orangutan available at the NCBI were retrieved and utilized to analyze CUB patterns across the selected mammalian species. Compositional properties, dinucleotide abundance, and parity analysis showed the dominance of A and G whilst RSCU analysis indicated the dominance of G/C-ending codons. The neutrality plot plotted between GC12 and GC3 to determine the variation between the mutation pressure and natural selection indicated the dominance of selection pressure (R = 0.926; p < 0.00001) over the three codon positions across the gene. The result is in concordance with the codon adaptation index analysis and the ENc-GC3 plot analysis, as well as the translational selection index (P2). Overall selection pressure is the dominant pressure acting during the evolution of the EPB41L3 gene.
Collapse
Affiliation(s)
- Utsang Kumar
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India; (U.K.); (S.S.); (N.P.); (A.M.)
| | - Rekha Khandia
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India; (U.K.); (S.S.); (N.P.); (A.M.)
- Correspondence: (R.K.); (K.D.)
| | - Shailja Singhal
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India; (U.K.); (S.S.); (N.P.); (A.M.)
| | - Nidhi Puranik
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India; (U.K.); (S.S.); (N.P.); (A.M.)
| | - Meghna Tripathi
- ICAR-National Institute of High Security Animal Diseases, Bhopal 462043, India; (M.T.); (A.K.P.)
| | - Atul Kumar Pateriya
- ICAR-National Institute of High Security Animal Diseases, Bhopal 462043, India; (M.T.); (A.K.P.)
| | - Raju Khan
- Microfluidics & MEMS Center, (MRS & CFC), CSIR-Advanced Materials and Processes Research Institute (AMPRI), Hoshangabad Road, Bhopal 462026, India;
| | - Talha Bin Emran
- Department of Pharmacy, BGC Trust University Bangladesh, Chittagong 4381, Bangladesh;
| | - Kuldeep Dhama
- Division of Pathology, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, India
- Correspondence: (R.K.); (K.D.)
| | - Ashok Munjal
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India; (U.K.); (S.S.); (N.P.); (A.M.)
| | - Taha Alqahtani
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia; (T.A.); (A.M.A.)
| | - Ali M. Alqahtani
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia; (T.A.); (A.M.A.)
| |
Collapse
|
6
|
Characterization of microsatellites in the endangered snow leopard based on the chromosome-level genome. MAMMAL RES 2021. [DOI: 10.1007/s13364-021-00563-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
7
|
Poverennaya IV, Roytberg MA. Spliceosomal Introns: Features, Functions, and Evolution. BIOCHEMISTRY (MOSCOW) 2021; 85:725-734. [PMID: 33040717 DOI: 10.1134/s0006297920070019] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Spliceosomal introns, which have been found in most eukaryotic genes, are non-coding sequences excised from pre-mRNAs by a special complex called spliceosome during mRNA splicing. Introns occur in both protein- and RNA-coding genes and can be found in coding and untranslated gene regions. Because intron sequences vary greatly due to a high rate of polymorphism, the functions of intron had been for a long time associated only with alternative splicing, while intron evolution had been viewed not as an evolution of an individual genomic element, but rather considered within a framework of the evolution of the gene intron-exon structure. Here, we review the theories of intron origin, evolutionary events in the exon-intron structure, such as intron gain, loss, and sliding, intron functions known to date, and mechanisms by which changes in the intron features (length and phase) can affect the regulation of gene-mediated processes.
Collapse
Affiliation(s)
- I V Poverennaya
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia. .,Institute of Mathematical Problems in Biology, Keldysh Branch of Institute of Applied Mathematics, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia
| | - M A Roytberg
- Institute of Mathematical Problems in Biology, Keldysh Branch of Institute of Applied Mathematics, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia.,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia.,Higher School of Economics, Moscow, 101000, Russia
| |
Collapse
|
8
|
Symonová R, Suh A. Nucleotide composition of transposable elements likely contributes to AT/GC compositional homogeneity of teleost fish genomes. Mob DNA 2019; 10:49. [PMID: 31857829 PMCID: PMC6909575 DOI: 10.1186/s13100-019-0195-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 12/05/2019] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Teleost fish genome size has been repeatedly demonstrated to positively correlate with the proportion of transposable elements (TEs). This finding might have far-reaching implications for our understanding of the evolution of nucleotide composition across vertebrates. Genomes of fish and amphibians are GC homogenous, with non-teleost gars being the single exception identified to date, whereas birds and mammals are AT/GC heterogeneous. The exact reason for this phenomenon remains controversial. Since TEs make up significant proportions of genomes and can quickly accumulate across genomes, they can potentially influence the host genome with their own GC content (GC%). However, the GC% of fish TEs has so far been neglected. RESULTS The genomic proportion of TEs indeed correlates with genome size, although not as linearly as previously shown with fewer genomes, and GC% negatively correlates with genome size in the 33 fish genome assemblies analysed here (excluding salmonids). GC% of fish TE consensus sequences positively correlates with the corresponding genomic GC% in 29 species tested. Likewise, the GC contents of the entire repetitive vs. non-repetitive genomic fractions correlate positively in 54 fish species in Ensembl. However, among these fish species, there is also a wide variation in GC% between the main groups of TEs. Class II DNA transposons, predominant TEs in fish genomes, are significantly GC-poorer than Class I retrotransposons. The AT/GC heterogeneous gar genome contains fewer Class II TEs, a situation similar to fugu with its extremely compact and also GC-enriched but AT/GC homogenous genome. CONCLUSION Our results reveal a previously overlooked correlation between GC% of fish genomes and their TEs. This applies to both TE consensus sequences as well as the entire repetitive genomic fraction. On the other hand, there is a wide variation in GC% across fish TE groups. These results raise the question whether GC% of TEs evolves independently of GC% of the host genome or whether it is driven by TE localization in the host genome. Answering these questions will help to understand how genomic GC% is shaped over time. Long-term accumulation of GC-poor(er) Class II DNA transposons might indeed have influenced AT/GC homogenization of fish genomes and requires further investigation.
Collapse
Affiliation(s)
- Radka Symonová
- Department of Biology, Faculty of Science, University of Hradec Králové, Hradec Králové, Czech Republic
| | - Alexander Suh
- Department of Ecology and Genetics - Evolutionary Biology, Evolutionary Biology Centre (EBC), Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- Present address: Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre (EBC), Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
9
|
Huttener R, Thorrez L, In't Veld T, Granvik M, Snoeck L, Van Lommel L, Schuit F. GC content of vertebrate exome landscapes reveal areas of accelerated protein evolution. BMC Evol Biol 2019; 19:144. [PMID: 31311498 PMCID: PMC6636035 DOI: 10.1186/s12862-019-1469-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 06/26/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events. The assessment of phylogenic relationships between species routinely depends on the analysis of sequence homology at the nucleotide or protein level. RESULTS We analyzed mRNA GC content, codon usage and divergence of orthologous proteins in 55 vertebrate genomes. Data were visualized in genome-wide landscapes using a sliding window approach. Landscapes of GC content reveal both evolutionary conservation of clustered genes, and lineage-specific changes, so that it was possible to construct a phylogenetic tree that closely matched the classic "tree of life". Landscapes of GC content also strongly correlated to landscapes of amino acid usage: positive correlation with glycine, alanine, arginine and proline and negative correlation with phenylalanine, tyrosine, methionine, isoleucine, asparagine and lysine. Peaks of GC content correlated strongly with increased protein divergence. CONCLUSIONS Landscapes of base- and amino acid composition of the coding genome opens a new approach in comparative genomics, allowing identification of discrete regions in which protein evolution accelerated over deep evolutionary time. Insight in the evolution of genome structure may spur novel studies assessing the evolutionary benefit of genes in particular genomic regions.
Collapse
Affiliation(s)
- R Huttener
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Thorrez
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.,Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - T In't Veld
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - M Granvik
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Snoeck
- Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - L Van Lommel
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - F Schuit
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
| |
Collapse
|
10
|
Wu X, Kabalane H, Kahli M, Petryk N, Laperrousaz B, Jaszczyszyn Y, Drillon G, Nicolini FE, Perot G, Robert A, Fund C, Chibon F, Xia R, Wiels J, Argoul F, Maguer-Satta V, Arneodo A, Audit B, Hyrien O. Developmental and cancer-associated plasticity of DNA replication preferentially targets GC-poor, lowly expressed and late-replicating regions. Nucleic Acids Res 2019; 46:10157-10172. [PMID: 30189101 PMCID: PMC6212843 DOI: 10.1093/nar/gky797] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 08/24/2018] [Indexed: 01/08/2023] Open
Abstract
The spatiotemporal program of metazoan DNA replication is regulated during development and altered in cancers. We have generated novel OK-seq, Repli-seq and RNA-seq data to compare the DNA replication and gene expression programs of twelve cancer and non-cancer human cell types. Changes in replication fork directionality (RFD) determined by OK-seq are widespread but more frequent within GC-poor isochores and largely disconnected from transcription changes. Cancer cell RFD profiles cluster with non-cancer cells of similar developmental origin but not with different cancer types. Importantly, recurrent RFD changes are detected in specific tumour progression pathways. Using a model for establishment and early progression of chronic myeloid leukemia (CML), we identify 1027 replication initiation zones (IZs) that progressively change efficiency during long-term expression of the BCR-ABL1 oncogene, being twice more often downregulated than upregulated. Prolonged expression of BCR-ABL1 results in targeting of new IZs and accentuation of previous efficiency changes. Targeted IZs are predominantly located in GC-poor, late replicating gene deserts and frequently silenced in late CML. Prolonged expression of BCR-ABL1 results in massive deletion of GC-poor, late replicating DNA sequences enriched in origin silencing events. We conclude that BCR-ABL1 expression progressively affects replication and stability of GC-poor, late-replicating regions during CML progression.
Collapse
Affiliation(s)
- Xia Wu
- Institut de Biologie de l'École Normale Supérieure (IBENS), Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France.,Physics Department, East China Normal University, Shanghai, China
| | - Hadi Kabalane
- Univ Lyon, ENS de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon, France
| | - Malik Kahli
- Institut de Biologie de l'École Normale Supérieure (IBENS), Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
| | - Nataliya Petryk
- Institut de Biologie de l'École Normale Supérieure (IBENS), Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
| | - Bastien Laperrousaz
- Univ Lyon, ENS de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon, France.,CNRS UMR5286, INSERM U1052, Centre de Recherche en Cancérologie de Lyon, F- 69008 Lyon, France
| | - Yan Jaszczyszyn
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Guenola Drillon
- Univ Lyon, ENS de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon, France
| | - Frank-Emmanuel Nicolini
- CNRS UMR5286, INSERM U1052, Centre de Recherche en Cancérologie de Lyon, F- 69008 Lyon, France.,Centre Léon Bérard, F-69008 Lyon, France
| | - Gaëlle Perot
- INSERM U1218, Institut Bergonié, F-33000 Bordeaux, France
| | - Aude Robert
- UMR 8126, Université Paris-Sud Paris-Saclay, CNRS, Institut Gustave Roussy, Villejuif, France
| | - Cédric Fund
- École Normale Supérieure, PSL Research University, CNRS, Inserm, IBENS, Plateforme Génomique, 75005 Paris, France
| | | | - Ruohong Xia
- Physics Department, East China Normal University, Shanghai, China
| | - Joëlle Wiels
- UMR 8126, Université Paris-Sud Paris-Saclay, CNRS, Institut Gustave Roussy, Villejuif, France
| | - Françoise Argoul
- LOMA, Université de Bordeaux, CNRS, UMR 5798, F-33405 Talence, France
| | - Véronique Maguer-Satta
- CNRS UMR5286, INSERM U1052, Centre de Recherche en Cancérologie de Lyon, F- 69008 Lyon, France
| | - Alain Arneodo
- LOMA, Université de Bordeaux, CNRS, UMR 5798, F-33405 Talence, France
| | - Benjamin Audit
- Univ Lyon, ENS de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon, France
| | - Olivier Hyrien
- Institut de Biologie de l'École Normale Supérieure (IBENS), Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
| |
Collapse
|
11
|
Ming Z, Chen Q, Chen N, Lin M, Liu N, Hu J, Xiao X. Eliminating the secondary structure of targeting strands for enhancement of DNA probe based low-abundance point mutation detection. Anal Chim Acta 2019; 1075:137-143. [PMID: 31196419 DOI: 10.1016/j.aca.2019.05.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 04/25/2019] [Accepted: 05/05/2019] [Indexed: 10/26/2022]
Abstract
Nucleic acid probes are very useful tools in biological and medical science. However, the essential sensing mechanism of nucleic acid probes was prone to the interference of surrounding sequences. Especially when the target sequences formed secondary structures such as hairpin or quadruplex, the nucleic acid probes were hindered from hybridizing with target strands, greatly disabled the function of probes. Herein, we have established an Open strand based strategy for eliminating the influence of secondary structures on the performance of nucleic acid probes. The strategy was general toward different lengths, secondary structures and sequences of the targeting strand, and we found that the improvement was higher when the secondary structure of the targeting strand was more complicated. Experiments on synthetic single stranded DNA and real clinical genomic DNA samples were conducted for low abundance mutation detection, and the limit of detection for TERT-C228T and BRCA2 rs80359065 mutations could be 0.02% and 0.05% respectively, demonstrating the clinical practicability of our proposed strategy in low abundance mutation detection.
Collapse
Affiliation(s)
- Zhihao Ming
- Family Planning Research Institute/Center of Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, PR China
| | - Qianzhi Chen
- Cancer Research Institute, Tongji Hospital, Huazhong University of Science and Technology, Wuhan, 430030, PR China
| | - Na Chen
- Family Planning Research Institute/Center of Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, PR China
| | - Meng Lin
- Family Planning Research Institute/Center of Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, PR China
| | - Na Liu
- Family Planning Research Institute/Center of Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, PR China
| | - Junbo Hu
- Cancer Research Institute, Tongji Hospital, Huazhong University of Science and Technology, Wuhan, 430030, PR China
| | - Xianjin Xiao
- Family Planning Research Institute/Center of Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, PR China.
| |
Collapse
|
12
|
Uddin A, Paul N, Chakraborty S. The codon usage pattern of genes involved in ovarian cancer. Ann N Y Acad Sci 2019; 1440:67-78. [PMID: 30843242 DOI: 10.1111/nyas.14019] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 01/04/2019] [Accepted: 01/14/2019] [Indexed: 12/20/2022]
Abstract
In this study, we analyzed the compositional dynamics and codon usage pattern of genes involved in ovarian cancer (OC) using a computational method. Mutations in specific genes are associated with OC, and some genes are risk factors for progression of OC, but no work has been reported yet on the codon usage pattern of genes involved in OC. Nucleotide composition analysis of OC-related genes suggested that the overall GC content was higher than AT content; that is, the genes were GC rich. The improved effective number of codons indicated that the overall extent of codon usage bias of genes involved in OC was low. The codons AGC, CTG, ATC, ACC, GTG, and GCC were overrepresented, while the codons TCG, TTA, CTA, CCG, CAA, CGT, ATA, ACG, GTA, GTT, GCG, and GGT were underrepresented in the genes. Correspondence analysis suggested that the codon usage pattern was different in different genes. A highly significant correlation was observed between GC12 and GC3 (r = 0.587, P < 0.01) of genes, suggesting that directional mutation affected the three codon positions. Our report on the codon usage pattern of genes involved in OC includes a new perspective for elucidating the mechanisms of biased usage of synonymous codons, as well as providing useful clues for molecular genetic engineering.
Collapse
Affiliation(s)
- Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial Science College, Assam, India
| | - Nirmal Paul
- Department of Biotechnology, Assam University, Assam, India
| | | |
Collapse
|
13
|
MapToGenome: A Comparative Genomic Tool that Aligns Transcript Maps to Sequenced Genomes. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Efforts to generate whole genome assemblies and dense genetic maps have provided a wealth of gene positional information for several vertebrate species. Comparing the relative location of orthologous genes among these genomes provides perspective on genome evolution and can aid in translating genetic information between distantly related organisms. However, large-scale comparisons between genetic maps and genome assemblies can prove challenging because genetic markers are commonly derived from transcribed sequences that are incompletely and variably annotated. We developed the program MapToGenome as a tool for comparing transcript maps and genome assemblies. MapToGenome processes sequence alignments between mapped transcripts and whole genome sequence while accounting for the presence of intronic sequences, and assigns orthology based on user-defined parameters. To illustrate the utility of this program, we used MapToGenome to process alignments between vertebrate genetic maps and genome assemblies 1) self/self alignments for maps and assemblies of the rat and zebrafish genome; 2) alignments between vertebrate transcript maps (rat, salamander, zebrafish, and medaka) and the chicken genome; and 3) alignments of the medaka and zebrafish maps to the pufferfish ( Tetraodon nigroviridis) genome. Our results show that map-genome alignments can be improved by combining alignments across presumptive intron breaks and ignoring alignments for simple sequence length polymorphism (SSLP) marker sequences. Comparisons between vertebrate maps and genomes reveal broad patterns of conservation among vertebrate genomes and the differential effects of genome rearrangement over time and across lineages.
Collapse
|
14
|
Kabir M, Barradas A, Tzotzos GT, Hentges KE, Doig AJ. Properties of genes essential for mouse development. PLoS One 2017; 12:e0178273. [PMID: 28562614 PMCID: PMC5451031 DOI: 10.1371/journal.pone.0178273] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 05/10/2017] [Indexed: 12/20/2022] Open
Abstract
Essential genes are those that are critical for life. In the specific case of the mouse, they are the set of genes whose deletion means that a mouse is unable to survive after birth. As such, they are the key minimal set of genes needed for all the steps of development to produce an organism capable of life ex utero. We explored a wide range of sequence and functional features to characterise essential (lethal) and non-essential (viable) genes in mice. Experimental data curated manually identified 1301 essential genes and 3451 viable genes. Very many sequence features show highly significant differences between essential and viable mouse genes. Essential genes generally encode complex proteins, with multiple domains and many introns. These genes tend to be: long, highly expressed, old and evolutionarily conserved. These genes tend to encode ligases, transferases, phosphorylated proteins, intracellular proteins, nuclear proteins, and hubs in protein-protein interaction networks. They are involved with regulating protein-protein interactions, gene expression and metabolic processes, cell morphogenesis, cell division, cell proliferation, DNA replication, cell differentiation, DNA repair and transcription, cell differentiation and embryonic development. Viable genes tend to encode: membrane proteins or secreted proteins, and are associated with functions such as cellular communication, apoptosis, behaviour and immune response, as well as housekeeping and tissue specific functions. Viable genes are linked to transport, ion channels, signal transduction, calcium binding and lipid binding, consistent with their location in membranes and involvement with cell-cell communication. From the analysis of the composite features of essential and viable genes, we conclude that essential genes tend to be required for intracellular functions, and viable genes tend to be involved with extracellular functions and cell-cell communication. Knowledge of the features that are over-represented in essential genes allows for a deeper understanding of the functions and processes implemented during mammalian development.
Collapse
Affiliation(s)
- Mitra Kabir
- Faculty of Biology, Medicine, and Health, University of Manchester, Manchester, United Kingdom
- Manchester Institute of Biotechnology and Department of Chemistry, Faculty of Science and Engineering, The University of Manchester, Manchester, United Kingdom
| | - Ana Barradas
- Faculty of Biology, Medicine, and Health, University of Manchester, Manchester, United Kingdom
| | - George T. Tzotzos
- Department of Agriculture, Food and Environmental Sciences, Marche Polytechnic University, Ancona, Italy
| | - Kathryn E. Hentges
- Faculty of Biology, Medicine, and Health, University of Manchester, Manchester, United Kingdom
| | - Andrew J. Doig
- Manchester Institute of Biotechnology and Department of Chemistry, Faculty of Science and Engineering, The University of Manchester, Manchester, United Kingdom
| |
Collapse
|
15
|
Sievers A, Bosiek K, Bisch M, Dreessen C, Riedel J, Froß P, Hausmann M, Hildenbrand G. K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features. Genes (Basel) 2017; 8:E122. [PMID: 28422050 PMCID: PMC5406869 DOI: 10.3390/genes8040122] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 03/24/2017] [Accepted: 04/04/2017] [Indexed: 12/26/2022] Open
Abstract
In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers) was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4) on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B) of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs), which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST) and the high pace of standard k-mer analysis.
Collapse
Affiliation(s)
- Aaron Sievers
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Katharina Bosiek
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Marc Bisch
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Chris Dreessen
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Jascha Riedel
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Patrick Froß
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Michael Hausmann
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Georg Hildenbrand
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
- Department of Radiation Oncology, Universitätsmedizin Mannheim, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany.
| |
Collapse
|
16
|
Yao L, Tan KWM, Tan TW, Lee YK. Exploring the transcriptome of non-model oleaginous microalga Dunaliella tertiolecta through high-throughput sequencing and high performance computing. BMC Bioinformatics 2017; 18:122. [PMID: 28228091 PMCID: PMC5322580 DOI: 10.1186/s12859-017-1551-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2016] [Accepted: 02/16/2017] [Indexed: 12/31/2022] Open
Abstract
Background RNA-Seq technology has received a lot of attention in recent years for microalgal global transcriptomic profiling. It is widely used in transcriptome-wide analysis of gene expression., particularly for microalgal strains with potential as biofuel sources. However, insufficient genomic or transcriptomic information of non-model microalgae has limited the understanding of their regulatory mechanisms and hampered genetic manipulation to enhance biofuel production. As such, an optimal microalgal transcriptomic database construction is a subject of urgent investigation. Results Dunaliella tertiolecta, a non-model oleaginous microalgal species, was sequenced via Illumina MISEQ and HISEQ 4000 in RNA-Seq studies. The high quality high-throughout sequencing data were explored using high performance computing (HPC) in a petascale data center and subjected to de novo assembly and parallelized mpiBLASTX search with multiple species. As a result, a transcriptome database of 17,845 was constructed (~95% completeness). This enlarged database constructed fueled the RNA-Seq data analysis, which was validated by a nitrogen deprivation (ND) study that induces triacylglycerol (TAG) production. Conclusions The new paralleled assembly and annotation method under HPC presented here allows the solution of large-scale data processing problems in acceptable computation time. There is significant increase in the number of transcriptomic data achieved and observable heterogeneity in the performance to identify differentially expressed genes in the ND treatment paradigm. The results provide new insights as to how response to ND treatment in microalgae is regulated. ND analyses highlight the advantages of this database generated in this study that could also serve as a useful resource for future gene manipulation and transcriptome-wide analysis. We thus demonstrate the usefulness of exploring the transcriptome as an informative platform for functional studies and genetic manipulations in similar species. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1551-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lina Yao
- Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117545, Singapore
| | - Kenneth Wei Min Tan
- Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117545, Singapore
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117596, Singapore.,National Supercomputing Centre (NSCC), Singapore, 138632, Singapore
| | - Yuan Kun Lee
- Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117545, Singapore.
| |
Collapse
|
17
|
Liu S, Hou W, Sun T, Xu Y, Li P, Yue B, Fan Z, Li J. Genome-wide mining and comparative analysis of microsatellites in three macaque species. Mol Genet Genomics 2017; 292:537-550. [PMID: 28160080 DOI: 10.1007/s00438-017-1289-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Accepted: 01/09/2017] [Indexed: 12/13/2022]
Abstract
Microsatellites are found in taxonomically different organisms, and such repeats are related with genomic structure, function and certain diseases. To characterize microsatellites for macaques, we searched and compared SSRs with 1-6 bp nucleotide motifs in rhesus, cynomolgus and pigtailed macaque. A total of 1395671, 1284929 and 1266348 perfect SSRs were mined, respectively. The most frequent perfect SSRs were mononucleotide SSRs. The most GC-content was in dinucleotide SSRs and the least was in the mononucleotide SSRs. Chromosome size was positively correlated with SSR number and negatively correlated with the relative frequency and density of SSRs. The GC content of chromosome SSRs were negatively correlated with relative frequency of SSRs and GC content of chromosome sequences. The features of microsatellite distribution in assembled genomes of the three species were greatly similar, which revealed that the distributional pattern of microsatellites is probably conservative in genus Macaca. The degenerated number of repeat motifs was found to be different in pentanucleotide and hexanucleotide repeats. Species-specific motifs for each macaque were significantly underrepresented. Overall, SSR frequencies of each chromosome in rhesus macaque were higher than in cynomolgus macaque. The maximum repeat times of mono- to pentanucleotide repeats in cynomolgus macaque was more than other two macaques. These results emphasize the genetic diversity and phylogenetic relationship of genus Macaca species. Our data will be beneficial for comparative genome mapping, understanding the distribution of SSRs and genome structure between these animal models, and provide a foundation for further development and identification of more macaque-specific SSRs.
Collapse
Affiliation(s)
- Sanxu Liu
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, People's Republic of China
| | - Wei Hou
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, People's Republic of China
| | - Tianlin Sun
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, People's Republic of China
| | - Yongtao Xu
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, People's Republic of China
| | - Peng Li
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, People's Republic of China
| | - Bisong Yue
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, People's Republic of China.,Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life Sciences, Sichuan University, Chengdu, 610064, People's Republic of China
| | - Zhenxin Fan
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, People's Republic of China
| | - Jing Li
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, People's Republic of China. .,Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life Sciences, Sichuan University, Chengdu, 610064, People's Republic of China.
| |
Collapse
|
18
|
Liu H, Jia Y, Sun X, Tian D, Hurst LD, Yang S. Direct Determination of the Mutation Rate in the Bumblebee Reveals Evidence for Weak Recombination-Associated Mutation and an Approximate Rate Constancy in Insects. Mol Biol Evol 2017; 34:119-130. [PMID: 28007973 PMCID: PMC5854123 DOI: 10.1093/molbev/msw226] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Accurate knowledge of the mutation rate provides a base line for inferring expected rates of evolution, for testing evolutionary hypotheses and for estimation of key parameters. Advances in sequencing technology now permit direct estimates of the mutation rate from sequencing of close relatives. Within insects there have been three prior such estimates, two in nonsocial insects (Drosophila: 2.8 × 10-9 per bp per haploid genome per generation; Heliconius: 2.9 × 10-9) and one in a social species, the honeybee (3.4 × 10-9). Might the honeybee's rate be ∼20% higher because it has an exceptionally high recombination rate and recombination may be directly or indirectly mutagenic? To address this possibility, we provide a direct estimate of the mutation rate in the bumblebee (Bombus terrestris), this being a close relative of the honeybee but with a much lower recombination rate. We confirm that the crossover rate of the bumblebee is indeed much lower than honeybees (8.7 cM/Mb vs. 37 cM/Mb). Importantly, we find no significant difference in the mutation rates: we estimate for bumblebees a rate of 3.6 × 10-9 per haploid genome per generation (95% confidence intervals 2.38 × 10-9 and 5.37 × 10-9) which is just 5% higher than the estimate that of honeybees. Both genomes have approximately one new mutation per haploid genome per generation. While we find evidence for a direct coupling between recombination and mutation (also seen in honeybees), the effect is so weak as to leave almost no footprint on any between-species differences. The similarity in mutation rates suggests an approximate constancy of the mutation rate in insects.
Collapse
Affiliation(s)
- Haoxuan Liu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Yanxiao Jia
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Xiaoguang Sun
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Dacheng Tian
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Sihai Yang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| |
Collapse
|
19
|
Symonová R, Majtánová Z, Arias-Rodriguez L, Mořkovský L, Kořínková T, Cavin L, Pokorná MJ, Doležálková M, Flajšhans M, Normandeau E, Ráb P, Meyer A, Bernatchez L. Genome Compositional Organization in Gars Shows More Similarities to Mammals than to Other Ray-Finned Fish. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2016; 328:607-619. [DOI: 10.1002/jez.b.22719] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 11/13/2016] [Accepted: 11/22/2016] [Indexed: 12/12/2022]
Affiliation(s)
- Radka Symonová
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
- Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
- Research Institute for Limnology; University of Innsbruck; Mondsee Austria
| | - Zuzana Majtánová
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
- Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
| | - Lenin Arias-Rodriguez
- División Académica de Ciencias Biológicas; Universidad Juárez Autónoma de Tabasco (UJAT); Villahermosa Tabasco México
| | - Libor Mořkovský
- Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
| | - Tereza Kořínková
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
| | - Lionel Cavin
- Muséum d'Histoire Naturelle; Geneva 6 Switzerland
| | - Martina Johnson Pokorná
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
- Department of Ecology; Faculty of Science; Charles University; Prague 2 Czech Republic
| | - Marie Doležálková
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
- Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
| | - Martin Flajšhans
- Faculty of Fisheries and Protection of Waters; South Bohemian Research Centre of Aquaculture and Biodiversity of Hydrocenoses; University of South Bohemia in České Budějovice; Vodňany Czech Republic
| | - Eric Normandeau
- IBIS, Department of Biology, University Laval, Pavillon Charles-Eugène-Marchand; Avenue de la Médecine Quebec City; Canada
| | - Petr Ráb
- Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
| | - Axel Meyer
- Chair in Zoology and Evolutionary Biology; Department of Biology; University of Konstanz; Konstanz Germany
| | - Louis Bernatchez
- IBIS, Department of Biology, University Laval, Pavillon Charles-Eugène-Marchand; Avenue de la Médecine Quebec City; Canada
| |
Collapse
|
20
|
Sizova TV, Karpova OI. The length of chromatin loops in meiotic prophase I of warm-blooded vertebrates depends on the DNA compositional organization. RUSS J GENET+ 2016. [DOI: 10.1134/s1022795416110144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
21
|
Abstract
As genes originate at different evolutionary times, they harbor distinctive genomic signatures of evolutionary ages. Although previous studies have investigated different gene age-related signatures, what signatures dominantly associate with gene age remains unresolved. Here we address this question via a combined approach of comprehensive assignment of gene ages, gene family identification, and multivariate analyses. We first provide a comprehensive and improved gene age assignment by combining homolog clustering with phylogeny inference and categorize human genes into 26 age classes spanning the whole tree of life. We then explore the dominant age-related signatures based on a collection of 10 potential signatures (including gene composition, gene length, selection pressure, expression level, connectivity in protein–protein interaction network and DNA methylation). Our results show that GC content and connectivity in protein–protein interaction network (PPIN) associate dominantly with gene age. Furthermore, we investigate the heterogeneity of dominant signatures in duplicates and singletons. We find that GC content is a consistent primary factor of gene age in duplicates and singletons, whereas PPIN is more strongly associated with gene age in singletons than in duplicates. Taken together, GC content and PPIN are two dominant signatures in close association with gene age, exhibiting heterogeneity in duplicates and singletons and presumably reflecting complex differential interplays between natural selection and mutation.
Collapse
Affiliation(s)
- Hongyan Yin
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Guangyu Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Lina Ma
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China
| | - Soojin V Yi
- School of Biology, Georgia Institute of Technology, Atlanta
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
22
|
Abstract
Exonic splice enhancers (ESEs) are short nucleotide motifs, enriched near exon ends, that enhance the recognition of the splice site and thus promote splicing. Are intronless genes under selection to avoid these motifs so as not to attract the splicing machinery to an mRNA that should not be spliced, thereby preventing the production of an aberrant transcript? Consistent with this possibility, we find that ESEs in putative recent retrocopies are at a higher density and evolving faster than those in other intronless genes, suggesting that they are being lost. Moreover, intronless genes are less dense in putative ESEs than intron-containing ones. However, this latter difference is likely due to the skewed base composition of intronless sequences, a skew that is in line with the general GC richness of few exon genes. Indeed, after controlling for such biases, we find that both intronless and intron-containing genes are denser in ESEs than expected by chance. Importantly, nucleotide-controlled analysis of evolutionary rates at synonymous sites in ESEs indicates that the ESEs in intronless genes are under purifying selection in both human and mouse. We conclude that on the loss of introns, some but not all, ESE motifs are lost, the remainder having functions beyond a role in splice promotion. These results have implications for the design of intronless transgenes and for understanding the causes of selection on synonymous sites.
Collapse
Affiliation(s)
- Rosina Savisaar
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
23
|
Bernardi G. Genome Organization and Chromosome Architecture. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2016; 80:83-91. [PMID: 26801160 DOI: 10.1101/sqb.2015.80.027318] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
How the same DNA sequences can function in the three-dimensional architecture of interphase nucleus, fold in the very compact structure of metaphase chromosomes, and go precisely back to the original interphase architecture in the following cell cycle remains an unresolved question to this day. The solution to this question presented here rests on the correlations that were found to hold between the isochore organization of the genome and the architecture of chromosomes from interphase to metaphase. The key points are the following: (1) The transition from the looped domains and subdomains of interphase chromatin to the 30-nm fiber loops of early prophase chromosomes goes through their unfolding into an extended chromatin structure (probably a 10-nm "beads-on-a-string" structure); (2) the architectural proteins of interphase chromatin, such as CTCF and cohesin subunits, are retained in mitosis and are part of the discontinuous protein scaffold of mitotic chromosomes; and (3) the conservation of the link between architectural proteins and their binding sites on DNA through the cell cycle explains the reversibility of the interphase to mitosis process and the "mitotic memory" of interphase architecture.
Collapse
Affiliation(s)
- Giorgio Bernardi
- Science Department, Roma Tre University, 00146 Rome, Italy Stazione Zoologica Anton Dohrn, 80121 Naples, Italy
| |
Collapse
|
24
|
Jabbari K, Nürnberg P. A genomic view on epilepsy and autism candidate genes. Genomics 2016; 108:31-6. [PMID: 26772991 DOI: 10.1016/j.ygeno.2016.01.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 12/15/2015] [Accepted: 01/01/2016] [Indexed: 01/25/2023]
Abstract
Epilepsy is a common complex disorder most frequently associated with psychiatric and neurological diseases. Massive parallel sequencing of individual or cohort genomes and exomes led the identification of several disease associated genes. We review here the candidate genes in epilepsy genetics with focus on exome and gene panel data. Together with the examination of brain expressed genes and post synaptic proteome the results show that: (1) Non-metabolic epilepsies and autism candidate genes tend to be AT-rich and (2) large transcript size and local AT-richness are characteristic features of genes involved in developmental brain disorders and synaptic functions. These results point to the preferential location of core epilepsy and autism candidate genes in late replicating, GC-poor chromosomal regions (isochores). These results indicate that the genomic alterations leading to some brain disorders are confined to responsive chromatin areas harboring brain critical genes.
Collapse
Affiliation(s)
- Kamel Jabbari
- Cologne Center for Genomics, University of Cologne, Cologne, Germany.
| | - Peter Nürnberg
- Cologne Center for Genomics, University of Cologne, Cologne, Germany
| |
Collapse
|
25
|
Sundararajan A, Dukowic-Schulze S, Kwicklis M, Engstrom K, Garcia N, Oviedo OJ, Ramaraj T, Gonzales MD, He Y, Wang M, Sun Q, Pillardy J, Kianian SF, Pawlowski WP, Chen C, Mudge J. Gene Evolutionary Trajectories and GC Patterns Driven by Recombination in Zea mays. FRONTIERS IN PLANT SCIENCE 2016; 7:1433. [PMID: 27713757 PMCID: PMC5031598 DOI: 10.3389/fpls.2016.01433] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Accepted: 09/08/2016] [Indexed: 05/20/2023]
Abstract
Recombination occurring during meiosis is critical for creating genetic variation and plays an essential role in plant evolution. In addition to creating novel gene combinations, recombination can affect genome structure through altering GC patterns. In maize (Zea mays) and other grasses, another intriguing GC pattern exists. Maize genes show a bimodal GC content distribution that has been attributed to nucleotide bias in the third, or wobble, position of the codon. Recombination may be an underlying driving force given that recombination sites are often associated with high GC content. Here we explore the relationship between recombination and genomic GC patterns by comparing GC gene content at each of the three codon positions (GC1, GC2, and GC3, collectively termed GCx) to instances of a variable GC-rich motif that underlies double strand break (DSB) hotspots and to meiocyte-specific gene expression. Surprisingly, GCx bimodality in maize cannot be fully explained by the codon wobble hypothesis. High GCx genes show a strong overlap with the DSB hotspot motif, possibly providing a mechanism for the high evolutionary rates seen in these genes. On the other hand, genes that are turned on in meiosis (early prophase I) are biased against both high GCx genes and genes with the DSB hotspot motif, possibly allowing important meiotic genes to avoid DSBs. Our data suggests a strong link between the GC-rich motif underlying DSB hotspots and high GCx genes.
Collapse
Affiliation(s)
| | | | | | | | - Nathan Garcia
- National Center for Genome Resources, Santa FeNM, USA
| | | | | | | | - Yan He
- Section of Plant Biology, School of Integrative Plant Science, Cornell University, IthacaNY, USA
| | - Minghui Wang
- Section of Plant Biology, School of Integrative Plant Science, Cornell University, IthacaNY, USA
- Biotechnology Resource Center Bioinformatics Facility, Cornell University, IthacaNY, USA
| | - Qi Sun
- Biotechnology Resource Center Bioinformatics Facility, Cornell University, IthacaNY, USA
| | - Jaroslaw Pillardy
- Biotechnology Resource Center Bioinformatics Facility, Cornell University, IthacaNY, USA
| | - Shahryar F. Kianian
- Cereal Disease Laboratory, United States Department of Agriculture – Agricultural Research Service, St. PaulMN, USA
| | - Wojciech P. Pawlowski
- Section of Plant Biology, School of Integrative Plant Science, Cornell University, IthacaNY, USA
| | - Changbin Chen
- Department of Horticultural Science, University of Minnesota, St. PaulMN, USA
| | - Joann Mudge
- National Center for Genome Resources, Santa FeNM, USA
- *Correspondence: Joann Mudge,
| |
Collapse
|
26
|
Chen F, Zhu Z, Zhou X, Yan Y, Dong Z, Cui D. High-Throughput Sequencing Reveals Single Nucleotide Variants in Longer-Kernel Bread Wheat. FRONTIERS IN PLANT SCIENCE 2016; 7:1193. [PMID: 27551288 PMCID: PMC4976665 DOI: 10.3389/fpls.2016.01193] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 07/25/2016] [Indexed: 05/09/2023]
Abstract
The transcriptomes of bread wheat Yunong 201 and its ethyl methanesulfonate derivative Yunong 3114 were obtained by next-sequencing technology. Single nucleotide variants (SNVs) in the wheat strains were explored and compared. A total of 5907 and 6287 non-synonymous SNVs were acquired for Yunong 201 and 3114, respectively. A total of 4021 genes with SNVs were obtained. The genes that underwent non-synonymous SNVs were significantly involved in ATP binding, protein phosphorylation, and cellular protein metabolic process. The heat map analysis also indicated that most of these mutant genes were significantly differentially expressed at different developmental stages. The SNVs in these genes possibly contribute to the longer kernel length of Yunong 3114. Our data provide useful information on wheat transcriptome for future studies on wheat functional genomics. This study could also help in illustrating the gene functions of the non-synonymous SNVs of Yunong 201 and 3114.
Collapse
|
27
|
Abstract
How the same DNA sequences can function in the three-dimensional architecture of interphase nucleus, fold in the very compact structure of metaphase chromosomes and go precisely back to the original interphase architecture in the following cell cycle remains an unresolved question to this day. The strategy used to address this issue was to analyze the correlations between chromosome architecture and the compositional patterns of DNA sequences spanning a size range from a few hundreds to a few thousands Kilobases. This is a critical range that encompasses isochores, interphase chromatin domains and boundaries, and chromosomal bands. The solution rests on the following key points: 1) the transition from the looped domains and sub-domains of interphase chromatin to the 30-nm fiber loops of early prophase chromosomes goes through the unfolding into an extended chromatin structure (probably a 10-nm "beads-on-a-string" structure); 2) the architectural proteins of interphase chromatin, such as CTCF and cohesin sub-units, are retained in mitosis and are part of the discontinuous protein scaffold of mitotic chromosomes; 3) the conservation of the link between architectural proteins and their binding sites on DNA through the cell cycle explains the "mitotic memory" of interphase architecture and the reversibility of the interphase to mitosis process. The results presented here also lead to a general conclusion which concerns the existence of correlations between the isochore organization of the genome and the architecture of chromosomes from interphase to metaphase.
Collapse
Affiliation(s)
- Giorgio Bernardi
- Science Department, Roma Tre University, Marconi, Rome, Italy
- Stazione Zoologica Anton Dohrn, Villa Comunale, Naples, Italy
| |
Collapse
|
28
|
Mugal CF, Weber CC, Ellegren H. GC-biased gene conversion links the recombination landscape and demography to genomic base composition. Bioessays 2015; 37:1317-26. [DOI: 10.1002/bies.201500058] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Carina F. Mugal
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| | - Claudia C. Weber
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
- Department of Biology; Center for Computational Genetics and Genomics; Temple University; Philadelphia PA USA
| | - Hans Ellegren
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| |
Collapse
|
29
|
Kryuchkova-Mostacci N, Robinson-Rechavi M. Tissue-Specific Evolution of Protein Coding Genes in Human and Mouse. PLoS One 2015; 10:e0131673. [PMID: 26121354 PMCID: PMC4488272 DOI: 10.1371/journal.pone.0131673] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 06/04/2015] [Indexed: 12/23/2022] Open
Abstract
Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.
Collapse
Affiliation(s)
- Nadezda Kryuchkova-Mostacci
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
30
|
Wu X, Hurst LD. Why Selection Might Be Stronger When Populations Are Small: Intron Size and Density Predict within and between-Species Usage of Exonic Splice Associated cis-Motifs. Mol Biol Evol 2015; 32:1847-61. [PMID: 25771198 PMCID: PMC4476162 DOI: 10.1093/molbev/msv069] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The nearly neutral theory predicts that small effective population size provides the conditions for weakened selection. This is postulated to explain why our genome is more “bloated” than that of, for example, yeast, ours having large introns and large intergene spacer. If a bloated genome is also an error prone genome might it, however, be the case that selection for error-mitigating properties is stronger in our genome? We examine this notion using splicing as an exemplar, not least because large introns can predispose to noisy splicing. We thus ask whether, owing to genomic decay, selection for splice error-control mechanisms is stronger, not weaker, in species with large introns and small populations. In humans much information defining splice sites is in cis-exonic motifs, most notably exonic splice enhancers (ESEs). These act as splice-error control elements. Here then we ask whether within and between-species intron size is a predictor of the commonality of exonic cis-splicing motifs. We show that, as predicted, the proportion of synonymous sites that are ESE-associated and under selection in humans is weakly positively correlated with the size of the flanking intron. In a phylogenetically controlled framework, we observe, also as expected, that mean intron size is both predicted by Ne.μ and is a good predictor of cis-motif usage across species, this usage coevolving with splice site definition. Unexpectedly, however, across taxa intron density is a better predictor of cis-motif usage than intron size. We propose that selection for splice-related motifs is driven by a need to avoid decoy splice sites that will be more common in genes with many and large introns. That intron number and density predict ESE usage within human genes is consistent with this, as is the finding of intragenic heterogeneity in ESE density. As intronic content and splice site usage across species is also well predicted by Ne.μ, the result also suggests an unusual circumstance in which selection (for cis-modifiers of splicing) might be stronger when population sizes are smaller, as here splicing is noisier, resulting in a greater need to control error-prone splicing.
Collapse
Affiliation(s)
- XianMing Wu
- Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
| |
Collapse
|
31
|
Evolutionary consequences of DNA methylation on the GC content in vertebrate genomes. G3-GENES GENOMES GENETICS 2015; 5:441-7. [PMID: 25591920 PMCID: PMC4349097 DOI: 10.1534/g3.114.015545] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The genomes of many vertebrates show a characteristic variation in GC content. To explain its origin and evolution, mainly three mechanisms have been proposed: selection for GC content, mutation bias, and GC-biased gene conversion. At present, the mechanism of GC-biased gene conversion, i.e., short-scale, unidirectional exchanges between homologous chromosomes in the neighborhood of recombination-initiating double-strand breaks in favor for GC nucleotides, is the most widely accepted hypothesis. We here suggest that DNA methylation also plays an important role in the evolution of GC content in vertebrate genomes. To test this hypothesis, we investigated one mammalian (human) and one avian (chicken) genome. We used bisulfite sequencing to generate a whole-genome methylation map of chicken sperm and made use of a publicly available whole-genome methylation map of human sperm. Inclusion of these methylation maps into a model of GC content evolution provided significant support for the impact of DNA methylation on the local equilibrium GC content. Moreover, two different estimates of equilibrium GC content, one that neglects and one that incorporates the impact of DNA methylation and the concomitant CpG hypermutability, give estimates that differ by approximately 15% in both genomes, arguing for a strong impact of DNA methylation on the evolution of GC content. Thus, our results put forward that previous estimates of equilibrium GC content, which neglect the hypermutability of CpG dinucleotides, need to be reevaluated.
Collapse
|
32
|
Clément Y, Fustier MA, Nabholz B, Glémin S. The bimodal distribution of genic GC content is ancestral to monocot species. Genome Biol Evol 2014; 7:336-48. [PMID: 25527839 PMCID: PMC4316631 DOI: 10.1093/gbe/evu278] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
In grasses such as rice or maize, the distribution of genic GC content is well known to be bimodal. It is mainly driven by GC content at third codon positions (GC3 for short). This feature is thought to be specific to grasses as closely related species like banana have a unimodal GC3 distribution. GC3 is associated with numerous genomics features and uncovering the origin of this peculiar distribution will help understanding the potential roles and consequences of GC3 variations within and between genomes. Until recently, the origin of the peculiar GC3 distribution in grasses has remained unknown. Thanks to the recent publication of several complete genomes and transcriptomes of nongrass monocots, we studied more than 1,000 groups of one-to-one orthologous genes in seven grasses and three outgroup species (banana, palm tree, and yam). Using a maximum likelihood-based method, we reconstructed GC3 at several ancestral nodes. We found that the bimodal GC3 distribution observed in extant grasses is ancestral to both grasses and most monocot species, and that other species studied here have lost this peculiar structure. We also found that GC3 in grass lineages is globally evolving very slowly and that the decreasing GC3 gradient observed from 5′ to 3′ along coding sequences is also conserved and ancestral to monocots. This result strongly challenges the previous views on the specificity of grass genomes and we discuss its implications for the possible causes of the evolution of GC content in monocots.
Collapse
Affiliation(s)
- Yves Clément
- Montpellier SupAgro, Unité Mixte de Recherche 1334, Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, Montpellier, France Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| | | | - Benoit Nabholz
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| | - Sylvain Glémin
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, Université Montpellier, France
| |
Collapse
|
33
|
Chaurasia A, Tarallo A, Bernà L, Yagi M, Agnisola C, D’Onofrio G. Length and GC content variability of introns among teleostean genomes in the light of the metabolic rate hypothesis. PLoS One 2014; 9:e103889. [PMID: 25093416 PMCID: PMC4122358 DOI: 10.1371/journal.pone.0103889] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 07/07/2014] [Indexed: 01/30/2023] Open
Abstract
A comparative analysis of five teleostean genomes, namely zebrafish, medaka, three-spine stickleback, fugu and pufferfish was performed with the aim to highlight the nature of the forces driving both length and base composition of introns (i.e., bpi and GCi). An inter-genome approach using orthologous intronic sequences was carried out, analyzing independently both variables in pairwise comparisons. An average length shortening of introns was observed at increasing average GCi values. The result was not affected by masking transposable and repetitive elements harbored in the intronic sequences. The routine metabolic rate (mass specific temperature-corrected using the Boltzmann's factor) was measured for each species. A significant correlation held between average differences of metabolic rate, length and GC content, while environmental temperature of fish habitat was not correlated with bpi and GCi. Analyzing the concomitant effect of both variables, i.e., bpi and GCi, at increasing genomic GC content, a decrease of bpi and an increase of GCi was observed for the significant majority of the intronic sequences (from ∼40% to ∼90%, in each pairwise comparison). The opposite event, concomitant increase of bpi and decrease of GCi, was counter selected (from <1% to ∼10%, in each pairwise comparison). The results further support the hypothesis that the metabolic rate plays a key role in shaping genome architecture and evolution of vertebrate genomes.
Collapse
Affiliation(s)
- Ankita Chaurasia
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- Campus UAB - CRAG Bellaterra - Cerdanyola del Vallès, Barcelona, Spain
| | - Andrea Tarallo
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
| | - Luisa Bernà
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- Molecular Biology Unit, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Mitsuharu Yagi
- Faculty of Fisheries, Nagasaki University, Bunkyo, Nagasaki, Japan
| | - Claudio Agnisola
- Department of Biological Sciences, University of Naples Federico II, Napoli, Italy
| | - Giuseppe D’Onofrio
- Genome Evolution and Organization – Dept. Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, Italy
- * E-mail:
| |
Collapse
|
34
|
Nabeel-Shah S, Ashraf K, Pearlman RE, Fillingham J. Molecular evolution of NASP and conserved histone H3/H4 transport pathway. BMC Evol Biol 2014; 14:139. [PMID: 24951090 PMCID: PMC4082323 DOI: 10.1186/1471-2148-14-139] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Accepted: 06/12/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND NASP is an essential protein in mammals that functions in histone transport pathways and maintenance of a soluble reservoir of histones H3/H4. NASP has been studied exclusively in Opisthokonta lineages where some functional diversity has been reported. In humans, growing evidence implicates NASP miss-regulation in the development of a variety of cancers. Although a comprehensive phylogenetic analysis is lacking, NASP-family proteins that possess four TPR motifs are thought to be widely distributed across eukaryotes. RESULTS We characterize the molecular evolution of NASP by systematically identifying putative NASP orthologs across diverse eukaryotic lineages ranging from excavata to those of the crown group. We detect extensive silent divergence at the nucleotide level suggesting the presence of strong purifying selection acting at the protein level. We also observe a selection bias for high frequencies of acidic residues which we hypothesize is a consequence of their critical function(s), further indicating the role of functional constraints operating on NASP evolution. Our data indicate that TPR1 and TPR4 constitute the most rapidly evolving functional units of NASP and may account for the functional diversity observed among well characterized family members. We also show that NASP paralogs in ray-finned fish have different genomic environments with clear differences in their GC content and have undergone significant changes at the protein level suggesting functional diversification. CONCLUSION We draw four main conclusions from this study. First, wide distribution of NASP throughout eukaryotes suggests that it was likely present in the last eukaryotic common ancestor (LECA) possibly as an important innovation in the transport of H3/H4. Second, strong purifying selection operating at the protein level has influenced the nucleotide composition of NASP genes. Further, we show that selection has acted to maintain a high frequency of functionally relevant acidic amino acids in the region that interrupts TPR2. Third, functional diversity reported among several well characterized NASP family members can be explained in terms of quickly evolving TPR1 and TPR4 motifs. Fourth, NASP fish specific paralogs have significantly diverged at the protein level with NASP2 acquiring a NNR domain.
Collapse
Affiliation(s)
| | | | | | - Jeffrey Fillingham
- Department of Chemistry and Biology, Ryerson University, 350 Victoria St,, Toronto M5B 2K3, Canada.
| |
Collapse
|
35
|
Li XQ. Comparative analysis of the base compositions of the pre-mRNA 3' cleaved-off region and the mRNA 3' untranslated region relative to the genomic base composition in animals and plants. PLoS One 2014; 9:e99928. [PMID: 24941005 PMCID: PMC4062462 DOI: 10.1371/journal.pone.0099928] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2013] [Accepted: 05/20/2014] [Indexed: 12/26/2022] Open
Abstract
The precursor messenger RNA (pre-mRNA) three-prime cleaved-off region (3′COR) and the mRNA three-prime untranslated region (3′UTR) play critical roles in regulating gene expression. The differences in base composition between these regions and the corresponding genomes are still largely uncharacterized in animals and plants. In this study, the base compositions of non-redundant 3′CORs and 3′UTRs were compared with the corresponding whole genomes of eleven animals, four dicotyledonous plants, and three monocotyledonous (cereal) plants. Among the four bases (A, C, G, and U for adenine, cytosine, guanine, and uracil, respectively), U (which corresponds to T, for thymine, in DNA) was the most frequent, A the second most frequent, G the third most frequent, and C the least frequent in most of the species in both the 3′COR and 3′UTR regions. In comparison with the whole genomes, in both regions the U content was usually the most overrepresented (particularly in the monocotyledonous plants), and the C content was the most underrepresented. The order obtained for the species groups, when ranked from high to low according to the U contents in the 3′COR and 3′UTR was as follows: dicotyledonous plants, monocotyledonous plants, non-mammal animals, and mammals. In contrast, the genomic T content was highest in dicotyledonous plants, lowest in monocotyledonous plants, and intermediate in animals. These results suggest the following: 1) there is a mechanism operating in both animals and plants which is biased toward U and against C in the 3′COR and 3′UTR; 2) the 3′UTR and 3′COR, as functional units, minimized the difference between dicotyledonous and monocotyledonous plants, while the dicotyledonous and monocotyledonous genomes evolved into two extreme groups in terms of base composition.
Collapse
Affiliation(s)
- Xiu-Qing Li
- Potato Research Centre, Agriculture and Agri-Food Canada, Fredericton, New Brunswick, Canada
- * E-mail:
| |
Collapse
|
36
|
Zhang R, Ou HY, Gao F, Luo H. Identification of Horizontally-transferred Genomic Islands and Genome Segmentation Points by Using the GC Profile Method. Curr Genomics 2014; 15:113-21. [PMID: 24822029 PMCID: PMC4009839 DOI: 10.2174/1389202915999140328163125] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2013] [Revised: 11/28/2013] [Accepted: 11/29/2013] [Indexed: 11/29/2022] Open
Abstract
The nucleotide composition of genomes undergoes dramatic variations among all three kingdoms of life. GC content, an important characteristic for a genome, is related to many important functions, and therefore GC content and its distribution are routinely reported for sequenced genomes. Traditionally, GC content distribution is assessed by computing GC contents in windows that slide along the genome. Disadvantages of this routinely used window-based method include low resolution and low sensitivity. Additionally, different window sizes result in different GC content distribution patterns within the same genome. We proposed a windowless method, the GC profile, for displaying GC content variations across the genome. Compared to the window-based method, the GC profile has the following advantages: 1) higher sensitivity, because of variation-amplifying procedures; 2) higher resolution, because boundaries between domains can be determined at one single base pair; 3) uniqueness, because the GC profile is unique for a given genome and 4) the capacity to show both global and regional GC content distributions. These characteristics are useful in identifying horizontally-transferred genomic islands and homogenous GC-content domains. Here, we review the applications of the GC profile in identifying genomic islands and genome segmentation points, and in serving as a platform to integrate with other algorithms for genome analysis. A web server generating GC profiles and implementing relevant genome segmentation algorithms is available at: www.zcurve.net.
Collapse
Affiliation(s)
- Ren Zhang
- Center for Molecular Medicine and Genetics, School of Medicine, Wayne State University, Detroit, MI, USA
| | - Hong-Yu Ou
- State Key Laboratory of Microbial Metabolism and School of Life Sciences & Biotechnology, Shanghai Jiaotong University, Shanghai 200030, China
| | - Feng Gao
- Department of Physics, Tianjin University, Tianjin, 300072, China
| | - Hao Luo
- Department of Physics, Tianjin University, Tianjin, 300072, China
| |
Collapse
|
37
|
Identifying regulatory mechanisms underlying tumorigenesis using locus expression signature analysis. Proc Natl Acad Sci U S A 2014; 111:5747-52. [PMID: 24706889 DOI: 10.1073/pnas.1309293111] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Retroviral insertional mutagenesis is a powerful tool for identifying putative cancer genes in mice. To uncover the regulatory mechanisms by which common insertion loci affect downstream processes, we supplemented genotyping data with genome-wide mRNA expression profiling data for 97 tumors induced by retroviral insertional mutagenesis. We developed locus expression signature analysis, an algorithm to construct and interpret the differential gene expression signature associated with each common insertion locus. Comparing locus expression signatures to promoter affinity profiles allowed us to build a detailed map of transcription factors whose protein-level regulatory activity is modulated by a particular locus. We also predicted a large set of drugs that might mitigate the effect of the insertion on tumorigenesis. Taken together, our results demonstrate the potential of a locus-specific signature approach for identifying mammalian regulatory mechanisms in a cancer context.
Collapse
|
38
|
Kvikstad EM, Duret L. Strong heterogeneity in mutation rate causes misleading hallmarks of natural selection on indel mutations in the human genome. Mol Biol Evol 2013; 31:23-36. [PMID: 24113537 PMCID: PMC3879449 DOI: 10.1093/molbev/mst185] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.
Collapse
Affiliation(s)
- Erika M Kvikstad
- Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, CNRS, Université Lyon 1, Villeurbanne, France
| | | |
Collapse
|
39
|
Romiguier J, Ranwez V, Delsuc F, Galtier N, Douzery EJP. Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. Mol Biol Evol 2013; 30:2134-44. [PMID: 23813978 DOI: 10.1093/molbev/mst116] [Citation(s) in RCA: 117] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Despite the rapid increase of size in phylogenomic data sets, a number of important nodes on animal phylogeny are still unresolved. Among these, the rooting of the placental mammal tree is still a controversial issue. One difficulty lies in the pervasive phylogenetic conflicts among genes, with each one telling its own story, which may be reliable or not. Here, we identified a simple criterion, that is, the GC content, which substantially helps in determining which gene trees best reflect the species tree. We assessed the ability of 13,111 coding sequence alignments to correctly reconstruct the placental phylogeny. We found that GC-rich genes induced a higher amount of conflict among gene trees and performed worse than AT-rich genes in retrieving well-supported, consensual nodes on the placental tree. We interpret this GC effect mainly as a consequence of genome-wide variations in recombination rate. Indeed, recombination is known to drive GC-content evolution through GC-biased gene conversion and might be problematic for phylogenetic reconstruction, for instance, in an incomplete lineage sorting context. When we focused on the AT-richest fraction of the data set, the resolution level of the placental phylogeny was greatly increased, and a strong support was obtained in favor of an Afrotheria rooting, that is, Afrotheria as the sister group of all other placentals. We show that in mammals most conflicts among gene trees, which have so far hampered the resolution of the placental tree, are concentrated in the GC-rich regions of the genome. We argue that the GC content-because it is a reliable indicator of the long-term recombination rate-is an informative criterion that could help in identifying the most reliable molecular markers for species tree inference.
Collapse
Affiliation(s)
- Jonathan Romiguier
- CNRS, Université Montpellier, Institut des Sciences de l'Evolution, Montpellier, France.
| | | | | | | | | |
Collapse
|
40
|
Carels N, Frías D. A Statistical Method without Training Step for the Classification of Coding Frame in Transcriptome Sequences. Bioinform Biol Insights 2013; 7:35-54. [PMID: 23400232 PMCID: PMC3561939 DOI: 10.4137/bbi.s10053] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
In this study, we investigated the modalities of coding open reading frame (cORF) classification of expressed sequence tags (EST) by using the universal feature method (UFM). The UFM algorithm is based on the scoring of purine bias (Rrr) and stop codon frequencies. UFM classifies ORFs as coding or non-coding through a score based on 5 factors: (i) stop codon frequency; (ii) the product of the probabilities of purines occurring in the three positions of nucleotide triplets; (iii) the product of the probabilities of Cytosine (C), Guanine (G), and Adenine (A) occurring in the 1st, 2nd, and 3rd positions of triplets, respectively; (iv) the probabilities of a G occurring in the 1st and 2nd positions of triplets; and (v) the probabilities of a T occurring in the 1st and an A in the 2nd position of triplets. Because UFM is based on primary determinants of coding sequences that are conserved throughout the biosphere, it is suitable for cORF classification of any sequence in eukaryote transcriptomes without prior knowledge. Considering the protein sequences of the Protein Data Bank (RCSB PDB or more simply PDB) as a reference, we found that UFM classifies cORFs of ≥200 bp (if the coding strand is known) and cORFs of ≥300 bp (if the coding strand is unknown), and releases them in their coding strand and coding frame, which allows their automatic translation into protein sequences with a success rate equal to or higher than 95%. We first established the statistical parameters of UFM using ESTs from Plasmodium falciparum, Arabidopsis thaliana, Oryza sativa, Zea mays, Drosophila melanogaster, Homo sapiens and Chlamydomonas reinhardtii in reference to the protein sequences of PDB. Second, we showed that the success rate of cORF classification using UFM is expected to apply to approximately 95% of higher eukaryote genes that encode for proteins. Third, we used UFM in combination with CAP3 to assemble large EST samples into cORFs that we used to analyze transcriptome phenotypes in rice, maize, and humans. We discuss the error rate and the interference of noisy sequences such as pseudogenes, transposons, and retrotransposons. This method is suitable for rapid cORF extraction from transcriptome data and allows correct description of the genome phenotypes of plant genomes without prior knowledge. Additional care is necessary when addressing the human transcriptome due to the interference caused by large amounts of noisy sequences. UFM can be regarded as a low complexity tool for prior knowledge extraction concerning the coding fraction of the transcriptome of any eukaryote. Due to its low level of complexity, UFM is also very robust to variations of codon usage.
Collapse
Affiliation(s)
- Nicolas Carels
- Fundação Oswaldo Cruz (FIOCRUZ), Instituto Oswaldo Cruz (IOC), Laboratório de Genômica Funcional e Bioinformática, Rio de Janeiro, RJ, Brazil
| | | |
Collapse
|
41
|
A pronounced evolutionary shift of the pseudoautosomal region boundary in house mice. Mamm Genome 2012; 23:454-66. [PMID: 22763584 DOI: 10.1007/s00335-012-9403-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2012] [Accepted: 06/07/2012] [Indexed: 10/28/2022]
Abstract
The pseudoautosomal region (PAR) is essential for the accurate pairing and segregation of the X and Y chromosomes during meiosis. Despite its functional significance, the PAR shows substantial evolutionary divergence in structure and sequence between mammalian species. An instructive example of PAR evolution is the house mouse Mus musculus domesticus (represented by the C57BL/6J strain), which has the smallest PAR among those that have been mapped. In C57BL/6J, the PAR boundary is located just ~700 kb from the distal end of the X chromosome, whereas the boundary is found at a more proximal position in Mus spretus, a species that diverged from house mice 2-4 million years ago. In this study we used a combination of genetic and physical mapping to document a pronounced shift in the PAR boundary in a second house mouse subspecies, Mus musculus castaneus (represented by the CAST/EiJ strain), ~430 kb proximal of the M. m. domesticus boundary. We demonstrate molecular evolutionary consequences of this shift, including a marked lineage-specific increase in sequence divergence within Mid1, a gene that resides entirely within the M. m. castaneus PAR but straddles the boundary in other subspecies. Our results extend observations of structural divergence in the PAR to closely related subspecies, pointing to major evolutionary changes in this functionally important genomic region over a short time period.
Collapse
|
42
|
Nam K, Ellegren H. Recombination drives vertebrate genome contraction. PLoS Genet 2012; 8:e1002680. [PMID: 22570634 PMCID: PMC3342960 DOI: 10.1371/journal.pgen.1002680] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 03/15/2012] [Indexed: 11/19/2022] Open
Abstract
Selective and/or neutral processes may govern variation in DNA content and, ultimately, genome size. The observation in several organisms of a negative correlation between recombination rate and intron size could be compatible with a neutral model in which recombination is mutagenic for length changes. We used whole-genome data on small insertions and deletions within transposable elements from chicken and zebra finch to demonstrate clear links between recombination rate and a number of attributes of reduced DNA content. Recombination rate was negatively correlated with the length of introns, transposable elements, and intergenic spacer and with the rate of short insertions. Importantly, it was positively correlated with gene density, the rate of short deletions, the deletion bias, and the net change in sequence length. All these observations point at a pattern of more condensed genome structure in regions of high recombination. Based on the observed rates of small insertions and deletions and assuming that these rates are representative for the whole genome, we estimate that the genome of the most recent common ancestor of birds and lizards has lost nearly 20% of its DNA content up until the present. Expansion of transposable elements can counteract the effect of deletions in an equilibrium mutation model; however, since the activity of transposable elements has been low in the avian lineage, the deletion bias is likely to have had a significant effect on genome size evolution in dinosaurs and birds, contributing to the maintenance of a small genome. We also demonstrate that most of the observed correlations between recombination rate and genome contraction parameters are seen in the human genome, including for segregating indel polymorphisms. Our data are compatible with a neutral model in which recombination drives vertebrate genome size evolution and gives no direct support for a role of natural selection in this process. One major implication from genetic work done several decades ago is that the genome contains a lot of sequences that do not constitute genes or other functional elements. The total amount of DNA—the genome size—is thus not necessarily an indicator of DNA complexity or organismal complexity, an observation often referred to as the C-value paradox (C-value being a measure of DNA content). What then is it that determines genome size? One model posits that the evolution of genome size is not a consequence of natural selection but is instead governed by the incidence and character of naturally occurring mutations that affect the length of DNA, a process that is not affected by selection. Here we present the results of an analysis of how recombination affects the size of avian and human genomes. We find strong evidence that the rate of recombination is a driving force of genome size evolution. In regions of the genome where recombination occurs frequently, the loss of DNA caused by small deletions is particularly pronounced. Our simulations show that the effect of such recombination-driven genome contraction can be profound over evolutionary time scales. These observations lead to a model in which recombination is mutagenic for length changes and that the incidence of deletions increases with increasing recombination rate. Although we cannot formally exclude that natural selection contributes to the observed relationship between recombination and genome contraction, we find no evidence to support such a scenario.
Collapse
Affiliation(s)
| | - Hans Ellegren
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
43
|
Matoulkova E, Michalova E, Vojtesek B, Hrstka R. The role of the 3' untranslated region in post-transcriptional regulation of protein expression in mammalian cells. RNA Biol 2012; 9:563-76. [PMID: 22614827 DOI: 10.4161/rna.20231] [Citation(s) in RCA: 253] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The untranslated regions (UTRs) at the 3'end of mRNA transcripts contain important sequences that influence the fate of mRNA and thus proteosynthesis. In this review, we summarize the information known to date about 3'end processing, sequence characteristics including related binding proteins and the role of 3'UTRs in several selected signaling pathways to delineate their importance in the regulatory processes in mammalian cells. In addition to reviewing recent advances in the more well known aspects, such as cleavage and polyadenylation processes that influence mRNA stability and location, we concentrate on some newly emerging concepts of the role of the 3'UTR, including alternative polyadenylation sites in relation to proliferation and differentiation and the recognition of the multi-functional properties of non-coding RNAs, including miRNAs that commonly target the 3'UTR. The emerging picture is of a highly complex set of regulatory systems that include autoregulation, cooperativity and competition to fine tune proteosynthesis in context-dependent manners.
Collapse
|
44
|
Fujita MK, Edwards SV, Ponting CP. The Anolis lizard genome: an amniote genome without isochores. Genome Biol Evol 2011; 3:974-84. [PMID: 21795750 PMCID: PMC3184785 DOI: 10.1093/gbe/evr072] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Isochores are large regions of relatively homogeneous nucleotide composition and are present in the genomes of all mammals and birds that have been sequenced to date. The newly sequenced genome of Anolis carolinensis provides the first opportunity to quantify isochore structure in a nonavian reptile. We find Anolis to have the most compositionally homogeneous genome of all amniotes sequenced thus far, a homogeneity exceeding that for the frog Xenopus. Based on a Bayesian algorithm, Anolis has smaller and less GC-rich isochores compared with human and chicken. Correlates generally associated with GC-rich isochores, including shorter introns and higher gene density, have all but disappeared from the Anolis genome. Using genic GC as a proxy for isochore structure so as to compare with other vertebrates, we found that GC content has substantially decreased in the lineage leading to Anolis since diverging from the common ancestor of Reptilia ∼275 Ma, perhaps reflecting weakened or reversed GC-biased gene conversion, a nonadaptive substitution process that is thought to be important in the maintenance and trajectory of isochore evolution. Our results demonstrate that GC composition in Anolis is not associated with important features of genome structure, including gene density and intron size, in contrast to patterns seen in mammal and bird genomes.
Collapse
Affiliation(s)
- Matthew K Fujita
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA.
| | | | | |
Collapse
|
45
|
Clément Y, Arndt PF. Substitution patterns are under different influences in primates and rodents. Genome Biol Evol 2011; 3:236-45. [PMID: 21339508 PMCID: PMC3068003 DOI: 10.1093/gbe/evr011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
There are large-scale variations of the GC-content along mammalian chromosomes that have been called isochore structures. Primates and rodents have different isochore structures, which suggests that these lineages exhibit different modes of GC-content evolution. It has been shown that, in the human lineage, GC-biased gene conversion (gBGC), a neutral process associated with meiotic recombination, acts on GC-content evolution by influencing A or T to G or C substitution rates. We computed genome-wide substitution patterns in the mouse lineage from multiple alignments and compared them with substitution patterns in the human lineage. We found that in the mouse lineage, gBGC is active but weaker than in the human lineage and that male-specific recombination better predicts GC-content evolution than female-specific recombination. Furthermore, we were able to show that G or C to A or T substitution rates are predicted by a combination of different factors in both lineages. A or T to G or C substitution rates are most strongly predicted by meiotic recombination in the human lineage but by CpG odds ratio (the observed CpG frequency normalized by the expected CpG frequency) in the mouse lineage, suggesting that substitution patterns are under different influences in primates and rodents.
Collapse
Affiliation(s)
- Yves Clément
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | |
Collapse
|
46
|
Janes DE, Organ CL, Fujita MK, Shedlock AM, Edwards SV. Genome evolution in Reptilia, the sister group of mammals. Annu Rev Genomics Hum Genet 2010; 11:239-64. [PMID: 20590429 DOI: 10.1146/annurev-genom-082509-141646] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genomes of birds and nonavian reptiles (Reptilia) are critical for understanding genome evolution in mammals and amniotes generally. Despite decades of study at the chromosomal and single-gene levels, and the evidence for great diversity in genome size, karyotype, and sex chromosome diversity, reptile genomes are virtually unknown in the comparative genomics era. The recent sequencing of the chicken and zebra finch genomes, in conjunction with genome scans and the online publication of the Anolis lizard genome, has begun to clarify the events leading from an ancestral amniote genome--predicted to be large and to possess a diverse repeat landscape on par with mammals and a birdlike sex chromosome system--to the small and highly streamlined genomes of birds. Reptilia exhibit a wide range of evolutionary rates of different subgenomes and, from isochores to mitochondrial DNA, provide a critical contrast to the genomic paradigms established in mammals.
Collapse
Affiliation(s)
- Daniel E Janes
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | | | | | | | | |
Collapse
|
47
|
Oota S, Kawamura K, Kawai Y, Saitou N. A new framework for studying the isochore evolution: estimation of the equilibrium GC content based on the temporal mutation rate model. Genome Biol Evol 2010; 2:558-71. [PMID: 20675617 PMCID: PMC2997559 DOI: 10.1093/gbe/evq041] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Isochore is the genome-wide mosaic structure in guanine-cytosine (GC) content. The origin of isochores is thought to have emerged in the ancestral amniote genome, and the GC-rich isochore is eroded in the mammalian lineages. However, there are many enigmas in the isochore evolution: 1) although all the mammalians, birds, and even reptiles, which are clearly polyphyletic, have isochore, opossum and platypus lack GC-rich and GC-poor isochore classes; 2) although the isochore is predicted to vanish according to a fairly robust theory, a completely opposite conclusion was led in some mammalian lineages; and 3) the major three hypotheses on the isochore evolution cannot explain observed evidences without flaws. So far compositional evolution has been studied under the assumption that per base pair rate of GC→AT (u) and AT→GC (v) mutations are temporally constant (the constant model). With this model alone, however, it is difficult to explain the isochore evolution. We propose a simple model for compositional evolution based on the temporal per base pair rate of mutations (the variable model). In this model, rates u and v vary depending on temporal GC contents. Mathematically, the variable model is an expansion of the constant model. By using high-density human single nucleotide polymorphism data, we compared the variable model with the constant model. Although the variable model gave consistent results with the constant model, it can potentially describe the complicated isochore evolution, which the constant model cannot explain. The versatile characteristics of the variable model may shed new light on the mysterious isochore evolution.
Collapse
Affiliation(s)
- Satoshi Oota
- Department of Biological Systems, RIKEN BioResource Center, Tsukuba, 305-0074 Japan.
| | | | | | | |
Collapse
|
48
|
Fahey ME, Mills W, Higgins DG, Moore T. Maternally and paternally silenced imprinted genes differ in their intron content. Comp Funct Genomics 2010; 5:572-83. [PMID: 18629181 PMCID: PMC2447473 DOI: 10.1002/cfg.437] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2004] [Revised: 11/01/2004] [Accepted: 11/12/2004] [Indexed: 12/31/2022] Open
Abstract
Imprinted genes exhibit silencing of one of the parental alleles during embryonic development. In a previous study imprinted genes were found to have reduced intron content relative to a non-imprinted control set (Hurst et al., 1996). However, due to the small sample size, it was not possible to analyse the source of this effect. Here, we re-investigate this observation using larger datasets of imprinted and control (non-imprinted) genes that allow us to consider mouse and human, and maternally and paternally silenced, imprinted genes separately. We find that, in the human and mouse, there is reduced intron content in the maternally silenced imprinted genes relative to a non-imprinted control set. Among imprinted genes, a strong bias is also observed in the distribution of intronless genes, which are found exclusively in the maternally silenced dataset. The paternally silenced dataset in the human is not different to the control set; however, the mouse paternally silenced dataset has more introns than the control group. A direct comparison of mouse maternally and paternally silenced imprinted gene datasets shows that they differ significantly with respect to a variety of intron-related parameters. We discuss a variety of possible explanations for our observations.
Collapse
Affiliation(s)
- Marie E Fahey
- Department of Biochemistry, Biosciences Institute, University College Cork, College Road, Cork, Ireland
| | | | | | | |
Collapse
|
49
|
Dunham I, Beare DM, Collins JE. The characteristics of human genes: analysis of human chromosome 22. Comp Funct Genomics 2010; 4:635-46. [PMID: 18629020 PMCID: PMC2447302 DOI: 10.1002/cfg.335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2003] [Revised: 09/04/2003] [Accepted: 09/08/2003] [Indexed: 11/11/2022] Open
Affiliation(s)
- Ian Dunham
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | |
Collapse
|
50
|
Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA. GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 2010; 11:308. [PMID: 20470436 PMCID: PMC2895627 DOI: 10.1186/1471-2164-11-308] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Accepted: 05/16/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The third, or wobble, position in a codon provides a high degree of possible degeneracy and is an elegant fault-tolerance mechanism. Nucleotide biases between organisms at the wobble position have been documented and correlated with the abundances of the complementary tRNAs. We and others have noticed a bias for cytosine and guanine at the third position in a subset of transcripts within a single organism. The bias is present in some plant species and warm-blooded vertebrates but not in all plants, or in invertebrates or cold-blooded vertebrates. RESULTS Here we demonstrate that in certain organisms the amount of GC at the wobble position (GC3) can be used to distinguish two classes of genes. We highlight the following features of genes with high GC3 content: they (1) provide more targets for methylation, (2) exhibit more variable expression, (3) more frequently possess upstream TATA boxes, (4) are predominant in certain classes of genes (e.g., stress responsive genes) and (5) have a GC3 content that increases from 5'to 3'. These observations led us to formulate a hypothesis to explain GC3 bimodality in grasses. CONCLUSIONS Our findings suggest that high levels of GC3 typify a class of genes whose expression is regulated through DNA methylation or are a legacy of accelerated evolution through gene conversion. We discuss the three most probable explanations for GC3 bimodality: biased gene conversion, transcriptional and translational advantage and gene methylation.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.
| | | | | | | |
Collapse
|