Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Duret L, Mouchiroud D, Gautier C. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J Mol Evol 1995;40:308-17. [PMID: 7723057 DOI: 10.1007/bf00163235] [Citation(s) in RCA: 186] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

For:	Duret L, Mouchiroud D, Gautier C. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J Mol Evol 1995;40:308-17. [PMID: 7723057 DOI: 10.1007/bf00163235] [Citation(s) in RCA: 186] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

Number

Cited by Other Article(s)

Bai MZ, Guo YY. Bioinformatics Analysis of MSH1 Genes of Green Plants: Multiple Parallel Length Expansions, Intron Gains and Losses, Partial Gene Duplications, and Alternative Splicing. Int J Mol Sci 2023;24:13620. [PMID: 37686425 PMCID: PMC10487979 DOI: 10.3390/ijms241713620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 08/28/2023] [Accepted: 08/29/2023] [Indexed: 09/10/2023] Open

Abstract

MutS homolog 1 (MSH1) is involved in the recombining and repairing of organelle genomes and is essential for maintaining their stability. Previous studies indicated that the length of the gene varied greatly among species and detected species-specific partial gene duplications in Physcomitrella patens. However, there are critical gaps in the understanding of the gene size expansion, and the extent of the partial gene duplication of MSH1 remains unclear. Here, we screened MSH1 genes in 85 selected species with genome sequences representing the main clades of green plants (Viridiplantae). We identified the MSH1 gene in all lineages of green plants, except for nine incomplete species, for bioinformatics analysis. The gene is a singleton gene in most of the selected species with conserved amino acids and protein domains. Gene length varies greatly among the species, ranging from 3234 bp in Ostreococcus tauri to 805,861 bp in Cycas panzhihuaensis. The expansion of MSH1 repeatedly occurred in multiple clades, especially in Gymnosperms, Orchidaceae, and Chloranthus spicatus. MSH1 has exceptionally long introns in certain species due to the gene length expansion, and the longest intron even reaches 101,025 bp. And the gene length is positively correlated with the proportion of the transposable elements (TEs) in the introns. In addition, gene structure analysis indicated that the MSH1 of green plants had undergone parallel intron gains and losses in all major lineages. However, the intron number of seed plants (gymnosperm and angiosperm) is relatively stable. All the selected gymnosperms contain 22 introns except for Gnetum montanum and Welwitschia mirabilis, while all the selected angiosperm species preserve 21 introns except for the ANA grade. Notably, the coding region of MSH1 in algae presents an exceptionally high GC content (47.7% to 75.5%). Moreover, over one-third of the selected species contain species-specific partial gene duplications of MSH1, except for the conserved mosses-specific partial gene duplication. Additionally, we found conserved alternatively spliced MSH1 transcripts in five species. The study of MSH1 sheds light on the evolution of the long genes of green plants.

Collapse

Genome Evolution and the Future of Phylogenomics of Non-Avian Reptiles. Animals (Basel) 2023;13:ani13030471. [PMID: 36766360 PMCID: PMC9913427 DOI: 10.3390/ani13030471] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/13/2023] [Accepted: 01/15/2023] [Indexed: 02/01/2023] Open

Srikulnath K, Ahmad SF, Singchat W, Panthum T. Why Do Some Vertebrates Have Microchromosomes? Cells 2021;10:2182. [PMID: 34571831 PMCID: PMC8466491 DOI: 10.3390/cells10092182] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 08/17/2021] [Accepted: 08/17/2021] [Indexed: 12/27/2022] Open

Affiliation(s)

Kornsorn Srikulnath Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.) Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand The International Undergraduate Program in Bioscience and Technology, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand Amphibian Research Center, Hiroshima University, 1-3-1, Kagamiyama, Higashihiroshima 739-8526, Japan
Syed Farhan Ahmad Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.) Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand The International Undergraduate Program in Bioscience and Technology, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
Worapong Singchat Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.) Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
Thitipong Panthum Animal Genomics and Bioresource Research Center (AGB Research Center), Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (T.P.) Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand

Collapse

Top O, Milferstaedt SWL, van Gessel N, Hoernstein SNW, Özdemir B, Decker EL, Reski R. Expression of a human cDNA in moss results in spliced mRNAs and fragmentary protein isoforms. Commun Biol 2021;4:964. [PMID: 34385580 PMCID: PMC8361020 DOI: 10.1038/s42003-021-02486-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 07/26/2021] [Indexed: 12/18/2022] Open

Kumar U, Khandia R, Singhal S, Puranik N, Tripathi M, Pateriya AK, Khan R, Emran TB, Dhama K, Munjal A, Alqahtani T, Alqahtani AM. Insight into Codon Utilization Pattern of Tumor Suppressor Gene EPB41L3 from Different Mammalian Species Indicates Dominant Role of Selection Force. Cancers (Basel) 2021;13:cancers13112739. [PMID: 34205890 PMCID: PMC8198080 DOI: 10.3390/cancers13112739] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/27/2021] [Accepted: 05/27/2021] [Indexed: 12/13/2022] Open

Abstract

Simple Summary

The present study envisaged the codon usage pattern analysis of tumor suppressor gene EPB41L3 for the human, brown rat, domesticated cattle, and Sumatran orangutan. Most amino acids are coded by more than one synonymous codon, but they are used in a biased manner. The codon usage bias results from multiple factors like compositional properties, dinucleotide abundance, neutrality, parity, tRNA pool, etc. Understanding codon bias is central to fields as diverse as molecular evolution, gene expressivity, protein translation, and protein folding. This kind of studies is important to see the effects of various evolutionary forces on codon usage. The present study indicated that the selection force is dominant over other forces shaping codon usage in the envisaged organisms.

Abstract

Uneven codon usage within genes as well as among genomes is a usual phenomenon across organisms. It plays a significant role in the translational efficiency and evolution of a particular gene. EPB41L3 is a tumor suppressor protein-coding gene, and in the present study, the pattern of codon usage was envisaged. The full-length sequences of the EPB41L3 gene for the human, brown rat, domesticated cattle, and Sumatran orangutan available at the NCBI were retrieved and utilized to analyze CUB patterns across the selected mammalian species. Compositional properties, dinucleotide abundance, and parity analysis showed the dominance of A and G whilst RSCU analysis indicated the dominance of G/C-ending codons. The neutrality plot plotted between GC12 and GC3 to determine the variation between the mutation pressure and natural selection indicated the dominance of selection pressure (R = 0.926; p < 0.00001) over the three codon positions across the gene. The result is in concordance with the codon adaptation index analysis and the ENc-GC3 plot analysis, as well as the translational selection index (P2). Overall selection pressure is the dominant pressure acting during the evolution of the EPB41L3 gene.

Collapse

Characterization of microsatellites in the endangered snow leopard based on the chromosome-level genome. MAMMAL RES 2021. [DOI: 10.1007/s13364-021-00563-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Poverennaya IV, Roytberg MA. Spliceosomal Introns: Features, Functions, and Evolution. BIOCHEMISTRY (MOSCOW) 2021;85:725-734. [PMID: 33040717 DOI: 10.1134/s0006297920070019] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Symonová R, Suh A. Nucleotide composition of transposable elements likely contributes to AT/GC compositional homogeneity of teleost fish genomes. Mob DNA 2019;10:49. [PMID: 31857829 PMCID: PMC6909575 DOI: 10.1186/s13100-019-0195-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 12/05/2019] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Teleost fish genome size has been repeatedly demonstrated to positively correlate with the proportion of transposable elements (TEs). This finding might have far-reaching implications for our understanding of the evolution of nucleotide composition across vertebrates. Genomes of fish and amphibians are GC homogenous, with non-teleost gars being the single exception identified to date, whereas birds and mammals are AT/GC heterogeneous. The exact reason for this phenomenon remains controversial. Since TEs make up significant proportions of genomes and can quickly accumulate across genomes, they can potentially influence the host genome with their own GC content (GC%). However, the GC% of fish TEs has so far been neglected.

RESULTS

The genomic proportion of TEs indeed correlates with genome size, although not as linearly as previously shown with fewer genomes, and GC% negatively correlates with genome size in the 33 fish genome assemblies analysed here (excluding salmonids). GC% of fish TE consensus sequences positively correlates with the corresponding genomic GC% in 29 species tested. Likewise, the GC contents of the entire repetitive vs. non-repetitive genomic fractions correlate positively in 54 fish species in Ensembl. However, among these fish species, there is also a wide variation in GC% between the main groups of TEs. Class II DNA transposons, predominant TEs in fish genomes, are significantly GC-poorer than Class I retrotransposons. The AT/GC heterogeneous gar genome contains fewer Class II TEs, a situation similar to fugu with its extremely compact and also GC-enriched but AT/GC homogenous genome.

CONCLUSION

Our results reveal a previously overlooked correlation between GC% of fish genomes and their TEs. This applies to both TE consensus sequences as well as the entire repetitive genomic fraction. On the other hand, there is a wide variation in GC% across fish TE groups. These results raise the question whether GC% of TEs evolves independently of GC% of the host genome or whether it is driven by TE localization in the host genome. Answering these questions will help to understand how genomic GC% is shaped over time. Long-term accumulation of GC-poor(er) Class II DNA transposons might indeed have influenced AT/GC homogenization of fish genomes and requires further investigation.

Collapse

Huttener R, Thorrez L, In't Veld T, Granvik M, Snoeck L, Van Lommel L, Schuit F. GC content of vertebrate exome landscapes reveal areas of accelerated protein evolution. BMC Evol Biol 2019;19:144. [PMID: 31311498 PMCID: PMC6636035 DOI: 10.1186/s12862-019-1469-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 06/26/2019] [Indexed: 11/10/2022] Open

Wu X, Kabalane H, Kahli M, Petryk N, Laperrousaz B, Jaszczyszyn Y, Drillon G, Nicolini FE, Perot G, Robert A, Fund C, Chibon F, Xia R, Wiels J, Argoul F, Maguer-Satta V, Arneodo A, Audit B, Hyrien O. Developmental and cancer-associated plasticity of DNA replication preferentially targets GC-poor, lowly expressed and late-replicating regions. Nucleic Acids Res 2019;46:10157-10172. [PMID: 30189101 PMCID: PMC6212843 DOI: 10.1093/nar/gky797] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 08/24/2018] [Indexed: 01/08/2023] Open

Affiliation(s)

Xia Wu Institut de Biologie de l'École Normale Supérieure (IBENS), Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France.,Physics Department, East China Normal University, Shanghai, China
Hadi Kabalane Univ Lyon, ENS de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon, France
Malik Kahli Institut de Biologie de l'École Normale Supérieure (IBENS), Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
Nataliya Petryk Institut de Biologie de l'École Normale Supérieure (IBENS), Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
Bastien Laperrousaz Univ Lyon, ENS de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon, France.,CNRS UMR5286, INSERM U1052, Centre de Recherche en Cancérologie de Lyon, F- 69008 Lyon, France
Yan Jaszczyszyn Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, France
Guenola Drillon Univ Lyon, ENS de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon, France
Frank-Emmanuel Nicolini CNRS UMR5286, INSERM U1052, Centre de Recherche en Cancérologie de Lyon, F- 69008 Lyon, France.,Centre Léon Bérard, F-69008 Lyon, France
Gaëlle Perot INSERM U1218, Institut Bergonié, F-33000 Bordeaux, France
Aude Robert UMR 8126, Université Paris-Sud Paris-Saclay, CNRS, Institut Gustave Roussy, Villejuif, France
Cédric Fund École Normale Supérieure, PSL Research University, CNRS, Inserm, IBENS, Plateforme Génomique, 75005 Paris, France
Frédéric Chibon INSERM U1218, Institut Bergonié, F-33000 Bordeaux, France
Ruohong Xia Physics Department, East China Normal University, Shanghai, China
Joëlle Wiels UMR 8126, Université Paris-Sud Paris-Saclay, CNRS, Institut Gustave Roussy, Villejuif, France
Françoise Argoul LOMA, Université de Bordeaux, CNRS, UMR 5798, F-33405 Talence, France
Véronique Maguer-Satta CNRS UMR5286, INSERM U1052, Centre de Recherche en Cancérologie de Lyon, F- 69008 Lyon, France
Alain Arneodo LOMA, Université de Bordeaux, CNRS, UMR 5798, F-33405 Talence, France
Benjamin Audit Univ Lyon, ENS de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon, France
Olivier Hyrien Institut de Biologie de l'École Normale Supérieure (IBENS), Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France

Collapse

Ming Z, Chen Q, Chen N, Lin M, Liu N, Hu J, Xiao X. Eliminating the secondary structure of targeting strands for enhancement of DNA probe based low-abundance point mutation detection. Anal Chim Acta 2019;1075:137-143. [PMID: 31196419 DOI: 10.1016/j.aca.2019.05.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 04/25/2019] [Accepted: 05/05/2019] [Indexed: 10/26/2022]

Uddin A, Paul N, Chakraborty S. The codon usage pattern of genes involved in ovarian cancer. Ann N Y Acad Sci 2019;1440:67-78. [PMID: 30843242 DOI: 10.1111/nyas.14019] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 01/04/2019] [Accepted: 01/14/2019] [Indexed: 12/20/2022]

MapToGenome: A Comparative Genomic Tool that Aligns Transcript Maps to Sequenced Genomes. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Kabir M, Barradas A, Tzotzos GT, Hentges KE, Doig AJ. Properties of genes essential for mouse development. PLoS One 2017;12:e0178273. [PMID: 28562614 PMCID: PMC5451031 DOI: 10.1371/journal.pone.0178273] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 05/10/2017] [Indexed: 12/20/2022] Open

Abstract

Essential genes are those that are critical for life. In the specific case of the mouse, they are the set of genes whose deletion means that a mouse is unable to survive after birth. As such, they are the key minimal set of genes needed for all the steps of development to produce an organism capable of life ex utero. We explored a wide range of sequence and functional features to characterise essential (lethal) and non-essential (viable) genes in mice. Experimental data curated manually identified 1301 essential genes and 3451 viable genes. Very many sequence features show highly significant differences between essential and viable mouse genes. Essential genes generally encode complex proteins, with multiple domains and many introns. These genes tend to be: long, highly expressed, old and evolutionarily conserved. These genes tend to encode ligases, transferases, phosphorylated proteins, intracellular proteins, nuclear proteins, and hubs in protein-protein interaction networks. They are involved with regulating protein-protein interactions, gene expression and metabolic processes, cell morphogenesis, cell division, cell proliferation, DNA replication, cell differentiation, DNA repair and transcription, cell differentiation and embryonic development. Viable genes tend to encode: membrane proteins or secreted proteins, and are associated with functions such as cellular communication, apoptosis, behaviour and immune response, as well as housekeeping and tissue specific functions. Viable genes are linked to transport, ion channels, signal transduction, calcium binding and lipid binding, consistent with their location in membranes and involvement with cell-cell communication. From the analysis of the composite features of essential and viable genes, we conclude that essential genes tend to be required for intracellular functions, and viable genes tend to be involved with extracellular functions and cell-cell communication. Knowledge of the features that are over-represented in essential genes allows for a deeper understanding of the functions and processes implemented during mammalian development.

Collapse

Sievers A, Bosiek K, Bisch M, Dreessen C, Riedel J, Froß P, Hausmann M, Hildenbrand G. K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features. Genes (Basel) 2017;8:E122. [PMID: 28422050 PMCID: PMC5406869 DOI: 10.3390/genes8040122] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 03/24/2017] [Accepted: 04/04/2017] [Indexed: 12/26/2022] Open

Abstract

In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers) was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4) on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B) of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs), which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST) and the high pace of standard k-mer analysis.

Collapse

Yao L, Tan KWM, Tan TW, Lee YK. Exploring the transcriptome of non-model oleaginous microalga Dunaliella tertiolecta through high-throughput sequencing and high performance computing. BMC Bioinformatics 2017;18:122. [PMID: 28228091 PMCID: PMC5322580 DOI: 10.1186/s12859-017-1551-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2016] [Accepted: 02/16/2017] [Indexed: 12/31/2022] Open

Abstract

Background

RNA-Seq technology has received a lot of attention in recent years for microalgal global transcriptomic profiling. It is widely used in transcriptome-wide analysis of gene expression., particularly for microalgal strains with potential as biofuel sources. However, insufficient genomic or transcriptomic information of non-model microalgae has limited the understanding of their regulatory mechanisms and hampered genetic manipulation to enhance biofuel production. As such, an optimal microalgal transcriptomic database construction is a subject of urgent investigation.

Results

Dunaliella tertiolecta, a non-model oleaginous microalgal species, was sequenced via Illumina MISEQ and HISEQ 4000 in RNA-Seq studies. The high quality high-throughout sequencing data were explored using high performance computing (HPC) in a petascale data center and subjected to de novo assembly and parallelized mpiBLASTX search with multiple species. As a result, a transcriptome database of 17,845 was constructed (~95% completeness). This enlarged database constructed fueled the RNA-Seq data analysis, which was validated by a nitrogen deprivation (ND) study that induces triacylglycerol (TAG) production.

Conclusions

The new paralleled assembly and annotation method under HPC presented here allows the solution of large-scale data processing problems in acceptable computation time. There is significant increase in the number of transcriptomic data achieved and observable heterogeneity in the performance to identify differentially expressed genes in the ND treatment paradigm. The results provide new insights as to how response to ND treatment in microalgae is regulated. ND analyses highlight the advantages of this database generated in this study that could also serve as a useful resource for future gene manipulation and transcriptome-wide analysis. We thus demonstrate the usefulness of exploring the transcriptome as an informative platform for functional studies and genetic manipulations in similar species.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1551-x) contains supplementary material, which is available to authorized users.

Collapse

Liu S, Hou W, Sun T, Xu Y, Li P, Yue B, Fan Z, Li J. Genome-wide mining and comparative analysis of microsatellites in three macaque species. Mol Genet Genomics 2017;292:537-550. [PMID: 28160080 DOI: 10.1007/s00438-017-1289-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Accepted: 01/09/2017] [Indexed: 12/13/2022]

Liu H, Jia Y, Sun X, Tian D, Hurst LD, Yang S. Direct Determination of the Mutation Rate in the Bumblebee Reveals Evidence for Weak Recombination-Associated Mutation and an Approximate Rate Constancy in Insects. Mol Biol Evol 2017;34:119-130. [PMID: 28007973 PMCID: PMC5854123 DOI: 10.1093/molbev/msw226] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Symonová R, Majtánová Z, Arias-Rodriguez L, Mořkovský L, Kořínková T, Cavin L, Pokorná MJ, Doležálková M, Flajšhans M, Normandeau E, Ráb P, Meyer A, Bernatchez L. Genome Compositional Organization in Gars Shows More Similarities to Mammals than to Other Ray-Finned Fish. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2016;328:607-619. [DOI: 10.1002/jez.b.22719] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 11/13/2016] [Accepted: 11/22/2016] [Indexed: 12/12/2022]

Affiliation(s)

Radka Symonová Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic Research Institute for Limnology; University of Innsbruck; Mondsee Austria
Zuzana Majtánová Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
Lenin Arias-Rodriguez División Académica de Ciencias Biológicas; Universidad Juárez Autónoma de Tabasco (UJAT); Villahermosa Tabasco México
Libor Mořkovský Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
Tereza Kořínková Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
Lionel Cavin Muséum d'Histoire Naturelle; Geneva 6 Switzerland
Martina Johnson Pokorná Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic Department of Ecology; Faculty of Science; Charles University; Prague 2 Czech Republic
Marie Doležálková Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic Department of Zoology; Faculty of Science; Charles University; Prague 2 Czech Republic
Martin Flajšhans Faculty of Fisheries and Protection of Waters; South Bohemian Research Centre of Aquaculture and Biodiversity of Hydrocenoses; University of South Bohemia in České Budějovice; Vodňany Czech Republic
Eric Normandeau IBIS, Department of Biology, University Laval, Pavillon Charles-Eugène-Marchand; Avenue de la Médecine Quebec City; Canada
Petr Ráb Laboratory of Fish Genetics; Institute of Animal Physiology and Genetics; The Czech Academy of Sciences; Liběchov Czech Republic
Axel Meyer Chair in Zoology and Evolutionary Biology; Department of Biology; University of Konstanz; Konstanz Germany
Louis Bernatchez IBIS, Department of Biology, University Laval, Pavillon Charles-Eugène-Marchand; Avenue de la Médecine Quebec City; Canada

Collapse

Sizova TV, Karpova OI. The length of chromatin loops in meiotic prophase I of warm-blooded vertebrates depends on the DNA compositional organization. RUSS J GENET+ 2016. [DOI: 10.1134/s1022795416110144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Yin H, Wang G, Ma L, Yi SV, Zhang Z. What Signatures Dominantly Associate with Gene Age? Genome Biol Evol 2016;8:3083-3089. [PMID: 27609935 PMCID: PMC5174733 DOI: 10.1093/gbe/evw216] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Savisaar R, Hurst LD. Purifying Selection on Exonic Splice Enhancers in Intronless Genes. Mol Biol Evol 2016;33:1396-418. [PMID: 26802218 PMCID: PMC4868121 DOI: 10.1093/molbev/msw018] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Bernardi G. Genome Organization and Chromosome Architecture. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2016;80:83-91. [PMID: 26801160 DOI: 10.1101/sqb.2015.80.027318] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Jabbari K, Nürnberg P. A genomic view on epilepsy and autism candidate genes. Genomics 2016;108:31-6. [PMID: 26772991 DOI: 10.1016/j.ygeno.2016.01.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 12/15/2015] [Accepted: 01/01/2016] [Indexed: 01/25/2023]

Sundararajan A, Dukowic-Schulze S, Kwicklis M, Engstrom K, Garcia N, Oviedo OJ, Ramaraj T, Gonzales MD, He Y, Wang M, Sun Q, Pillardy J, Kianian SF, Pawlowski WP, Chen C, Mudge J. Gene Evolutionary Trajectories and GC Patterns Driven by Recombination in Zea mays. FRONTIERS IN PLANT SCIENCE 2016;7:1433. [PMID: 27713757 PMCID: PMC5031598 DOI: 10.3389/fpls.2016.01433] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Accepted: 09/08/2016] [Indexed: 05/20/2023]

Chen F, Zhu Z, Zhou X, Yan Y, Dong Z, Cui D. High-Throughput Sequencing Reveals Single Nucleotide Variants in Longer-Kernel Bread Wheat. FRONTIERS IN PLANT SCIENCE 2016;7:1193. [PMID: 27551288 PMCID: PMC4976665 DOI: 10.3389/fpls.2016.01193] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 07/25/2016] [Indexed: 05/09/2023]

Bernardi G. Chromosome Architecture and Genome Organization. PLoS One 2015;10:e0143739. [PMID: 26619076 PMCID: PMC4664426 DOI: 10.1371/journal.pone.0143739] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 11/09/2015] [Indexed: 02/08/2023] Open

Mugal CF, Weber CC, Ellegren H. GC-biased gene conversion links the recombination landscape and demography to genomic base composition. Bioessays 2015;37:1317-26. [DOI: 10.1002/bies.201500058] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Kryuchkova-Mostacci N, Robinson-Rechavi M. Tissue-Specific Evolution of Protein Coding Genes in Human and Mouse. PLoS One 2015;10:e0131673. [PMID: 26121354 PMCID: PMC4488272 DOI: 10.1371/journal.pone.0131673] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 06/04/2015] [Indexed: 12/23/2022] Open

Wu X, Hurst LD. Why Selection Might Be Stronger When Populations Are Small: Intron Size and Density Predict within and between-Species Usage of Exonic Splice Associated cis-Motifs. Mol Biol Evol 2015;32:1847-61. [PMID: 25771198 PMCID: PMC4476162 DOI: 10.1093/molbev/msv069] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open

Abstract

The nearly neutral theory predicts that small effective population size provides the conditions for weakened selection. This is postulated to explain why our genome is more “bloated” than that of, for example, yeast, ours having large introns and large intergene spacer. If a bloated genome is also an error prone genome might it, however, be the case that selection for error-mitigating properties is stronger in our genome? We examine this notion using splicing as an exemplar, not least because large introns can predispose to noisy splicing. We thus ask whether, owing to genomic decay, selection for splice error-control mechanisms is stronger, not weaker, in species with large introns and small populations. In humans much information defining splice sites is in cis-exonic motifs, most notably exonic splice enhancers (ESEs). These act as splice-error control elements. Here then we ask whether within and between-species intron size is a predictor of the commonality of exonic cis-splicing motifs. We show that, as predicted, the proportion of synonymous sites that are ESE-associated and under selection in humans is weakly positively correlated with the size of the flanking intron. In a phylogenetically controlled framework, we observe, also as expected, that mean intron size is both predicted by N_e.μ and is a good predictor of cis-motif usage across species, this usage coevolving with splice site definition. Unexpectedly, however, across taxa intron density is a better predictor of cis-motif usage than intron size. We propose that selection for splice-related motifs is driven by a need to avoid decoy splice sites that will be more common in genes with many and large introns. That intron number and density predict ESE usage within human genes is consistent with this, as is the finding of intragenic heterogeneity in ESE density. As intronic content and splice site usage across species is also well predicted by N_e.μ, the result also suggests an unusual circumstance in which selection (for cis-modifiers of splicing) might be stronger when population sizes are smaller, as here splicing is noisier, resulting in a greater need to control error-prone splicing.

Collapse

Evolutionary consequences of DNA methylation on the GC content in vertebrate genomes. G3-GENES GENOMES GENETICS 2015;5:441-7. [PMID: 25591920 PMCID: PMC4349097 DOI: 10.1534/g3.114.015545] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Clément Y, Fustier MA, Nabholz B, Glémin S. The bimodal distribution of genic GC content is ancestral to monocot species. Genome Biol Evol 2014;7:336-48. [PMID: 25527839 PMCID: PMC4316631 DOI: 10.1093/gbe/evu278] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Chaurasia A, Tarallo A, Bernà L, Yagi M, Agnisola C, D’Onofrio G. Length and GC content variability of introns among teleostean genomes in the light of the metabolic rate hypothesis. PLoS One 2014;9:e103889. [PMID: 25093416 PMCID: PMC4122358 DOI: 10.1371/journal.pone.0103889] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 07/07/2014] [Indexed: 01/30/2023] Open

Nabeel-Shah S, Ashraf K, Pearlman RE, Fillingham J. Molecular evolution of NASP and conserved histone H3/H4 transport pathway. BMC Evol Biol 2014;14:139. [PMID: 24951090 PMCID: PMC4082323 DOI: 10.1186/1471-2148-14-139] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Accepted: 06/12/2014] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

NASP is an essential protein in mammals that functions in histone transport pathways and maintenance of a soluble reservoir of histones H3/H4. NASP has been studied exclusively in Opisthokonta lineages where some functional diversity has been reported. In humans, growing evidence implicates NASP miss-regulation in the development of a variety of cancers. Although a comprehensive phylogenetic analysis is lacking, NASP-family proteins that possess four TPR motifs are thought to be widely distributed across eukaryotes.

RESULTS

We characterize the molecular evolution of NASP by systematically identifying putative NASP orthologs across diverse eukaryotic lineages ranging from excavata to those of the crown group. We detect extensive silent divergence at the nucleotide level suggesting the presence of strong purifying selection acting at the protein level. We also observe a selection bias for high frequencies of acidic residues which we hypothesize is a consequence of their critical function(s), further indicating the role of functional constraints operating on NASP evolution. Our data indicate that TPR1 and TPR4 constitute the most rapidly evolving functional units of NASP and may account for the functional diversity observed among well characterized family members. We also show that NASP paralogs in ray-finned fish have different genomic environments with clear differences in their GC content and have undergone significant changes at the protein level suggesting functional diversification.

CONCLUSION

We draw four main conclusions from this study. First, wide distribution of NASP throughout eukaryotes suggests that it was likely present in the last eukaryotic common ancestor (LECA) possibly as an important innovation in the transport of H3/H4. Second, strong purifying selection operating at the protein level has influenced the nucleotide composition of NASP genes. Further, we show that selection has acted to maintain a high frequency of functionally relevant acidic amino acids in the region that interrupts TPR2. Third, functional diversity reported among several well characterized NASP family members can be explained in terms of quickly evolving TPR1 and TPR4 motifs. Fourth, NASP fish specific paralogs have significantly diverged at the protein level with NASP2 acquiring a NNR domain.

Collapse

Li XQ. Comparative analysis of the base compositions of the pre-mRNA 3' cleaved-off region and the mRNA 3' untranslated region relative to the genomic base composition in animals and plants. PLoS One 2014;9:e99928. [PMID: 24941005 PMCID: PMC4062462 DOI: 10.1371/journal.pone.0099928] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2013] [Accepted: 05/20/2014] [Indexed: 12/26/2022] Open

Abstract

The precursor messenger RNA (pre-mRNA) three-prime cleaved-off region (3′COR) and the mRNA three-prime untranslated region (3′UTR) play critical roles in regulating gene expression. The differences in base composition between these regions and the corresponding genomes are still largely uncharacterized in animals and plants. In this study, the base compositions of non-redundant 3′CORs and 3′UTRs were compared with the corresponding whole genomes of eleven animals, four dicotyledonous plants, and three monocotyledonous (cereal) plants. Among the four bases (A, C, G, and U for adenine, cytosine, guanine, and uracil, respectively), U (which corresponds to T, for thymine, in DNA) was the most frequent, A the second most frequent, G the third most frequent, and C the least frequent in most of the species in both the 3′COR and 3′UTR regions. In comparison with the whole genomes, in both regions the U content was usually the most overrepresented (particularly in the monocotyledonous plants), and the C content was the most underrepresented. The order obtained for the species groups, when ranked from high to low according to the U contents in the 3′COR and 3′UTR was as follows: dicotyledonous plants, monocotyledonous plants, non-mammal animals, and mammals. In contrast, the genomic T content was highest in dicotyledonous plants, lowest in monocotyledonous plants, and intermediate in animals. These results suggest the following: 1) there is a mechanism operating in both animals and plants which is biased toward U and against C in the 3′COR and 3′UTR; 2) the 3′UTR and 3′COR, as functional units, minimized the difference between dicotyledonous and monocotyledonous plants, while the dicotyledonous and monocotyledonous genomes evolved into two extreme groups in terms of base composition.

Collapse

Zhang R, Ou HY, Gao F, Luo H. Identification of Horizontally-transferred Genomic Islands and Genome Segmentation Points by Using the GC Profile Method. Curr Genomics 2014;15:113-21. [PMID: 24822029 PMCID: PMC4009839 DOI: 10.2174/1389202915999140328163125] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2013] [Revised: 11/28/2013] [Accepted: 11/29/2013] [Indexed: 11/29/2022] Open

Identifying regulatory mechanisms underlying tumorigenesis using locus expression signature analysis. Proc Natl Acad Sci U S A 2014;111:5747-52. [PMID: 24706889 DOI: 10.1073/pnas.1309293111] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Kvikstad EM, Duret L. Strong heterogeneity in mutation rate causes misleading hallmarks of natural selection on indel mutations in the human genome. Mol Biol Evol 2013;31:23-36. [PMID: 24113537 PMCID: PMC3879449 DOI: 10.1093/molbev/mst185] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Romiguier J, Ranwez V, Delsuc F, Galtier N, Douzery EJP. Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. Mol Biol Evol 2013;30:2134-44. [PMID: 23813978 DOI: 10.1093/molbev/mst116] [Citation(s) in RCA: 117] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Carels N, Frías D. A Statistical Method without Training Step for the Classification of Coding Frame in Transcriptome Sequences. Bioinform Biol Insights 2013;7:35-54. [PMID: 23400232 PMCID: PMC3561939 DOI: 10.4137/bbi.s10053] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Abstract

In this study, we investigated the modalities of coding open reading frame (cORF) classification of expressed sequence tags (EST) by using the universal feature method (UFM). The UFM algorithm is based on the scoring of purine bias (Rrr) and stop codon frequencies. UFM classifies ORFs as coding or non-coding through a score based on 5 factors: (i) stop codon frequency; (ii) the product of the probabilities of purines occurring in the three positions of nucleotide triplets; (iii) the product of the probabilities of Cytosine (C), Guanine (G), and Adenine (A) occurring in the 1st, 2nd, and 3rd positions of triplets, respectively; (iv) the probabilities of a G occurring in the 1st and 2nd positions of triplets; and (v) the probabilities of a T occurring in the 1st and an A in the 2nd position of triplets. Because UFM is based on primary determinants of coding sequences that are conserved throughout the biosphere, it is suitable for cORF classification of any sequence in eukaryote transcriptomes without prior knowledge. Considering the protein sequences of the Protein Data Bank (RCSB PDB or more simply PDB) as a reference, we found that UFM classifies cORFs of ≥200 bp (if the coding strand is known) and cORFs of ≥300 bp (if the coding strand is unknown), and releases them in their coding strand and coding frame, which allows their automatic translation into protein sequences with a success rate equal to or higher than 95%. We first established the statistical parameters of UFM using ESTs from Plasmodium falciparum, Arabidopsis thaliana, Oryza sativa, Zea mays, Drosophila melanogaster, Homo sapiens and Chlamydomonas reinhardtii in reference to the protein sequences of PDB. Second, we showed that the success rate of cORF classification using UFM is expected to apply to approximately 95% of higher eukaryote genes that encode for proteins. Third, we used UFM in combination with CAP3 to assemble large EST samples into cORFs that we used to analyze transcriptome phenotypes in rice, maize, and humans. We discuss the error rate and the interference of noisy sequences such as pseudogenes, transposons, and retrotransposons. This method is suitable for rapid cORF extraction from transcriptome data and allows correct description of the genome phenotypes of plant genomes without prior knowledge. Additional care is necessary when addressing the human transcriptome due to the interference caused by large amounts of noisy sequences. UFM can be regarded as a low complexity tool for prior knowledge extraction concerning the coding fraction of the transcriptome of any eukaryote. Due to its low level of complexity, UFM is also very robust to variations of codon usage.

Collapse

A pronounced evolutionary shift of the pseudoautosomal region boundary in house mice. Mamm Genome 2012;23:454-66. [PMID: 22763584 DOI: 10.1007/s00335-012-9403-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2012] [Accepted: 06/07/2012] [Indexed: 10/28/2022]

Nam K, Ellegren H. Recombination drives vertebrate genome contraction. PLoS Genet 2012;8:e1002680. [PMID: 22570634 PMCID: PMC3342960 DOI: 10.1371/journal.pgen.1002680] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 03/15/2012] [Indexed: 11/19/2022] Open

Abstract

Selective and/or neutral processes may govern variation in DNA content and, ultimately, genome size. The observation in several organisms of a negative correlation between recombination rate and intron size could be compatible with a neutral model in which recombination is mutagenic for length changes. We used whole-genome data on small insertions and deletions within transposable elements from chicken and zebra finch to demonstrate clear links between recombination rate and a number of attributes of reduced DNA content. Recombination rate was negatively correlated with the length of introns, transposable elements, and intergenic spacer and with the rate of short insertions. Importantly, it was positively correlated with gene density, the rate of short deletions, the deletion bias, and the net change in sequence length. All these observations point at a pattern of more condensed genome structure in regions of high recombination. Based on the observed rates of small insertions and deletions and assuming that these rates are representative for the whole genome, we estimate that the genome of the most recent common ancestor of birds and lizards has lost nearly 20% of its DNA content up until the present. Expansion of transposable elements can counteract the effect of deletions in an equilibrium mutation model; however, since the activity of transposable elements has been low in the avian lineage, the deletion bias is likely to have had a significant effect on genome size evolution in dinosaurs and birds, contributing to the maintenance of a small genome. We also demonstrate that most of the observed correlations between recombination rate and genome contraction parameters are seen in the human genome, including for segregating indel polymorphisms. Our data are compatible with a neutral model in which recombination drives vertebrate genome size evolution and gives no direct support for a role of natural selection in this process.

One major implication from genetic work done several decades ago is that the genome contains a lot of sequences that do not constitute genes or other functional elements. The total amount of DNA—the genome size—is thus not necessarily an indicator of DNA complexity or organismal complexity, an observation often referred to as the C-value paradox (C-value being a measure of DNA content). What then is it that determines genome size? One model posits that the evolution of genome size is not a consequence of natural selection but is instead governed by the incidence and character of naturally occurring mutations that affect the length of DNA, a process that is not affected by selection. Here we present the results of an analysis of how recombination affects the size of avian and human genomes. We find strong evidence that the rate of recombination is a driving force of genome size evolution. In regions of the genome where recombination occurs frequently, the loss of DNA caused by small deletions is particularly pronounced. Our simulations show that the effect of such recombination-driven genome contraction can be profound over evolutionary time scales. These observations lead to a model in which recombination is mutagenic for length changes and that the incidence of deletions increases with increasing recombination rate. Although we cannot formally exclude that natural selection contributes to the observed relationship between recombination and genome contraction, we find no evidence to support such a scenario.

Collapse

Matoulkova E, Michalova E, Vojtesek B, Hrstka R. The role of the 3' untranslated region in post-transcriptional regulation of protein expression in mammalian cells. RNA Biol 2012;9:563-76. [PMID: 22614827 DOI: 10.4161/rna.20231] [Citation(s) in RCA: 253] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Fujita MK, Edwards SV, Ponting CP. The Anolis lizard genome: an amniote genome without isochores. Genome Biol Evol 2011;3:974-84. [PMID: 21795750 PMCID: PMC3184785 DOI: 10.1093/gbe/evr072] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Clément Y, Arndt PF. Substitution patterns are under different influences in primates and rodents. Genome Biol Evol 2011;3:236-45. [PMID: 21339508 PMCID: PMC3068003 DOI: 10.1093/gbe/evr011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Janes DE, Organ CL, Fujita MK, Shedlock AM, Edwards SV. Genome evolution in Reptilia, the sister group of mammals. Annu Rev Genomics Hum Genet 2010;11:239-64. [PMID: 20590429 DOI: 10.1146/annurev-genom-082509-141646] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Oota S, Kawamura K, Kawai Y, Saitou N. A new framework for studying the isochore evolution: estimation of the equilibrium GC content based on the temporal mutation rate model. Genome Biol Evol 2010;2:558-71. [PMID: 20675617 PMCID: PMC2997559 DOI: 10.1093/gbe/evq041] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Fahey ME, Mills W, Higgins DG, Moore T. Maternally and paternally silenced imprinted genes differ in their intron content. Comp Funct Genomics 2010;5:572-83. [PMID: 18629181 PMCID: PMC2447473 DOI: 10.1002/cfg.437] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2004] [Revised: 11/01/2004] [Accepted: 11/12/2004] [Indexed: 12/31/2022] Open

Dunham I, Beare DM, Collins JE. The characteristics of human genes: analysis of human chromosome 22. Comp Funct Genomics 2010;4:635-46. [PMID: 18629020 PMCID: PMC2447302 DOI: 10.1002/cfg.335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2003] [Revised: 09/04/2003] [Accepted: 09/08/2003] [Indexed: 11/11/2022] Open

Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA. GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 2010;11:308. [PMID: 20470436 PMCID: PMC2895627 DOI: 10.1186/1471-2164-11-308] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Accepted: 05/16/2010] [Indexed: 11/10/2022] Open