1
|
Baeza M, Sepulveda D, Cifuentes V, Alcaíno J. Codon usage bias in yeasts and its correlation with gene expression, growth temperature, and protein structure. Front Microbiol 2024; 15:1414422. [PMID: 39040903 PMCID: PMC11260810 DOI: 10.3389/fmicb.2024.1414422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 06/25/2024] [Indexed: 07/24/2024] Open
Abstract
Codon usage bias (CUB) has been described in viruses, prokaryotes, and eukaryotes and has been linked to several cellular and environmental factors, such as the organism's growth temperature, gene expression levels, and regulation of protein synthesis and folding. Most of the studies in this area have been conducted in bacteria and higher eukaryotes, in some cases with different results. In this study, a comparative analysis of CUB in yeasts isolated from cold and template environments was performed in order to evaluate the correlation of CUB with yeast optimal temperature of growth (OTG), gene expression levels, cellular function, and structure of encoded proteins. Among the main findings, highly expressed ORFs tend to have a more similar CUB within and between yeasts, and a direct correlation between codons ending in C and expression level was generally found. A low correspondence between CUB and OTG was observed, with an inverse correlation for some codons ending in C. The clustering of yeasts based on their CUB partially aligns with their OTG, being more consistent for yeasts with lower OTG. In most yeasts, the abundance of preferred codons was generally lower at the 5' end of ORFs, higher in segments encoding beta strand, lower in segments encoding extracellular and transmembrane regions, and higher in "translation" and "energy metabolism" pathways, especially in highly expressed ORFs. Based on our findings, it is suggested that the abundance and distribution of preferred and non-preferred codons along mRNAs contribute to proper protein folding and functionality by regulating protein synthesis rates, becoming a more important factor under conditions that require faster protein synthesis in yeasts.
Collapse
Affiliation(s)
- Marcelo Baeza
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | | | - Víctor Cifuentes
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Jennifer Alcaíno
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| |
Collapse
|
2
|
Felipe Benites L, Stephens TG, Van Etten J, James T, Christian WC, Barry K, Grigoriev IV, McDermott TR, Bhattacharya D. Hot springs viruses at Yellowstone National Park have ancient origins and are adapted to thermophilic hosts. Commun Biol 2024; 7:312. [PMID: 38594478 PMCID: PMC11003980 DOI: 10.1038/s42003-024-05931-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 02/16/2024] [Indexed: 04/11/2024] Open
Abstract
Geothermal springs house unicellular red algae in the class Cyanidiophyceae that dominate the microbial biomass at these sites. Little is known about host-virus interactions in these environments. We analyzed the virus community associated with red algal mats in three neighboring habitats (creek, endolithic, soil) at Lemonade Creek, Yellowstone National Park (YNP), USA. We find that despite proximity, each habitat houses a unique collection of viruses, with the giant viruses, Megaviricetes, dominant in all three. The early branching phylogenetic position of genes encoded on metagenome assembled virus genomes (vMAGs) suggests that the YNP lineages are of ancient origin and not due to multiple invasions from mesophilic habitats. The existence of genomic footprints of adaptation to thermophily in the vMAGs is consistent with this idea. The Cyanidiophyceae at geothermal sites originated ca. 1.5 Bya and are therefore relevant to understanding biotic interactions on the early Earth.
Collapse
Affiliation(s)
- L Felipe Benites
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - Timothy G Stephens
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - Julia Van Etten
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
- Graduate Program in Ecology and Evolution, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - Timeeka James
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - William C Christian
- Department of Land Resources and Environmental Sciences, Montana State University, Bozeman, Montana, USA
| | - Kerrie Barry
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Igor V Grigoriev
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Timothy R McDermott
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, Montana, USA
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA.
| |
Collapse
|
3
|
Arias PM, Butler J, Randhawa GS, Soltysiak MPM, Hill KA, Kari L. Environment and taxonomy shape the genomic signature of prokaryotic extremophiles. Sci Rep 2023; 13:16105. [PMID: 37752120 PMCID: PMC10522608 DOI: 10.1038/s41598-023-42518-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/11/2023] [Indexed: 09/28/2023] Open
Abstract
This study provides comprehensive quantitative evidence suggesting that adaptations to extreme temperatures and pH imprint a discernible environmental component in the genomic signature of microbial extremophiles. Both supervised and unsupervised machine learning algorithms were used to analyze genomic signatures, each computed as the k-mer frequency vector of a 500 kbp DNA fragment arbitrarily selected to represent a genome. Computational experiments classified/clustered genomic signatures extracted from a curated dataset of [Formula: see text] extremophile (temperature, pH) bacteria and archaea genomes, at multiple scales of analysis, [Formula: see text]. The supervised learning resulted in high accuracies for taxonomic classifications at [Formula: see text], and medium to medium-high accuracies for environment category classifications of the same datasets at [Formula: see text]. For [Formula: see text], our findings were largely consistent with amino acid compositional biases and codon usage patterns in coding regions, previously attributed to extreme environment adaptations. The unsupervised learning of unlabelled sequences identified several exemplars of hyperthermophilic organisms with large similarities in their genomic signatures, in spite of belonging to different domains in the Tree of Life.
Collapse
Affiliation(s)
- Pablo Millán Arias
- School of Computer Science, University of Waterloo, Waterloo, ON, Canada.
| | - Joseph Butler
- Department of Biology, University of Western Ontario, London, ON, Canada
| | - Gurjit S Randhawa
- School of Mathematical and Computational Sciences, University of Prince Edward Island, Charlottetown, PE, Canada
| | | | - Kathleen A Hill
- Department of Biology, University of Western Ontario, London, ON, Canada
| | - Lila Kari
- School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
4
|
Masłowska-Górnicz A, van den Bosch MRM, Saccenti E, Suarez-Diez M. A large-scale analysis of codon usage bias in 4868 bacterial genomes shows association of codon adaptation index with GC content, protein functional domains and bacterial phenotypes. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2022; 1865:194826. [PMID: 35605953 DOI: 10.1016/j.bbagrm.2022.194826] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 05/05/2022] [Accepted: 05/12/2022] [Indexed: 06/15/2023]
Abstract
Multiple synonymous codons code for the same amino acid, resulting in the degeneracy of the genetic code and in the preferred used of some codons called codon bias usage (CBU). We performed a large-scale analysis of codon usage bias analysing the distribution of the codon adaptation index (CAI) and the codon relative adaptiveness index (RA) in 4868 bacterial genomes. We found that CAI values differ significantly between protein functional domains and part of the protein outside domains and show how CAI, GC content and preferred usage of polymerase III alpha subunits are related. Additionally, we give evidence of the association between CAI and bacterial phenotypes.
Collapse
Affiliation(s)
- Anna Masłowska-Górnicz
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, the Netherlands
| | - Melanie R M van den Bosch
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, the Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, the Netherlands.
| | - Maria Suarez-Diez
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, the Netherlands.
| |
Collapse
|
5
|
Salwan R, Sharma V. Genomics of Prokaryotic Extremophiles to Unfold the Mystery of Survival in Extreme Environments. Microbiol Res 2022; 264:127156. [DOI: 10.1016/j.micres.2022.127156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 07/30/2022] [Accepted: 07/31/2022] [Indexed: 11/26/2022]
|
6
|
Neutralism versus selectionism: Chargaff's second parity rule, revisited. Genetica 2021; 149:81-88. [PMID: 33880685 PMCID: PMC8057000 DOI: 10.1007/s10709-021-00119-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 04/09/2021] [Indexed: 11/03/2022]
Abstract
Of Chargaff's four "rules" on DNA base frequencies, the functional interpretation of his second parity rule (PR2) is the most contentious. Thermophile base compositions (GC%) were taken by Galtier and Lobry (1997) as favoring Sueoka's neutral PR2 hypothesis over Forsdyke's selective PR2 hypothesis, namely that mutations improving local within-species recombination efficiency had generated a genome-wide potential for the strands of duplex DNA to separate and initiate recombination through the "kissing" of the tips of stem-loops. However, following Chargaff's GC rule, base composition mainly reflects a species-specific, genome-wide, evolutionary pressure. GC% could not have consistently followed the dictates of temperature, since it plays fundamental roles in both sustaining species integrity and, through primarily neutral genome-wide mutation, fostering speciation. Evidence for a local within-species recombination-initiating role of base order was obtained with a novel technology that masked the contribution of base composition to nucleic acid folding energy. Forsdyke's results were consistent with his PR2 hypothesis, appeared to resolve some root problems in biology and provided a theoretical underpinning for alignment-free taxonomic analyses using relative oligonucleotide frequencies (k-mer analysis). Moreover, consistent with Chargaff's cluster rule, discovery of the thermoadaptive role of the "purine-loading" of open reading frames made less tenable the Galtier-Lobry anti-selectionist arguments.
Collapse
|
7
|
Khan MF, Patra S. Deciphering the rationale behind specific codon usage pattern in extremophiles. Sci Rep 2018; 8:15548. [PMID: 30341344 PMCID: PMC6195531 DOI: 10.1038/s41598-018-33476-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 09/21/2018] [Indexed: 12/03/2022] Open
Abstract
Protein stability is affected at different hierarchies – gene, RNA, amino acid sequence and structure. Gene is the first level which contributes via varying codon compositions. Codon selectivity of an organism differs with normal and extremophilic milieu. The present work attempts at detailing the codon usage pattern of six extremophilic classes and their harmony. Homologous gene datasets of thermophile-mesophile, psychrophile-mesophile, thermophile-psychrophile, acidophile-alkaliphile, halophile-nonhalophile and barophile-nonbarophile were analysed for filtering statistically significant attributes. Relative abundance analysis, 1–9 scale ranking, nucleotide compositions, attribute weighting and machine learning algorithms were employed to arrive at findings. AGG in thermophiles and barophiles, CAA in mesophiles and psychrophiles, TGG in acidophiles, GAG in alkaliphiles and GAC in halophiles had highest preference. Preference of GC-rich and G/C-ending codons were observed in halophiles and barophiles whereas, a decreasing trend was reflected in psychrophiles and alkaliphiles. GC-rich codons were found to decrease and G/C-ending codons increased in thermophiles whereas, acidophiles showed equal contents of GC-rich and G/C-ending codons. Codon usage patterns exhibited harmony among different extremophiles and has been detailed. However, the codon attribute preferences and their selectivity of extremophiles varied in comparison to non-extremophiles. The finding can be instrumental in codon optimization application for heterologous expression of extremophilic proteins.
Collapse
Affiliation(s)
- Mohd Faheem Khan
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, 781039, Assam, India
| | - Sanjukta Patra
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, 781039, Assam, India.
| |
Collapse
|
8
|
Arribas M, Aguirre J, Manrubia S, Lázaro E. Differences in adaptive dynamics determine the success of virus variants that propagate together. Virus Evol 2018; 4:vex043. [PMID: 29340211 PMCID: PMC5761584 DOI: 10.1093/ve/vex043] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Virus fitness is a complex parameter that results from the interaction of virus-specific characters (e.g. intracellular growth rate, adsorption rate, virion extracellular stability, and tolerance to mutations) with others that depend on the underlying fitness landscape and the internal structure of the whole population. Individual mutants usually have lower fitness values than the complex population from which they come from. When they are propagated and allowed to attain large population sizes for a sufficiently long time, they approach mutation-selection equilibrium with the concomitant fitness gains. The optimization process follows dynamics that vary among viruses, likely due to differences in any of the parameters that determine fitness values. As a consequence, when different mutants spread together, the number of generations experienced by each of them prior to co-propagation may determine its particular fate. In this work we attempt a clarification of the effect of different levels of population diversity in the outcome of competition dynamics. To this end, we analyze the behavior of two mutants of the RNA bacteriophage Qβ that co-propagate with the wild-type virus. When both competitor viruses are clonal, the mutants rapidly outcompete the wild type. However, the outcome in competitions performed with partially optimized virus populations depends on the distance of the competitors to their clonal origin. We also implement a theoretical population dynamics model that describes the evolution of a heterogeneous population of individuals, each characterized by a fitness value, subjected to subsequent cycles of replication and mutation. The experimental results are explained in the framework of our theoretical model under two non-excluding, likely complementary assumptions: (1) The relative advantage of both competitors changes as populations approach mutation-selection equilibrium, as a consequence of differences in their growth rates and (2) one of the competitors is more robust to mutations than the other. The main conclusion is that the nearness of an RNA virus population to mutation-selection equilibrium is a key factor determining the fate of particular mutants arising during replication.
Collapse
Affiliation(s)
- María Arribas
- Centro de Astrobiología (CSIC-INTA), Ctra. de Ajalvir km. 4, Torrejón de Ardoz, Madrid 28850, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.,Centro Nacional de Biotecnología (CSIC), c/Darwin 3, Madrid 28049, Spain
| | - Susanna Manrubia
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.,Centro Nacional de Biotecnología (CSIC), c/Darwin 3, Madrid 28049, Spain
| | - Ester Lázaro
- Centro de Astrobiología (CSIC-INTA), Ctra. de Ajalvir km. 4, Torrejón de Ardoz, Madrid 28850, Spain.,Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
| |
Collapse
|
9
|
Thermostable proteins bioprocesses: The activity of restriction endonuclease-methyltransferase from Thermus thermophilus (RM.TthHB27I) cloned in Escherichia coli is critically affected by the codon composition of the synthetic gene. PLoS One 2017; 12:e0186633. [PMID: 29040308 PMCID: PMC5645126 DOI: 10.1371/journal.pone.0186633] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 10/04/2017] [Indexed: 11/25/2022] Open
Abstract
Obtaining thermostable enzymes (thermozymes) is an important aspect of biotechnology. As thermophiles have adapted their genomes to high temperatures, their cloned genes’ expression in mesophiles is problematic. This is mainly due to their high GC content, which leads to the formation of unfavorable secondary mRNA structures and codon usage in Escherichia coli (E. coli). RM.TthHB27I is a member of a family of bifunctional thermozymes, containing a restriction endonuclease (REase) and a methyltransferase (MTase) in a single polypeptide. Thermus thermophilus HB27 (T. thermophilus) produces low amounts of RM.TthHB27I with a unique DNA cleavage specificity. We have previously cloned the wild type (wt) gene into E. coli, which increased the production of RM.TthHB27I over 100-fold. However, its enzymatic activities were extremely low for an ORF expressed under a T7 promoter. We have designed and cloned a fully synthetic tthHB27IRM gene, using a modified ‘codon randomization’ strategy. Codons with a high GC content and of low occurrence in E. coli were eliminated. We incorporated a stem-loop circuit, devised to negatively control the expression of this highly toxic gene by partially hiding the ribosome-binding site (RBS) and START codon in mRNA secondary structures. Despite having optimized 59% of codons, the amount of produced RM.TthHB27I protein was similar for both recombinant tthHB27IRM gene variants. Moreover, the recombinant wt RM.TthHB27I is very unstable, while the RM.TthHB27I resulting from the expression of the synthetic gene exhibited enzymatic activities and stability equal to the native thermozyme isolated from T. thermophilus. Thus, we have developed an efficient purification protocol using the synthetic tthHB27IRM gene variant only. This suggests the effect of co-translational folding kinetics, possibly affected by the frequency of translational errors. The availability of active RM.TthHB27I is of practical importance in molecular biotechnology, extending the palette of available REase specificities.
Collapse
|
10
|
Jegousse C, Yang Y, Zhan J, Wang J, Zhou Y. Structural signatures of thermal adaptation of bacterial ribosomal RNA, transfer RNA, and messenger RNA. PLoS One 2017; 12:e0184722. [PMID: 28910383 PMCID: PMC5598986 DOI: 10.1371/journal.pone.0184722] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 08/29/2017] [Indexed: 12/02/2022] Open
Abstract
Temperature adaptation of bacterial RNAs is a subject of both fundamental and practical interest because it will allow a better understanding of molecular mechanism of RNA folding with potential industrial application of functional thermophilic or psychrophilic RNAs. Here, we performed a comprehensive study of rRNA, tRNA, and mRNA of more than 200 bacterial species with optimal growth temperatures (OGT) ranging from 4°C to 95°C. We investigated temperature adaptation at primary, secondary and tertiary structure levels. We showed that unlike mRNA, tRNA and rRNA were optimized for their structures at compositional levels with significant tertiary structural features even for their corresponding randomly permutated sequences. tRNA and rRNA are more exposed to solvent but remain structured for hyperthermophiles with nearly OGT-independent fluctuation of solvent accessible surface area within a single RNA chain. mRNA in hyperthermophiles is essentially the same as random sequences without tertiary structures although many mRNA in mesophiles and psychrophiles have well-defined tertiary structures based on their low overall solvent exposure with clear separation of deeply buried from partly exposed bases as in tRNA and rRNA. These results provide new insight into temperature adaptation of different RNAs.
Collapse
MESH Headings
- Bacteria/genetics
- Databases, Genetic
- Models, Molecular
- Nucleic Acid Conformation
- RNA Folding/drug effects
- RNA, Bacterial/chemistry
- RNA, Bacterial/drug effects
- RNA, Messenger/chemistry
- RNA, Messenger/drug effects
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/drug effects
- RNA, Transfer/chemistry
- RNA, Transfer/drug effects
- Solvents/pharmacology
- Temperature
Collapse
Affiliation(s)
- Clara Jegousse
- UFR Sciences et Techniques, Université de Nantes, 2 rue de la Houssinière, Nantes, France
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- * E-mail:
| |
Collapse
|
11
|
Abstract
In prokaryotes, translation initiation typically depends on complementary binding between a G-rich Shine–Dalgarno (SD) motif in the 5′ untranslated region of mRNAs, and the 3′ tail of the 16S ribosomal RNA (the anti-SD sequence). In some cases, internal SD-like motifs in the coding region generate “programmed” ribosomal pauses that are beneficial for protein folding or accurate targeting. On the other hand, such pauses can also reduce protein production, generating purifying selection against internal SD-like motifs. This selection should be stronger in GC-rich genomes that are more likely to harbor the G-rich SD motif. However, the nature and consequences of selection acting on internal SD-like motifs within genomes and across species remains unclear. We analyzed the frequency of SD-like hexamers in the coding regions of 284 prokaryotes (277 with known anti-SD sequences and 7 without a typical SD mechanism). After accounting for GC content, we found that internal SD-like hexamers are avoided in 230 species, including three without a typical SD mechanism. The degree of avoidance was higher in GC-rich genomes, mesophiles, and N-terminal regions of genes. In contrast, 54 species either showed no signature of avoidance or were enriched in internal SD-like motifs. C-terminal gene regions were relatively enriched in SD-like hexamers, particularly for genes in operons or those followed closely by downstream genes. Together, our results suggest that the frequency of internal SD-like hexamers is governed by multiple factors including GC content and genome organization, and further empirical work is necessary to understand the evolution and functional roles of these motifs.
Collapse
Affiliation(s)
- Gaurav D Diwan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India SASTRA University, Thanjavur, India
| | - Deepa Agashe
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India
| |
Collapse
|
12
|
Wang Q, Cen Z, Zhao J. The survival mechanisms of thermophiles at high temperatures: an angle of omics. Physiology (Bethesda) 2015; 30:97-106. [PMID: 25729055 DOI: 10.1152/physiol.00066.2013] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Thermophiles are referred to as microorganisms with optimal growth temperatures of >60 °C. Over the past few years, a number of studies have been conducted regarding thermophiles, especially using the omics strategies. This review provides a systematic view of the survival physiology of thermophiles from an "omics" perspective, which suggests that the adaptive ability of thermophiles is based on a cooperative mode with multi-dimensional regulations integrating genomics, transcriptomics, and proteomics.
Collapse
Affiliation(s)
- Quanhui Wang
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China; and BGI-Shenzhen, Shenzhen, China
| | - Zhen Cen
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China; and
| | - Jingjing Zhao
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China; and
| |
Collapse
|
13
|
Zylicz-Stachula A, Zolnierkiewicz O, Sliwinska K, Jezewska-Frackowiak J, Skowron PM. Modified 'one amino acid-one codon' engineering of high GC content TaqII-coding gene from thermophilic Thermus aquaticus results in radical expression increase. Microb Cell Fact 2014; 13:7. [PMID: 24410856 PMCID: PMC3893498 DOI: 10.1186/1475-2859-13-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Accepted: 01/03/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An industrial approach to protein production demands maximization of cloned gene expression, balanced with the recombinant host's viability. Expression of toxic genes from thermophiles poses particular difficulties due to high GC content, mRNA secondary structures, rare codon usage and impairing the host's coding plasmid replication.TaqII belongs to a family of bifunctional enzymes, which are a fusion of the restriction endonuclease (REase) and methyltransferase (MTase) activities in a single polypeptide. The family contains thermostable REases with distinct specificities: TspGWI, TaqII, Tth111II/TthHB27I, TspDTI and TsoI and a few enzymes found in mesophiles. While not being isoschizomers, the enzymes exhibit amino acid (aa) sequence homologies, having molecular sizes of ~120 kDa share common modular architecture, resemble Type-I enzymes, cleave DNA 11/9 nt from the recognition sites, their activity is affected by S-adenosylmethionine (SAM). RESULTS We describe the taqIIRM gene design, cloning and expression of the prototype TaqII. The enzyme amount in natural hosts is extremely low. To improve expression of the taqIIRM gene in Escherichia coli (E. coli), we designed and cloned a fully synthetic, low GC content, low mRNA secondary structure taqIIRM, codon-optimized gene under a bacteriophage lambda (λ) PR promoter. Codon usage based on a modified 'one amino acid-one codon' strategy, weighted towards low GC content codons, resulted in approximately 10-fold higher expression of the synthetic gene. 718 codons of total 1105 were changed, comprising 65% of the taqIIRM gene. The reason for we choose a less effective strategy rather than a resulting in high expression yields 'codon randomization' strategy, was intentional, sub-optimal TaqII in vivo production, in order to decrease the high 'toxicity' of the REase-MTase protein. CONCLUSIONS Recombinant wt and synthetic taqIIRM gene were cloned and expressed in E. coli. The modified 'one amino acid-one codon' method tuned for thermophile-coded genes was applied to obtain overexpression of the 'toxic' taqIIRM gene. The method appears suited for industrial production of thermostable 'toxic' enzymes in E. coli. This novel variant of the method biased toward increasing a gene's AT content may provide economic benefits for industrial applications.
Collapse
Affiliation(s)
| | | | | | | | - Piotr M Skowron
- Department of Molecular Biotechnology, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland.
| |
Collapse
|
14
|
Mallatt J, Chittenden KD. The GC content of LSU rRNA evolves across topological and functional regions of the ribosome in all three domains of life. Mol Phylogenet Evol 2014; 72:17-30. [PMID: 24394731 DOI: 10.1016/j.ympev.2013.12.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 11/28/2013] [Accepted: 12/24/2013] [Indexed: 12/21/2022]
Abstract
Large-subunit rRNA is the ribozyme that catalyzes protein synthesis by translation, and many of its features vary along a deep-to-superficial gradient. By measuring the G+C proportions in this rRNA in all three domains of life (60 bacteria, 379 eukaryote, and 23 archaean sequences), we tested whether the proportion of GC nucleotides varies along this in-out gradient. The rRNA regions used were several zones identified by Bokov and Steinberg (2009) as being arranged from deep to superficial within the LSU. To the Bokov-Steinberg zones, we added the most superficial zone of all, the divergent domains (expansion segments), which are greatly enlarged in eukaryotes. Regression lines constructed from the hundreds of species of organisms revealed the expected in-out gradient, showing that species with high %GC (or high %AT) in their rRNA distribute more of these abundant nucleotides into the peripheral zones. This could be explained by the evolutionary rates of replacement of all nucleotides (A, C, G, T), because these latter rates are fastest at the periphery and slowest near the conserved core. As an overall explanation, we propose that when extrinsic factors (whole-genome nucleotide composition, or environmental temperature) demand the percentage of GC in the rRNA of a species be high or low, then the deep-lying zones are buffered against GC variation because they are the slowest to evolve. The deep, conserved zones are also the most involved in translation, hinting that stabilizing selection there prevents a high GC variability that would diminish LSU rRNA's core functions. We found only a few domain-specific trends in rRNA-GC distribution, which relate to many Archaea living at high temperatures or to the highly complex genes and adaptations of Eukaryota. Use of rRNA sequences in molecular phylogenetic studies, for reconstructing the relationships of organisms across the tree of life, requires accurate models of how rRNA evolves. The demonstration that GC distributes in regular patterns across rRNA regions can improve these tree-reconstruction models in the future and should yield phylogenies of greater accuracy.
Collapse
Affiliation(s)
- Jon Mallatt
- School of Biological Sciences, Washington State University, Pullman, WA 99164-4236, United States.
| | - Kevin D Chittenden
- School of Biological Sciences, Washington State University, Pullman, WA 99164-4236, United States
| |
Collapse
|
15
|
Reply to "codon usage frequency of RNA virus genomes from high-temperature acidic-environment metagenomes". J Virol 2013; 87:1920-1. [PMID: 23308028 DOI: 10.1128/jvi.02883-12] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
16
|
Windisch HS, Lucassen M, Frickenhaus S. Evolutionary force in confamiliar marine vertebrates of different temperature realms: adaptive trends in zoarcid fish transcriptomes. BMC Genomics 2012; 13:549. [PMID: 23051706 PMCID: PMC3557217 DOI: 10.1186/1471-2164-13-549] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2012] [Accepted: 10/08/2012] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Studies of temperature-induced adaptation on the basis of genomic sequence data were mainly done in extremophiles. Although the general hypothesis of an increased molecular flexibility in the cold is widely accepted, the results of thermal adaptation are still difficult to detect at proteomic down to the genomic sequence level. Approaches towards a more detailed picture emerge with the advent of new sequencing technologies. Only small changes in primary protein structure have been shown to modify kinetic and thermal properties of enzymes, but likewise for interspecies comparisons a high genetic identity is still essential to specify common principles. The present study uses comprehensive transcriptomic sequence information to uncover general patterns of thermal adaptation on the RNA as well as protein primary structure. RESULTS By comparing orthologous sequences of two closely related zoarcid fish inhabiting different latitudinal zones (Antarctica: Pachycara brachycephalum, temperate zone: Zoarces viviparus) we were able to detect significant differences in the codon usage. In the cold-adapted species a lower GC content in the wobble position prevailed for preserved amino acids. We were able to estimate 40-60% coverage of the functions represented within the two compared zoarcid cDNA-libraries on the basis of a reference genome of the phylogenetically closely related fish Gasterosteus aculeatus. A distinct pattern of amino acid substitutions could be identified for the non-synonymous codon exchanges, with a remarkable surplus of serine and reduction of glutamic acid and asparagine for the Antarctic species. CONCLUSION Based on the differences between orthologous sequences from confamiliar species, distinguished mainly by the temperature regimes of their habitats, we hypothesize that temperature leaves a signature on the composition of biological macromolecules (RNA, proteins) with implications for the transcription and translation level. As the observed pattern of amino acid substitutions only partly support the flexibility hypothesis further evolutionary forces may be effective at the global transcriptome level.
Collapse
Affiliation(s)
- Heidrun Sigrid Windisch
- Alfred Wegener Institute for Polar and Marine Research, Am Handelshafen 12, Bremerhaven, Germany
| | - Magnus Lucassen
- Alfred Wegener Institute for Polar and Marine Research, Am Handelshafen 12, Bremerhaven, Germany
| | - Stephan Frickenhaus
- Alfred Wegener Institute for Polar and Marine Research, Am Handelshafen 12, Bremerhaven, Germany
| |
Collapse
|
17
|
Dutta C, Paul S. Microbial lifestyle and genome signatures. Curr Genomics 2012; 13:153-62. [PMID: 23024607 PMCID: PMC3308326 DOI: 10.2174/138920212799860698] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Revised: 09/13/2011] [Accepted: 09/28/2011] [Indexed: 12/29/2022] Open
Abstract
Microbes are known for their unique ability to adapt to varying lifestyle and environment, even to the extreme or adverse ones. The genomic architecture of a microbe may bear the signatures not only of its phylogenetic position, but also of the kind of lifestyle to which it is adapted. The present review aims to provide an account of the specific genome signatures observed in microbes acclimatized to distinct lifestyles or ecological niches. Niche-specific signatures identified at different levels of microbial genome organization like base composition, GC-skew, purine-pyrimidine ratio, dinucleotide abundance, codon bias, oligonucleotide composition etc. have been discussed. Among the specific cases highlighted in the review are the phenomena of genome shrinkage in obligatory host-restricted microbes, genome expansion in strictly intra-amoebal pathogens, strand-specific codon usage in intracellular species, acquisition of genome islands in pathogenic or symbiotic organisms, discriminatory genomic traits of marine microbes with distinct trophic strategies, and conspicuous sequence features of certain extremophiles like those adapted to high temperature or high salinity.
Collapse
Affiliation(s)
- Chitra Dutta
- Structural Biology & Bioinformatics Division, CSIR- Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India
| | | |
Collapse
|
18
|
Variation in the correlation of G + C composition with synonymous codon usage bias among bacteria. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010:61374. [PMID: 18350114 DOI: 10.1155/2007/61374] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2007] [Accepted: 06/04/2007] [Indexed: 11/17/2022]
Abstract
G + C composition at the third codon position (GC3) is widely reported to be correlated with synonymous codon usage bias. However, no quantitative attempt has been made to compare the extent of this correlation among different genomes. Here, we applied Shannon entropy from information theory to measure the degree of GC3 bias and that of synonymous codon usage bias of each gene. The strength of the correlation of GC3 with synonymous codon usage bias, quantified by a correlation coefficient, varied widely among bacterial genomes, ranging from -0.07 to 0.95. Previous analyses suggesting that the relationship between GC3 and synonymous codon usage bias is independent of species are thus inconsistent with the more detailed analyses obtained here for individual species.
Collapse
|
19
|
Zhu W, Lomsadze A, Borodovsky M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 2010; 38:e132. [PMID: 20403810 PMCID: PMC2896542 DOI: 10.1093/nar/gkq275] [Citation(s) in RCA: 1019] [Impact Index Per Article: 72.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.
Collapse
Affiliation(s)
- Wenhan Zhu
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | | | |
Collapse
|
20
|
A unique combination of genetic systems for the synthesis of trehalose in Rubrobacter xylanophilus: properties of a rare actinobacterial TreT. J Bacteriol 2008; 190:7939-46. [PMID: 18835983 DOI: 10.1128/jb.01055-08] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Trehalose is the primary organic solute in Rubrobacter xylanophilus under all conditions tested, including those for optimal growth. We detected genes of four different pathways for trehalose synthesis in the genome of this organism, namely, the trehalose-6-phosphate synthase (Tps)/trehalose-6-phosphate phosphatase (Tpp), TreS, TreY/TreZ, and TreT pathways. Moreover, R. xylanophilus is the only known member of the phylum Actinobacteria to harbor TreT. The Tps sequence is typically bacterial, but the Tpp sequence is closely related to eukaryotic counterparts. Both the Tps/Tpp and the TreT pathways were active in vivo, while the TreS and the TreY/TreZ pathways were not active under the growth conditions tested and appear not to contribute to the levels of trehalose observed. The genes from the active pathways were functionally expressed in Escherichia coli, and Tps was found to be highly specific for GDP-glucose, a rare feature among these enzymes. The trehalose-6-phosphate formed was specifically dephosphorylated to trehalose by Tpp. The recombinant TreT synthesized trehalose from different nucleoside diphosphate-glucose donors and glucose, but the activity in R. xylanophilus cell extracts was specific for ADP-glucose. The TreT could also catalyze trehalose hydrolysis in the presence of ADP, but with a very high K(m). Here, we functionally characterize two systems for the synthesis of trehalose in R. xylanophilus, a representative of an ancient lineage of the actinobacteria, and discuss a possible scenario for the exceptional occurrence of treT in this extremophilic bacterium.
Collapse
|
21
|
Montanucci L, Fariselli P, Martelli PL, Casadio R. Predicting protein thermostability changes from sequence upon multiple mutations. Bioinformatics 2008; 24:i190-5. [PMID: 18586713 PMCID: PMC2718644 DOI: 10.1093/bioinformatics/btn166] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation: A basic question in protein science is to which extent mutations affect protein thermostability. This knowledge would be particularly relevant for engineering thermostable enzymes. In several experimental approaches, this issue has been serendipitously addressed. It would be therefore convenient providing a computational method that predicts when a given protein mutant is more thermostable than its corresponding wild-type. Results: We present a new method based on support vector machines that is able to predict whether a set of mutations (including insertion and deletions) can enhance the thermostability of a given protein sequence. When trained and tested on a redundancy-reduced dataset, our predictor achieves 88% accuracy and a correlation coefficient equal to 0.75. Our predictor also correctly classifies 12 out of 14 experimentally characterized protein mutants with enhanced thermostability. Finally, it correctly detects all the 11 mutated proteins whose increase in stability temperature is >10°C. Availability: The dataset and the list of protein clusters adopted for the SVM cross-validation are available at the web site http://lipid.biocomp.unibo.it/~ludovica/thermo-meso-MUT. Contact:casadio@alma.unibo.it
Collapse
Affiliation(s)
- Ludovica Montanucci
- Department of Biology, University of Bologna, via Irnerio 42, 40126 Bologna, Italy
| | | | | | | |
Collapse
|
22
|
Wang J, Ma BG, Zhang HY, Chen LL, Zhang SC. How does gene expression level contribute to thermophilic adaptation of prokaryotes? An exploration based on predictors. Gene 2008; 421:32-6. [PMID: 18621118 DOI: 10.1016/j.gene.2008.06.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2008] [Revised: 06/16/2008] [Accepted: 06/17/2008] [Indexed: 11/17/2022]
Abstract
By analyzing the predicted gene expression levels of 33 prokaryotes with living temperature span from <10 degrees C to >100 degrees C, a universal positive correlation was found between the percentage of predicted highly expressed genes and the organisms' optimal growth temperature. A physical interpretation of the correlation revealed that highly expressed genes are statistically more thermostable than lowly expressed genes. These findings show the possibility of the significant contribution of gene expression level to the prokaryotic thermal adaptation and provide evidence for the translational selection pressure on the thermostability of natural proteins during evolution.
Collapse
Affiliation(s)
- Ji Wang
- Department of Marine Biology, Ocean University of China, Qingdao 266003, P. R. China
| | | | | | | | | |
Collapse
|
23
|
Riley M, Staley JT, Danchin A, Wang TZ, Brettin TS, Hauser LJ, Land ML, Thompson LS. Genomics of an extreme psychrophile, Psychromonas ingrahamii. BMC Genomics 2008; 9:210. [PMID: 18460197 PMCID: PMC2405808 DOI: 10.1186/1471-2164-9-210] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2007] [Accepted: 05/06/2008] [Indexed: 11/10/2022] Open
Abstract
Background The genome sequence of the sea-ice bacterium Psychromonas ingrahamii 37, which grows exponentially at -12C, may reveal features that help to explain how this extreme psychrophile is able to grow at such low temperatures. Determination of the whole genome sequence allows comparison with genes of other psychrophiles and mesophiles. Results Correspondence analysis of the composition of all P. ingrahamii proteins showed that (1) there are 6 classes of proteins, at least one more than other bacteria, (2) integral inner membrane proteins are not sharply separated from bulk proteins suggesting that, overall, they may have a lower hydrophobic character, and (3) there is strong opposition between asparagine and the oxygen-sensitive amino acids methionine, arginine, cysteine and histidine and (4) one of the previously unseen clusters of proteins has a high proportion of "orphan" hypothetical proteins, raising the possibility these are cold-specific proteins. Based on annotation of proteins by sequence similarity, (1) P. ingrahamii has a large number (61) of regulators of cyclic GDP, suggesting that this bacterium produces an extracellular polysaccharide that may help sequester water or lower the freezing point in the vicinity of the cell. (2) P. ingrahamii has genes for production of the osmolyte, betaine choline, which may balance the osmotic pressure as sea ice freezes. (3) P. ingrahamii has a large number (11) of three-subunit TRAP systems that may play an important role in the transport of nutrients into the cell at low temperatures. (4) Chaperones and stress proteins may play a critical role in transforming nascent polypeptides into 3-dimensional configurations that permit low temperature growth. (5) Metabolic properties of P. ingrahamii were deduced. Finally, a few small sets of proteins of unknown function which may play a role in psychrophily have been singled out as worthy of future study. Conclusion The results of this genomic analysis provide a springboard for further investigations into mechanisms of psychrophily. Focus on the role of asparagine excess in proteins, targeted phenotypic characterizations and gene expression investigations are needed to ascertain if and how the organism regulates various proteins in response to growth at lower temperatures.
Collapse
Affiliation(s)
- Monica Riley
- Bay Paul Center, Marine Biological Laboratory, Woods Hole, MA 02543, USA.
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Bohlin J, Skjerve E, Ussery DW. Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes. BMC Genomics 2008; 9:104. [PMID: 18307761 PMCID: PMC2289816 DOI: 10.1186/1471-2164-9-104] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2007] [Accepted: 02/28/2008] [Indexed: 11/22/2022] Open
Abstract
Background The increasing number of sequenced prokaryotic genomes contains a wealth of genomic data that needs to be effectively analysed. A set of statistical tools exists for such analysis, but their strengths and weaknesses have not been fully explored. The statistical methods we are concerned with here are mainly used to examine similarities between archaeal and bacterial DNA from different genomes. These methods compare observed genomic frequencies of fixed-sized oligonucleotides with expected values, which can be determined by genomic nucleotide content, smaller oligonucleotide frequencies, or be based on specific statistical distributions. Advantages with these statistical methods include measurements of phylogenetic relationship with relatively small pieces of DNA sampled from almost anywhere within genomes, detection of foreign/conserved DNA, and homology searches. Our aim was to explore the reliability and best suited applications for some popular methods, which include relative oligonucleotide frequencies (ROF), di- to hexanucleotide zero'th order Markov methods (ZOM) and 2.order Markov chain Method (MCM). Tests were performed on distant homology searches with large DNA sequences, detection of foreign/conserved DNA, and plasmid-host similarity comparisons. Additionally, the reliability of the methods was tested by comparing both real and random genomic DNA. Results Our findings show that the optimal method is context dependent. ROFs were best suited for distant homology searches, whilst the hexanucleotide ZOM and MCM measures were more reliable measures in terms of phylogeny. The dinucleotide ZOM method produced high correlation values when used to compare real genomes to an artificially constructed random genome with similar %GC, and should therefore be used with care. The tetranucleotide ZOM measure was a good measure to detect horizontally transferred regions, and when used to compare the phylogenetic relationships between plasmids and hosts, significant correlation (R2 = 0.4) was found with genomic GC content and intra-chromosomal homogeneity. Conclusion The statistical methods examined are fast, easy to implement, and powerful for a number of different applications involving genomic sequence comparisons. However, none of the measures examined were superior in all tests, and therefore the choice of the statistical method should depend on the task at hand.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, P.O. Box 8146 Dep., N-0033 Oslo, Norway.
| | | | | |
Collapse
|
25
|
Blanquart S, Lartillot N. A site- and time-heterogeneous model of amino acid replacement. Mol Biol Evol 2008; 25:842-58. [PMID: 18234708 DOI: 10.1093/molbev/msn018] [Citation(s) in RCA: 166] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We combined the category (CAT) mixture model (Lartillot N, Philippe H. 2004) and the nonstationary break point (BP) model (Blanquart S, Lartillot N. 2006) into a new model, CAT-BP, accounting for variations of the evolutionary process both along the sequence and across lineages. As in CAT, the model implements a mixture of distinct Markovian processes of substitution distributed among sites, thus accommodating site-specific selective constraints induced by protein structure and function. Furthermore, as in BP, these processes are nonstationary, and their equilibrium frequencies are allowed to change along lineages in a correlated way, through discrete shifts in global amino acid composition distributed along the phylogenetic tree. We implemented the CAT-BP model in a Bayesian Markov Chain Monte Carlo framework and compared its predictions with those of 3 simpler models, BP, CAT, and the site- and time-homogeneous general time-reversible (GTR) model, on a concatenation of 4 mitochondrial proteins of 20 arthropod species. In contrast to GTR, BP, and CAT, which all display a phylogenetic reconstruction artifact positioning the bees Apis mellifera and Melipona bicolor among chelicerates, the CAT-BP model is able to recover the monophyly of insects. Using posterior predictive tests, we further show that the CAT-BP combination yields better anticipations of site- and taxon-specific amino acid frequencies and that it better accounts for the homoplasies that are responsible for the artifact. Altogether, our results show that the joint modeling of heterogeneities across sites and along time results in a synergistic improvement of the phylogenetic inference, indicating that it is essential to disentangle the combined effects of both sources of heterogeneity, in order to overcome systematic errors in protein phylogenetic analyses.
Collapse
Affiliation(s)
- Samuel Blanquart
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, Montpellier, France.
| | | |
Collapse
|
26
|
Affiliation(s)
- Claire Torchet
- Institut Jacques-Monod, Biochimie de l'Evolution et Adaptabilité Moléculaire, Université Paris VI, Tour 43, 2 place Jussieu, 75251 Paris Cedex 05, France
| | | |
Collapse
|
27
|
Montanucci L, Martelli PL, Fariselli P, Casadio R. Robust determinants of thermostability highlighted by a codon frequency index capable of discriminating thermophilic from mesophilic genomes. J Proteome Res 2007; 6:2502-8. [PMID: 17530792 DOI: 10.1021/pr060670p] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Can genome analysis tell us about the lifestyle of an organism? We ask this question considering a thorough cross comparison of thermophilic and mesophilic genomes, since presently the number of available genomes is enough to ensure statistical significance of the results. We analyze, by means of principal component analysis (PCA), the codon composition of a database comprising 116 genomes, selected so as to include one species for each genus and show that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level. The results of our analysis indicate that all the known features of thermostability can be found in the 64 component loadings of the second principal axis of PCA. By this, we develop an index of thermostability whose discriminative power between mesophiles and thermophiles scores with 98% accuracy at the genome level and with 95% accuracy at the protein sequence level. We also prove that these results are not due to phylogenetic differences between archaea and bacteria.
Collapse
Affiliation(s)
- Ludovica Montanucci
- Biocomputing Group, CIRB/Dept of Biology, University of Bologna, Bologna, Italy
| | | | | | | |
Collapse
|