1
|
Zielezinski A, Barylski J, Karlowski WM. Taxonomy-aware, sequence similarity ranking reliably predicts phage-host relationships. BMC Biol 2021; 19:223. [PMID: 34625070 PMCID: PMC8501573 DOI: 10.1186/s12915-021-01146-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 09/06/2021] [Indexed: 12/02/2022] Open
Abstract
Background Characterizing phage–host interactions is critical to understanding the ecological role of both partners and effective isolation of phage therapeuticals. Unfortunately, experimental methods for studying these interactions are markedly slow, low-throughput, and unsuitable for phages or hosts difficult to maintain in laboratory conditions. Therefore, a number of in silico methods emerged to predict prokaryotic hosts based on viral sequences. One of the leading approaches is the application of the BLAST tool that searches for local similarities between viral and microbial genomes. However, this prediction method has three major limitations: (i) top-scoring sequences do not always point to the actual host; (ii) mosaic virus genomes may match to many, typically related, bacteria; and (iii) viral and host sequences may diverge beyond the point where their relationship can be detected by a BLAST alignment. Results We created an extension to BLAST, named Phirbo, that improves host prediction quality beyond what is obtainable from standard BLAST searches. The tool harnesses information concerning sequence similarity and bacteria relatedness to predict phage–host interactions. Phirbo was evaluated on three benchmark sets of known virus–host pairs, and it improved precision and recall by 11–40 percentage points over currently available, state-of-the-art, alignment-based, alignment-free, and machine-learning host prediction tools. Moreover, the discriminatory power of Phirbo for the recognition of virus–host relationships surpassed the results of other tools by at least 10 percentage points (area under the curve = 0.95), yielding a mean host prediction accuracy of 57% and 68% at the genus and family levels, respectively, and drops by 12 percentage points when using only a fraction of viral genome sequences (3 kb). Finally, we provide insights into a repertoire of protein and ncRNA genes that are shared between phages and hosts and may be prone to horizontal transfer during infection. Conclusions Our results suggest that Phirbo is a simple and effective tool for predicting phage–host relationships. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-01146-6.
Collapse
Affiliation(s)
- Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland.
| | - Jakub Barylski
- Molecular Virology Research Unit, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland
| | - Wojciech M Karlowski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland.
| |
Collapse
|
2
|
Hoang HD, Neault S, Pelin A, Alain T. Emerging translation strategies during virus-host interaction. WILEY INTERDISCIPLINARY REVIEWS-RNA 2020; 12:e1619. [PMID: 32757266 PMCID: PMC7435527 DOI: 10.1002/wrna.1619] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 06/18/2020] [Accepted: 06/19/2020] [Indexed: 01/02/2023]
Abstract
Translation control is crucial during virus-host interaction. On one hand, viruses completely rely on the protein synthesis machinery of host cells to propagate and have evolved various mechanisms to redirect the host's ribosomes toward their viral mRNAs. On the other hand, the host rewires its translation program in an attempt to contain and suppress the virus early on during infection; the antiviral program includes specific control on protein synthesis to translate several antiviral mRNAs involved in quenching the infection. As the infection progresses, host translation is in turn inhibited in order to limit viral propagation. We have learnt of very diverse strategies that both parties utilize to gain or retain control over the protein synthesis machinery. Yet novel strategies continue to be discovered, attesting for the importance of mRNA translation in virus-host interaction. This review focuses on recently described translation strategies employed by both hosts and viruses. These discoveries provide additional pieces in the understanding of the complex virus-host translation landscape. This article is categorized under: Translation > Translation Mechanisms Translation > Translation Regulation.
Collapse
Affiliation(s)
- Huy-Dung Hoang
- Children's Hospital of Eastern Ontario Research Institute, Apoptosis Research Centre, Ottawa, Ontario, K1H8L1, Canada.,Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | - Serge Neault
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada.,Centre for Innovative Cancer Research, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada
| | - Adrian Pelin
- Centre for Innovative Cancer Research, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada
| | - Tommy Alain
- Children's Hospital of Eastern Ontario Research Institute, Apoptosis Research Centre, Ottawa, Ontario, K1H8L1, Canada.,Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| |
Collapse
|
3
|
Ren J, Song K, Deng C, Ahlgren NA, Fuhrman JA, Li Y, Xie X, Poplin R, Sun F. Identifying viruses from metagenomic data using deep learning. QUANTITATIVE BIOLOGY 2020; 8:64-77. [PMID: 34084563 PMCID: PMC8172088 DOI: 10.1007/s40484-019-0187-4] [Citation(s) in RCA: 238] [Impact Index Per Article: 59.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 10/08/2019] [Accepted: 10/14/2019] [Indexed: 01/08/2023]
Abstract
BACKGROUND The recent development of metagenomic sequencing makes it possible to massively sequence microbial genomes including viral genomes without the need for laboratory culture. Existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences from metagenomic data. METHODS Here we developed a reference-free and alignment-free machine learning method, DeepVirFinder, for identifying viral sequences in metagenomic data using deep learning. RESULTS Trained based on sequences from viral RefSeq discovered before May 2015, and evaluated on those discovered after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths, achieving AUROC 0.93, 0.95, 0.97, and 0.98 for 300, 500, 1000, and 3000 bp sequences respectively. Enlarging the training data with additional millions of purified viral sequences from metavirome samples further improved the accuracy for identifying virus groups that are under-represented. Applying DeepVirFinder to real human gut metagenomic samples, we identified 51,138 viral sequences belonging to 175 bins in patients with colorectal carcinoma (CRC). Ten bins were found associated with the cancer status, suggesting viruses may play important roles in CRC. CONCLUSIONS Powered by deep learning and high throughput sequencing metagenomic data, DeepVirFinder significantly improved the accuracy of viral identification and will assist the study of viruses in the era of metagenomics.
Collapse
Affiliation(s)
- Jie Ren
- Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA 90089, USA
| | - Kai Song
- School of Mathematics and Statistics, Qingdao University, Qingdao 266071, China
| | - Chao Deng
- Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA 90089, USA
| | | | - Jed A. Fuhrman
- Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Yi Li
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Xiaohui Xie
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | | | - Fengzhu Sun
- Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
4
|
Salem M, Skurnik M. Genomic Characterization of Sixteen Yersinia enterocolitica-Infecting Podoviruses of Pig Origin. Viruses 2018; 10:v10040174. [PMID: 29614052 PMCID: PMC5923468 DOI: 10.3390/v10040174] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 03/23/2018] [Accepted: 03/29/2018] [Indexed: 11/16/2022] Open
Abstract
Yersinia enterocolitica causes enteric infections in humans and animals. Human infections are often caused by contaminated pork meat. Y. enterocolitica colonizes pig tonsils and pigs secrete both the human pathogen and its specific bacteriophages into the stools. In this work, sixteen Y. enterocolitica—infecting lytic bacteriophages isolated from pig stools originating from several pig farms were characterized. All phages belong to the Podoviridae family and their genomes range between 38,391–40,451 bp in size. The overall genome organization of all the phages resembled that of T7-like phages, having 3–6 host RNA polymerase (RNAP)-specific promoters at the beginning of the genomes and 11–13 phage RNAP-specific promoters as well as 3–5 rho-independent terminators, scattered throughout the genomes. Using a ligation-based approach, the physical termini of the genomes containing direct terminal repeats of 190–224 bp were established. No genes associated with lysogeny nor any toxin, virulence factor or antibiotic resistance genes were present in the genomes. Even though the phages had been isolated from different pig farms the nucleotide sequences of their genomes were 90–97% identical suggesting that the phages were undergoing microevolution within and between the farms. Lipopolysaccharide was found to be the surface receptor of all but one of the phages. The phages are classified as new species within the T7virus genus of Autographivirinae subfamily.
Collapse
Affiliation(s)
- Mabruka Salem
- Department of Bacteriology and Immunology, Medicum, Research Programs Unit, Immunobiology, University of Helsinki, 00014 Helsinki, Finland.
- Department of Microbiology, Faculty of Medicine, University of Benghazi, Benghazi 16063, Libya.
| | - Mikael Skurnik
- Department of Bacteriology and Immunology, Medicum, Research Programs Unit, Immunobiology, University of Helsinki, 00014 Helsinki, Finland.
- Division of Clinical Microbiology, Helsinki University Hospital, HUSLAB, 00029 Helsinki, Finland.
| |
Collapse
|
5
|
Goz E, Mioduser O, Diament A, Tuller T. Evidence of translation efficiency adaptation of the coding regions of the bacteriophage lambda. DNA Res 2017; 24:333-342. [PMID: 28338832 PMCID: PMC5737525 DOI: 10.1093/dnares/dsx005] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Accepted: 02/01/2017] [Indexed: 11/15/2022] Open
Abstract
Deciphering the way gene expression regulatory aspects are encoded in viral genomes is a challenging mission with ramifications related to all biomedical disciplines. Here, we aimed to understand how the evolution shapes the bacteriophage lambda genes by performing a high resolution analysis of ribosomal profiling data and gene expression related synonymous/silent information encoded in bacteriophage coding regions. We demonstrated evidence of selection for distinct compositions of synonymous codons in early and late viral genes related to the adaptation of translation efficiency to different bacteriophage developmental stages. Specifically, we showed that evolution of viral coding regions is driven, among others, by selection for codons with higher decoding rates; during the initial/progressive stages of infection the decoding rates in early/late genes were found to be superior to those in late/early genes, respectively. Moreover, we argued that selection for translation efficiency could be partially explained by adaptation to Escherichia coli tRNA pool and the fact that it can change during the bacteriophage life cycle. An analysis of additional aspects related to the expression of viral genes, such as mRNA folding and more complex/longer regulatory signals in the coding regions, is also reported. The reported conclusions are likely to be relevant also to additional viruses.
Collapse
Affiliation(s)
- Eli Goz
- Department of Biomedical Engineering, Tel-Aviv University, Ramat Aviv 69978, Israel.,SynVaccine Ltd Ramat Hachayal, Tel Aviv 6971039, Israel
| | - Oriah Mioduser
- Department of Biomedical Engineering, Tel-Aviv University, Ramat Aviv 69978, Israel
| | - Alon Diament
- Department of Biomedical Engineering, Tel-Aviv University, Ramat Aviv 69978, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, Tel-Aviv University, Ramat Aviv 69978, Israel.,SynVaccine Ltd Ramat Hachayal, Tel Aviv 6971039, Israel.,Sagol School of Neuroscience, Tel-Aviv University, Ramat Aviv 69978, Israel
| |
Collapse
|
6
|
Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. MICROBIOME 2017. [PMID: 28683828 DOI: 10.1186/s40168-017-0283-285] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
BACKGROUND Identifying viral sequences in mixed metagenomes containing both viral and host contigs is a critical first step in analyzing the viral component of samples. Current tools for distinguishing prokaryotic virus and host contigs primarily use gene-based similarity approaches. Such approaches can significantly limit results especially for short contigs that have few predicted proteins or lack proteins with similarity to previously known viruses. METHODS We have developed VirFinder, the first k-mer frequency based, machine learning method for virus contig identification that entirely avoids gene-based similarity searches. VirFinder instead identifies viral sequences based on our empirical observation that viruses and hosts have discernibly different k-mer signatures. VirFinder's performance in correctly identifying viral sequences was tested by training its machine learning model on sequences from host and viral genomes sequenced before 1 January 2014 and evaluating on sequences obtained after 1 January 2014. RESULTS VirFinder had significantly better rates of identifying true viral contigs (true positive rates (TPRs)) than VirSorter, the current state-of-the-art gene-based virus classification tool, when evaluated with either contigs subsampled from complete genomes or assembled from a simulated human gut metagenome. For example, for contigs subsampled from complete genomes, VirFinder had 78-, 2.4-, and 1.8-fold higher TPRs than VirSorter for 1, 3, and 5 kb contigs, respectively, at the same false positive rates as VirSorter (0, 0.003, and 0.006, respectively), thus VirFinder works considerably better for small contigs than VirSorter. VirFinder furthermore identified several recently sequenced virus genomes (after 1 January 2014) that VirSorter did not and that have no nucleotide similarity to previously sequenced viruses, demonstrating VirFinder's potential advantage in identifying novel viral sequences. Application of VirFinder to a set of human gut metagenomes from healthy and liver cirrhosis patients reveals higher viral diversity in healthy individuals than cirrhosis patients. We also identified contig bins containing crAssphage-like contigs with higher abundance in healthy patients and a putative Veillonella genus prophage associated with cirrhosis patients. CONCLUSIONS This innovative k-mer based tool complements gene-based approaches and will significantly improve prokaryotic viral sequence identification, especially for metagenomic-based studies of viral ecology.
Collapse
Affiliation(s)
- Jie Ren
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Nathan A Ahlgren
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy, Los Angeles, CA, 90089, USA.
- Present address: Biology Department, Clark University, 950 Main St, Worcester, MA, 01610, USA.
| | - Yang Young Lu
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Jed A Fuhrman
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy, Los Angeles, CA, 90089, USA
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA.
- Center for Computational Systems Biology, Fudan University, 200433, Shanghai, China.
| |
Collapse
|
7
|
Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. MICROBIOME 2017; 5:69. [PMID: 28683828 PMCID: PMC5501583 DOI: 10.1186/s40168-017-0283-5] [Citation(s) in RCA: 329] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 06/05/2017] [Indexed: 05/19/2023]
Abstract
BACKGROUND Identifying viral sequences in mixed metagenomes containing both viral and host contigs is a critical first step in analyzing the viral component of samples. Current tools for distinguishing prokaryotic virus and host contigs primarily use gene-based similarity approaches. Such approaches can significantly limit results especially for short contigs that have few predicted proteins or lack proteins with similarity to previously known viruses. METHODS We have developed VirFinder, the first k-mer frequency based, machine learning method for virus contig identification that entirely avoids gene-based similarity searches. VirFinder instead identifies viral sequences based on our empirical observation that viruses and hosts have discernibly different k-mer signatures. VirFinder's performance in correctly identifying viral sequences was tested by training its machine learning model on sequences from host and viral genomes sequenced before 1 January 2014 and evaluating on sequences obtained after 1 January 2014. RESULTS VirFinder had significantly better rates of identifying true viral contigs (true positive rates (TPRs)) than VirSorter, the current state-of-the-art gene-based virus classification tool, when evaluated with either contigs subsampled from complete genomes or assembled from a simulated human gut metagenome. For example, for contigs subsampled from complete genomes, VirFinder had 78-, 2.4-, and 1.8-fold higher TPRs than VirSorter for 1, 3, and 5 kb contigs, respectively, at the same false positive rates as VirSorter (0, 0.003, and 0.006, respectively), thus VirFinder works considerably better for small contigs than VirSorter. VirFinder furthermore identified several recently sequenced virus genomes (after 1 January 2014) that VirSorter did not and that have no nucleotide similarity to previously sequenced viruses, demonstrating VirFinder's potential advantage in identifying novel viral sequences. Application of VirFinder to a set of human gut metagenomes from healthy and liver cirrhosis patients reveals higher viral diversity in healthy individuals than cirrhosis patients. We also identified contig bins containing crAssphage-like contigs with higher abundance in healthy patients and a putative Veillonella genus prophage associated with cirrhosis patients. CONCLUSIONS This innovative k-mer based tool complements gene-based approaches and will significantly improve prokaryotic viral sequence identification, especially for metagenomic-based studies of viral ecology.
Collapse
Affiliation(s)
- Jie Ren
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Nathan A Ahlgren
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy, Los Angeles, CA, 90089, USA.
- Present address: Biology Department, Clark University, 950 Main St, Worcester, MA, 01610, USA.
| | - Yang Young Lu
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Jed A Fuhrman
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy, Los Angeles, CA, 90089, USA
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA.
- Center for Computational Systems Biology, Fudan University, 200433, Shanghai, China.
| |
Collapse
|
8
|
Ahlgren NA, Ren J, Lu YY, Fuhrman JA, Sun F. Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic Acids Res 2016; 45:39-53. [PMID: 27899557 PMCID: PMC5224470 DOI: 10.1093/nar/gkw1002] [Citation(s) in RCA: 167] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 10/31/2016] [Indexed: 01/17/2023] Open
Abstract
Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonomy from among ∼32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure [Formula: see text] at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times. Requiring a maximum dissimilarity score for making predictions (thresholding) and taking the consensus of the 30 most similar hosts further improved accuracy. Using a previous dataset of 820 bacteriophage and 2699 bacterial genomes, [Formula: see text] host prediction accuracies with thresholding and consensus methods (genus-level: 64%) exceeded previous Euclidian distance ONF (32%) or homology-based (22-62%) methods. When applied to metagenomically-assembled marine SUP05 viruses and the human gut virus crAssphage, [Formula: see text]-based predictions overlapped (i.e. some same, some different) with the previously inferred hosts of these viruses. The extent of overlap improved when only using host genomes or metagenomic contigs from the same habitat or samples as the query viruses. The [Formula: see text] ONF method will greatly improve the characterization of novel, metagenomic viruses.
Collapse
Affiliation(s)
- Nathan A Ahlgren
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy Los, Angeles, CA 90089, USA
| | - Jie Ren
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA
| | - Yang Young Lu
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA
| | - Jed A Fuhrman
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy Los, Angeles, CA 90089, USA
| | - Fengzhu Sun
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy Los, Angeles, CA 90089, USA.,Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA.,Center for Computational Systems Biology, Fudan University, Shanghai 200433, China
| |
Collapse
|
9
|
Phylogenomic networks reveal limited phylogenetic range of lateral gene transfer by transduction. ISME JOURNAL 2016; 11:543-554. [PMID: 27648812 PMCID: PMC5183456 DOI: 10.1038/ismej.2016.116] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Revised: 06/24/2016] [Accepted: 07/08/2016] [Indexed: 01/01/2023]
Abstract
Bacteriophages are recognized DNA vectors and transduction is considered as a common mechanism of lateral gene transfer (LGT) during microbial evolution. Anecdotal events of phage-mediated gene transfer were studied extensively, however, a coherent evolutionary viewpoint of LGT by transduction, its extent and characteristics, is still lacking. Here we report a large-scale evolutionary reconstruction of transduction events in 3982 genomes. We inferred 17 158 recent transduction events linking donors, phages and recipients into a phylogenomic transduction network view. We find that LGT by transduction is mostly restricted to closely related donors and recipients. Furthermore, a substantial number of the transduction events (9%) are best described as gene duplications that are mediated by mobile DNA vectors. We propose to distinguish this type of paralogy by the term autology. A comparison of donor and recipient genomes revealed that genome similarity is a superior predictor of species connectivity in the network in comparison to common habitat. This indicates that genetic similarity, rather than ecological opportunity, is a driver of successful transduction during microbial evolution. A striking difference in the connectivity pattern of donors and recipients shows that while lysogenic interactions are highly species-specific, the host range for lytic phage infections can be much wider, serving to connect dense clusters of closely related species. Our results thus demonstrate that DNA transfer via transduction occurs within the context of phage–host specificity, but that this tight constraint can be breached, on rare occasions, to produce long-range LGTs of profound evolutionary consequences.
Collapse
|
10
|
Wei Y, Wang J, Xia X. Coevolution between Stop Codon Usage and Release Factors in Bacterial Species. Mol Biol Evol 2016; 33:2357-67. [PMID: 27297468 PMCID: PMC4989110 DOI: 10.1093/molbev/msw107] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Three stop codons in bacteria represent different translation termination signals, and their usage is expected to depend on their differences in translation termination efficiency, mutation bias, and relative abundance of release factors (RF1 decoding UAA and UAG, and RF2 decoding UAA and UGA). In 14 bacterial species (covering Proteobacteria, Firmicutes, Cyanobacteria, Actinobacteria and Spirochetes) with cellular RF1 and RF2 quantified, UAA is consistently over-represented in highly expressed genes (HEGs) relative to lowly expressed genes (LEGs), whereas UGA usage is the opposite even in species where RF2 is far more abundant than RF1. UGA usage relative to UAG increases significantly with PRF2 [=RF2/(RF1 + RF2)] as expected from adaptation between stop codons and their decoders. PRF2 is > 0.5 over a wide range of AT content (measured by PAT3 as the proportion of AT at third codon sites), but decreases rapidly toward zero at the high range of PAT3. This explains why bacterial lineages with high PAT3 often have UGA reassigned because of low RF2. There is no indication that UAG is a minor stop codon in bacteria as claimed in a recent publication. The claim is invalid because of the failure to apply the two key criteria in identifying a minor codon: (1) it is least preferred by HEGs (or most preferred by LEGs) and (2) it corresponds to the least abundant decoder. Our results suggest a more plausible explanation for why UAA usage increases, and UGA usage decreases, with PAT3, but UAG usage remains low over the entire PAT3 range.
Collapse
Affiliation(s)
- Yulong Wei
- Department of Biology, University of Ottawa, Ottawa, ON, Canada
| | - Juan Wang
- Department of Biology, University of Ottawa, Ottawa, ON, Canada
| | - Xuhua Xia
- Department of Biology, University of Ottawa, Ottawa, ON, Canada Ottawa Institute of Systems Biology, Ottawa, ON, Canada
| |
Collapse
|
11
|
Gonzalez DL, Giannerini S, Rosa R. The non-power model of the genetic code: a paradigm for interpreting genomic information. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:rsta.2015.0062. [PMID: 26857679 DOI: 10.1098/rsta.2015.0062] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 10/27/2015] [Indexed: 06/05/2023]
Abstract
In this article, we present a mathematical framework based on redundant (non-power) representations of integer numbers as a paradigm for the interpretation of genomic information. The core of the approach relies on modelling the degeneracy of the genetic code. The model allows one to explain many features and symmetries of the genetic code and to uncover hidden symmetries. Also, it provides us with new tools for the analysis of genomic sequences. We review briefly three main areas: (i) the Euplotid nuclear code, (ii) the vertebrate mitochondrial code, and (iii) the main coding/decoding strategies used in the three domains of life. In every case, we show how the non-power model is a natural unified framework for describing degeneracy and deriving sound biological hypotheses on protein coding. The approach is rooted on number theory and group theory; nevertheless, we have kept the technical level to a minimum by focusing on key concepts and on the biological implications.
Collapse
Affiliation(s)
- Diego Luis Gonzalez
- Dipartimento di Scienze Statistiche, Università di Bologna, Via delle Belle Arti 41, 40126 Bologna, Italy CNR-IMM, Sezione di Bologna, Via Gobetti 101, 40129 Bologna, Italy
| | - Simone Giannerini
- Dipartimento di Scienze Statistiche, Università di Bologna, Via delle Belle Arti 41, 40126 Bologna, Italy
| | - Rodolfo Rosa
- CNR-IMM, Sezione di Bologna, Via Gobetti 101, 40129 Bologna, Italy
| |
Collapse
|
12
|
Chin CL, Chin HK, Chin CSH, Lai ET, Ng SK. Engineering selection stringency on expression vector for the production of recombinant human alpha1-antitrypsin using Chinese Hamster ovary cells. BMC Biotechnol 2015; 15:44. [PMID: 26033090 PMCID: PMC4450478 DOI: 10.1186/s12896-015-0145-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 04/17/2015] [Indexed: 11/28/2022] Open
Abstract
Background Expression vector engineering technology is one of the most convenient and timely method for cell line development to meet the rising demand of novel production cell line with high productivity. Destabilization of dihydrofolate reductase (dhfr) selection marker by addition of AU-rich elements and murine ornithine decarboxylase PEST region was previously shown to improve the specific productivities of recombinant human interferon gamma in CHO-DG44 cells. In this study, we evaluated novel combinations of engineered motifs for further selection marker attenuation to improve recombinant human alpha-1-antitrypsin (rhA1AT) production. Motifs tested include tandem PEST elements to promote protein degradation, internal ribosome entry site (IRES) mutations to impede translation initiation, and codon-deoptimized dhfr selection marker to reduce translation efficiency. Results After a 2-step methotrexate (MTX) amplification to 50 nM that took less than 3 months, the expression vector with IRES point mutation and dhfr-PEST gave a maximum titer of 1.05 g/l with the top producer cell pool. Further MTX amplification to 300 nM MTX gave a maximum titer of 1.15 g/l. Relative transcript copy numbers and dhfr protein expression in the cell pools were also analysed to demonstrate that the transcription of rhA1AT and dhfr genes were correlated due to the IRES linkage, and that the strategies of further attenuating dhfr protein expression with the use of a mutated IRES and tandem PEST, but not codon deoptimization, were effective in reducing dhfr protein levels in suspension serum free culture. Conclusions Novel combinations of engineered motifs for further selection marker attenuation were studied to result in the highest reported recombinant protein titer to our knowledge in shake flask batch culture of stable mammalian cell pools at 1.15 g/l, highlighting applicability of expression vector optimization in generating high producing stable cells essential for recombinant protein therapeutics production. Our results also suggest that codon usage of the selection marker should be considered for applications that may involve gene amplification and serum free suspension culture, since the overall codon usage and thus the general expression and regulation of host cell proteins may be affected in the surviving cells. Electronic supplementary material The online version of this article (doi:10.1186/s12896-015-0145-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Christine Lin Chin
- Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
| | - Hing Kah Chin
- Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
| | - Cara Sze Hui Chin
- Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
| | - Ethan Tingfeng Lai
- Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
| | - Say Kong Ng
- Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore. .,Department of Pharmacy, Faculty of Science, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
13
|
Chithambaram S, Prabhakaran R, Xia X. Differential codon adaptation between dsDNA and ssDNA phages in Escherichia coli. Mol Biol Evol 2014; 31:1606-17. [PMID: 24586046 PMCID: PMC4032129 DOI: 10.1093/molbev/msu087] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Because phages use their host translation machinery, their codon usage should evolve toward that of highly expressed host genes. We used two indices to measure codon adaptation of phages to their host, rRSCU (the correlation in relative synonymous codon usage [RSCU] between phages and their host) and Codon Adaptation Index (CAI) computed with highly expressed host genes as the reference set (because phage translation depends on host translation machinery). These indices used for this purpose are appropriate only when hosts exhibit little mutation bias, so only phages parasitizing Escherichia coli were included in the analysis. For double-stranded DNA (dsDNA) phages, both rRSCU and CAI decrease with increasing number of transfer RNA genes encoded by the phage genome. rRSCU is greater for dsDNA phages than for single-stranded DNA (ssDNA) phages, and the low rRSCU values are mainly due to poor concordance in RSCU values for Y-ending codons between ssDNA phages and the E. coli host, consistent with the predicted effect of C→T mutation bias in the ssDNA phages. Strong C→T mutation bias would improve codon adaptation in codon families (e.g., Gly) where U-ending codons are favored over C-ending codons (“U-friendly” codon families) by highly expressed host genes but decrease codon adaptation in other codon families where highly expressed host genes favor C-ending codons against U-ending codons (“U-hostile” codon families). It is remarkable that ssDNA phages with increasing C→T mutation bias also increased the usage of codons in the “U-friendly” codon families, thereby achieving CAI values almost as large as those of dsDNA phages. This represents a new type of codon adaptation.
Collapse
Affiliation(s)
- Shivapriya Chithambaram
- Department of Biology and Center for Advanced Research in Environmental Genomics, University of Ottawa, Ottawa, Ontario, Canada
| | - Ramanandan Prabhakaran
- Department of Biology and Center for Advanced Research in Environmental Genomics, University of Ottawa, Ottawa, Ontario, Canada
| | - Xuhua Xia
- Department of Biology and Center for Advanced Research in Environmental Genomics, University of Ottawa, Ottawa, Ontario, Canada
| |
Collapse
|
14
|
Coupling mutagenesis and parallel deep sequencing to probe essential residues in a genome or gene. Proc Natl Acad Sci U S A 2013; 110:E848-57. [PMID: 23401533 DOI: 10.1073/pnas.1222538110] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The sequence of a protein determines its function by influencing its folding, structure, and activity. Similarly, the most conserved residues of orthologous and paralogous proteins likely define those most important. The detection of important or essential residues is not always apparent via sequence alignments because these are limited by the depth of any given gene's phylogeny, as well as specificities that relate to each protein's unique biological origin. Thus, there is a need for robust and comprehensive ways of evaluating the importance of specific amino acid residues of proteins of known or unknown function. Here we describe an approach called Mut-seq, which allows the identification of virtually all of the essential residues present in a whole genome through the application of limited chemical mutagenesis, selection for function, and deep parallel genomic sequencing. Here we have applied this method to T7 bacteriophage and T7-like virus JSF7 of Vibrio cholerae.
Collapse
|
15
|
Determinants of translation efficiency and accuracy. Mol Syst Biol 2011; 7:481. [PMID: 21487400 PMCID: PMC3101949 DOI: 10.1038/msb.2011.14] [Citation(s) in RCA: 325] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2010] [Accepted: 02/15/2011] [Indexed: 12/17/2022] Open
Abstract
A given protein sequence can be encoded by an astronomical number of alternative nucleotide sequences. Recent research has revealed that this flexibility provides evolution with multiple ways to tune the efficiency and fidelity of protein translation and folding. Proper functioning of biological cells requires that the process of protein expression be carried out with high efficiency and fidelity. Given an amino-acid sequence of a protein, multiple degrees of freedom still remain that may allow evolution to tune efficiency and fidelity for each gene under various conditions and cell types. Particularly, the redundancy of the genetic code allows the choice between alternative codons for the same amino acid, which, although ‘synonymous,' may exert dramatic effects on the process of translation. Here we review modern developments in genomics and systems biology that have revolutionized our understanding of the multiple means by which translation is regulated. We suggest new means to model the process of translation in a richer framework that will incorporate information about gene sequences, the tRNA pool of the organism and the thermodynamic stability of the mRNA transcripts. A practical demonstration of a better understanding of the process would be a more accurate prediction of the proteome, given the transcriptome at a diversity of biological conditions.
Collapse
|
16
|
Cardinale DJ, Duffy S. Single-stranded genomic architecture constrains optimal codon usage. BACTERIOPHAGE 2011; 1:219-224. [PMID: 22334868 PMCID: PMC3278643 DOI: 10.4161/bact.1.4.18496] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Revised: 10/21/2011] [Accepted: 10/23/2011] [Indexed: 12/11/2022]
Abstract
Viral codon usage is shaped by the conflicting forces of mutational pressure and selection to match host patterns for optimal expression. We examined whether genomic architecture (single- or double-stranded DNA) influences the degree to which bacteriophage codon usage differ from their primary bacterial hosts and each other. While both correlated equally with their hosts’ genomic nucleotide content, the coat genes of ssDNA phages were less well adapted than those of dsDNA phages to their hosts’ codon usage profiles due to their preference for codons ending in thymine. No specific biases were detected in dsDNA phage genomes. In all nine of ten cases of codon redundancy in which a specific codon was overrepresented, ssDNA phages favored the NNT codon. A cytosine to thymine biased mutational pressure working in conjunction with strong selection against non-synonymous mutations appears be shaping codon usage bias in ssDNA viral genomes.
Collapse
Affiliation(s)
- Daniel J Cardinale
- Department of Ecology, Evolution and Natural Resources; School of Environmental and Biological Sciences; Rutgers; The State University of New Jersey; New Brunswick, NJ USA
| | | |
Collapse
|
17
|
Synonymous codon usage analysis of thirty two mycobacteriophage genomes. Adv Bioinformatics 2010:316936. [PMID: 20150956 PMCID: PMC2817497 DOI: 10.1155/2009/316936] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2009] [Accepted: 10/27/2009] [Indexed: 11/17/2022] Open
Abstract
Synonymous codon usage of protein coding genes of thirty two completely sequenced mycobacteriophage genomes was studied using multivariate statistical analysis. One of the major factors influencing codon usage is identified to be compositional bias. Codons ending with either C or G are preferred in highly expressed genes among which C ending codons are highly preferred over G ending codons. A strong negative correlation between effective number of codons (Nc) and GC3s content was also observed, showing that the codon usage was effected by gene nucleotide composition. Translational selection is also identified to play a role in shaping the codon usage operative at the level of translational accuracy. High level of heterogeneity is seen among and between the genomes. Length of genes is also identified to influence the codon usage in 11 out of 32 phage genomes. Mycobacteriophage Cooper is identified to be the highly biased genome with better translation efficiency comparing well with the host specific tRNA genes.
Collapse
|
18
|
Abstract
Across all kingdoms of biological life, protein-coding genes exhibit unequal usage of synonymous codons. Although alternative theories abound, translational selection has been accepted as an important mechanism that shapes the patterns of codon usage in prokaryotes and simple eukaryotes. Here we analyze patterns of codon usage across 74 diverse bacteriophages that infect E. coli, P. aeruginosa, and L. lactis as their primary host. We use the concept of a “genome landscape,” which helps reveal non-trivial, long-range patterns in codon usage across a genome. We develop a series of randomization tests that allow us to interrogate the significance of one aspect of codon usage, such as GC content, while controlling for another aspect, such as adaptation to host-preferred codons. We find that 33 phage genomes exhibit highly non-random patterns in their GC3-content, use of host-preferred codons, or both. We show that the head and tail proteins of these phages exhibit significant bias towards host-preferred codons, relative to the non-structural phage proteins. Our results support the hypothesis of translational selection on viral genes for host-preferred codons, over a broad range of bacteriophages. Any protein can be encoded by multiple, synonymous spellings. But organisms typically prefer one spelling over another—a phenomenon known as codon bias. Codon bias is generally understood to result from selection for synonymous spellings that increase the rate and accuracy of protein translation. In this work, we have examined the complete genomes of all sequenced viruses that infect the bacteria E. coli, P. aeruginosa, and L. lactis, and have found that many of these viral genomes also exhibit codon bias. Moreover, the degree of codon bias varies across the viral genome, as visualized using a technique called a “genome landscape.” By comparing the observed genomes to randomly drawn genomes, we demonstrate that the regions of high codon bias in these viral genomes often coincide with regions encoding structural proteins. Thus, the proteins that a virus needs to produce in high copy number utilize the same encoding as its host organism does for highly expressed proteins. Our results extend the translational theory of codon bias to the viral kingdom: parts of the viral genome are selected to obey the preferences of its host.
Collapse
|
19
|
Carbone A. Codon bias is a major factor explaining phage evolution in translationally biased hosts. J Mol Evol 2008; 66:210-23. [PMID: 18286220 DOI: 10.1007/s00239-008-9068-6] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Revised: 11/20/2007] [Accepted: 12/07/2007] [Indexed: 11/28/2022]
Abstract
The size and diversity of bacteriophage populations require methodologies to quantitatively study the landscape of phage differences. Statistical approaches are confronted with small genome sizes forbidding significant single-phage analysis, and comparative methods analyzing full phage genomes represent an alternative but they are of difficult interpretation due to lateral gene transfer, which creates a mosaic spectrum of related phage species. Based on a large-scale codon bias analysis of 116 DNA phages hosted by 11 translationally biased bacteria belonging to different phylogenetic families, we observe that phage genomes are almost always under codon selective pressure imposed by translationally biased hosts, and we propose a classification of phages with translationally biased hosts which is based on adaptation patterns. We introduce a computational method for comparing phages sharing homologous proteins, possibly accepted by different hosts. We observe that throughout phages, independently from the host, capsid genes appear to be the most affected by host translational bias. For coliphages, genes involved in virion morphogenesis, host interaction and ssDNA binding are also affected by adaptive pressure. Adaptation affects long and small phages in a significant way. We analyze in more detail the Microviridae phage space to illustrate the potentiality of the approach. The small number of directions in adaptation observed in phages grouped around phi X174 is discussed in the light of functional bias. The adaptation analysis of the set of Microviridae phages defined around phi MH2K shows that phage classification based on adaptation does not reflect bacterial phylogeny.
Collapse
Affiliation(s)
- Alessandra Carbone
- Génomique Analytique, Université Pierre et Marie Curie-Paris 6, UMR S511, 91 Bd de l'Hôpital, 75013, Paris, France.
| |
Collapse
|
20
|
Abstract
Phages have highly compact genomes with sizes reflecting their capacity to exploit the host resources. Here, we investigate the reasons for tRNAs being the only translation-associated genes frequently found in phages. We were able to unravel the selective processes shaping the tRNA distribution in phages by analyzing their genomes and those of their hosts. We found ample evidence against tRNAs being selected to facilitate phage integration in the prokaryotic chromosomes. Conversely, there is a significant association between tRNA distribution and codon usage. We support this observation by introducing a master equation model, where tRNAs are randomly gained from their hosts and then lost either neutrally or according to a set of different selection mechanisms. Those tRNAs present in phages tend to correspond to codons that are simultaneously highly used by the phage genes, while rare in the host genome. Accordingly, we propose that a selective recruitment of tRNAs compensates for the compositional differences between the phage and the host genomes. To further understand the importance of these results in phage biology, we analyzed the differences between temperate and virulent phages. Virulent phages contain more tRNAs than temperate ones, higher codon usage biases, and more important compositional differences with respect to the host genome. These differences are thus in perfect agreement with the results of our master equation model and further suggest that tRNA acquisition may contribute to higher virulence. Thus, even though phages use most of the cell's translation machinery, they can complement it with their own genetic information to attain higher fitness. These results suggest that similar selection pressures may act upon other cellular essential genes that are being found in the recently uncovered large viruses.
Collapse
Affiliation(s)
- Marc Bailly-Bechet
- CNRS URA 2171, Institut Pasteur, Unité Génétique in silico, F-75724 Paris Cedex 15, France.
| | | | | |
Collapse
|
21
|
Pyrc K, Dijkman R, Deng L, Jebbink MF, Ross HA, Berkhout B, van der Hoek L. Mosaic structure of human coronavirus NL63, one thousand years of evolution. J Mol Biol 2006; 364:964-73. [PMID: 17054987 PMCID: PMC7094706 DOI: 10.1016/j.jmb.2006.09.074] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2006] [Revised: 09/24/2006] [Accepted: 09/25/2006] [Indexed: 11/23/2022]
Abstract
Before the SARS outbreak only two human coronaviruses (HCoV) were known: HCoV-OC43 and HCoV-229E. With the discovery of SARS-CoV in 2003, a third family member was identified. Soon thereafter, we described the fourth human coronavirus (HCoV-NL63), a virus that has spread worldwide and is associated with croup in children. We report here the complete genome sequence of two HCoV-NL63 clinical isolates, designated Amsterdam 57 and Amsterdam 496. The genomes are 27,538 and 27,550 nucleotides long, respectively, and share the same genome organization. We identified two variable regions, one within the 1a and one within the S gene, whereas the 1b and N genes were most conserved. Phylogenetic analysis revealed that HCoV-NL63 genomes have a mosaic structure with multiple recombination sites. Additionally, employing three different algorithms, we assessed the evolutionary rate for the S gene of group Ib coronaviruses to be approximately 3 x 10(-4) substitutions per site per year. Using this evolutionary rate we determined that HCoV-NL63 diverged in the 11th century from its closest relative HCoV-229E.
Collapse
Affiliation(s)
- Krzysztof Pyrc
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands.
| | | | | | | | | | | | | |
Collapse
|
22
|
Sau K, Gupta SK, Sau S, Mandal SC, Ghosh TC. Factors influencing synonymous codon and amino acid usage biases in Mimivirus. Biosystems 2006; 85:107-13. [PMID: 16442213 DOI: 10.1016/j.biosystems.2005.12.004] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2005] [Revised: 12/05/2005] [Accepted: 12/17/2005] [Indexed: 10/25/2022]
Abstract
Synonymous codon and amino acid usage biases have been investigated in 903 Mimivirus protein-coding genes in order to understand the architecture and evolution of Mimivirus genome. As expected for an AT-rich genome, third codon positions of the synonymous codons of Mimivirus carry mostly A or T bases. It was found that codon usage bias in Mimivirus genes is dictated both by mutational pressure and translational selection. Evidences show that four factors such as mean molecular weight (MMW), hydropathy, aromaticity and cysteine content are mostly responsible for the variation of amino acid usage in Mimivirus proteins. Based on our observation, we suggest that genes involved in translation, DNA repair, protein folding, etc., have been laterally transferred to Mimivirus a long ago from living organism and with time these genes acquire the codon usage pattern of other Mimivirus genes under selection pressure.
Collapse
Affiliation(s)
- K Sau
- Department of Biotechnology, Haldia Institute of Technology, Haldia, India
| | | | | | | | | |
Collapse
|
23
|
Sau K, Gupta SK, Sau S, Ghosh TC. Synonymous codon usage bias in 16 Staphylococcus aureus phages: implication in phage therapy. Virus Res 2005; 113:123-31. [PMID: 15970346 DOI: 10.1016/j.virusres.2005.05.001] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2005] [Revised: 05/06/2005] [Accepted: 05/10/2005] [Indexed: 11/22/2022]
Abstract
To reveal the factors influencing architecture of protein-coding genes in staphylococcal phages, relative synonymous codon usage variation has been investigated in 920 protein-coding genes of 16 staphylococcal phages. As expected for AT rich genomes, there are predominantly A and T ending codons in all 16 phages. Both Nc plot and correspondence analysis on relative synonymous codon usage indicates that mutation bias influences codon usage variation in the 16 phages. Correspondence analysis also suggests that translational selection and gene length also influence the codon usage variation in the phages to some extent and codon usage in staphylococcal phages is phage-specific but not S. aureus-specific. Further analysis indicates that among 16 staphylococcal phages, 44AHJD, P68 and K may be extremely virulent in nature as most of their genes have high translation efficiency. If this is true, then above three phages may be useful for curing staphylococcal infections.
Collapse
Affiliation(s)
- K Sau
- Bioinformatics Centre, Bose Institute, P1/12, CIT Scheme VII M, Calcutta 700 054, India.
| | | | | | | |
Collapse
|
24
|
Sau K, Sau S, Mandal SC, Ghosh TC. Factors influencing the synonymous codon and amino acid usage bias in AT-rich Pseudomonas aeruginosa phage PhiKZ. Acta Biochim Biophys Sin (Shanghai) 2005; 37:625-33. [PMID: 16143818 PMCID: PMC7109957 DOI: 10.1111/j.1745-7270.2005.00089.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
To reveal how the AT-rich genome of bacteriophage PhiKZ has been shaped in order to carry out its growth in the GC-rich host Pseudomonas aeruginosa, synonymous codon and amino acid usage bias of PhiKZ was investigated and the data were compared with that of P. aeruginosa. It was found that synonymous codon and amino acid usage of PhiKZ was distinct from that of P. aeruginosa. In contrast to P. aeruginosa, the third codon position of the synonymous codons of PhiKZ carries mostly A or T base; codon usage bias in PhiKZ is dictated mainly by mutational bias and, to a lesser extent, by translational selection. A cluster analysis of the relative synonymous codon usage values of 16 myoviruses including PhiKZ shows that PhiKZ is evolutionary much closer to Escherichia coli phage T4. Further analysis reveals that the three factors of mean molecular weight, aromaticity and cysteine content are mostly responsible for the variation of amino acid usage in PhiKZ proteins, whereas amino acid usage of P. aeruginosa proteins is mainly governed by grand average of hydropathicity, aromaticity and cysteine content. Based on these observations, we suggest that codons of the phage-like PhiKZ have evolved to preferentially incorporate the smaller amino acid residues into their proteins during translation, thereby economizing the cost of its development in GC-rich P. aeruginosa.
Collapse
Affiliation(s)
- K. Sau
- Department of Mathematics, Jadavpur UniversityCalcutta 700 032, India
| | - S. Sau
- Department of Biochemistry, Bose Institute, P1/12-CIT Scheme VII MCalcutta 700 054, India
| | - S. C. Mandal
- Department of Mathematics, Jadavpur UniversityCalcutta 700 032, India
- Corresponding authors: S. C. MANDAL: E-mail,
| | - T. C. Ghosh
- Bioinformatics Centre, Bose Institute, P1/12-CIT Scheme VII MCalcutta 700 054, India
- T. C. GHOSH: Tel, +91-33-2334 6626; Fax, +91-33-2334 3886; E-mail,
| |
Collapse
|
25
|
Sahu K, Gupta SK, Sau S, Ghosh TC. Comparative Analysis of the Base Composition and Codon Usages in Fourteen Mycobacteriophage Genomes. J Biomol Struct Dyn 2005; 23:63-71. [PMID: 15918677 DOI: 10.1080/07391102.2005.10507047] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
To study the possible codon usage and base composition variation in the bacteriophages, fourteen mycobacteriophages were used as a model system here and both the parameters in all these phages and their plating bacteria, M. smegmatis had been determined and compared. As all the organisms are GC-rich, the GC contents at third codon positions were found in fact higher than the second codon positions as well as the first + second codon positions in all the organisms indicating that directional mutational pressure is strongly operative at the synonymous third codon positions. Nc plot indicates that codon usage variation in all these organisms are governed by the forces other than compositional constraints. Correspondence analysis suggests that: (i) there are codon usage variation among the genes and genomes of the fourteen mycobacteriophages and M. smegmatis, i.e., codon usage patterns in the mycobacteriophages is phage-specific but not the M. smegmatis-specific; (ii) synonymous codon usage patterns of Barnyard, Che8, Che9d, and Omega are more similar than the rest mycobacteriophages and M. smegmatis; (iii) codon usage bias in the mycobacteriophages are mainly determined by mutational pressure; and (iv) the genes of comparatively GC rich genomes are more biased than the GC poor genomes. Translational selection in determining the codon usage variation in highly expressed genes can be invoked from the predominant occurrences of C ending codons in the highly expressed genes. Cluster analysis based on codon usage data also shows that there are two distinct branches for the fourteen mycobacteriophages and there is codon usage variation even among the phages of each branch.
Collapse
Affiliation(s)
- K Sahu
- Bioinformatics Centre, Bose Institute, P1/12 - CIT Scheme VII M, Calcutta 700 054, India
| | | | | | | |
Collapse
|
26
|
Calin-Jageman I, Nicholson AW. Mutational analysis of an RNA internal loop as a reactivity epitope for Escherichia coli ribonuclease III substrates. Biochemistry 2003; 42:5025-34. [PMID: 12718545 DOI: 10.1021/bi030004r] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The enzymatic cleavage of double-stranded (ds) RNA is an obligatory step in the maturation and decay of many cellular and viral RNAs. The primary agents of dsRNA processing are members of the ribonuclease III (RNase III) superfamily, which are highly conserved in eukaryotic and bacterial cells. Escherichia coli RNase III participates in the maturation of the ribosomal RNAs and in the maturation and decay of cellular and phage mRNAs. E. coli RNase III-dependent cleavage events can regulate gene expression by controlling mRNA stability and translational activity. RNase III recognizes its substrates and selects the scissile phosphodiester(s) by recognizing specific RNA sequence and structural elements, termed reactivity epitopes. Some E. coli RNase III substrates contain an internal loop, in which is located the single scissile phosphodiester. The specific features of the internal loop that establish the pattern of single-strand cleavage are not known. A mutational analysis of the asymmetric [4 nt/5 nt] internal loop of the phage T7 R1.1 substrate reveals that cleavage reactivity is largely independent of internal loop sequence. Instead, the [4/5] asymmetry per se is the primary determinant of cleavage of a single bond within the 5 nt strand of the internal loop. The T7 R1.1 internal loop lacks elements of local tertiary structure, as revealed by sensitivity to cleavage by terbium ion and by the ability of the internal loop to destabilize a small model duplex. The internal loop functions as a discrete structural element in that the pattern of cleavage can be controlled by the specific type of asymmetry. The implications of these findings are discussed in light of RNase III substrate function as a gene regulatory element.
Collapse
Affiliation(s)
- Irina Calin-Jageman
- Department of Biological Sciences, Wayne State University, Detroit, Michigan 48202, USA
| | | |
Collapse
|
27
|
Abstract
An interplay among experimental studies of protein synthesis, evolutionary theory, and comparisons of DNA sequence data has shed light on the roles of natural selection and genetic drift in 'silent' DNA evolution.
Collapse
Affiliation(s)
- H Akashi
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas 66045-2106, USA.
| | | |
Collapse
|
28
|
Morton BR. Chloroplast DNA codon use: evidence for selection at the psb A locus based on tRNA availability. J Mol Evol 1993; 37:273-80. [PMID: 8230251 DOI: 10.1007/bf00175504] [Citation(s) in RCA: 114] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Codon use in the three sequenced chloroplast genomes (Marchantia, Oryza, and Nicotiana) is examined. The chloroplast has a bias in that codons NNA and NNT are favored over synonymous NNC and NNG codons. This appears to be a consequence of an overall high A + T content of the genome. This pattern of codon use is not followed by the psb A gene of all three genomes and other psb A sequences examined. In this gene, the codon use favors NNC over NNT for twofold degenerate amino acids. In each case the only tRNA coded by the genome is complementary to the NNC codon. This codon use is similar to the codon use by chloroplast genes examined from Chlamydomonas reinhardtii. Since psb A is the major translation product of the chloroplast, this suggests that selection is acting on the codon use of this gene to adapt codons to tRNA availability, as previously suggested for unicellular organisms.
Collapse
Affiliation(s)
- B R Morton
- Department of Botany and Plant Sciences, University of California, Riverside 92521
| |
Collapse
|
29
|
Chopin A. Organization and regulation of genes for amino acid biosynthesis in lactic acid bacteria. FEMS Microbiol Rev 1993; 12:21-37. [PMID: 8398216 DOI: 10.1111/j.1574-6976.1993.tb00011.x] [Citation(s) in RCA: 147] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
The recent description of large clusters of biosynthetic genes in the chromosome of Lactococcus lactis and, to a lesser extent, of Lactobacillus, has brought some information on gene organization and control of gene expression in these organisms. The genes involved in a given amino acid biosynthetic pathway are clustered at a single chromosomal location and form an operon. Additional genes which are not required for the biosynthesis are present within some operons. Genetic signals are, in general, similar to those found in other prokaryotes. Several systems controlling gene expression have been identified and transcription attenuation seems frequent. Among the attenuation mechanisms identified, one resembles that controlling amino acid biosynthesis in many bacteria by ribosome stalling at codons corresponding to limiting amino acid. The others are different and might be related to a new class of attenuation mechanism. Preliminary evidence for a new type of regulatory mechanism, involving a metabolic shunt, is also reviewed.
Collapse
Affiliation(s)
- A Chopin
- Laboratoire de Génétique Microbienne, INRA, Jouy-en-Josas, France
| |
Collapse
|
30
|
|
31
|
Molecular evolution of bacteriophages: Discrete patterns of codon usage in T4 genes are related to the time of gene expression. J Mol Evol 1991. [DOI: 10.1007/bf02100191] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
32
|
Brown CM, Stockwell PA, Trotman CN, Tate WP. The signal for the termination of protein synthesis in procaryotes. Nucleic Acids Res 1990; 18:2079-86. [PMID: 2186375 PMCID: PMC330686 DOI: 10.1093/nar/18.8.2079] [Citation(s) in RCA: 98] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The sequences around the stop codons of 862 Escherichia coli genes have been analysed to identify any additional features which contribute to the signal for the termination of protein synthesis. Highly significant deviations from the expected nucleotide distribution were observed, both before and after the stop codon. Immediately prior to UAA stop codons in E. coli there is a preference for codons of the form NAR (any base, adenine, purine), and in particular those that code for glutamine or the basic amino acids. In contrast, codons for threonine or branched nonpolar amino acids were under-represented. Uridine was over-represented in the nucleotide position immediately following all three stop codons, whereas adenine and cytosine were under-represented. This pattern is accentuated in highly expressed genes, but is not as marked in either lowly expressed genes or those that terminate in UAG, the codon specifically recognised by polypeptide chain release factor-1. These observations suggest that for the efficient termination of protein synthesis in E. coli, the 'stop signal' may be a tetranucleotide, rather than simply a tri-nucleotide codon, and that polypeptide chain release factor-2 recognises this extended signal. The sequence following stop codons was analysed in genes from several other procaryotes and bacteriophages. Salmonella typhimurium, Bacillus subtilis, bacteriophages and the methanogenic archaebacteria showed a similar bias to E. coli.
Collapse
Affiliation(s)
- C M Brown
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | | | | | | |
Collapse
|
33
|
|
34
|
Rogerson AC. The sequence asymmetry of the Escherichia coli chromosome appears to be independent of strand or function and may be evolutionarily conserved. Nucleic Acids Res 1989; 17:5547-63. [PMID: 2474802 PMCID: PMC318178 DOI: 10.1093/nar/17.14.5547] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
I have examined potential determinants of the asymmetric distribution of nucleotide sequences in the genome of Escherichia coli as cataloged in GenBank release 44. I have used the frequency of occurrence of all possible tetranucleotides in a given sequence catalog or derivative as a comparative measure of asymmetry. The GenBank-cataloged strand and its complement show statistically similar (not complementary) distributions. The distribution is statistically similar in comparisons between the protein coding subset and the total genome, the coding subset and selected non-coding genes, the coding subset and the remainder of the DNA, and the coding subset and stable RNA sequences. I have compared the distribution in the genome of E. coli with the distributions found in the cataloged genomes of Salmonella typhimurium, Bacillus subtilis, and of coliphages lambda and T7. The distribution summed in both strands of the cataloged DNA differs statistically only in comparisons with lytic bacteriophage T7 because only the two strands of T7 show statistically dissimilar distributions. Despite similarities in tetranucleotide distribution, the pattern of codon complementarity in B. subtilis is different than that documented for E. coli. Thus, sequence asymmetry does not seem related to specific DNA function or to documented similarities or differences in codon bias. The sequence asymmetry of the E. coli genome may thus reflect a hitherto unsuspected pattern impressed on both strands of DNA which is or can be packaged into bacterial genomes.
Collapse
Affiliation(s)
- A C Rogerson
- Biology Department, St Lawrence University, Canton, NY 13617
| |
Collapse
|
35
|
Abstract
The frequency of use of the three alternative translation termination codons has been examined in 165 Escherichia coli, 52 Bacillus subtilis and 106 Saccharomyces cerevisiae genes. Genes were first categorised according to their degree of bias in sense codon usage. In each species there is a very strong bias in favour of UAA (over UAG and UGA) in genes where sense codon usage is highly biased. This bias declines, principally with an increase in the use of UGA, in genes with lower sense codon bias. It appears that selection operating during translation may maintain the bias in stop codon usage. Such selection could result from the greater availability of UAA-cognate release factor(s), or from a lower frequency of translational readthrough at UAA.
Collapse
Affiliation(s)
- P M Sharp
- Department of Genetics, Trinity College, Dublin, Ireland
| | | |
Collapse
|
36
|
Benner S, Ellington AD. Interpreting the behavior of enzymes: purpose or pedigree? CRC CRITICAL REVIEWS IN BIOCHEMISTRY 1988; 23:369-426. [PMID: 3067974 DOI: 10.3109/10409238809082549] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
To interpret the growing body of data describing the structural, physical, and chemical behaviors of biological macromolecules, some understanding must be developed to relate these behaviors to the evolutionary processes that created them. Behaviors that are the products of natural selection reflect biological function and offer clues to the underlying chemical principles. Nonselected behaviors reflect historical accident and random drift. This review considers experimental data relevant to distinguishing between nonfunctional and functional behaviors in biological macromolecules. In the first segment, tools are developed for building functional and historical models to explain macromolecular behavior. These tools are then used with recent experimental data to develop a general outline of the relationship between structure, behavior, and natural selection in proteins and nucleic acids. In segments published elsewhere, specific functional and historical models for three properties of enzymes--kinetics, stereospecificity, and specificity for cofactor structures--are examined. Functional models appear most suitable for explaining the kinetic behavior of proteins. A mixture of functional and historical models appears necessary to understand the stereospecificity of enzyme reactions. Specificity for cofactor structures appears best understood in light of purely historical models based on a hypothesis of an early form of life exclusively using RNA catalysis.
Collapse
Affiliation(s)
- S Benner
- Organische Chemie, Eidgenössische Technische Hochschule, Zürich, Switzerland
| | | |
Collapse
|
37
|
Shields DC, Sharp PM. Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res 1987; 15:8023-40. [PMID: 3118331 PMCID: PMC306324 DOI: 10.1093/nar/15.19.8023] [Citation(s) in RCA: 201] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Codon usage data for 56 Bacillus subtilis genes show that synonymous codon usage in B. subtilis is less biased than in Escherichia coli, or in Saccharomyces cerevisiae. Nevertheless, certain genes with a high codon bias can be identified by correspondence analysis, and also by various indices of codon bias. These genes are very highly expressed, and a general trend (a decrease) in codon bias across genes seems to correspond to decreasing expression level. This, then, may be a general phenomenon in unicellular organisms. The unusually small effect of translational selection on the pattern of codon usage in lowly expressed genes in B. subtilis yields similar dinucleotide frequencies among different codon positions, and on complementary strands. These patterns could arise through selection on DNA structure, but more probably are largely determined by mutation. This prevalence of mutational bias could lead to difficulties in assessing whether open reading frames encode proteins.
Collapse
Affiliation(s)
- D C Shields
- Department of Genetics, Trinity College, Dublin, Ireland
| | | |
Collapse
|
38
|
Eveleth DD, Marsh JL. Overlapping transcription units in Drosophila: sequence and structure of the Cs gene. MOLECULAR & GENERAL GENETICS : MGG 1987; 209:290-8. [PMID: 3478553 DOI: 10.1007/bf00329656] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
The Cs gene lies between the functionally and evolutionarily related dopa decarboxylase (Ddc) and l(2)amd loci of Drosophila. The Cs and Ddc genes overlap at their 3' ends, implying that the transcription termination signals of these genes are polar, since each gene's primary transcript contains the complement of the other gene's transcription termination signals. The mature transcripts of the Cs and Ddc genes are complementary for a short distance and the primary transcripts may be complementary over thousands of base pairs. Despite intensive mutagenesis in this region, no mutations affecting the Cs transcript have been recovered although over 90 alleles of the two flanking genes (Ddc and l(2)amd) have been identified. Unlike the flanking Ddc and l(2)amd genes, the structure of the Cs gene and the temporal and tissue specificity of Cs expression are inconsistent with any structural or functional relatedness to the Ddc gene family. The internal structure of the Cs transcript is unlike that of most protein coding genes; it contains several open reading frames which are not situated favorably for efficient translation of the Cs message. This unusual internal structure may be the basis of the observed mutational silence of the Cs locus.
Collapse
Affiliation(s)
- D D Eveleth
- Developmental Biology Center, University of California, Irvine 92717
| | | |
Collapse
|
39
|
Li WH. Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J Mol Evol 1987; 24:337-45. [PMID: 3110426 DOI: 10.1007/bf02134132] [Citation(s) in RCA: 174] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The population dynamics of nearly neutral mutations are studied using a single-site and a multisite model. In the latter model, the nucleotides in a sequence are completely linked and the selection schemes employed are additive, multiplicative, and additive with a threshold. Although the third selection scheme is very different from the first two, the three schemes produce identical results for a wide range of parameter values. Thus the present study provides a general theory for the population dynamics of nearly neutral mutations because the results can also be used to draw inferences about other selection schemes such as stabilizing selection and synergistic selection. It is shown that the number of slightly deleterious mutations accumulated in a sequence can be considerably larger under the multisite model than under the single-site model, particularly if the sequence is long or if the mutation rate per site is high. The results show that even a very slight selective difference between synonymous codons can produce a strong bias in codon usage. Three alternative explanations for the strong bias in codon usage in bacterial and yeast genes are considered. The implications of the present results for molecular evolution are discussed.
Collapse
|
40
|
Phillips GJ, Arnold J, Ivarie R. The effect of codon usage on the oligonucleotide composition of the E. coli genome and identification of over- and underrepresented sequences by Markov chain analysis. Nucleic Acids Res 1987; 15:2627-38. [PMID: 3550700 PMCID: PMC340673 DOI: 10.1093/nar/15.6.2627] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
As shown in the accompanying paper (5), the oligonucleotide composition of the E. coli genome is highly asymmetric for sequences up to 6 bp in length when ranked from highest to lowest abundance. We show here that this largely reflects codon usage because heavily used codons were found in the highly abundant oligomers whereas rarely used codons, with some exceptions, occurred in sequences in low abundance. Furthermore, linear regression analysis revealed a strong correlation between the frequencies of each trinucleotide and its usage as a codon. Dinucleotides are also not randomly distributed across each codon position and the dinucleotide composition of genes that are transcribed but not translated (rRNA and tRNA genes) was highly related to that seen in genes encoding polypeptides. However, 45 tetra-, 8 penta-, and 6 hexanucleotides were significantly over- or underabundant by Markov chain analysis and could not be accounted for by codon usage. Of these underrepresented sequences, many were palindromes, including the Dam methylation site.
Collapse
|
41
|
Abstract
I briefly discuss some aspects of theoretical molecular biology. Specifically, I include the issues of searches for homologies via string matchings, for patterns of specific nucleotide groupings and of sequence-structure relationship. The various approaches developed in order to achieve this end are described, attempting to convey some of the excitement in this quickly growing field.
Collapse
Affiliation(s)
- R Nussinov
- Sackler Institute of Molecular Medicine, Sackler Faculty of Medicine, Tel Aviv Univrsity, Ramat Aviv, Israel
| |
Collapse
|
42
|
Sharp PM, Li WH. The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 1987; 15:1281-95. [PMID: 3547335 PMCID: PMC340524 DOI: 10.1093/nar/15.3.1281] [Citation(s) in RCA: 2579] [Impact Index Per Article: 69.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
A simple, effective measure of synonymous codon usage bias, the Codon Adaptation Index, is detailed. The index uses a reference set of highly expressed genes from a species to assess the relative merits of each codon, and a score for a gene is calculated from the frequency of use of all codons in that gene. The index assesses the extent to which selection has been effective in moulding the pattern of codon usage. In that respect it is useful for predicting the level of expression of a gene, for assessing the adaptation of viral genes to their hosts, and for making comparisons of codon usage in different organisms. The index may also give an approximate indication of the likely success of heterologous gene expression.
Collapse
|
43
|
Abstract
Observed patterns of synonymous codon usage are explained in terms of the joint effects of mutation, selection, and random drift. Examination of the codon usage in 165 Escherichia coli genes reveals a consistent trend of increasing bias with increasing gene expression level. Selection on codon usage appears to be unidirectional, so that the pattern seen in lowly expressed genes is best explained in terms of an absence of strong selection. A measure of directional synonymous-codon usage bias, the Codon Adaptation Index, has been developed. In enterobacteria, rates of synonymous substitution are seen to vary greatly among genes, and genes with a high codon bias evolve more slowly. A theoretical study shows that the patterns of extreme codon bias observed for some E. coli (and yeast) genes can be generated by rather small selective differences. The relative plausibilities of various theoretical models for explaining nonrandom codon usage are discussed.
Collapse
|
44
|
Wong JT, Cedergren R. Natural selection versus primitive gene structure as determinant of codon usage. EUROPEAN JOURNAL OF BIOCHEMISTRY 1986; 159:175-80. [PMID: 3091367 DOI: 10.1111/j.1432-1033.1986.tb09849.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Different codons are not utilized equally in known gene sequences. One of the important biases of codon usage is observed in the form of an enrichment of RNY codons, especially within RNN codon families. Such biases could represent the residue of a primitive repeating-RNY gene structure, or the outcome of natural selection, or both. Analyses based on the rates of silent substitutions, the frequencies of base doublets, and synonymous codon ratios for Escherichia coli, yeast, Drosophila and Xenopus proteins have been performed. The results rule out any significant support for a primitive repeating-RNY or repeating-RRY gene structure, and establish the important role of natural selection in determining the choice of codons. With strong intervention by natural selection, the relationship between primitive gene structure and codon usage necessarily becomes minimal.
Collapse
|
45
|
Sharp PM, Tuohy TM, Mosurski KR. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 1986; 14:5125-43. [PMID: 3526280 PMCID: PMC311530 DOI: 10.1093/nar/14.13.5125] [Citation(s) in RCA: 842] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Codon usage data has been compiled for 110 yeast genes. Cluster analysis on relative synonymous codon usage revealed two distinct groups of genes. One group corresponds to highly expressed genes, and has much more extreme synonymous codon preference. The pattern of codon usage observed is consistent with that expected if a need to match abundant tRNAs, and intermediacy of tRNA-mRNA interaction energies are important selective constraints. Thus codon usage in the highly expressed group shows a higher correlation with tRNA abundance, a greater degree of third base pyrimidine bias, and a lesser tendency to the A+T richness which is characteristic of the yeast genome. The cluster analysis can be used to predict the likely level of gene expression of any gene, and identifies the pattern of codon usage likely to yield optimal gene expression in yeast.
Collapse
|
46
|
McConnell DJ, Cantwell BA, Devine KM, Forage AJ, Laoide BM, O'Kane C, Ollington JF, Sharp PM. Genetic engineering of extracellular enzyme systems of Bacilli. Ann N Y Acad Sci 1986; 469:1-17. [PMID: 3524394 DOI: 10.1111/j.1749-6632.1986.tb26480.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
47
|
Schroeder C, Jurkschat H, Meisel A, Reich JG, Krüger D. Unusual occurrence of EcoP1 and EcoP15 recognition sites and counterselection of type II methylation and restriction sequences in bacteriophage T7 DNA. Gene X 1986; 45:77-86. [PMID: 3023202 DOI: 10.1016/0378-1119(86)90134-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Selected and counterselected oligodeoxynucleotide sequences were identified in the total sequence of bacteriophage T7 DNA using a statistical criterion derived for a probability model of the Markov chain type. All extremely rare tetra- and pentadeoxynucleotides are (or contain) recognition sequences for the Escherichia coli DNA methylases dam or dcm. Most of the 37 hexadeoxynucleotides absent from T7 DNA are recognition sequences for type II modification/restriction enzymes of E. coli or related species. In contrast to most restriction sites counterselected during evolution, the EcoP1 site GGTCT occurs 126 times in the T7 genome, and phage T7 replication is severely repressed in P1-lysogenic host cells. We demonstrate that the frequency of the EcoP1 site is determined by that of the overlapping recognition sites for T7 primase, an essential phage enzyme. The recognition site of a type III enzyme, EcoP15, is also not counterselected. In T7 DNA all 36 EcoP15 sites are arranged in such a manner that the sequence CAGCAG is confined to the H strand, the complementary sequence CTGCTG to the L strand. This "strand bias" is highly significant and, therefore, very probably selected. A functional relation between this strand bias and the refractive behaviour of phage T7 to EcoP15 restriction is suspected.
Collapse
|