1
|
Paremskaia AI, Kogan AA, Murashkina A, Naumova DA, Satish A, Abramov IS, Feoktistova SG, Mityaeva ON, Deviatkin AA, Volchkov PY. Codon-optimization in gene therapy: promises, prospects and challenges. Front Bioeng Biotechnol 2024; 12:1371596. [PMID: 38605988 PMCID: PMC11007035 DOI: 10.3389/fbioe.2024.1371596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 03/19/2024] [Indexed: 04/13/2024] Open
Abstract
Codon optimization has evolved to enhance protein expression efficiency by exploiting the genetic code's redundancy, allowing for multiple codon options for a single amino acid. Initially observed in E. coli, optimal codon usage correlates with high gene expression, which has propelled applications expanding from basic research to biopharmaceuticals and vaccine development. The method is especially valuable for adjusting immune responses in gene therapies and has the potenial to create tissue-specific therapies. However, challenges persist, such as the risk of unintended effects on protein function and the complexity of evaluating optimization effectiveness. Despite these issues, codon optimization is crucial in advancing gene therapeutics. This study provides a comprehensive review of the current metrics for codon-optimization, and its practical usage in research and clinical applications, in the context of gene therapy.
Collapse
Affiliation(s)
- Anastasiia Iu Paremskaia
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Anna A. Kogan
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Anastasiia Murashkina
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Daria A. Naumova
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Anakha Satish
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Ivan S. Abramov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
- The MCSC named after A. S. Loginov, Moscow, Russia
| | - Sofya G. Feoktistova
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Olga N. Mityaeva
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Andrei A. Deviatkin
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Pavel Yu Volchkov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
- The MCSC named after A. S. Loginov, Moscow, Russia
| |
Collapse
|
2
|
Khandia R, Pandey MK, Khan AA, Baklanov I, Alanazi AM, Nepali P, Gurjar P, Choudhary OP. Synthetic biology approach revealed enhancement in haeme oxygenase-1 gene expression by codon pair optimization while reduction by codon deoptimization. Ann Med Surg (Lond) 2024; 86:1359-1369. [PMID: 38463112 PMCID: PMC10923308 DOI: 10.1097/ms9.0000000000001465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 10/23/2023] [Indexed: 03/12/2024] Open
Abstract
Haem oxygenase-1 (HO-1) is a ubiquitously expressed gene involved in cellular homoeostasis, and its imbalance in expression results in various disorders. To alleviate such disorders, HO-1 gene expression needs to be modulated. Codon usage bias results from evolutionary forces acting on any nucleotide sequence and determines the gene expression. Like codon usage bias, codon pair bias also exists, playing a role in gene expression. In the present study, HO-1 gene was recoded by manipulating codon and codon pair bias, and four such constructs were made through codon/codon pair deoptimization and codon/codon pair optimization to reduce and enhance the HO-1 gene expression. Codon usage analysis was done for these constructs for four tissues brain, heart, pancreas and liver. Based on codon usage in different tissues, gene expression of these tissues was determined in terms of the codon adaptation index. Based on the codon adaptation index, minimum free energy, and translation efficiency, constructs were evaluated for enhanced or decreased HO-1 expression. The analysis revealed that for enhancing gene expression, codon pair optimization, while for reducing gene expression, codon deoptimization is efficacious. The recoded constructs developed in the study could be used in gene therapy regimens to cure HO-1 over or underexpression-associated disorders.
Collapse
Affiliation(s)
- Rekha Khandia
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal, MP, India
| | - Megha Katare Pandey
- Translational Medicine Center, All India Institute of Medical Sciences, Bhopal, MP, India
| | - Azmat Ali Khan
- Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Igor Baklanov
- Department of Philosophy, North Caucasus Federal University, Pushkina, Stavropol, Russia
| | - Amer M. Alanazi
- Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Prakash Nepali
- Bhimad Primary Health Care Center, Government of Nepal, Tanahun, Nepal
| | - Pankaj Gurjar
- Centre for Global Health Research, Saveetha Medical College and Hospital, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India
- Department of Science and Engineering, Novel Global Community Educational Foundation, Hebersham, NSW, Australia
| | - Om Prakash Choudhary
- Department of Veterinary Anatomy, College of Veterinary Science, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU), Rampura Phul, Bathinda, Punjab, India
| |
Collapse
|
3
|
Wang Y, Li Z, Wang X, Jiang W, Jiang W. SARS-CoV-2 continuously optimizes its codon usage to adapt to human lung environment. J Appl Genet 2023; 64:831-837. [PMID: 37740828 DOI: 10.1007/s13353-023-00790-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 09/14/2023] [Accepted: 09/16/2023] [Indexed: 09/25/2023]
Abstract
Viruses need to utilize the resources from host cells to reproduce themselves. RNA translation rate, which is largely determined by codon usage, is the rate-limiting step across the life cycle of viruses. Adapting to the codon usage of hosts would help virus better proliferate. We retrieved the time-course mutation profile of millions of world-wide SARS-CoV-2 sequences. For synonymous mutations, we defined whether a mutation elevate or reduce the relative synonymous codon usage (RSCU). We found that if a synonymous mutation in SARS-CoV-2 increases the RSCU (calculated from human lungs), denoted as delta RSCU > 0, then this mutation is positively selected because the allele frequency (AF) of this mutation increases with time, and vice versa. The results suggest that in SARS-CoV-2, the synonymous mutations that increase codon optimality are beneficial to the virus and are favored by natural selection. For the first time, we used the dynamics of allele frequency to demonstrate that SARS-CoV-2 is continuously optimizing its codon usage to adapt to human lungs. Nevertheless, adaptation to other human tissues cannot be excluded. These results warn us that under this global pandemic, synonymous mutations in SARS-CoV-2 should not be automatically ignored since they indeed change the fitness of the virus.
Collapse
Affiliation(s)
- Yinglian Wang
- Institute of Integrated Medicine, Qingdao Medical College, Qingdao University, Qingdao, 266071, Shandong, China
- Changyi People's Hospital, Weifang, 261300, Shandong, China
| | - Zhenhua Li
- Pulmonary and Critical Care Medicine Department 2, Qingdao Municipal Hospital of Traditional Chinese Medicine (Qingdao Hiser Medical Group), Qingdao, 266033, China
- Department of Respiratory Diseases, The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao Haici Hospital, Qingdao, 266033, Shandong, China
| | - Xiuxiu Wang
- Department of Respiratory Medicine, Qilu Hospital (Qingdao), Cheeloo College of Medicine, Shandong University, Qingdao, 266035, Shandong, China
| | - Wen Jiang
- Pulmonary and Critical Care Medicine Department 2, Qingdao Municipal Hospital of Traditional Chinese Medicine (Qingdao Hiser Medical Group), Qingdao, 266033, China
- Department of Respiratory Diseases, The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao Haici Hospital, Qingdao, 266033, Shandong, China
| | - Wenqing Jiang
- Pulmonary and Critical Care Medicine Department 2, Qingdao Municipal Hospital of Traditional Chinese Medicine (Qingdao Hiser Medical Group), Qingdao, 266033, China.
- Department of Respiratory Diseases, The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao Haici Hospital, Qingdao, 266033, Shandong, China.
| |
Collapse
|
4
|
Johnson MM, Hockenberry AJ, McGuffie MJ, Vieira LC, Wilke CO. Growth-dependent Gene Expression Variation Influences the Strength of Codon Usage Biases. Mol Biol Evol 2023; 40:msad189. [PMID: 37619989 PMCID: PMC10482319 DOI: 10.1093/molbev/msad189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 08/11/2023] [Indexed: 08/26/2023] Open
Abstract
The most highly expressed genes in microbial genomes tend to use a limited set of synonymous codons, often referred to as "preferred codons." The existence of preferred codons is commonly attributed to selection pressures on various aspects of protein translation including accuracy and/or speed. However, gene expression is condition-dependent and even within single-celled organisms transcript and protein abundances can vary depending on a variety of environmental and other factors. Here, we show that growth rate-dependent expression variation is an important constraint that significantly influences the evolution of gene sequences. Using large-scale transcriptomic and proteomic data sets in Escherichia coli and Saccharomyces cerevisiae, we confirm that codon usage biases are strongly associated with gene expression but highlight that this relationship is most pronounced when gene expression measurements are taken during rapid growth conditions. Specifically, genes whose relative expression increases during periods of rapid growth have stronger codon usage biases than comparably expressed genes whose expression decreases during rapid growth conditions. These findings highlight that gene expression measured in any particular condition tells only part of the story regarding the forces shaping the evolution of microbial gene sequences. More generally, our results imply that microbial physiology during rapid growth is critical for explaining long-term translational constraints.
Collapse
Affiliation(s)
- Mackenzie M Johnson
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Adam J Hockenberry
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Matthew J McGuffie
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX, USA
| | - Luiz Carlos Vieira
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
5
|
Johnson MM, Hockenberry AJ, McGuffie MJ, Vieira LC, Wilke CO. Growth-dependent gene expression variation influences the strength of codon usage biases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.532645. [PMID: 36993177 PMCID: PMC10055066 DOI: 10.1101/2023.03.14.532645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
The most highly expressed genes in microbial genomes tend to use a limited set of synonymous codons, often referred to as "preferred codons." The existence of preferred codons is commonly attributed to selection pressures on various aspects of protein translation including accuracy and/or speed. However, gene expression is condition-dependent and even within single-celled organisms transcript and protein abundances can vary depending on a variety of environmental and other factors. Here, we show that growth rate-dependent expression variation is an important constraint that significantly influences the evolution of gene sequences. Using large-scale transcriptomic and proteomic data sets in Escherichia coli and Saccharomyces cerevisiae, we confirm that codon usage biases are strongly associated with gene expression but highlight that this relationship is most pronounced when gene expression measurements are taken during rapid growth conditions. Specifically, genes whose relative expression increases during periods of rapid growth have stronger codon usage biases than comparably expressed genes whose expression decreases during rapid growth conditions. These findings highlight that gene expression measured in any particular condition tells only part of the story regarding the forces shaping the evolution of microbial gene sequences. More generally, our results imply that microbial physiology during rapid growth is critical for explaining long-term translational constraints.
Collapse
Affiliation(s)
- Mackenzie M Johnson
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, United States of America
| | - Adam J Hockenberry
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, United States of America
| | - Matthew J McGuffie
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX, United States of America
| | - Luiz Carlos Vieira
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, United States of America
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, United States of America
| |
Collapse
|
6
|
Lin BC, Katneni U, Jankowska KI, Meyer D, Kimchi-Sarfaty C. In silico methods for predicting functional synonymous variants. Genome Biol 2023; 24:126. [PMID: 37217943 PMCID: PMC10204308 DOI: 10.1186/s13059-023-02966-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/10/2023] [Indexed: 05/24/2023] Open
Abstract
Single nucleotide variants (SNVs) contribute to human genomic diversity. Synonymous SNVs are previously considered to be "silent," but mounting evidence has revealed that these variants can cause RNA and protein changes and are implicated in over 85 human diseases and cancers. Recent improvements in computational platforms have led to the development of numerous machine-learning tools, which can be used to advance synonymous SNV research. In this review, we discuss tools that should be used to investigate synonymous variants. We provide supportive examples from seminal studies that demonstrate how these tools have driven new discoveries of functional synonymous SNVs.
Collapse
Affiliation(s)
- Brian C Lin
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Upendra Katneni
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Katarzyna I Jankowska
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA.
| |
Collapse
|
7
|
Barbosa Pereira PJ, Manso JA, Macedo-Ribeiro S. The structural plasticity of polyglutamine repeats. Curr Opin Struct Biol 2023; 80:102607. [PMID: 37178477 DOI: 10.1016/j.sbi.2023.102607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/11/2023] [Accepted: 04/12/2023] [Indexed: 05/15/2023]
Abstract
From yeast to humans, polyglutamine (polyQ) repeat tracts are found frequently in the proteome and are particularly prominent in the activation domains of transcription factors. PolyQ is a polymorphic motif that modulates functional protein-protein interactions and aberrant self-assembly. Expansion of the polyQ repeated sequences beyond critical physiological repeat length thresholds triggers self-assembly and is linked to severe pathological implications. This review provides an overview of the current knowledge on the structures of polyQ tracts in the soluble and aggregated states and discusses the influence of neighboring regions on polyQ secondary structure, aggregation, and fibril morphologies. The influence of the genetic context of the polyQ-encoding trinucleotides is briefly discussed as a challenge for future endeavors in this field.
Collapse
Affiliation(s)
- Pedro José Barbosa Pereira
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal.
| | - José A Manso
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal
| | - Sandra Macedo-Ribeiro
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal
| |
Collapse
|
8
|
Anisimova AS, Kolyupanova NM, Makarova NE, Egorov AA, Kulakovskiy IV, Dmitriev SE. Human Tissues Exhibit Diverse Composition of Translation Machinery. Int J Mol Sci 2023; 24:ijms24098361. [PMID: 37176068 PMCID: PMC10179197 DOI: 10.3390/ijms24098361] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 04/26/2023] [Accepted: 05/03/2023] [Indexed: 05/15/2023] Open
Abstract
While protein synthesis is vital for the majority of cell types of the human body, diversely differentiated cells require specific translation regulation. This suggests the specialization of translation machinery across tissues and organs. Using transcriptomic data from GTEx, FANTOM, and Gene Atlas, we systematically explored the abundance of transcripts encoding translation factors and aminoacyl-tRNA synthetases (ARSases) in human tissues. We revised a few known and identified several novel translation-related genes exhibiting strict tissue-specific expression. The proteins they encode include eEF1A1, eEF1A2, PABPC1L, PABPC3, eIF1B, eIF4E1B, eIF4ENIF1, and eIF5AL1. Furthermore, our analysis revealed a pervasive tissue-specific relative abundance of translation machinery components (e.g., PABP and eRF3 paralogs, eIF2B and eIF3 subunits, eIF5MPs, and some ARSases), suggesting presumptive variance in the composition of translation initiation, elongation, and termination complexes. These conclusions were largely confirmed by the analysis of proteomic data. Finally, we paid attention to sexual dimorphism in the repertoire of translation factors encoded in sex chromosomes (eIF1A, eIF2γ, and DDX3), and identified the testis and brain as organs with the most diverged expression of translation-associated genes.
Collapse
Affiliation(s)
- Aleksandra S Anisimova
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119234 Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119234 Moscow, Russia
| | - Natalia M Kolyupanova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119234 Moscow, Russia
| | - Nadezhda E Makarova
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119234 Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119234 Moscow, Russia
| | - Artyom A Egorov
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119234 Moscow, Russia
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 117971 Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, 420008 Kazan, Russia
| | - Sergey E Dmitriev
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119234 Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119234 Moscow, Russia
| |
Collapse
|
9
|
Hernandez-Alias X, Benisty H, Radusky LG, Serrano L, Schaefer MH. Using protein-per-mRNA differences among human tissues in codon optimization. Genome Biol 2023; 24:34. [PMID: 36829202 PMCID: PMC9951436 DOI: 10.1186/s13059-023-02868-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 02/07/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND Codon usage and nucleotide composition of coding sequences have profound effects on protein expression. However, while it is recognized that different tissues have distinct tRNA profiles and codon usages in their transcriptomes, the effect of tissue-specific codon optimality on protein synthesis remains elusive. RESULTS We leverage existing state-of-the-art transcriptomics and proteomics datasets from the GTEx project and the Human Protein Atlas to compute the protein-to-mRNA ratios of 36 human tissues. Using this as a proxy of translational efficiency, we build a machine learning model that identifies codons enriched or depleted in specific tissues. We detect two clusters of tissues with an opposite pattern of codon preferences. We then use these identified patterns for the development of CUSTOM, a codon optimizer algorithm which suggests a synonymous codon design in order to optimize protein production in a tissue-specific manner. In human cell-line models, we provide evidence that codon optimization should take into account particularities of the translational machinery of the tissues in which the target proteins are expressed and that our approach can design genes with tissue-optimized expression profiles. CONCLUSIONS We provide proof-of-concept evidence that codon preferences exist in tissue-specific protein synthesis and demonstrate its application to synthetic gene design. We show that CUSTOM can be of benefit in biological and biotechnological applications, such as in the design of tissue-targeted therapies and vaccines.
Collapse
Affiliation(s)
- Xavier Hernandez-Alias
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain.
| | - Hannah Benisty
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Leandro G Radusky
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), 08002, Barcelona, Spain. .,ICREA, Pg. Lluís Companys 23, 08010, Barcelona, Spain.
| | - Martin H Schaefer
- IEO European Institute of Oncology IRCCS, Department of Experimental Oncology, Via Adamello 16, 20139, Milan, Italy.
| |
Collapse
|
10
|
Fumagalli SE, Padhiar NH, Meyer D, Katneni U, Bar H, DiCuccio M, Komar AA, Kimchi-Sarfaty C. Analysis of 3.5 million SARS-CoV-2 sequences reveals unique mutational trends with consistent nucleotide and codon frequencies. Virol J 2023; 20:31. [PMID: 36812119 PMCID: PMC9936480 DOI: 10.1186/s12985-023-01982-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 02/02/2023] [Indexed: 02/19/2023] Open
Abstract
BACKGROUND Since the onset of the SARS-CoV-2 pandemic, bioinformatic analyses have been performed to understand the nucleotide and synonymous codon usage features and mutational patterns of the virus. However, comparatively few have attempted to perform such analyses on a considerably large cohort of viral genomes while organizing the plethora of available sequence data for a month-by-month analysis to observe changes over time. Here, we aimed to perform sequence composition and mutation analysis of SARS-CoV-2, separating sequences by gene, clade, and timepoints, and contrast the mutational profile of SARS-CoV-2 to other comparable RNA viruses. METHODS Using a cleaned, filtered, and pre-aligned dataset of over 3.5 million sequences downloaded from the GISAID database, we computed nucleotide and codon usage statistics, including calculation of relative synonymous codon usage values. We then calculated codon adaptation index (CAI) changes and a nonsynonymous/synonymous mutation ratio (dN/dS) over time for our dataset. Finally, we compiled information on the types of mutations occurring for SARS-CoV-2 and other comparable RNA viruses, and generated heatmaps showing codon and nucleotide composition at high entropy positions along the Spike sequence. RESULTS We show that nucleotide and codon usage metrics remain relatively consistent over the 32-month span, though there are significant differences between clades within each gene at various timepoints. CAI and dN/dS values vary substantially between different timepoints and different genes, with Spike gene on average showing both the highest CAI and dN/dS values. Mutational analysis showed that SARS-CoV-2 Spike has a higher proportion of nonsynonymous mutations than analogous genes in other RNA viruses, with nonsynonymous mutations outnumbering synonymous ones by up to 20:1. However, at several specific positions, synonymous mutations were overwhelmingly predominant. CONCLUSIONS Our multifaceted analysis covering both the composition and mutation signature of SARS-CoV-2 gives valuable insight into the nucleotide frequency and codon usage heterogeneity of SARS-CoV-2 over time, and its unique mutational profile compared to other RNA viruses.
Collapse
Affiliation(s)
- Sarah E Fumagalli
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Nigam H Padhiar
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Upendra Katneni
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Haim Bar
- Department of Statistics, University of Connecticut, Storrs, CT, USA
| | | | - Anton A Komar
- Department of Biological, Geological and Environmental Sciences, Center for Gene Regulation in Health and Disease, Cleveland State University, Cleveland, OH, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
| |
Collapse
|
11
|
Khandia R, Khan AA, Karuvantevida N, Gurjar P, Rzhepakovsky IV, Legaz I. Insights into Synonymous Codon Usage Bias in Hepatitis C Virus and Its Adaptation to Hosts. Pathogens 2023; 12:pathogens12020325. [PMID: 36839597 PMCID: PMC9961758 DOI: 10.3390/pathogens12020325] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/25/2023] [Accepted: 02/02/2023] [Indexed: 02/17/2023] Open
Abstract
Hepatitis C virus (HCV) is enveloped RNA virus, encoding for a polyprotein that is processed by cellular proteases. The virus is responsible for liver cirrhosis, allograft rejection, and human hepatocellular carcinoma. Based on studies including compositional analysis, odds ratio analysis, parity analysis, skew analysis, relative synonymous codon usage, codon bias, and protein properties, it was evident that codon usage bias in HCV is dependent upon the nucleotide composition. Codon context analysis revealed CTC-CTG as a preferred codon pair. While CGA and CGT codons were rare, none of the codons were rare in HCV-like viruses envisaged in the present study. Many of the preferred codon pairs were valine amino acid-initiated, which possibly infers viral infectivity; hence the role of selection forces appears to act on the HCV genome, which was further validated by neutrality analysis where selection accounted for 87.28%, while mutation accounted for 12.72% force shaping codon usage. Furthermore, codon usage was correlated with the length of the genome. HCV viruses prefer valine-initiated codon pairs, while HCV-like viruses prefer alanine-initiated codon pairs. The HCV host range is very narrow and is confined to only humans and chimpanzees. Based on indices including codon usage correlation analysis, similarity index, and relative codon deoptimization index, it is evident in the study that the chimpanzee is the primary host of the virus. The present study helped elucidate the preferred host for HCV. The information presented in the study paved the way for generating an attenuated vaccine candidate through viral recoding, with finely tuned nucleotide composition and a perfect balance of preferred and rare codons.
Collapse
Affiliation(s)
- Rekha Khandia
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India
- Correspondence: (R.K.); (I.L.)
| | - Azmat Ali Khan
- Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh 11451, Saudi Arabia
| | - Noushad Karuvantevida
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai P.O. Box 505055, United Arab Emirates
| | - Pankaj Gurjar
- Department of Science and Engineering, Novel Global Community Educational Foundation, Hebersham, NSW 2770, Australia
| | | | - Isabel Legaz
- Department of Legal and Forensic Medicine, Biomedical Research Institute (IMIB), Regional Campus of International Excellence “Campus Mare Nostrum”, Faculty of Medicine, University of Murcia, 30120 Murcia, Spain
- Correspondence: (R.K.); (I.L.)
| |
Collapse
|
12
|
Implementing computational methods in tandem with synonymous gene recoding for therapeutic development. Trends Pharmacol Sci 2023; 44:73-84. [PMID: 36307252 DOI: 10.1016/j.tips.2022.09.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/26/2022] [Accepted: 09/27/2022] [Indexed: 12/24/2022]
Abstract
Synonymous gene recoding, the substitution of synonymous variants into the genetic sequence, has been used to overcome many production limitations in therapeutic development. However, the safety and efficacy of recoded therapeutics can be difficult to evaluate because synonymous codon substitutions can result in subtle, yet impactful changes in protein features and require sensitive methods for detection. Given that computational approaches have made significant leaps in recent years, we propose that machine-learning (ML) tools may be leveraged to assess gene-recoded therapeutics and foresee an opportunity to adapt codon contexts to enhance some powerful existing tools. Here, we examine how synonymous gene recoding has been used to address challenges in therapeutic development, explain the biological mechanisms underlying its effects, and explore the application of computational platforms to improve the surveillance of functional variants in therapeutic design.
Collapse
|
13
|
Ran X, Xiao J, Cheng F, Wang T, Teng H, Sun Z. Pan-cancer analyses of synonymous mutations based on tissue-specific codon optimality. Comput Struct Biotechnol J 2022; 20:3567-3580. [PMID: 35860410 PMCID: PMC9287186 DOI: 10.1016/j.csbj.2022.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 06/22/2022] [Accepted: 07/03/2022] [Indexed: 11/24/2022] Open
Abstract
Developed tissue-specific codon optimality in 29 human tissues. Applied these to analyze synonymous mutations in ∼10,000 tumor and normal samples. Synonymous mutations frequently increase optimal codons in most cancer types. Synonymous mutations frequently increase optimal codons cell cycle-related genes. Frequency of optimal codon gain relates to proliferation, DDR deficiency, and survival.
Codon optimality has been demonstrated to be an important determinant of mRNA stability and expression levels in multiple model organisms and human cell lines. However, tissue-specific codon optimality has not been developed to investigate how codon optimality is usually perturbed by somatic synonymous mutations in human cancers. Here, we determined tissue-specific codon optimality in 29 human tissues based on mRNA expression data from the Genotype-Tissue Expression project. We found that optimal codons were associated with differentiation, whereas non-optimal codons were correlated with proliferation. Furthermore, codons biased toward differentiation displayed greater tissue specificity in codon optimality, and the tissue specificity of codon optimality was primarily present in amino acids with high degeneracy of the genetic code. By applying tissue-specific codon optimality to somatic synonymous mutations in 8532 tumor samples across 24 cancer types and to those in 416 normal cells across six human tissues, we found that synonymous mutations frequently increased optimal codons in tumor cells and cancer-related genes (e.g., genes involved in cell cycle). Furthermore, an elevated frequency of optimal codon gain was found to promote tumor cell proliferation in three cancer types characterized by DNA damage repair deficiency and could act as a prognostic biomarker for patients with triple-negative breast cancer. In summary, this study profiled tissue-specific codon optimality in human tissues, revealed alterations in codon optimality caused by synonymous mutations in human cancers, and highlighted the non-negligible role of optimal codon gain in tumorigenesis and therapeutics.
Collapse
Affiliation(s)
- Xia Ran
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China.,CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jinyuan Xiao
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou 325000, China
| | - Fang Cheng
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou 325000, China
| | - Tao Wang
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Kaifu District, Changsha, Hunan 410078, China
| | - Huajing Teng
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing, China
| | - Zhongsheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China.,CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing 100049, China.,Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou 325000, China
| |
Collapse
|
14
|
Miller JB, Meurs TE, Hodgman MW, Song B, Miller KN, Ebbert MTW, Kauwe JSK, Ridge PG. The Ramp Atlas: facilitating tissue and cell-specific ramp sequence analyses through an intuitive web interface. NAR Genom Bioinform 2022; 4:lqac039. [PMID: 35664804 PMCID: PMC9155233 DOI: 10.1093/nargab/lqac039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 03/01/2022] [Accepted: 05/24/2022] [Indexed: 11/14/2022] Open
Abstract
Ramp sequences occur when the average translational efficiency of codons near the 5′ end of highly expressed genes is significantly lower than the rest of the gene sequence, which counterintuitively increases translational efficiency by decreasing downstream ribosomal collisions. Here, we show that the relative codon adaptiveness within different tissues changes the existence of a ramp sequence without altering the underlying genetic code. We present the first comprehensive analysis of tissue and cell type-specific ramp sequences and report 3108 genes with ramp sequences that change between tissues and cell types, which corresponds with increased gene expression within those tissues and cells. The Ramp Atlas (https://ramps.byu.edu/) allows researchers to query precomputed ramp sequences in 18 388 genes across 62 tissues and 66 cell types and calculate tissue-specific ramp sequences from user-uploaded FASTA files through an intuitive web interface. We used The Ramp Atlas to identify seven SARS-CoV-2 genes and seven human SARS-CoV-2 entry factor genes with tissue-specific ramp sequences that may help explain viral proliferation within those tissues. We anticipate that The Ramp Atlas will facilitate personalized and creative tissue-specific ramp sequence analyses for both human and viral genes that will increase our ability to utilize this often-overlooked regulatory region.
Collapse
Affiliation(s)
- Justin B Miller
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY 40504, USA
| | - Taylor E Meurs
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Matthew W Hodgman
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY 40504, USA
| | - Benjamin Song
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Kyle N Miller
- Department of Computer Science, Utah Valley University, Orem, UT 84058, USA
| | - Mark T W Ebbert
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY 40504, USA
| | - John S K Kauwe
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Perry G Ridge
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| |
Collapse
|
15
|
Watts A, Sankaranarayanan S, Watts A, Raipuria RK. Optimizing protein expression in heterologous system: Strategies and tools. Meta Gene 2021. [DOI: 10.1016/j.mgene.2021.100899] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
16
|
Abstract
Infectious diseases pose two main compelling issues. First, the identification of the molecular factors that allow chronic infections, that is, the often completely asymptomatic coexistence of infectious agents with the human host. Second, the definition of the mechanisms that allow the switch from pathogen dormancy to pathologic (re)activation. Furthering previous studies, the present study (1) analyzed the frequency of occurrence of synonymous codons in coding DNA, that is, codon usage, as a genetic tool that rules protein expression; (2) described how human codon usage can inhibit protein expression of infectious agents during latency, so that pathogen genes the codon usage of which does not conform to the human codon usage cannot be translated; and (3) framed human codon usage among the front-line instruments of the innate immunity against infections. In parallel, it was shown that, while genetics can account for the molecular basis of pathogen latency, the changes of the quantitative relationship between codon frequencies and isoaccepting tRNAs during cell proliferation offer a biochemical mechanism that explains the pathogen switching to (re)activation. Immunologically, this study warns that using codon optimization methodologies can (re)activate, potentiate, and immortalize otherwise quiescent, asymptomatic pathogens, thus leading to uncontrollable pandemics.
Collapse
Affiliation(s)
- Darja Kanduc
- Department of Biosciences, Biotechnologies and Biopharmaceutics, University of Bari, Bari, Italy
| |
Collapse
|
17
|
Iriarte A, Lamolle G, Musto H. Codon Usage Bias: An Endless Tale. J Mol Evol 2021; 89:589-593. [PMID: 34383106 DOI: 10.1007/s00239-021-10027-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 08/06/2021] [Indexed: 11/28/2022]
Abstract
Since the genetic code is degenerate, several codons are translated to the same amino acid. Although these triplets were historically considered to be "synonymous" and therefore expected to be used at rather equal frequencies in all genomes, we now know that this is not the case. Indeed, since several coding sequences were obtained in the late '70s and early '80s in the last century, coming from either the same or different species, it was evident that (a) each genome, taken globally, displayed different codon usage patterns, which means that different genomes display a particular global codon usage table when all genes are considered together, and (b) there is a strong intragenomic diversity: in other words, within a given species the codon usage pattern can (and usually do) differ greatly among genes in the same genome. These different patterns were attributed to two main factors: first, the mutational bias characteristic of each genome, which determines that GC- poor species display a general bias towards A/T codons while the reverse is true for GC- rich species. Second, the differences in codon usage among genes from the same species are due to natural selection acting at the level of translation, in such a way that highly expressed genes tend to use codons that match with the most abundant isoacceptor tRNAs. Thus, these genes are translated at a highest rate, which in turn leads to avoid the limiting factor in translation which is the number of available ribosomes per cell. Although these explanations are still valid, new factors are almost constantly postulated to affect codon usage. In this mini review, we shall try to summarize them.
Collapse
Affiliation(s)
- Andrés Iriarte
- Laboratorio de Genómica Evolutiva, Depto. de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, 11400, Montevideo, Uruguay.,Laboratorio de Biología Computacional, Depto. de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, 11600, Montevideo, Uruguay
| | - Guillermo Lamolle
- Laboratorio de Genómica Evolutiva, Depto. de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, 11400, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Genómica Evolutiva, Depto. de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, 11400, Montevideo, Uruguay.
| |
Collapse
|
18
|
Meyer D, Kames J, Bar H, Komar AA, Alexaki A, Ibla J, Hunt RC, Santana-Quintero LV, Golikov A, DiCuccio M, Kimchi-Sarfaty C. Distinct signatures of codon and codon pair usage in 32 primary tumor types in the novel database CancerCoCoPUTs for cancer-specific codon usage. Genome Med 2021; 13:122. [PMID: 34321100 PMCID: PMC8317675 DOI: 10.1186/s13073-021-00935-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 07/09/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Gene expression is highly variable across tissues of multi-cellular organisms, influencing the codon usage of the tissue-specific transcriptome. Cancer disrupts the gene expression pattern of healthy tissue resulting in altered codon usage preferences. The topic of codon usage changes as they relate to codon demand, and tRNA supply in cancer is of growing interest. METHODS We analyzed transcriptome-weighted codon and codon pair usage based on The Cancer Genome Atlas (TCGA) RNA-seq data from 6427 solid tumor samples and 632 normal tissue samples. This dataset represents 32 cancer types affecting 11 distinct tissues. Our analysis focused on tissues that give rise to multiple solid tumor types and cancer types that are present in multiple tissues. RESULTS We identified distinct patterns of synonymous codon usage changes for different cancer types affecting the same tissue. For example, a substantial increase in GGT-glycine was observed in invasive ductal carcinoma (IDC), invasive lobular carcinoma (ILC), and mixed invasive ductal and lobular carcinoma (IDLC) of the breast. Change in synonymous codon preference favoring GGT correlated with change in synonymous codon preference against GGC in IDC and IDLC, but not in ILC. Furthermore, we examined the codon usage changes between paired healthy/tumor tissue from the same patient. Using clinical data from TCGA, we conducted a survival analysis of patients based on the degree of change between healthy and tumor-specific codon usage, revealing an association between larger changes and increased mortality. We have also created a database that contains cancer-specific codon and codon pair usage data for cancer types derived from TCGA, which represents a comprehensive tool for codon-usage-oriented cancer research. CONCLUSIONS Based on data from TCGA, we have highlighted tumor type-specific signatures of codon and codon pair usage. Paired data revealed variable changes to codon usage patterns, which must be considered when designing personalized cancer treatments. The associated database, CancerCoCoPUTs, represents a comprehensive resource for codon and codon pair usage in cancer and is available at https://dnahive.fda.gov/review/cancercocoputs/ . These findings are important to understand the relationship between tRNA supply and codon demand in cancer states and could help guide the development of new cancer therapeutics.
Collapse
Affiliation(s)
- Douglas Meyer
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Jacob Kames
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Haim Bar
- Department of Statistics, University of Connecticut, Storrs, CT, USA
| | - Anton A Komar
- Center for Gene Regulation in Health and Disease, Department of Biological, Geological and Environmental Sciences, Cleveland State University, Cleveland, OH, USA
| | - Aikaterini Alexaki
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Juan Ibla
- Department of Anesthesiology, Critical Care and Pain Medicine, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Ryan C Hunt
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Luis V Santana-Quintero
- High-performance Integrated Virtual Environment, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD, 20993, USA
| | - Anton Golikov
- High-performance Integrated Virtual Environment, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD, 20993, USA
| | - Michael DiCuccio
- National Center of Biotechnology Information, National Institutes of Health, Bethesda, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US Food and Drug Administration, Silver Spring, MD, USA.
| |
Collapse
|
19
|
Simón D, Cristina J, Musto H. Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts. Front Microbiol 2021; 12:646300. [PMID: 34262534 PMCID: PMC8274242 DOI: 10.3389/fmicb.2021.646300] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Accepted: 06/04/2021] [Indexed: 11/13/2022] Open
Abstract
The genetic material of the three domains of life (Bacteria, Archaea, and Eukaryota) is always double-stranded DNA, and their GC content (molar content of guanine plus cytosine) varies between ≈ 13% and ≈ 75%. Nucleotide composition is the simplest way of characterizing genomes. Despite this simplicity, it has several implications. Indeed, it is the main factor that determines, among other features, dinucleotide frequencies, repeated short DNA sequences, and codon and amino acid usage. Which forces drive this strong variation is still a matter of controversy. For rather obvious reasons, most of the studies concerning this huge variation and its consequences, have been done in free-living organisms. However, no recent comprehensive study of all known viruses has been done (that is, concerning all available sequences). Viruses, by far the most abundant biological entities on Earth, are the causative agents of many diseases. An overview of these entities is important also because their genetic material is not always double-stranded DNA: indeed, certain viruses have as genetic material single-stranded DNA, double-stranded RNA, single-stranded RNA, and/or retro-transcribing. Therefore, one may wonder if what we have learned about the evolution of GC content and its implications in prokaryotes and eukaryotes also applies to viruses. In this contribution, we attempt to describe compositional properties of ∼ 10,000 viral species: base composition (globally and according to Baltimore classification), correlations among non-coding regions and the three codon positions, and the relationship of the nucleotide frequencies and codon usage of viruses with the same feature of their hosts. This allowed us to determine how the base composition of phages strongly correlate with the value of their respective hosts, while eukaryotic viruses do not (with fungi and protists as exceptions). Finally, we discuss some of these results concerning codon usage: reinforcing previous results, we found that phages and hosts exhibit moderate to high correlations, while for eukaryotes and their viruses the correlations are weak or do not exist.
Collapse
Affiliation(s)
- Diego Simón
- Laboratorio de Genómica Evolutiva, Departamento de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.,Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de la Republica, Montevideo, Uruguay.,Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Juan Cristina
- Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de la Republica, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Genómica Evolutiva, Departamento de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| |
Collapse
|
20
|
Bahiri-Elitzur S, Tuller T. Codon-based indices for modeling gene expression and transcript evolution. Comput Struct Biotechnol J 2021; 19:2646-2663. [PMID: 34025951 PMCID: PMC8122159 DOI: 10.1016/j.csbj.2021.04.042] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Revised: 04/17/2021] [Accepted: 04/18/2021] [Indexed: 11/21/2022] Open
Abstract
Codon usage bias (CUB) refers to the phenomena that synonymous codons are used in different frequencies in most genes and organisms. The general assumption is that codon biases reflect a balance between mutational biases and natural selection. Today we understand that the codon content is related and can affect all gene expression steps. Starting from the 1980s, codon-based indices have been used for answering different questions in all biomedical fields, including systems biology, agriculture, medicine, and biotechnology. In general, codon usage bias indices weigh each codon or a small set of codons to estimate the fitting of a certain coding sequence to a certain phenomenon (e.g., bias in codons, adaptation to the tRNA pool, frequencies of certain codons, transcription elongation speed, etc.) and are usually easy to implement. Today there are dozens of such indices; thus, this paper aims to review and compare the different codon usage bias indices, their applications, and advantages. In addition, we perform analysis that demonstrates that most indices tend to correlate even though they aim to capture different aspects. Due to the centrality of codon usage bias on different gene expression steps, it is important to keep developing new indices that can capture additional aspects that are not modeled with the current indices.
Collapse
Affiliation(s)
| | - Tamir Tuller
- Department of Biomedical Engineering, Tel-Aviv University, Tel Aviv, Israel
- The Sagol School of Neuroscience, Tel-Aviv University, Tel Aviv, Israel
| |
Collapse
|
21
|
Ou Z, Ouzounis C, Wang D, Sun W, Li J, Chen W, Marlière P, Danchin A. A Path toward SARS-CoV-2 Attenuation: Metabolic Pressure on CTP Synthesis Rules the Virus Evolution. Genome Biol Evol 2020; 12:2467-2485. [PMID: 33125064 PMCID: PMC7665462 DOI: 10.1093/gbe/evaa229] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/23/2020] [Indexed: 02/06/2023] Open
Abstract
In the context of the COVID-19 pandemic, we describe here the singular metabolic background that constrains enveloped RNA viruses to evolve toward likely attenuation in the long term, possibly after a step of increased pathogenicity. Cytidine triphosphate (CTP) is at the crossroad of the processes allowing SARS-CoV-2 to multiply, because CTP is in demand for four essential metabolic steps. It is a building block of the virus genome, it is required for synthesis of the cytosine-based liponucleotide precursors of the viral envelope, it is a critical building block of the host transfer RNAs synthesis and it is required for synthesis of dolichol-phosphate, a precursor of viral protein glycosylation. The CCA 3'-end of all the transfer RNAs required to translate the RNA genome and further transcripts into the proteins used to build active virus copies is not coded in the human genome. It must be synthesized de novo from CTP and ATP. Furthermore, intermediary metabolism is built on compulsory steps of synthesis and salvage of cytosine-based metabolites via uridine triphosphate that keep limiting CTP availability. As a consequence, accidental replication errors tend to replace cytosine by uracil in the genome, unless recombination events allow the sequence to return to its ancestral sequences. We document some of the consequences of this situation in the function of viral proteins. This unique metabolic setup allowed us to highlight and provide a raison d'être to viperin, an enzyme of innate antiviral immunity, which synthesizes 3'-deoxy-3',4'-didehydro-CTP as an extremely efficient antiviral nucleotide.
Collapse
Affiliation(s)
- Zhihua Ou
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen, China
| | - Christos Ouzounis
- Biological Computation and Process Laboratory, Centre for Research and Technology Hellas, Chemical Process and Energy Resources Institute, Thessalonica, Greece
| | - Daxi Wang
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen, China
| | - Wanying Sun
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen, China.,BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Junhua Li
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen, China
| | - Weijun Chen
- Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen, China.,BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen, China
| | - Philippe Marlière
- TESSSI, The European Syndicate of Synthetic Scientists and Industrialists, Paris, France
| | - Antoine Danchin
- Kodikos Labs, Institut Cochin, Paris, France.,School of Biomedical Sciences, Li KaShing Faculty of Medicine, Hong Kong University, Pokfulam, Hong Kong
| |
Collapse
|
22
|
Sequence analysis of SARS-CoV-2 genome reveals features important for vaccine design. Sci Rep 2020; 10:15643. [PMID: 32973171 PMCID: PMC7519053 DOI: 10.1038/s41598-020-72533-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Accepted: 08/19/2020] [Indexed: 12/28/2022] Open
Abstract
As the SARS-CoV-2 pandemic is rapidly progressing, the need for the development of an effective vaccine is critical. A promising approach for vaccine development is to generate, through codon pair deoptimization, an attenuated virus. This approach carries the advantage that it only requires limited knowledge specific to the virus in question, other than its genome sequence. Therefore, it is well suited for emerging viruses, for which we may not have extensive data. We performed comprehensive in silico analyses of several features of SARS-CoV-2 genomic sequence (e.g., codon usage, codon pair usage, dinucleotide/junction dinucleotide usage, RNA structure around the frameshift region) in comparison with other members of the coronaviridae family of viruses, the overall human genome, and the transcriptome of specific human tissues such as lung, which are primarily targeted by the virus. Our analysis identified the spike (S) and nucleocapsid (N) proteins as promising targets for deoptimization and suggests a roadmap for SARS-CoV-2 vaccine development, which can be generalizable to other viruses.
Collapse
|
23
|
Computational Resources for Molecular Biology: Special Issue 2020. J Mol Biol 2020; 432:3361-3363. [PMID: 32298696 DOI: 10.1016/j.jmb.2020.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
24
|
Kames J, Holcomb DD, Kimchi O, DiCuccio M, Hamasaki-Katagiri N, Wang T, Komar AA, Alexaki A, Kimchi-Sarfaty C. Sequence analysis of SARS-CoV-2 genome reveals features important for vaccine design. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.03.30.016832. [PMID: 32511300 PMCID: PMC7217226 DOI: 10.1101/2020.03.30.016832] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
As the SARS-CoV-2 pandemic is rapidly progressing, the need for the development of an effective vaccine is critical. A promising approach for vaccine development is to generate, through codon pair deoptimization, an attenuated virus. This approach carries the advantage that it only requires limited knowledge specific to the virus in question, other than its genome sequence. Therefore, it is well suited for emerging viruses for which we may not have extensive data. We performed comprehensive in silico analyses of several features of SARS-CoV-2 genomic sequence (e.g., codon usage, codon pair usage, dinucleotide/junction dinucleotide usage, RNA structure around the frameshift region) in comparison with other members of the coronaviridae family of viruses, the overall human genome, and the transcriptome of specific human tissues such as lung, which are primarily targeted by the virus. Our analysis identified the spike (S) and nucleocapsid (N) proteins as promising targets for deoptimization and suggests a roadmap for SARS-CoV-2 vaccine development, which can be generalizable to other viruses.
Collapse
Affiliation(s)
- Jacob Kames
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA
| | - David D. Holcomb
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA
| | - Ofer Kimchi
- Harvard University School of Engineering and Applied Sciences
| | - Michael DiCuccio
- National Center of Biotechnology Information, National Institutes of Health, Bethesda, MD, USA
| | - Nobuko Hamasaki-Katagiri
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA
| | - Tony Wang
- Center for Biologics Evaluation and Research, Office of Vaccines Research and Review, Division of Viral Products, Food and Drug Administration, Silver Spring, MD, USA
| | - Anton A. Komar
- Center for Gene Regulation in Health and Disease, Cleveland State University, Cleveland, OH, USA
| | - Aikaterini Alexaki
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA
| |
Collapse
|