1
|
Wang N, Zheng X, Leptihn S, Li Y, Cai H, Zhang P, Wu W, Yu Y, Hua X. Characteristics and phylogenetic distribution of megaplasmids and prediction of a putative chromid in Pseudomonas aeruginosa. Comput Struct Biotechnol J 2024; 23:1418-1428. [PMID: 38616963 PMCID: PMC11015739 DOI: 10.1016/j.csbj.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 04/01/2024] [Accepted: 04/01/2024] [Indexed: 04/16/2024] Open
Abstract
Research on megaplasmids that contribute to the spread of antimicrobial resistance (AMR) in Pseudomonas aeruginosa strains has grown in recent years due to the now widely used technologies allowing long-read sequencing. Here, we systematically analyzed distinct and consistent genetic characteristics of megaplasmids found in P. aeruginosa. Our data provide information on their phylogenetic distribution and hypotheses tracing the potential evolutionary paths of megaplasmids. Most of the megaplasmids we found belong to the IncP-2-type, with conserved and syntenic genetic backbones carrying modules of genes associated with chemotaxis apparatus, tellurite resistance and plasmid replication, segregation, and transmission. Extensively variable regions harbor abundant AMR genes, especially those encoding β-lactamases such as VIM-2, IMP-45, and KPC variants, which are high-risk elements in nosocomial infection. IncP-2 megaplasmids act as effective vehicles transmitting AMR genes to diverse regions. One evolutionary model of the origin of megaplasmids claims that chromids can develop from megaplasmids. These chromids have been characterized as an intermediate between a megaplasmid and a chromosome, also containing core genes that can be found on the chromosome but not on the megaplasmid. Using in silico prediction, we identified the "PABCH45 unnamed replicon" as a putative chromid in P. aeruginosa, which shows a much higher similarity and closer phylogenetic relationship to chromosomes than to megaplasmids while also encoding plasmid-like partition genes. We propose that such a chromid could facilitate genome expansion, allowing for more rapid adaptations to novel ecological niches or selective conditions, in comparison to megaplasmids.
Collapse
Affiliation(s)
- Nanfei Wang
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xuan Zheng
- Department of Nephrology, Sir Run Run Shaw Hospital, College of Medicine, Zhejiang University, Hangzhou, China
| | - Sebastian Leptihn
- HMU Health and Medical University, Am Anger 64/73 – 99084, Erfurt, Germany
- Deutsches Zentrum für Infektionsforschung (DZIF) Translational Phage-Network, Inhoffenstraße 7 – 38124, Braunschweig, Germany
- University of Southern Denmark,Department of Biochemistry and Molecular Biology, Campusvej 55 – 5230, Odense, Denmark
| | - Yue Li
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Heng Cai
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Piaopiao Zhang
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Wenhao Wu
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yunsong Yu
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaoting Hua
- Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China
- Regional Medical Center for National Institute of Respiratory Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
2
|
Li T, Ma Z, Ding T, Yang Y, Wang F, Wan X, Liang F, Chen X, Yao H. Codon usage bias and phylogenetic analysis of chloroplast genome in 36 gracilariaceae species. Funct Integr Genomics 2024; 24:45. [PMID: 38429550 DOI: 10.1007/s10142-024-01316-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/11/2024] [Accepted: 02/13/2024] [Indexed: 03/03/2024]
Abstract
Gracilariaceae is a group of marine large red algae and main source of agar with important economic and ecological value. The codon usage patterns of chloroplast genomes in 36 species from Graciliaceae show that GC range from 0.284 to 0.335, the average GC3 range from 0.135 to 0.243 and the value of ENC range from 35.098 to 42.327, which indicates these genomes are rich in AT and prefer to use codons ending with AT in these species. Nc plot, PR2 plot, neutrality plot analyses and correlation analysis indicate that these biases may be caused by multiple factors, such as natural selection and mutation pressure, but prolonged natural selection is the main driving force influencing codon usage preference. The cluster analysis and phylogenetic analysis show that the differentiation relationship of them is different and indicate that codons with weak or unbiased preferences may also play an irreplaceable role in these species' evolution. In addition, we identified 26 common high-frequency codons and 8-18 optimal codons all ending in A/U in these 36 species. Our results will not only contribute to carrying out transgenic work in Gracilariaceae species to maximize the protein yield in the future, but also lay a theoretical foundation for further exploring systematic classification of them.
Collapse
Affiliation(s)
- Tingting Li
- College of Life Science, Sichuan Agriculture University, Ya'an, 625014, Sichuan, People's Republic of China
| | - Zheng Ma
- College of Life Science, Sichuan Agriculture University, Ya'an, 625014, Sichuan, People's Republic of China
| | - Tiemei Ding
- College of Life Science, Sichuan Agriculture University, Ya'an, 625014, Sichuan, People's Republic of China
| | - Yanxin Yang
- College of Life Science, Sichuan Agriculture University, Ya'an, 625014, Sichuan, People's Republic of China
| | - Fei Wang
- College of Life Science, Sichuan Agriculture University, Ya'an, 625014, Sichuan, People's Republic of China
| | - Xinjing Wan
- College of Life Science, Sichuan Agriculture University, Ya'an, 625014, Sichuan, People's Republic of China
| | - Fangyun Liang
- College of Life Science, Sichuan Agriculture University, Ya'an, 625014, Sichuan, People's Republic of China
| | - Xi Chen
- College of Life Science, Sichuan Agriculture University, Ya'an, 625014, Sichuan, People's Republic of China
| | - Huipeng Yao
- College of Life Science, Sichuan Agriculture University, Ya'an, 625014, Sichuan, People's Republic of China.
| |
Collapse
|
3
|
Analysis of the Compositional Features and Codon Usage Pattern of Genes Involved in Human Autophagy. Cells 2022; 11:cells11203203. [PMID: 36291071 PMCID: PMC9601114 DOI: 10.3390/cells11203203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 09/29/2022] [Accepted: 10/04/2022] [Indexed: 11/16/2022] Open
Abstract
Autophagy plays an intricate role in paradigmatic human pathologies such as cancer, and neurodegenerative, cardiovascular, and autoimmune disorders. Autophagy regulation is performed by a set of autophagy-related (ATG) genes, first recognized in yeast genome and subsequently identified in other species, including humans. Several other genes have been identified to be involved in the process of autophagy either directly or indirectly. Studying the codon usage bias (CUB) of genes is crucial for understanding their genome biology and molecular evolution. Here, we examined the usage pattern of nucleotide and synonymous codons and the influence of evolutionary forces in genes involved in human autophagy. The coding sequences (CDS) of the protein coding human autophagy genes were retrieved from the NCBI nucleotide database and analyzed using various web tools and software to understand their nucleotide composition and codon usage pattern. The effective number of codons (ENC) in all genes involved in human autophagy ranges between 33.26 and 54.6 with a mean value of 45.05, indicating an overall low CUB. The nucleotide composition analysis of the autophagy genes revealed that the genes were marginally rich in GC content that significantly influenced the codon usage pattern. The relative synonymous codon usage (RSCU) revealed 3 over-represented and 10 under-represented codons. Both natural selection and mutational pressure were the key forces influencing the codon usage pattern of the genes involved in human autophagy.
Collapse
|
4
|
Saha J, Dey S, Pal A. Whole genome sequencing and comparative genomic analyses of Pseudomonas aeruginosa strain isolated from arable soil reveal novel insights into heavy metal resistance and codon biology. Curr Genet 2022; 68:481-503. [PMID: 35763098 DOI: 10.1007/s00294-022-01245-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/14/2022] [Accepted: 06/06/2022] [Indexed: 11/03/2022]
Abstract
Elevated concentration of non-essential persistent heavy metals and metalloids in the soil is detrimental to essential soil microbes and plants, resulting in diminished diversity and biomass. Thus, isolation, screening, and whole genomic analysis of potent strains of bacteria from arable lands with inherent capabilities of heavy metal resistance and plant growth promotion hold the key for bio remedial applications. This study is an attempt to do the same. In this study, a potent strain of Pseudomonas aeruginosa was isolated from paddy fields, followed by metabolic profiling using FTIR, metal uptake analysis employing ICP-MS, whole genome sequencing and comparative codon usage analysis. ICP-MS study provided insights into a high degree of Cd uptake during the exponential phase of growth under cumulative metal stress to Cd, Zn and Co, which was further corroborated by the detection of cadA gene along with czcCBA operon in the genome upon performing whole-genome sequencing. This potent strain of Pseudomonas aeruginosa also harboured genes, such as copA, chrA, znuA, mgtE, corA, and others conferring resistance against different heavy metals, such as Cd, Zn, Co, Cu, Cr, etc. A comparative codon usage bias analysis at the genomic and genic level, whereby several heavy metal resistant genes were considered in the backdrop of two housekeeping genes among 40 Pseudomonas spp. indicated the presence of a relatively strong codon usage bias in the studied strain. With this work, an effort was made to explore heavy metal-resistant bacteria (isolated from arable soil) and whole genome sequence analysis to get insight into metal resistance for future bio remedial applications.
Collapse
Affiliation(s)
- Jayanti Saha
- Microbiology and Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, West Bengal, 733134, India
| | - Sourav Dey
- Microbiology and Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, West Bengal, 733134, India
| | - Ayon Pal
- Microbiology and Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, West Bengal, 733134, India.
| |
Collapse
|
5
|
Carpentier F, Rodríguez de la Vega RC, Jay P, Duhamel M, Shykoff JA, Perlin MH, Wallen RM, Hood ME, Giraud T. Tempo of degeneration across independently evolved non-recombining regions. Mol Biol Evol 2022; 39:6553583. [PMID: 35325190 PMCID: PMC9004411 DOI: 10.1093/molbev/msac060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Recombination is beneficial over the long term, allowing more effective selection. Despite long-term advantages of recombination, local recombination suppression can evolve and lead to genomic degeneration, in particular on sex chromosomes. Here, we investigated the tempo of degeneration in nonrecombining regions, that is, the function curve for the accumulation of deleterious mutations over time, leveraging on 22 independent events of recombination suppression identified on mating-type chromosomes of anther-smut fungi, including newly identified ones. Using previously available and newly generated high-quality genome assemblies of alternative mating types of 13 Microbotryum species, we estimated degeneration levels in terms of accumulation of nonoptimal codons and nonsynonymous substitutions in nonrecombining regions. We found a reduced frequency of optimal codons in the nonrecombining regions compared with autosomes, that was not due to less frequent GC-biased gene conversion or lower ancestral expression levels compared with recombining regions. The frequency of optimal codons rapidly decreased following recombination suppression and reached an asymptote after ca. 3 Ma. The strength of purifying selection remained virtually constant at dN/dS = 0.55, that is, at an intermediate level between purifying selection and neutral evolution. Accordingly, nonsynonymous differences between mating-type chromosomes increased linearly with stratum age, at a rate of 0.015 per My. We thus develop a method for disentangling effects of reduced selection efficacy from GC-biased gene conversion in the evolution of codon usage and we quantify the tempo of degeneration in nonrecombining regions, which is important for our knowledge on genomic evolution and on the maintenance of regions without recombination.
Collapse
Affiliation(s)
- Fantin Carpentier
- Laboratoire Ecologie Systématique et Evolution, Bâtiment 360, CNRS, AgroParisTech, Université Paris-Saclay, 91400 Orsay, France
- Université de Lille, CNRS, UMR 8198-Evo-Eco-Paleo F-59000, Lille, France
| | - Ricardo C. Rodríguez de la Vega
- Laboratoire Ecologie Systématique et Evolution, Bâtiment 360, CNRS, AgroParisTech, Université Paris-Saclay, 91400 Orsay, France
- Corresponding authors: E-mails: ;
| | - Paul Jay
- Laboratoire Ecologie Systématique et Evolution, Bâtiment 360, CNRS, AgroParisTech, Université Paris-Saclay, 91400 Orsay, France
| | - Marine Duhamel
- Laboratoire Ecologie Systématique et Evolution, Bâtiment 360, CNRS, AgroParisTech, Université Paris-Saclay, 91400 Orsay, France
- Evolution der Pflanzen und Pilze, Ruhr-Universität Bochum, Universitätsstraße 150, 44780, Bochum, Germany
| | - Jacqui A. Shykoff
- Laboratoire Ecologie Systématique et Evolution, Bâtiment 360, CNRS, AgroParisTech, Université Paris-Saclay, 91400 Orsay, France
| | - Michael H. Perlin
- Department of Biology, Program on Disease Evolution, University of Louisville, Louisville, KY 40292, USA
| | - R. Margaret Wallen
- Department of Biology, Program on Disease Evolution, University of Louisville, Louisville, KY 40292, USA
| | | | - Tatiana Giraud
- Laboratoire Ecologie Systématique et Evolution, Bâtiment 360, CNRS, AgroParisTech, Université Paris-Saclay, 91400 Orsay, France
- Corresponding authors: E-mails: ;
| |
Collapse
|
6
|
Lamolle G, Iriarte A, Musto H. Codon usage in the flatworm Schistosoma mansoni is shaped by the mutational bias towards A+T and translational selection, which increases GC-ending codons in highly expressed genes. Mol Biochem Parasitol 2021; 247:111445. [PMID: 34942292 DOI: 10.1016/j.molbiopara.2021.111445] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 12/14/2021] [Accepted: 12/17/2021] [Indexed: 11/30/2022]
Abstract
Schistosoma mansoni is a trematode flatworm that parasitizes humans and produces a disease called bilharzia. At the genomic level, it is characterized by a low genomic GC content and an "isochore-like" structure, where GC-richest regions, mainly placed at the extremes of the chromosomes, are interspersed with low GC-regions. Furthermore, the GC-richest regions are at the same time the gene-richest, and where the most heavily expressed genes are placed. Taking these features into account, we decided to reanalyze the codon usage of this flatworm. Our results show that a) when all genes are considered together, the strong mutational bias towards A + T leads to a predominance of A/T-ending codons, b) a multivariate analysis discriminates between highly and lowly expressed genes, c) the sequences expressed at highest levels display a significant increase in G/C-ending codons, d) when comparing the molecular distances with a closely related species the synonymous distance in highly expressed genes is significantly lower than in lowly expressed sequences. Therefore, we conclude that despite previous results, which were performed with a small sample of genes, codon usage in S. mansoni is the result of two forces that operate in opposite directions: while mutational bias leads to a predominance of A/T codons, translational selection, working at the level of speed, increment G/C ending triplets.
Collapse
Affiliation(s)
- Guillermo Lamolle
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Avenida A. Navarro 3051, 11600 Montevideo, Uruguay.
| | - Héctor Musto
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay.
| |
Collapse
|
7
|
Zhang Z, Guo F, Roy A, Yang J, Luo W, Shen X, Irwin DM, Chen RA, Shen Y. Evolutionary perspectives and adaptation dynamics of human seasonal influenza viruses from 2009 to 2019: An insight from codon usage. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2021; 96:105067. [PMID: 34487866 DOI: 10.1016/j.meegid.2021.105067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 08/28/2021] [Accepted: 09/01/2021] [Indexed: 06/13/2023]
Abstract
The annually recurrent seasonal influenza viruses, namely, influenza A viruses (H1N1/pdm2009 and H3N2) and influenza B viruses, contribute substantially to human disease burden. Elucidation of host adaptation, population dynamics and evolutionary patterns of these viruses contribute to better control of current epidemic situation and bolster efforts towards pandemic preparedness. Present study has been addressed at unraveling the signatures of codon usage and dinucleotide distribution of these seasonal influenza viruses associating with their fitness and ongoing adaptive evolution in human population. Thorough analysis of codon usage adaptation revealed that H3N2 has been exhibited best adapted to human cellular system, which correlate with its highest epidemic intensity as compared with the other seasonal influenza viruses. CpG dinucleotide was found to be strongly avoided among the seasonal influenza viruses with more restraint among influenza B viruses than influenza A viruses, and might be accounted to the strategy of the viral pathogens in evading human immune signals. Dynamic scenes of ongoing evolution in codon usage and elimination of CpG motif among the viruses, which correlate with their distinct host adaption state, signifying the marked impact of selective force operational on the viral genomes, aimed at proficient circulation, enhanced fitness and successful infective manifestations in humans.
Collapse
Affiliation(s)
- Zhipeng Zhang
- Zhaoqing Branch Center of Guangdong Laboratory for Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China; Center for Emerging and Zoonotic Diseases, College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China
| | - Fucheng Guo
- Center for Emerging and Zoonotic Diseases, College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China; Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou 510642, China
| | - Ayan Roy
- Department of Biotechnology, Lovely Professional University, Punjab, India
| | - Jinjin Yang
- Center for Emerging and Zoonotic Diseases, College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China
| | - Wen Luo
- Center for Emerging and Zoonotic Diseases, College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China
| | - Xuejuan Shen
- Zhaoqing Branch Center of Guangdong Laboratory for Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China; Center for Emerging and Zoonotic Diseases, College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China
| | - David M Irwin
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto M5S 1A8, Canada; Banting and Best Diabetes Centre, University of Toronto, Toronto M5S 1A8, Canada
| | - Rui-Ai Chen
- Zhaoqing Branch Center of Guangdong Laboratory for Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China; Center for Emerging and Zoonotic Diseases, College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China; Zhaoqing Institute of Biotechnology, Zhaoqing 526238, China.
| | - Yongyi Shen
- Zhaoqing Branch Center of Guangdong Laboratory for Lingnan Modern Agricultural Science and Technology, Zhaoqing 526238, China; Center for Emerging and Zoonotic Diseases, College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China; Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou 510642, China; Zhaoqing Institute of Biotechnology, Zhaoqing 526238, China; Key Laboratory of Zoonosis Prevention and Control of Guangdong Province, Guangzhou 510642, China.
| |
Collapse
|
8
|
Kitao K, Nakagawa S, Miyazawa T. An ancient retroviral RNA element hidden in mammalian genomes and its involvement in co-opted retroviral gene regulation. Retrovirology 2021; 18:36. [PMID: 34753509 PMCID: PMC8579622 DOI: 10.1186/s12977-021-00580-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 10/22/2021] [Indexed: 01/19/2023] Open
Abstract
Background Retroviruses utilize multiple unique RNA elements to control RNA processing and translation. However, it is unclear what functional RNA elements are present in endogenous retroviruses (ERVs). Gene co-option from ERVs sometimes entails the conservation of viral cis-elements required for gene expression, which might reveal the RNA regulation in ERVs. Results Here, we characterized an RNA element found in ERVs consisting of three specific sequence motifs, called SPRE. The SPRE-like elements were found in different ERV families but not in any exogenous viral sequences examined. We observed more than a thousand of copies of the SPRE-like elements in several mammalian genomes; in human and marmoset genomes, they overlapped with lineage-specific ERVs. SPRE was originally found in human syncytin-1 and syncytin-2. Indeed, several mammalian syncytin genes: mac-syncytin-3 of macaque, syncytin-Ten1 of tenrec, and syncytin-Car1 of Carnivora, contained the SPRE-like elements. A reporter assay revealed that the enhancement of gene expression by SPRE depended on the reporter genes. Mutation of SPRE impaired the wild-type syncytin-2 expression while the same mutation did not affect codon-optimized syncytin-2, suggesting that SPRE activity depends on the coding sequence. Conclusions These results indicate multiple independent invasions of various mammalian genomes by retroviruses harboring SPRE-like elements. Functional SPRE-like elements are found in several syncytin genes derived from these retroviruses. This element may facilitate the expression of viral genes, which were suppressed due to inefficient codon frequency or repressive elements within the coding sequences. These findings provide new insights into the long-term evolution of RNA elements and molecular mechanisms of gene expression in retroviruses. Supplementary Information The online version contains supplementary material available at 10.1186/s12977-021-00580-2.
Collapse
Affiliation(s)
- Koichi Kitao
- Laboratory of Virus-Host Coevolution, Institute for Frontier Life and Medical Sciences, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku, Kyoto, 606-8507, Japan
| | - So Nakagawa
- Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa, 259-1193, Japan
| | - Takayuki Miyazawa
- Laboratory of Virus-Host Coevolution, Institute for Frontier Life and Medical Sciences, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku, Kyoto, 606-8507, Japan.
| |
Collapse
|
9
|
Shimada S, Nakai R, Aoki K, Kudoh S, Imura S, Shimoeda N, Ohno G, Watanabe K, Miyazaki Y, Ishii Y, Tateda K. Characterization of the First Cultured Psychrotolerant Representative of Legionella from Antarctica Reveals Its Unique Genome Structure. Microbiol Spectr 2021; 9:e0042421. [PMID: 34668737 PMCID: PMC8528123 DOI: 10.1128/spectrum.00424-21] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 09/14/2021] [Indexed: 11/20/2022] Open
Abstract
Culture-independent analysis shows that Legionella spp. inhabit a wide range of low-temperature environments, but to date, no psychrotolerant or psychrophilic strains have been reported. Here, we characterized the first cultivated psychrotolerant representative, designated strain TUM19329T, isolated from an Antarctic lake using a polyphasic approach and comparative genomic analysis. A genome-wide phylogenetic tree indicated that this strain was phylogenetically separate at the species level. Strain TUM19329T shared common physiological traits (e.g., Gram-negative, limited growth on buffered charcoal-yeast extract α-ketoglutarate [BCYEα] agar with l-cysteine requirements) with its relatives, but it also showed psychrotolerant growth properties (e.g., growth at 4°C to 25°C). Moreover, this strain altered its own cellular fatty acid composition to accumulate unsaturated fatty acid at a lower temperature, which may help maintain the cell membrane fluidity. Through comparative genomic analysis, we found that this strain possessed massive mobile genetic elements compared with other species, amounting to up to 17% of the total genes. The majority of the elements were the result of the spread of only a few insertion sequences (ISs), which were spread throughout the genome by a "copy-and-paste" mechanism. Furthermore, we found metabolic genes, such as fatty acid synthesis-related genes, acquired by horizontal gene transfer (HGT). The expansion of ISs and HGT events may play a major role in shaping the phenotype and physiology of this strain. On the basis of the features presented here, we propose a new species-Legionella antarctica sp. nov.-represented by strain TUM19329T (= GTC 22699T = NCTC 14581T). IMPORTANCE This study characterized a unique cultivated representative of the genus Legionella isolated from an Antarctic lake. This psychrotolerant strain had some common properties of known Legionella species but also displayed other characteristics, such as plasticity in fatty acid composition and an enrichment of mobile genes in the genome. These remarkable properties, as well as other factors, may contribute to cold hardiness, and this first cultivated cold-tolerant strain of the genus Legionella may serve as a model bacterium for further studies. It is worth noting that environmentally derived 16S rRNA gene phylotypes closely related to the strain characterized here have been detected from diverse environments outside Antarctica, suggesting a wide distribution of psychrotolerant Legionella bacteria. Our culture- and genome-based findings may accelerate the ongoing studies of the behavior and pathogenicity of Legionella spp., which have been monitored for many years in the context of public health.
Collapse
Affiliation(s)
- Sho Shimada
- Department of Microbiology and Infectious Diseases, Toho University School of Medicine, Tokyo, Japan
- Department of Respiratory Medicine, Tokyo Medical and Dental University (TMDU), Tokyo, Japan
| | - Ryosuke Nakai
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Hokkaido, Japan
| | - Kotaro Aoki
- Department of Microbiology and Infectious Diseases, Toho University School of Medicine, Tokyo, Japan
| | - Sakae Kudoh
- National Institute of Polar Research, Research Organization of Information and Systems, Tokyo, Japan
- Department of Polar Science, The Graduate University for Advanced Studies, SOKENDAI, Tokyo, Japan
| | - Satoshi Imura
- National Institute of Polar Research, Research Organization of Information and Systems, Tokyo, Japan
- Department of Polar Science, The Graduate University for Advanced Studies, SOKENDAI, Tokyo, Japan
| | | | | | - Kentaro Watanabe
- National Institute of Polar Research, Research Organization of Information and Systems, Tokyo, Japan
| | - Yasunari Miyazaki
- Department of Respiratory Medicine, Tokyo Medical and Dental University (TMDU), Tokyo, Japan
| | - Yoshikazu Ishii
- Department of Microbiology and Infectious Diseases, Toho University School of Medicine, Tokyo, Japan
| | - Kazuhiro Tateda
- Department of Microbiology and Infectious Diseases, Toho University School of Medicine, Tokyo, Japan
| |
Collapse
|
10
|
Yang J, Ding H, Kan X. Codon usage patterns and evolution of HSP60 in birds. Int J Biol Macromol 2021; 183:1002-1012. [PMID: 33971236 DOI: 10.1016/j.ijbiomac.2021.05.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 04/30/2021] [Accepted: 05/03/2021] [Indexed: 11/27/2022]
Abstract
Heat shock protein 60 (HSP60) is highly conserved from prokaryotic to eukaryotic organisms, acting as molecular chaperone and other vital biological functions. In spite of increasing knowledge of HSP60, its evolutionary mechanism on functional adaption is still far from completely understood. Moreover, analysis of codon usage bias (CUB) is a powerful tool to understand evolutionary association studies. However, so far, as we know, no scientific work on CUB of HSP60 in birds has been reported. In this study, we provide a comprehensive analysis on the codon usage and molecular evolution of HSP60 across birds. The results indicated that HSP60 had a weak codon usage bias with high ENC values (range from 52.66 to 61), low RSCU, and A/T-ending codons were mostly preferred. Meanwhile, it was considered that mutation, natural selection, and genetic drift combined to shape codon usage patterns with different strength proportions among various birds for HSP60. Then, the LRT tests suggested that different lineages of birds might be under similar selective pressures. Besides, the two positive selection sites (151 and 131) were detected and might undergo radical changes. These findings would contribute to understand function diversity and molecular evolution of HSP60 in birds.
Collapse
Affiliation(s)
- Jianke Yang
- The Institute of Bioinformatics, College of Life Sciences, Anhui Normal University, Wuhu, Anhui, China; School of Preclinical Medicine, Wannan Medical College, Wuhu, Anhui, China
| | - Hengwu Ding
- The Institute of Bioinformatics, College of Life Sciences, Anhui Normal University, Wuhu, Anhui, China; Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, Wuhu, Anhui, China
| | - Xianzhao Kan
- The Institute of Bioinformatics, College of Life Sciences, Anhui Normal University, Wuhu, Anhui, China; Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, Wuhu, Anhui, China.
| |
Collapse
|
11
|
A Crosstalk on Codon Usage in Genes Associated with Leukemia. Biochem Genet 2020; 59:235-255. [PMID: 32989646 DOI: 10.1007/s10528-020-10000-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 09/18/2020] [Indexed: 02/07/2023]
Abstract
Leukemia is the outcome of aggregation of damaged white blood cells. Several genes were reported to be associated with the pathogenesis of leukemia. These genes were computationally analyzed to decipher their codon usage bias (CUB) and to identify the prime factors influencing the codon usage profile as no work was reported yet. The mean values of synonymous codon usage order (SCUO) parameter indicated low CUB of the genes. Significant positive association of SCUO with overall GC and positional GCs might signal the presence of mutational pressure. However, neutrality plot suggested the dominant role of natural selection across the genes. Along with natural selection, the role of mutation pressure was also prominent and that might be responsible for lower CUB (SCUO = 0.19) of genes. Low translational speed might permit accuracy in the process. A strong inverse relationship of translational rate was observed with CUB of genes and folding energy.
Collapse
|
12
|
Uddin A, Mazumder TH, Barbhuiya PA, Chakraborty S. Similarities and dissimilarities of codon usage in mitochondrial ATP genes among fishes, aves, and mammals. IUBMB Life 2020; 72:899-914. [DOI: 10.1002/iub.2231] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 01/05/2020] [Indexed: 11/06/2022]
Affiliation(s)
- Arif Uddin
- Department of ZoologyMoinul Hoque Choudhury Memorial Science College Hailakandi Assam India
| | | | | | | |
Collapse
|
13
|
Sheikh A, Al-Taher A, Al-Nazawi M, Al-Mubarak AI, Kandeel M. Analysis of preferred codon usage in the coronavirus N genes and their implications for genome evolution and vaccine design. J Virol Methods 2020; 277:113806. [PMID: 31911390 PMCID: PMC7119019 DOI: 10.1016/j.jviromet.2019.113806] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 11/24/2019] [Accepted: 12/20/2019] [Indexed: 02/08/2023]
Abstract
The nucleotide variations among the N genes of 13 different coronaviruses (CoVs) were interpreted. Overall, 18 amino acids observed with varying preferred codons. The effective number of codon values ranged from 40.43 to 53.85, revealing a slight codon bias. A highly significant correlation between GC3s and ENc values was observed in porcine epidemic diarrhea CoV, followed by Middle East respiratory syndrome CoV.
The nucleocapsid (N) protein of a coronavirus plays a crucial role in virus assembly and in its RNA transcription. It is important to characterize a virus at the nucleotide level to discover the virus’s genomic sequence variations and similarities relative to other viruses that could have an impact on the functions of its genes and proteins. This entails a comprehensive and comparative analysis of the viral genomes of interest for preferred nucleotides, codon bias, nucleotide changes at the 3rd position (NT3s), synonymous codon usage and relative synonymous codon usage. In this study, the variations in the N proteins among 13 different coronaviruses (CoVs) were analysed at the nucleotide and amino acid levels in an attempt to reveal how these viruses adapt to their hosts relative to their preferred codon usage in the N genes. The results revealed that, overall, eighteen amino acids had different preferred codons and eight of these were over-biased. The N genes had a higher AT% over GC% and the values of their effective number of codons ranged from 40.43 to 53.85, indicating a slight codon bias. Neutrality plots and correlation analyses showed a very high level of GC3s/GC correlation in porcine epidemic diarrhea CoV (pedCoV), followed by Middle East respiratory syndrome-CoV (MERS CoV), porcine delta CoV (dCoV), bat CoV (bCoV) and feline CoV (fCoV) with r values 0.81, 0.68, -0.47, 0.98 and 0.58, respectively. These data implied a high rate of evolution of the CoV genomes and a strong influence of mutation on evolutionary selection in the CoV N genes. This type of genetic analysis would be useful for evaluating a virus’s host adaptation, evolution and is thus of value to vaccine design strategies.
Collapse
Affiliation(s)
- Abdullah Sheikh
- The Camel Research Center, King Faisal University, Alhofuf, Alahsa 31982, Saudi Arabia
| | - Abdulla Al-Taher
- Department of Biomedical Sciences, College of Veterinary Medicine, King Faisal University, Alhofuf, Alahsa 31982, Saudi Arabia
| | - Mohammed Al-Nazawi
- Department of Biomedical Sciences, College of Veterinary Medicine, King Faisal University, Alhofuf, Alahsa 31982, Saudi Arabia
| | - Abdullah I Al-Mubarak
- Department of Microbiology, College of Veterinary Medicine, King Faisal University, Alhofuf, Alahsa 31982, Saudi Arabia
| | - Mahmoud Kandeel
- Department of Biomedical Sciences, College of Veterinary Medicine, King Faisal University, Alhofuf, Alahsa 31982, Saudi Arabia; Department of Pharmacology, Faculty of Veterinary Medicine, Kafrelsheikh University, Kafrelsheikh 33516, Egypt
| |
Collapse
|
14
|
Gu H, Chu DKW, Peiris M, Poon LLM. Multivariate analyses of codon usage of SARS-CoV-2 and other betacoronaviruses. Virus Evol 2020; 6:veaa032. [PMID: 32431949 DOI: 10.1101/2020.02.15.950568] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023] Open
Abstract
Coronavirus disease 2019 (COVID-19) is a global health concern as it continues to spread within China and beyond. The causative agent of this disease, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), belongs to the genus Betacoronavirus, which also includes severe acute respiratory syndrome-related coronavirus (SARSr-CoV) and Middle East respiratory syndrome-related coronavirus (MERSr-CoV). Codon usage of viral genes are believed to be subjected to different selection pressures in different host environments. Previous studies on codon usage of influenza A viruses helped identify viral host origins and evolution trends, however, similar studies on coronaviruses are lacking. In this study, we compared the codon usage bias using global correspondence analysis (CA), within-group CA and between-group CA. We found that the bat RaTG13 virus best matched the overall codon usage pattern of SARS-CoV-2 in orf1ab, spike and nucleocapsid genes, while the pangolin P1E virus had a more similar codon usage in membrane gene. The amino acid usage pattern of SARS-CoV-2 was generally found similar to bat and human SARSr-CoVs. However, we found greater synonymous codon usage differences between SARS-CoV-2 and its phylogenetic relatives on spike and membrane genes, suggesting these two genes of SARS-CoV-2 are subjected to different evolutionary pressures.
Collapse
Affiliation(s)
- Haogao Gu
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR
| | - Daniel K W Chu
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR
| | - Malik Peiris
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR
| | - Leo L M Poon
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR
| |
Collapse
|
15
|
Gu H, Chu DKW, Peiris M, Poon LLM. Multivariate analyses of codon usage of SARS-CoV-2 and other betacoronaviruses. Virus Evol 2020; 6:veaa032. [PMID: 32431949 PMCID: PMC7223271 DOI: 10.1093/ve/veaa032] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Coronavirus disease 2019 (COVID-19) is a global health concern as it continues to spread within China and beyond. The causative agent of this disease, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), belongs to the genus Betacoronavirus, which also includes severe acute respiratory syndrome-related coronavirus (SARSr-CoV) and Middle East respiratory syndrome-related coronavirus (MERSr-CoV). Codon usage of viral genes are believed to be subjected to different selection pressures in different host environments. Previous studies on codon usage of influenza A viruses helped identify viral host origins and evolution trends, however, similar studies on coronaviruses are lacking. In this study, we compared the codon usage bias using global correspondence analysis (CA), within-group CA and between-group CA. We found that the bat RaTG13 virus best matched the overall codon usage pattern of SARS-CoV-2 in orf1ab, spike and nucleocapsid genes, while the pangolin P1E virus had a more similar codon usage in membrane gene. The amino acid usage pattern of SARS-CoV-2 was generally found similar to bat and human SARSr-CoVs. However, we found greater synonymous codon usage differences between SARS-CoV-2 and its phylogenetic relatives on spike and membrane genes, suggesting these two genes of SARS-CoV-2 are subjected to different evolutionary pressures.
Collapse
Affiliation(s)
- Haogao Gu
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR
| | - Daniel K W Chu
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR
| | - Malik Peiris
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR
| | - Leo L M Poon
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR
| |
Collapse
|
16
|
Shi SL, Xia RX. Codon Usage in the Iflaviridae Family Is Not Diverse Though the Family Members Are Isolated from Diverse Host Taxa. Viruses 2019; 11:E1087. [PMID: 31766648 PMCID: PMC6950266 DOI: 10.3390/v11121087] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Revised: 11/17/2019] [Accepted: 11/20/2019] [Indexed: 12/12/2022] Open
Abstract
All iflavirus members belong to the unique genus, Iflavirus, of the family, Iflaviridae. The host taxa and sequence identities of these viruses are diverse. A codon usage bias, maintained by a balance between selection, mutation, and genetic drift, exists in a wide variety of organisms. We characterized the codon usage patterns of 44 iflavirus genomes that were isolated from the classes, Insecta, Arachnida, Mammalia, and Malacostraca. Iflaviruses lack a strong codon usage bias when they are evaluated using an effective number of codons. The odds ratios of the majority of dinucleotides are within the normal range. However, the dinucleotides at the 1st-2nd codon positions are more biased than those at the 2nd-3rd codon positions. Plots of effective numbers of codons, relative neutrality analysis, and PR2 bias analysis all indicate that selection pressure dominates mutations in shaping codon usage patterns in the family, Iflaviridae. When these viruses were grouped into their host taxa, we found that the indices, including the nucleotide composition, effective number of codons, relative synonymous codon usage, and the influencing factors behind the codon usage patterns, all show that there are non-significant differences between the six host-taxa-groups. Our results disagree with our assumption that diverse viruses should possess diverse codon usage patterns, suggesting that the nucleotide composition and codon usage in the family, Iflaviridae, are not host taxa-specific signatures.
Collapse
Affiliation(s)
| | - Run-Xi Xia
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China;
| |
Collapse
|
17
|
LaBella AL, Opulente DA, Steenwyk JL, Hittinger CT, Rokas A. Variation and selection on codon usage bias across an entire subphylum. PLoS Genet 2019; 15:e1008304. [PMID: 31365533 PMCID: PMC6701816 DOI: 10.1371/journal.pgen.1008304] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 08/20/2019] [Accepted: 07/11/2019] [Indexed: 01/04/2023] Open
Abstract
Variation in synonymous codon usage is abundant across multiple levels of organization: between codons of an amino acid, between genes in a genome, and between genomes of different species. It is now well understood that variation in synonymous codon usage is influenced by mutational bias coupled with both natural selection for translational efficiency and genetic drift, but how these processes shape patterns of codon usage bias across entire lineages remains unexplored. To address this question, we used a rich genomic data set of 327 species that covers nearly one third of the known biodiversity of the budding yeast subphylum Saccharomycotina. We found that, while genome-wide relative synonymous codon usage (RSCU) for all codons was highly correlated with the GC content of the third codon position (GC3), the usage of codons for the amino acids proline, arginine, and glycine was inconsistent with the neutral expectation where mutational bias coupled with genetic drift drive codon usage. Examination between genes’ effective numbers of codons and their GC3 contents in individual genomes revealed that nearly a quarter of genes (381,174/1,683,203; 23%), as well as most genomes (308/327; 94%), significantly deviate from the neutral expectation. Finally, by evaluating the imprint of translational selection on codon usage, measured as the degree to which genes’ adaptiveness to the tRNA pool were correlated with selective pressure, we show that translational selection is widespread in budding yeast genomes (264/327; 81%). These results suggest that the contribution of translational selection and drift to patterns of synonymous codon usage across budding yeasts varies across codons, genes, and genomes; whereas drift is the primary driver of global codon usage across the subphylum, the codon bias of large numbers of genes in the majority of genomes is influenced by translational selection. Synonymous mutations in genes have no effect on the encoded proteins and were once thought to be evolutionarily neutral. By examining codon usage bias across codons, genes, and genomes of 327 species in the budding yeast subphylum, we show that synonymous codon usage is shaped by both neutral processes and selection for translational efficiency. Specifically, whereas codon usage bias for most codons appears to be strongly associated with mutational bias and largely driven by genetic drift across the entire subphylum, patterns of codon usage bias in a few codons, as well as in many genes in nearly all genomes of budding yeasts, deviate from neutral expectations. Rather, the synonymous codons used within genes in most budding yeast genomes are adapted to the tRNAs present within each genome, a result most likely due to translational selection that optimizes codons to match the tRNAs. Our results suggest that patterns of codon usage bias in budding yeasts, and perhaps more broadly in fungi and other microbial eukaryotes, are shaped by both neutral and selective processes.
Collapse
Affiliation(s)
- Abigail L. LaBella
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Dana A. Opulente
- Laboratory of Genetics, Genome Center of Wisconsin, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin–Madison, Wisconsin, United States of America
| | - Jacob L. Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Chris Todd Hittinger
- Laboratory of Genetics, Genome Center of Wisconsin, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin–Madison, Wisconsin, United States of America
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
18
|
Pérez-Cataluña A, Salas-Massó N, Diéguez AL, Balboa S, Lema A, Romalde JL, Figueras MJ. Revisiting the Taxonomy of the Genus Arcobacter: Getting Order From the Chaos. Front Microbiol 2018; 9:2077. [PMID: 30233547 PMCID: PMC6131481 DOI: 10.3389/fmicb.2018.02077] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Accepted: 08/14/2018] [Indexed: 11/16/2022] Open
Abstract
Since the description of the genus Arcobacter in 1991, a total of 27 species have been described, although some species have shown 16S rRNA similarities below 95%, which is the cut-off that usually separates species that belong to different genera. The objective of the present study was to reassess the taxonomy of the genus Arcobacter using information derived from the core genome (286 genes), a Multilocus Sequence Analysis (MLSA) with 13 housekeeping genes, as well as different genomic indexes like Average Nucleotide Identity (ANI), in silico DNA–DNA hybridization (isDDH), Average Amino-acid Identity (AAI), Percentage of Conserved Proteins (POCPs), and Relative Synonymous Codon Usage (RSCU). The study included a total of 39 strains that represent all the 27 species included in the genus Arcobacter together with 13 strains that are potentially new species, and the analysis of 57 genomes. The different phylogenetic analyses showed that the Arcobacter species grouped into four clusters. In addition, A. lekithochrous and the candidatus species ‘A. aquaticus’ appeared, as did A. nitrofigilis, the type species of the genus, in separate branches. Furthermore, the genomic indices ANI and isDDH not only confirmed that all the species were well-defined, but also the coherence of the clusters. The AAI and POCP values showed intra-cluster ranges above the respective cut-off values of 60% and 50% described for species belonging to the same genus. Phenotypic analysis showed that certain test combinations could allow the differentiation of the four clusters and the three orphan species established by the phylogenetic and genomic analyses. The origin of the strains showed that each of the clusters embraced species recovered from a common or related environment. The results obtained enable the division of the current genus Arcobacter in at least seven different genera, for which the names Arcobacter, Aliiarcobacter gen. nov., Pseudoarcobacter gen. nov., Haloarcobacter gen. nov., Malacobacter gen. nov., Poseidonibacter gen. nov., and Candidate ‘Arcomarinus’ gen. nov. are proposed.
Collapse
Affiliation(s)
- Alba Pérez-Cataluña
- Departament de Ciències Mèdiques Bàsiques, Facultat de Medicina, Institut d'Investigació Sanitària Pere Virgili, Universitat Rovira i Virgili, Reus, Spain
| | - Nuria Salas-Massó
- Departament de Ciències Mèdiques Bàsiques, Facultat de Medicina, Institut d'Investigació Sanitària Pere Virgili, Universitat Rovira i Virgili, Reus, Spain
| | - Ana L Diéguez
- Departamento de Microbiología y Parasitología, CIBUS-Facultad de Biología, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Sabela Balboa
- Departamento de Microbiología y Parasitología, CIBUS-Facultad de Biología, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Alberto Lema
- Departamento de Microbiología y Parasitología, CIBUS-Facultad de Biología, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Jesús L Romalde
- Departamento de Microbiología y Parasitología, CIBUS-Facultad de Biología, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Maria J Figueras
- Departament de Ciències Mèdiques Bàsiques, Facultat de Medicina, Institut d'Investigació Sanitària Pere Virgili, Universitat Rovira i Virgili, Reus, Spain
| |
Collapse
|
19
|
Vasanthi S, Dass JFP. Comparative genome-wide analysis of codon usage of different bacterial species infecting Oryza sativa. J Cell Biochem 2018; 119:9346-9356. [PMID: 30105828 DOI: 10.1002/jcb.27214] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2018] [Accepted: 06/13/2018] [Indexed: 11/05/2022]
Abstract
Oryza sativa is vastly affected by microbial pathogen, causing blight-related diseases, which in turn deplete the growth and productivity of rice. In this study, we analyzed four bacterial rice pathogen genomes and reported on their codon usage that might have greater implication in mutation-related research. Differential codon usage indices, such as codon adaptation index (CAI), codon bias index (CBI), effective number of codons (ENc), relative synonymous codon usage (RSCU), correspondence analysis (COA), and parity plots, were applied on coding sequences of Pseudomonas fuscovaginae, Pseudomonas syringae, Xanthomonas oryzae, and Pseudomonas avenae speices. The RSCU results proposed a high-frequency usage of CUG and CGC that codes for leucine and arginine in all of the species. The CBI and CAI values between the genomes range from 0.17 to 0.3 and from 0.26 to 0.35, respectively, indicating a direct proportionality between these indexes. The mean ENc value of P. avenae coding sequence showed high codon bias compared with other genomes. The axis I variation from COA analysis shows a mean value of 42.28% codon variations in these bacterial species. Correlation studies between axis I and ENc-GC3, along with CAI and CBI, suggested the presence of nucleotide bias and mutational pressure as major forces for codon bias within these species. Hence, certain genes with high CAI-CBI have been correlated for better gene expression. Our study highlights the importance of nucleotide biasness, mutation pressure, and natural selection in shaping protein-coding genes in these four rice-affecting bacteria. This would further help in investigating the evolution of pathogenic gene families, which may direct research toward synthetic genes that could be suppressed or overrepresented based on their codon usage pattern toward pathogenicity.
Collapse
Affiliation(s)
- S Vasanthi
- Department of Integrative Biology, School of Biosciences and Technology, VIT, Vellore, Tamil Nadu, India
| | - J Febin Prabhu Dass
- Department of Integrative Biology, School of Biosciences and Technology, VIT, Vellore, Tamil Nadu, India
| |
Collapse
|
20
|
Gene expression, nucleotide composition and codon usage bias of genes associated with human Y chromosome. Genetica 2017; 145:295-305. [PMID: 28421323 DOI: 10.1007/s10709-017-9965-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 04/08/2017] [Indexed: 10/19/2022]
Abstract
Analysis of codon usage pattern is important to understand the genetic and evolutionary characteristics of genomes. We have used bioinformatic approaches to analyze the codon usage bias (CUB) of the genes located in human Y chromosome. Codon bias index (CBI) indicated that the overall extent of codon usage bias was low. The relative synonymous codon usage (RSCU) analysis suggested that approximately half of the codons out of 59 synonymous codons were most frequently used, and possessed a T or G at the third codon position. The codon usage pattern was different in different genes as revealed from correspondence analysis (COA). A significant correlation between effective number of codons (ENC) and various GC contents suggests that both mutation pressure and natural selection affect the codon usage pattern of genes located in human Y chromosome. In addition, Y-linked genes have significant difference in GC contents at the second and third codon positions, expression level, and codon usage pattern of some codons like the SPANX genes in X chromosome.
Collapse
|
21
|
Shi SL, Jiang YR, Yang RS, Wang Y, Qin L. Codon usage in Alphabaculovirus and Betabaculovirus hosted by the same insect species is weak, selection dominated and exhibits no more similar patterns than expected. INFECTION GENETICS AND EVOLUTION 2016; 44:412-417. [PMID: 27484795 PMCID: PMC7106102 DOI: 10.1016/j.meegid.2016.07.042] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2016] [Revised: 07/16/2016] [Accepted: 07/29/2016] [Indexed: 11/26/2022]
Abstract
Mutations shape synonymous codon usage bias in certain organism genomes, while selection shapes it in others. Lepidopteran-specific Alphabaculovirus and Betabaculovirus are two large genera in the family of Baculoviridae. In this study, we analyzed the codon usage patterns in 17 baculoviruses, including 10 alphabaculoviruses and 7 betabaculoviruses, which were isolated from seven insect species, and we characterized the codon usage patterns between Alphabaculovirus and Betabaculovirus. Our results show that all the baculoviruses possessed a general weak trend of codon bias. The differences of ENc (effective number of codons) values, nucleotide contents and the impacts of nucleotide content on ENc value within alpha-/betabaculovirus pairs were independent of whether the host species are the same or different. Furthermore, the majority of amino acid sequences adopted codons unequally in all viruses, but the numbers of common preferred codons between alpha- and betabaculoviruses hosted by the same insect species were not significantly different from the differences observed between alpha- and betabaculoviruses hosted by different insect species. In addition, the amino acids that adopt the same synonymous codon composition between alpha- and betabaculoviruses hosted by the same insect species were statistically as few as those between alpha- and betabaculoviruses hosted by different insect species. Correspondence analysis revealed that no major factors resulted in the codon bias in these baculoviruses, implying multiple minor influential factors exist. Neutrality plot analysis indicated that selection pressure dominated mutations in shaping the codon usage. However, the levels of selection pressure were not significantly different among viruses hosted by the same insect species. We expect that evolution would cause the alpha- and betabaculoviruses hosted by the same insect species to share more patterns, but this effect was not observed.
Collapse
Affiliation(s)
- Sheng-Lin Shi
- Insect Resource Engineering Research Center of Liaoning Province, College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China.
| | - Yi-Ren Jiang
- Insect Resource Engineering Research Center of Liaoning Province, College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China.
| | - Rui-Sheng Yang
- Insect Resource Engineering Research Center of Liaoning Province, College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China.
| | - Yong Wang
- Insect Resource Engineering Research Center of Liaoning Province, College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China.
| | - Li Qin
- Insect Resource Engineering Research Center of Liaoning Province, College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China.
| |
Collapse
|
22
|
Genome-Wide Analysis of Codon Usage Bias in Epichloë festucae. Int J Mol Sci 2016; 17:ijms17071138. [PMID: 27428961 PMCID: PMC4964511 DOI: 10.3390/ijms17071138] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Revised: 06/24/2016] [Accepted: 07/08/2016] [Indexed: 11/17/2022] Open
Abstract
Analysis of codon usage data has both practical and theoretical applications in understanding the basics of molecular biology. Differences in codon usage patterns among genes reflect variations in local base compositional biases and the intensity of natural selection. Recently, there have been several reports related to codon usage in fungi, but little is known about codon usage bias in Epichloë endophytes. The present study aimed to assess codon usage patterns and biases in 4870 sequences from Epichloë festucae, which may be helpful in revealing the constraint factors such as mutation or selection pressure and improving the bioreactor on the cloning, expression, and characterization of some special genes. The GC content with 56.41% is higher than the AT content (43.59%) in E. festucae. The results of neutrality and effective number of codons plot analyses showed that both mutational bias and natural selection play roles in shaping codon usage in this species. We found that gene length is strongly correlated with codon usage and may contribute to the codon usage patterns observed in genes. Nucleotide composition and gene expression levels also shape codon usage bias in E. festucae. E. festucae exhibits codon usage bias based on the relative synonymous codon usage (RSCU) values of 61 sense codons, with 25 codons showing an RSCU larger than 1. In addition, we identified 27 optimal codons that end in a G or C.
Collapse
|
23
|
Tekaia F. Genome Data Exploration Using Correspondence Analysis. Bioinform Biol Insights 2016; 10:59-72. [PMID: 27279736 PMCID: PMC4898644 DOI: 10.4137/bbi.s39614] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Revised: 04/12/2016] [Accepted: 04/14/2016] [Indexed: 01/14/2023] Open
Abstract
Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results, particularly by the ability of relating individual patterns with their corresponding characteristic variables.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
24
|
Abstract
Codon adaptation is codon usage bias that results from selective pressure to increase the translation efficiency of a gene. Codon adaptation has been studied across a wide range of genomes and some early analyses of plastids have shown evidence for codon adaptation in a limited set of highly expressed plastid genes. Here we study codon usage bias across all fully sequenced plastid genomes which includes representatives of the Rhodophyta, Alveolata, Cryptophyta, Euglenozoa, Glaucocystophyceae, Rhizaria, Stramenopiles and numerous lineages within the Viridiplantae, including Chlorophyta and Embryophyta. We show evidence that codon adaptation occurs in all genomes except for two, Theileria parva and Heicosporidium sp., both of which have highly reduced gene contents and no photosynthesis genes. We also show evidence that selection for codon adaptation increases the representation of the same set of codons, which we refer to as the adaptive codons, across this wide range of taxa, which is probably due to common features descended from the initial endosymbiont. We use various measures to estimate the relative strength of selection in the different lineages and show that it appears to be fairly strong in certain Stramenopiles and Chlorophyta lineages but relatively weak in many members of the Rhodophyta, Euglenozoa and Embryophyta. Given these results we propose that codon adaptation in plastids is widespread and displays the same general features as adaptation in eubacterial genomes.
Collapse
Affiliation(s)
- Haruo Suzuki
- Graduate School of Science and Engineering, Yamaguchi University, Yamaguchi, Japan
| | - Brian R. Morton
- Department of Biology, Barnard College, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
25
|
Li N, Li Y, Zheng C, Huang J, Zhang S. Genome-wide comparative analysis of the codon usage patterns in plants. Genes Genomics 2016. [DOI: 10.1007/s13258-016-0417-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
26
|
Nucleotide composition bias and codon usage trends of gene populations in Mycoplasma capricolum subsp. capricolum and M. Agalactiae. J Genet 2016; 94:251-60. [PMID: 26174672 DOI: 10.1007/s12041-015-0512-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Because of the low GC content of the gene population, amino acids of the two mycoplasmas tend to be encoded by synonymous codons with an A or T end. Compared with the codon usage of ovine, Mycoplasma capricolum and M. agalactiae tend to select optimal codons, which are rare codons in ovine. Due to codon usage pattern caused by genes with key biological functions, the overall codon usage trends represent a certain evolutionary direction in the life cycle of the two mycoplasmas. The overall codon usage trends of a gene population of M. capricolum subsp. capricolum can be obviously separated from other mycoplasmas, and the overall codon usage trends of M. agalactiae are highly similar to those of M. bovis. These results partly indicate the independent evolution of the two mycoplasmas without the limits of the host cell's environment. The GC and AT skews estimate nucleotide composition bias at different positions of nucleotide triplets and the protein consideration caused by the nucleotide composition bias at codon positions 1 and 2 largely take part in synonymous codon usage patterns of the two mycoplasmas. The correlation between the codon adaptation index and codon usage variation indicates that the effect of codon usage on gene expression in M. capricolum subsp. capricolum is opposite to that of M. agalactiae, further suggesting independence of the evolutionary process influencing the overall codon usage trends of gene populations of mycoplasmas.
Collapse
|
27
|
Dohra H, Fujishima M, Suzuki H. Analysis of amino acid and codon usage in Paramecium bursaria. FEBS Lett 2015; 589:3113-8. [PMID: 26341535 DOI: 10.1016/j.febslet.2015.08.033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Revised: 08/20/2015] [Accepted: 08/21/2015] [Indexed: 01/28/2023]
Abstract
The ciliate Paramecium bursaria harbors the green-alga Chlorella symbionts. We reassembled the P. bursaria transcriptome to minimize falsely fused transcripts, and investigated amino acid and codon usage using the transcriptome data. Surface proteins preferentially use smaller amino acid residues like cysteine. Unusual synonymous codon and amino acid usage in highly expressed genes can reflect a balance between translational selection and other factors. A correlation of gene expression level with synonymous codon or amino acid usage is emphasized in genes down-regulated in symbiont-bearing cells compared to symbiont-free cells. Our results imply that the selection is associated with P. bursaria-Chlorella symbiosis.
Collapse
Affiliation(s)
- Hideo Dohra
- Instrumental Research Support Office, Research Institute of Green Science and Technology, Shizuoka University, 836 Ohya, Suruga-ku, Shizuoka 422-8529, Japan; Department of Biological Science, Graduate School of Science, Shizuoka University, 836 Ohya, Suruga-ku, Shizuoka 422-8529, Japan
| | - Masahiro Fujishima
- Department of Environmental Science and Engineering, Graduate School of Science and Engineering, Yamaguchi University, 1677-1 Yoshida, Yamaguchi 753-8512, Japan; National Bio-Resource Project of Japan Agency for Medical Research and Development, Japan
| | - Haruo Suzuki
- Department of Environmental Science and Engineering, Graduate School of Science and Engineering, Yamaguchi University, 1677-1 Yoshida, Yamaguchi 753-8512, Japan.
| |
Collapse
|
28
|
Dissecting signal and noise in diatom chloroplast protein encoding genes with phylogenetic information profiling. Mol Phylogenet Evol 2015; 89:28-36. [DOI: 10.1016/j.ympev.2015.03.012] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 03/10/2015] [Accepted: 03/13/2015] [Indexed: 11/20/2022]
|
29
|
Félez-Sánchez M, Trösemeier JH, Bedhomme S, González-Bravo MI, Kamp C, Bravo IG. Cancer, Warts, or Asymptomatic Infections: Clinical Presentation Matches Codon Usage Preferences in Human Papillomaviruses. Genome Biol Evol 2015; 7:2117-35. [PMID: 26139833 PMCID: PMC4558848 DOI: 10.1093/gbe/evv129] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Viruses rely completely on the hosts’ machinery for translation of viral transcripts. However, for most viruses infecting humans, codon usage preferences (CUPrefs) do not match those of the host. Human papillomaviruses (HPVs) are a showcase to tackle this paradox: they present a large genotypic diversity and a broad range of phenotypic presentations, from asymptomatic infections to productive lesions and cancer. By applying phylogenetic inference and dimensionality reduction methods, we demonstrate first that genes in HPVs are poorly adapted to the average human CUPrefs, the only exception being capsid genes in viruses causing productive lesions. Phylogenetic relationships between HPVs explained only a small proportion of CUPrefs variation. Instead, the most important explanatory factor for viral CUPrefs was infection phenotype, as orthologous genes in viruses with similar clinical presentation displayed similar CUPrefs. Moreover, viral genes with similar spatiotemporal expression patterns also showed similar CUPrefs. Our results suggest that CUPrefs in HPVs reflect either variations in the mutation bias or differential selection pressures depending on the clinical presentation and expression timing. We propose that poor viral CUPrefs may be central to a trade-off between strong viral gene expression and the potential for eliciting protective immune response.
Collapse
Affiliation(s)
- Marta Félez-Sánchez
- Infections and Cancer Laboratory, Catalan Institute of Oncology, L'Hospitalet de Llobregat, Barcelona, Spain Virus and Cancer Laboratory. Bellvitge Institute of Biomedical Research (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Spain
| | - Jan-Hendrik Trösemeier
- Molecular Bioinformatics, Institute of Computer Science, Johann Wolfgang Goethe University, Frankfurt am Main, Germany Paul-Ehrlich-Institut, Federal Institute for Vaccines and Biomedicines, Langen, Germany
| | - Stéphanie Bedhomme
- Infections and Cancer Laboratory, Catalan Institute of Oncology, L'Hospitalet de Llobregat, Barcelona, Spain Virus and Cancer Laboratory. Bellvitge Institute of Biomedical Research (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Spain Département d'Ecologie Evolutive Centre d'Ecologie Fonctionnelle et Evolutive, CNRS - UMR 5175, Montpellier, France
| | | | - Christel Kamp
- Paul-Ehrlich-Institut, Federal Institute for Vaccines and Biomedicines, Langen, Germany
| | - Ignacio G Bravo
- Infections and Cancer Laboratory, Catalan Institute of Oncology, L'Hospitalet de Llobregat, Barcelona, Spain Virus and Cancer Laboratory. Bellvitge Institute of Biomedical Research (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Spain
| |
Collapse
|
30
|
Foroughmand-Araabi MH, Goliaei B, Alishahi K, Sadeghi M, Goliaei S. Codon usage and protein sequence pattern dependency in different organisms: A Bioinformatics approach. J Bioinform Comput Biol 2014; 13:1550002. [PMID: 25409941 DOI: 10.1142/s021972001550002x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Although it is known that synonymous codons are not chosen randomly, the role of the codon usage in gene regulation is not clearly understood, yet. Researchers have investigated the relation between the codon usage and various properties, such as gene regulation, translation rate, translation efficiency, mRNA stability, splicing, and protein domains. Recently, a universal codon usage based mechanism for gene regulation is proposed. We studied the role of protein sequence patterns on the codons usage by related genes. Considering a subsequence of a protein that matches to a pattern or motif, we showed that, parts of the genes, which are translated to this subsequence, use specific ratios of synonymous codons. Also, we built a multinomial logistic regression statistical model for codon usage, which considers the effect of patterns on codon usage. This model justifies the observed codon usage preference better than the classic organism dependent codon usage. Our results showed that the codon usage plays a role in controlling protein levels, for genes that participate in a specific biological function. This is the first time that this phenomenon is reported.
Collapse
|
31
|
Large-scale genomic analysis of codon usage in dengue virus and evaluation of its phylogenetic dependence. BIOMED RESEARCH INTERNATIONAL 2014; 2014:851425. [PMID: 25136631 PMCID: PMC4124757 DOI: 10.1155/2014/851425] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 06/05/2014] [Accepted: 06/11/2014] [Indexed: 12/04/2022]
Abstract
The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution.
Collapse
|
32
|
Dalmasso MC, Carmona SJ, Angel SO, Agüero F. Characterization of Toxoplasma gondii subtelomeric-like regions: identification of a long-range compositional bias that is also associated with gene-poor regions. BMC Genomics 2014; 15:21. [PMID: 24417889 PMCID: PMC4008256 DOI: 10.1186/1471-2164-15-21] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Accepted: 01/02/2014] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Chromosome ends are composed of telomeric repeats and subtelomeric regions, which are patchworks of genes interspersed with repeated elements. Although chromosome ends display similar arrangements in different species, their sequences are highly divergent. In addition, these regions display a particular nucleosomal composition and bind specific factors, therefore producing a special kind of heterochromatin. Using data from currently available draft genomes we have characterized these putative Telomeric Associated Sequences in Toxoplasma gondii. RESULTS An all-vs-all pairwise comparison of T. gondii assembled chromosomes revealed the presence of conserved regions of ∼ 30 Kb located near the ends of 9 of the 14 chromosomes of the genome of the ME49 strain. Sequence similarity among these regions is ∼ 70%, and they are also highly conserved in the GT1 and VEG strains. However, they are unique to Toxoplasma with no detectable similarity in other Apicomplexan parasites. The internal structure of these sequences consists of 3 repetitive regions separated by high-complexity sequences without annotated genes, except for a gene from the Toxoplasma Specific Family. ChIP-qPCR experiments showed that nucleosomes associated to these sequences are enriched in histone H4 monomethylated at K20 (H4K20me1), and the histone variant H2A.X, suggesting that they are silenced sequences (heterochromatin). A detailed characterization of the base composition of these sequences, led us to identify a strong long-range compositional bias, which was similar to that observed in other genomic silenced fragments such as those containing centromeric sequences, and was negatively correlated to gene density. CONCLUSIONS We identified and characterized a region present in most Toxoplasma assembled chromosomes. Based on their location, sequence features, and nucleosomal markers we propose that these might be part of subtelomeric regions of T. gondii. The identified regions display a unique trinucleotide compositional bias, which is shared (despite the lack of any detectable sequence similarity) with other silenced sequences, such as those making up the chromosome centromeres. We also identified other genomic regions with this compositional bias (but no detectable sequence similarity) that might be functionally similar.
Collapse
Affiliation(s)
| | | | - Sergio O Angel
- Instituto de Investigaciones Biotecnológicas - Instituto Tecnológico de Chascomús, UNSAM - CONICET, Sede Chascomús, Av, Intendente Marino Km 8, 2 CC 164, B 7130 IWA, Chascomús, Argentina.
| | | |
Collapse
|
33
|
Foroughmand-Araabi MH, Goliaei B, Alishahi K, Sadeghi M. Dependency of codon usage on protein sequence patterns: a statistical study. Theor Biol Med Model 2014; 11:2. [PMID: 24410898 PMCID: PMC3896713 DOI: 10.1186/1742-4682-11-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2013] [Accepted: 01/03/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Codon degeneracy and codon usage by organisms is an interesting and challenging problem. Researchers demonstrated the relation between codon usage and various functions or properties of genes and proteins, such as gene regulation, translation rate, translation efficiency, mRNA stability, splicing, and protein domains. Researchers usually represent segments of proteins responsible for specific functions or structures in a family of proteins as sequence patterns or motifs. We asked the question if organisms use the same codons in pattern segments as compared to the rest of the sequence. METHODS We used the likelihood ratio test, Pearson's chi-squared test, and mutual information to compare these two codon usages. RESULTS We showed that codon usage, in segments of genes that code for a given pattern or motif in a group of proteins, varied from the rest of the gene. The codon usage in these segments was not random. Amino acids with larger number of codons used more specific codon ratios in these segments. We studied the number of amino acids in the pattern (pattern length). As patterns got longer, there was a slight decrease in the fraction of patterns with significant different codon usage in the pattern region as compared to codon usage in the gene region. We defined a measure of specificity of protein patterns, and studied its relation to the codon usage. The difference in the codon usage between pattern region and gene region, was less for the patterns with higher specificity. CONCLUSIONS We provided a hypothesis that there are segments on genes that affect the codon usage and thus influence protein translation speed, and these regions are the regions that code protein pattern regions.
Collapse
Affiliation(s)
| | - Bahram Goliaei
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| | | | | |
Collapse
|
34
|
Selection on GGU and CGU codons in the high expression genes in bacteria. J Mol Evol 2013; 78:13-23. [PMID: 24271854 DOI: 10.1007/s00239-013-9596-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Accepted: 11/11/2013] [Indexed: 12/22/2022]
Abstract
The fourfold degenerate site (FDS) in coding sequences is important for studying the effect of any selection pressure on codon usage bias (CUB) because nucleotide substitution per se is not under any such pressure at the site due to the unaltered amino acid sequence in a protein. We estimated the frequency variation of nucleotides at the FDS across the eight family boxes (FBs) defined as Um(g), the unevenness measure of a gene g. The study was made in 545 species of bacteria. In many bacteria, the Um(g) correlated strongly with Nc'-a measure of the CUB. Analysis of the strongly correlated bacteria revealed that the U-ending codons (GGU, CGU) were preferred to the G-ending codons (GGG, CGG) in Gly and Arg FBs even in the genomes with G+C % higher than 65.0. Further evidence suggested that these codons can be used as a good indicator of selection pressure on CUB in genomes with higher G+C %.
Collapse
|
35
|
Iriarte A, Baraibar JD, Romero H, Castro-Sowinski S, Musto H. Evolution of optimal codon choices in the family Enterobacteriaceae. MICROBIOLOGY-SGM 2013; 159:555-564. [PMID: 23288542 DOI: 10.1099/mic.0.061952-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The Enterobacteriaceae are a large family of Proteobacteria that include many well-known prokaryotic genera, such as Escherichia, Yersinia and Salmonella. The main ideas of synonymous codon usage (CU) evolution and translational selection have been deeply influenced by studies with these bacterial groups. In this work we report the analysis of the CU pattern of completely sequenced bacterial genomes that belong to the Enterobacteriaceae. The effect of selection in translation acting at the levels of speed and accuracy, and phylogenetic trends within this group are described. Preferred (optimal) codons were identified. The evolutionary dynamics of these codons were studied and following a Bayesian approach these preferences were traced back to the common ancestor of the family. We found that there is some level of variation in selection among the analysed micro-organisms that is probably associated with lineage-specific trends. The codon bias was largely conserved across the evolutionary time of the family in highly expressed genes and protein conserved regions, suggesting a major role of negative selection. In this sense, the results support the idea that the extant CU bias is finely tuned over the ancestral well-conserved pool of tRNAs.
Collapse
Affiliation(s)
- Andrés Iriarte
- Área Genética, Depto. de Genética y Mejora Animal, Facultad de Veterinaria (UDELAR), Av. A. Lasplaces 1550, CP 11600, Montevideo, Uruguay.,Laboratorio de Evolución, Facultad de Ciencias (UDELAR), Iguá 4225, 11400 Montevideo, Uruguay.,Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias (UDELAR), Iguá 4225, 11400 Montevideo, Uruguay
| | - Juan Diego Baraibar
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias (UDELAR), Iguá 4225, 11400 Montevideo, Uruguay
| | - Héctor Romero
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias (UDELAR), Iguá 4225, 11400 Montevideo, Uruguay
| | - Susana Castro-Sowinski
- Sección Bioquímica y Biología Molecular, Facultad de Ciencias (UDELAR), Iguá 4225, 11400 Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias (UDELAR), Iguá 4225, 11400 Montevideo, Uruguay
| |
Collapse
|
36
|
Liu X. A more accurate relationship between 'effective number of codons' and GC3s under assumptions of no selection. Comput Biol Chem 2012; 42:35-9. [PMID: 23257412 DOI: 10.1016/j.compbiolchem.2012.11.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2012] [Revised: 11/07/2012] [Accepted: 11/14/2012] [Indexed: 10/27/2022]
Abstract
The 'effective number of codons' (Nc) introduced by Frank Wright in 1990 is one of the best measures to show the state of codon usage biases in genes and genomes. Although estimate methods of Nc have been improved by several investigators since then, no one noticed that the relationship between Nc and GC3s under assumptions of no selection given by Wright has a little but significant deviation. Since the curve showing such a relationship in Nc-plot is a useful reference line to display the main features of codon usage pattern for a number of genes, its high accuracy is important and necessary. Under ideal and ultimate conditions listed in this text a computational sample of Nc versus GC3s was derived and calculated. By nonlinear regression analysis, the relationship between Nc and GC3s without synonymous codon selection can be approximated by: N(c)=2.5-s+29.5/(s(2)+(1-s)(2)), instead of Wright's: N(c)=2+s+29/(s(2)+(1-s)(2)), where s denotes GC3s. The goodness of fit analysis of both confirmed that the new formula presented in this text is more accurate than the original one. In addition, in the case of using the same estimate method of Nc, the situation in overestimation is decreased to a certain extent by using the new reference line in comparison with Wright's one.
Collapse
Affiliation(s)
- Xiong'en Liu
- School of Computer and Information, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| |
Collapse
|
37
|
Sanjukta R, Farooqi MS, Sharma N, Rai A, Mishra DC, Singh DP. Trends in the codon usage patterns of Chromohalobacter salexigens genes. Bioinformation 2012; 8:1087-95. [PMID: 23251043 PMCID: PMC3523223 DOI: 10.6026/97320630081087] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2012] [Accepted: 10/06/2012] [Indexed: 11/23/2022] Open
Abstract
Chromohalobacter salexigens, a Gammaproteobacterium belonging to the family Halomonadaceae, shows a broad salinity range for growth. In order to reveal the factors influencing architecture of protein coding genes in C. salexigens, pattern of synonymous codon usage bias has been investigated. Overall codon usage analysis of the microorganism revealed that C and G ending codons are predominantly used in all the genes which are indicative of mutational bias. Multivariate statistical analysis showed that the genes are separated along the first major explanatory axis according to their expression levels and their genomic GC content at the synonymous third positions of the codons. Both NC plot and correspondence analysis on Relative Synonymous Codon Usage (RSCU) indicates that the variation in codon usage among the genes may be due to mutational bias at the DNA level and natural selection acting at the level of mRNA translation. Gene length and the hydrophobicity of the encoded protein also influence the codon usage variation of genes to some extent. A comparison of the relative synonymous codon usage between 10% each of highly and lowly expressed genes determines 23 optimal codons, which are statistically over represented in the former group of genes and may provide useful information for salt-stressed gene prediction and gene-transformation. Furthermore, genes for regulatory functions; mobile and extrachromosomal element functions; and cell envelope are observed to be highly expressed. The study could provide insight into the gene expression response of halophilic bacteria and facilitate establishment of effective strategies to develop salt-tolerant crops of agronomic value.
Collapse
Affiliation(s)
- Rajkumari Sanjukta
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Pusa, New Delhi – 110 012
| | - Mohammad Samir Farooqi
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Pusa, New Delhi – 110 012
| | - Naveen Sharma
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Pusa, New Delhi – 110 012
| | - Anil Rai
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Pusa, New Delhi – 110 012
| | - Dwijesh Chandra Mishra
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, Pusa, New Delhi – 110 012
| | - Dhananjaya P Singh
- National Bureau of Agriculturally Important Microorganisms, Mau Nath Bhanjan, UP – 275 101
| |
Collapse
|
38
|
Shi SL, Jiang YR, Liu YQ, Xia RX, Qin L. Selective pressure dominates the synonymous codon usage in parvoviridae. Virus Genes 2012; 46:10-9. [PMID: 22996735 DOI: 10.1007/s11262-012-0818-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2012] [Accepted: 09/05/2012] [Indexed: 12/16/2022]
Abstract
Parvoviridae is a family of small non-enveloped viruses and divided into two subfamilies. The family members infect a wide range of organisms from insects to humans and some of the members (e.g., nonpathogenic adeno-associated viruses) are effective gene therapy delivery vectors. We detailed the synonymous codon usage pattern of Parvoviridae family from the available 58 sequenced genomes through multivariate statistical methods. Our results revealed that nine viruses showed some degree of strong codon bias, and the others possessed a general weak trend of codon bias. ENc-plot and neutrality plot results showed that selective pressure dominated over mutation in shapes coding sequence's composition. The overall GC content and GC content at the third synonymous codon position were the principal determinants behind the variations within the codon usage patterns, as they both significantly correlated with the first axis of correspondence analysis. In addition, gene length had no direct influence on the codon usage pattern. Densovirinae subfamily and Parvovirinae subfamily possessed nine identical preferred codons, though most of the two subfamilies codon usage frequencies were significantly different. The result of cluster analysis based on synonymous codon usage was discordant with that of taxonomic classification. Adeno-associated viruses formed a separated clade far from other Parvoviridae members in the dendrogram. Thus, we concluded that natural selection rather than mutation pressure accounts for the main factor that affects the codon bias in Parvoviridae family.
Collapse
Affiliation(s)
- Sheng-Lin Shi
- Postdoctoral Station of Plant Protection, Shenyang Agricultural University, No.120 Dongling Road, Shenyang, P.R.China.
| | | | | | | | | |
Collapse
|
39
|
Iriarte A, Sanguinetti M, Fernández-Calero T, Naya H, Ramón A, Musto H. Translational selection on codon usage in the genus Aspergillus. Gene 2012; 506:98-105. [DOI: 10.1016/j.gene.2012.06.027] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Revised: 05/09/2012] [Accepted: 06/15/2012] [Indexed: 10/28/2022]
|
40
|
Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J. Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics 2012; 13:43. [PMID: 22435713 PMCID: PMC3368730 DOI: 10.1186/1471-2105-13-43] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 03/22/2012] [Indexed: 02/07/2023] Open
Abstract
Background Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis. Results Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance. Conclusions As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.
Collapse
Affiliation(s)
- Zhang Zhang
- Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | | | | | | | | | | | | |
Collapse
|
41
|
Spencer PS, Barral JM. Genetic code redundancy and its influence on the encoded polypeptides. Comput Struct Biotechnol J 2012; 1:e201204006. [PMID: 24688635 PMCID: PMC3962081 DOI: 10.5936/csbj.201204006] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 02/29/2012] [Accepted: 03/10/2012] [Indexed: 11/22/2022] Open
Abstract
The genetic code is said to be redundant in that the same amino acid residue can be encoded by multiple, so-called synonymous, codons. If all properties of synonymous codons were entirely equivalent, one would expect that they would be equally distributed along protein coding sequences. However, many studies over the last three decades have demonstrated that their distribution is not entirely random. It has been postulated that certain codons may be translated by the ribosome faster than others and thus their non-random distribution dictates how fast the ribosome moves along particular segments of the mRNA. The reasons behind such segmental variability in the rates of protein synthesis, and thus polypeptide emergence from the ribosome, have been explored by theoretical and experimental approaches. Predictions of the relative rates at which particular codons are translated and their impact on the nascent chain have not arrived at unequivocal conclusions. This is probably due, at least in part, to variation in the basis for classification of codons as “fast” or “slow”, as well as variability in the number and types of genes and proteins analyzed. Recent methodological advances have allowed nucleotide-resolution studies of ribosome residency times in entire transcriptomes, which confirm the non-uniform movement of ribosomes along mRNAs and shed light on the actual determinants of rate control. Moreover, experiments have begun to emerge that systematically examine the influence of variations in ribosomal movement and the fate of the emerging polypeptide chain.
Collapse
Affiliation(s)
- Paige S Spencer
- Department of Biochemistry & Molecular Biology, The University of Texas Medical Branch, 301 University Blvd., Galveston, TX 77555-0620
| | - José M Barral
- Department of Biochemistry & Molecular Biology, The University of Texas Medical Branch, 301 University Blvd., Galveston, TX 77555-0620 ; Department of Neuroscience & Cell Biology, The University of Texas Medical Branch, 301 University Blvd., Galveston, TX 77555-0620 ; Sealy Center for Structural Biology and Molecular Biophysics, The University of Texas Medical Branch, 301 University Blvd., Galveston, TX 77555-0620
| |
Collapse
|
42
|
Mehmood T, Martens H, Sæbø S, Warringer J, Snipen L. A Partial Least Squares based algorithm for parsimonious variable selection. Algorithms Mol Biol 2011; 6:27. [PMID: 22142365 PMCID: PMC3287970 DOI: 10.1186/1748-7188-6-27] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Accepted: 12/05/2011] [Indexed: 11/15/2022] Open
Abstract
Background In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems. Results We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection. Conclusions A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.
Collapse
|
43
|
Qiu H, Hildebrand F, Kuraku S, Meyer A. Unresolved orthology and peculiar coding sequence properties of lamprey genes: the KCNA gene family as test case. BMC Genomics 2011; 12:325. [PMID: 21699680 PMCID: PMC3141671 DOI: 10.1186/1471-2164-12-325] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2010] [Accepted: 06/23/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In understanding the evolutionary process of vertebrates, cyclostomes (hagfishes and lamprey) occupy crucial positions. Resolving molecular phylogenetic relationships of cyclostome genes with gnathostomes (jawed vertebrates) genes is indispensable in deciphering both the species tree and gene trees. However, molecular phylogenetic analyses, especially those including lamprey genes, have produced highly discordant results between gene families. To efficiently scrutinize this problem using partial genome assemblies of early vertebrates, we focused on the potassium voltage-gated channel, shaker-related (KCNA) family, whose members are mostly single-exon. RESULTS Seven sea lamprey KCNA genes as well as six elephant shark genes were identified, and their orthologies to bony vertebrate subgroups were assessed. In contrast to robustly supported orthology of the elephant shark genes to gnathostome subgroups, clear orthology of any sea lamprey gene could not be established. Notably, sea lamprey KCNA sequences displayed unique codon usage pattern and amino acid composition, probably associated with exceptionally high GC-content in their coding regions. This lamprey-specific property of coding sequences was also observed generally for genes outside this gene family. CONCLUSIONS Our results suggest that secondary modifications of sequence properties unique to the lamprey lineage may be one of the factors preventing robust orthology assessments of lamprey genes, which deserves further genome-wide validation. The lamprey lineage-specific alteration of protein-coding sequence properties needs to be taken into consideration in tackling the key questions about early vertebrate evolution.
Collapse
Affiliation(s)
- Huan Qiu
- Department of Biology, University of Konstanz, Konstanz, Germany
| | | | | | | |
Collapse
|
44
|
Suzuki H, Lefébure T, Hubisz MJ, Pavinski Bitar P, Lang P, Siepel A, Stanhope MJ. Comparative genomic analysis of the Streptococcus dysgalactiae species group: gene content, molecular adaptation, and promoter evolution. Genome Biol Evol 2011; 3:168-85. [PMID: 21282711 PMCID: PMC3056289 DOI: 10.1093/gbe/evr006] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Comparative genomics of closely related bacterial species with different pathogenesis and host preference can provide a means of identifying the specifics of adaptive differences. Streptococcus dysgalactiae (SD) is comprised of two subspecies: S. dysgalactiae subsp. equisimilis is both a human commensal organism and a human pathogen, and S. dysgalactiae subsp. dysgalactiae is strictly an animal pathogen. Here, we present complete genome sequences for both taxa, with analyses involving other species of Streptococcus but focusing on adaptation in the SD species group. We found little evidence for enrichment in biochemical categories of genes carried by each SD strain, however, differences in the virulence gene repertoire were apparent. Some of the differences could be ascribed to prophage and integrative conjugative elements. We identified approximately 9% of the nonrecombinant core genome to be under positive selection, some of which involved known virulence factors in other bacteria. Analyses of proteomes by pooling data across genes, by biochemical category, clade, or branch, provided evidence for increased rates of evolution in several gene categories, as well as external branches of the tree. Promoters were primarily evolving under purifying selection but with certain categories of genes evolving faster. Many of these fast-evolving categories were the same as those associated with rapid evolution in proteins. Overall, these results suggest that adaptation to changing environments and new hosts in the SD species group has involved the acquisition of key virulence genes along with selection of orthologous protein-coding loci and operon promoters.
Collapse
Affiliation(s)
- Haruo Suzuki
- Department of Population Medicine and Diagnostic Sciences, College of Veterinary Medicine, Cornell University, Ithaca, New York
| | | | | | | | | | | | | |
Collapse
|
45
|
|
46
|
Welch M, Villalobos A, Gustafsson C, Minshull J. Designing genes for successful protein expression. Methods Enzymol 2011; 498:43-66. [PMID: 21601673 DOI: 10.1016/b978-0-12-385120-8.00003-6] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
DNA sequences are now far more readily available in silico than as physical DNA. De novo gene synthesis is an increasingly cost-effective method for building genetic constructs, and effectively removes the constraint of basing constructs on extant sequences. This allows scientists and engineers to experimentally test their hypotheses relating sequence to function. Molecular biologists, and now synthetic biologists, are characterizing and cataloging genetic elements with specific functions, aiming to combine them to perform complex functions. However, the most common purpose of synthetic genes is for the expression of an encoded protein. The huge number of different proteins makes it impossible to characterize and catalog each functional gene. Instead, it is necessary to abstract design principles from experimental data: data that can be generated by making predictions followed by synthesizing sequences to test those predictions. Because of the degeneracy of the genetic code, design of gene sequences to encode proteins is a high-dimensional problem, so there is no single simple formula to guarantee success. Nevertheless, there are several straightforward steps that can be taken to greatly increase the probability that a designed sequence will result in expression of the encoded protein. In this chapter, we discuss gene sequence parameters that are important for protein expression. We also describe algorithms for optimizing these parameters, and troubleshooting procedures that can be helpful when initial attempts fail. Finally, we show how many of these methods can be accomplished using the synthetic biology software tool Gene Designer.
Collapse
Affiliation(s)
- Mark Welch
- DNA2.0, Inc., Suite A, Menlo Park, California, USA
| | | | | | | |
Collapse
|
47
|
Selected codon usage bias in members of the class Mollicutes. Gene 2010; 473:110-8. [PMID: 21147204 DOI: 10.1016/j.gene.2010.11.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Revised: 11/20/2010] [Accepted: 11/22/2010] [Indexed: 11/24/2022]
Abstract
Mollicutes are parasitic microorganisms mainly characterized by small cell sizes, reduced genomes and great A and T mutational bias. We analyzed the codon usage patterns of the completely sequenced genomes of bacteria that belong to this class. We found that for many organisms not only mutational bias but also selection has a major effect on codon usage. Through a comparative perspective and based on three widely used criteria we were able to classify Mollicutes according to the effect of selection on codon usage. We found conserved optimal codons in many species and study the tRNA gene pool in each genome. Previous results are reinforced by the fact that, when selection is operative, the putative optimal codons found match the respective cognate tRNA. Finally, we trace selection effect backwards to the common ancestor of the class and estimate the phylogenetic inertia associated with this character. We discuss the possible scenarios that explain the observed evolutionary patterns.
Collapse
|
48
|
Zuo XH, Guo XG, Zhan YZ, Wu D, Yang ZH, Dong WG, Huang LQ, Ren TG, Jing YG, Wang QH, Sun XM, Lin SJ. Host selection and niche differentiation in sucking lice (Insecta: Anoplura) among small mammals in southwestern China. Parasitol Res 2010; 108:1243-51. [PMID: 21140167 DOI: 10.1007/s00436-010-2173-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2010] [Accepted: 11/12/2010] [Indexed: 11/30/2022]
Abstract
Understanding factors that shape host selection has been a classic issue in ecology, evolutionary biology, and epidemiological investigation. During the survey from 2000 to 2009, a total of 11,216 individuals of small mammals were captured from Yunnan Province in southwestern China. The captured small mammalian hosts belong to five orders, ten families, 35 genera, and 65 species and from their body surface, 38,885 individuals of ectoparasitic sucking lice were collected, which represent five families, seven genera, and 31 species. Based on niche overlap of dominant sucking lice on their primary hosts, we used hierarchical cluster analysis to sort different sucking louse species' resource utilizations of similar kind into respective categories. Given λ<5, there are only two groups clustered, however, sucking louse species' resource utilization was sorted into eight respective categories at λ=15. The results revealed that most species of sucking lice usually had high host specificity and a certain species of sucking louse usually restricted to one or few small mammalian species as their dominant hosts. Correspondence analysis was used to visualize associations between parasitic sucking lice and their small mammalian hosts, which suggested three different patterns of host resource utilization: species specialists, genera generalists, and multiple selections. For example, Sathrax durus (Johnson) only parasitized on species of Tupaia belangeri (Wagner), Hoplopleura edentula (Fahredholz) predominatly on genus of Eothenomys, and Polyplax reclinata (Nitzsch) on Family of Soricidae. Our results demonstrate that sucking lice have high host specificity and this might be due to coevolution between sucking lice and their hosts.
Collapse
Affiliation(s)
- Xiao-Hua Zuo
- Institute of Pathogens and Vectors, Dali University, Dali, Yunnan, 671000, China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Li ZP, Ying DQ, Li P, Li F, Bo XC, Wang SQ. Analysis of synonymous codon usage bias in 09H1N1. Virol Sin 2010; 25:329-40. [PMID: 20960179 DOI: 10.1007/s12250-010-3123-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2010] [Accepted: 04/30/2010] [Indexed: 11/29/2022] Open
Abstract
A novel subtype of influenza A virus 09H1N1 has rapidly spread across the world. Evolutionary analyses of this virus have revealed that 09H1N1 is a triple reassortant of segments from swine, avian and human influenza viruses. In this study, we investigated factors shaping the codon usage bias of 09H1N1 and carried out cluster analysis of 60 strains of influenza A virus from different subtypes based on their codon usage bias. We discovered that more preferentially used codons of 09H1N1 are A-ended or U-ended, and the intra-genomic codon usage bias of 09H1N1 is quite low. Base composition constraint, dinucleotide biases and translational selection are the main factors influencing the codon usage bias of 09H1N1. At the genome level, we find that the codon usage bias of 09H1N1 is similar to H1N1 (A/swine/Kansas/77778/2007H1N1), H9N2 from Asia, H1N2 from Asia and North America and H3N2 from North America. Our results provide insight for understanding the processes governing evolution, regulation of gene expression, and revealing the evolution of 09H1N1.
Collapse
Affiliation(s)
- Zhen-Peng Li
- Beijing Institute of Radiation Medicine, Beijing, 100850, China
| | | | | | | | | | | |
Collapse
|
50
|
Ma L, Zhang T, Huang Z, Jiang X, Tao S. Patterns of nucleotides that flank substitutions in human orthologous genes. BMC Genomics 2010; 11:416. [PMID: 20602772 PMCID: PMC2996944 DOI: 10.1186/1471-2164-11-416] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2010] [Accepted: 07/05/2010] [Indexed: 11/23/2022] Open
Abstract
Background Sequence context is an important aspect of base mutagenesis, and three-base periodicity is an intrinsic property of coding sequences. However, how three-base periodicity is influenced in the vicinity of substitutions is still unclear. The effect of context on mutagenesis should be revealed in the usage of nucleotides that flank substitutions. Relative entropy (also known as Kullback-Leibler divergence) is useful for finding unusual patterns in biological sequences. Results Using relative entropy, we visualized the periodic patterns in the context of substitutions in human orthologous genes. Neighbouring patterns differed both among substitution categories and within a category that occurred at three codon positions. Transition tended to occur in periodic sequences relative to transversion. Periodic signals were stronger in a set of flanking sequences of substitutions that occurred at the third-codon positions than in those that occurred at the first- or second-codon positions. To determine how the three-base periodicity was affected near the substitution sites, we fitted a sine model to the values of the relative entropy. A sine of period equal to 3 is a good approximation for the three-base periodicity at sites not in close vicinity to some substitutions. These periods were interrupted near the substitution site and then reappeared away from substitutions. A comparative analysis between the native and codon-shuffled datasets suggested that the codon usage frequency was not the sole origin of the three-base periodicity, implying that the native order of codons also played an important role in this periodicity. Synonymous codon shuffling revealed that synonymous codon usage bias was one of the factors responsible for the observed three-base periodicity. Conclusions Our results offer an efficient way to illustrate unusual periodic patterns in the context of substitutions and provide further insight into the origin of three-base periodicity. This periodicity is a result of the native codon order in the reading frame. The length of the period equal to 3 is caused by the usage bias of nucleotides in synonymous codons. The periodic features in nucleotides surrounding substitutions aid in further understanding genetic variation and nucleotide mutagenesis.
Collapse
Affiliation(s)
- Lei Ma
- Bioinformatics Centre, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | | | | | | | | |
Collapse
|