1
|
Amangeldina A, Tan ZW, Berezovsky IN. Living in trinity of extremes: Genomic and proteomic signatures of halophilic, thermophilic, and pH adaptation. Curr Res Struct Biol 2024; 7:100129. [PMID: 38327713 PMCID: PMC10847869 DOI: 10.1016/j.crstbi.2024.100129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/16/2024] [Accepted: 01/16/2024] [Indexed: 02/09/2024] Open
Abstract
Since nucleic acids and proteins of unicellular prokaryotes are directly exposed to extreme environmental conditions, it is possible to explore the genomic-proteomic compositional determinants of molecular mechanisms of adaptation developed by them in response to harsh environmental conditions. Using a wealth of currently available complete genomes/proteomes we were able to explore signatures of adaptation to three environmental factors, pH, salinity, and temperature, observing major trends in compositions of their nucleic acids and proteins. We derived predictors of thermostability, halophilic, and pH adaptations and complemented them by the principal components analysis. We observed a clear difference between thermophilic and salinity/pH adaptations, whereas latter invoke seemingly overlapping mechanisms. The genome-proteome compositional trade-off reveals an intricate balance between the work of base paring and base stacking in stabilization of coding DNA and r/tRNAs, and, at the same time, universal requirements for the stability and foldability of proteins regardless of the nucleotide biases. Nevertheless, we still found hidden fingerprints of ancient evolutionary connections between the nucleotide and amino acid compositions indicating their emergence, mutual evolution, and adjustment. The evolutionary perspective on the adaptation mechanisms is further studied here by means of the comparative analysis of genomic/proteomic traits of archaeal and bacterial species. The overall picture of genomic/proteomic signals of adaptation obtained here provides a foundation for future engineering and design of functional biomolecules resistant to harsh environments.
Collapse
Affiliation(s)
- Aidana Amangeldina
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| | - Zhen Wah Tan
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | - Igor N. Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| |
Collapse
|
2
|
Ji C, Zhang Y, Feng Y, Zhang X, Gong F, Yao H, Sun X, Pan Z. Circular replication-associated protein-encoding single-stranded DNA virus with risk of spillover is widely prevalent in domestic animals in China. Virus Res 2024; 339:199204. [PMID: 37607596 PMCID: PMC10654594 DOI: 10.1016/j.virusres.2023.199204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 08/18/2023] [Accepted: 08/19/2023] [Indexed: 08/24/2023]
Abstract
Circular replication-associated protein (Rep)-encoding single-stranded (CRESS) DNA viruses are highly diverse and have a broad range of hosts. In this study, we report the detection of Bo-Circo-like virus AH20-1 in the feces of diarrheal cattle. The virus has a circular genome of 3,912 nucleotides, three major putative open reading frames, and encodes a Rep gene of 310 amino acids. We found that the virus is closely related to the Bo-Circo-like virus CH strain, which belongs to the novel Kirkoviridae family. Furthermore, we conducted a nationwide surveillance program and found that the virus is prevalent in China (23.6%, 205/868), with the BCLa subtype being the predominant strain. Our findings suggest that the virus can infect sheep, highlighting the potential for cross-species transmission. Our pressure analysis indicates that the CRESS-DNA Kirkoviridae family has broad host adaptation, and that selection pressure played an important role in the evolution of its Rep genes. Our study underscores the need for continued epidemiological surveillance of this virus due to its widespread prevalence in our ruminant population and potential for cross-species transmission.
Collapse
Affiliation(s)
- Chengyuan Ji
- MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
| | - Yao Zhang
- MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
| | - Yiqiu Feng
- MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
| | - Xinqin Zhang
- MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
| | - Fengju Gong
- MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
| | - Huochun Yao
- MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
| | - Xueqiang Sun
- China Animal Health and Epidemiology Center, Key Laboratory of Animal Biosafety Risk Prevention and Control (South), Qingdao 266000, China.
| | - Zihao Pan
- MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China..
| |
Collapse
|
3
|
Aliperti L, Aptekmann AA, Farfañuk G, Couso LL, Soler-Bistué A, Sánchez IE. r/K selection of GC content in prokaryotes. Environ Microbiol 2023; 25:3255-3268. [PMID: 37813828 DOI: 10.1111/1462-2920.16511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 09/16/2023] [Indexed: 10/11/2023]
Abstract
The guanine/cytosine (GC) content of prokaryotic genomes is species-specific, taking values from 16% to 77%. This diversity of selection for GC content remains contentious. We analyse the correlations between GC content and a range of phenotypic and genotypic data in thousands of prokaryotes. GC content integrates well with these traits into r/K selection theory when phenotypic plasticity is considered. High GC-content prokaryotes are r-strategists with cheaper descendants thanks to a lower average amino acid metabolic cost, colonize unstable environments thanks to flagella and a bacillus form and are generalists in terms of resource opportunism and their defence mechanisms. Low GC content prokaryotes are K-strategists specialized for stable environments that maintain homeostasis via a high-cost outer cell membrane and endospore formation as a response to nutrient deprivation, and attain a higher nutrient-to-biomass yield. The lower proteome cost of high GC content prokaryotes is driven by the association between GC-rich codons and cheaper amino acids in the genetic code, while the correlation between GC content and genome size may be partly due to functional diversity driven by r/K selection. In all, molecular diversity in the GC content of prokaryotes may be a consequence of ecological r/K selection.
Collapse
Affiliation(s)
- Lucio Aliperti
- Facultad de Ciencias Exactas y Naturales. Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Ariel A Aptekmann
- Marine and Coastal Sciences Department, Rutgers University, New Brunswick, New Jersey, USA
| | - Gonzalo Farfañuk
- Facultad de Ciencias Exactas y Naturales. Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Luciana L Couso
- Facultad de Agronomía, Cátedra de Genética, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Alfonso Soler-Bistué
- Instituto de Investigaciones Biotecnológicas Dr. Rodolfo A. Ugalde, CONICET, Universidad Nacional de San Martín, San Martin, Argentina
| | - Ignacio E Sánchez
- Facultad de Ciencias Exactas y Naturales. Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
4
|
Sane M, Diwan GD, Bhat BA, Wahl LM, Agashe D. Shifts in mutation spectra enhance access to beneficial mutations. Proc Natl Acad Sci U S A 2023; 120:e2207355120. [PMID: 37216547 PMCID: PMC10235995 DOI: 10.1073/pnas.2207355120] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 03/27/2023] [Indexed: 05/24/2023] Open
Abstract
Biased mutation spectra are pervasive, with wide variation in the magnitude of mutational biases that influence genome evolution and adaptation. How do such diverse biases evolve? Our experiments show that changing the mutation spectrum allows populations to sample previously undersampled mutational space, including beneficial mutations. The resulting shift in the distribution of fitness effects is advantageous: Beneficial mutation supply and beneficial pleiotropy both increase, while deleterious load reduces. More broadly, simulations indicate that reducing or reversing the direction of a long-term bias is always selectively favored. Such changes in mutation bias can occur easily via altered function of DNA repair genes. A phylogenetic analysis shows that these genes are repeatedly gained and lost in bacterial lineages, leading to frequent bias shifts in opposite directions. Thus, shifts in mutation spectra may evolve under selection and can directly alter the outcome of adaptive evolution by facilitating access to beneficial mutations.
Collapse
Affiliation(s)
- Mrudula Sane
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bengaluru560065, India
| | - Gaurav D. Diwan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bengaluru560065, India
- Bioquant, University of Heidelberg,69120Heidelberg, Germany
- Heidelberg University Biochemistry Center (BZH), 69120Heidelberg, Germany
| | - Bhoomika A. Bhat
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bengaluru560065, India
- Undergraduate Programme, Indian Institute of Science, Bengaluru 560012, India
| | - Lindi M. Wahl
- Mathematics, Western University, London, ON, N6A 5B7, Canada
| | - Deepa Agashe
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bengaluru560065, India
| |
Collapse
|
5
|
Genetic Diversity and Characterization of Circular Replication (Rep)-Encoding Single-Stranded (CRESS) DNA Viruses. Microbiol Spectr 2022; 10:e0105722. [PMID: 36346238 PMCID: PMC9769708 DOI: 10.1128/spectrum.01057-22] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
The CRESS-DNA viruses are the ubiquitous virus detected in almost all eukaryotic life trees and play an essential role in the maintaining ecosystem of the globe. Still, their genetic diversity is not fully understood. Here, we bring to light the genetic diversity of replication (Rep) and capsid (Cap) proteins of CRESS-DNA viruses. We divided the Rep protein of the CRESS-DNA virus into 10 clusters using CLANS and phylogenetic analyses. Also, most of the Rep protein in Rep cluster 1 (R1) and R2 (Circoviridae, Smacoviridae, Nanoviridae, and CRESSV1-5) contain the Viral_Rep superfamily and P-loop_NTPase superfamily domains, while the Rep protein of viruses in other clusters has no such characterized functional domain. The Circoviridae, Nanoviridae, and CRESSV1-3 viruses contain two domains, such as Viral_Rep and P-loop_NTPase; the CRESSV4 and CRESSV5 viruses have only the Viral_Rep domain; most of the sequences in the pCRESS-related group have only P-loop_NTPase; and Smacoviridae do not have these two domains. Further, we divided the Cap protein of the CRESS-DNA virus into 20 clusters using CLANS and phylogenetic analyses. The Rep and Cap proteins of Circoviridae and Smacoviridae are grouped into a specific cluster. Cap protein of CRESS-DNA viruses grouped with one cluster and Rep protein with another cluster. Further, our study reveals that selection pressure plays a significant role in the evolution of CRESS-DNA viruses' Rep and Cap genes rather than mutational pressure. We hope this study will help determine the genetic diversity of CRESS-DNA viruses as more sequences are discovered in the future. IMPORTANCE The genetic diversity of CRESS-DNA viruses is not fully understood. CRESS-DNA viruses are classified as CRESSV1 to CRESSV6 using only Rep protein. This study revealed that the Rep protein of the CRESS-DNA viruses is classified as CRESSV1 to CRESSV6 groups and the new Smacoviridae-related, CRESSV2-related, pCRESS-related, Circoviridae-related, and 1 to 4 outgroups, according to the Viral_Rep and P-loop_NTPase domain organization, CLANS, and phylogenetic analysis. Furthermore, for the first time in this study, the Cap protein of CRESS-DNA viruses was classified into 20 distinct clusters by CLANS and phylogenetic analysis. Through this classification, the genetic diversity of CRESS-DNA viruses clarifies the possibility of recombinations in Cap and Rep proteins. Finally, it has been shown that selection pressure plays a significant role in the evolution and genetic diversity of Cap and Rep proteins. This study explains the genetic diversity of CRESS-DNA viruses and hopes that it will help classify future detected viruses.
Collapse
|
6
|
Hu EZ, Lan XR, Liu ZL, Gao J, Niu DK. A positive correlation between GC content and growth temperature in prokaryotes. BMC Genomics 2022; 23:110. [PMID: 35139824 PMCID: PMC8827189 DOI: 10.1186/s12864-022-08353-7] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 01/31/2022] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND GC pairs are generally more stable than AT pairs; GC-rich genomes were proposed to be more adapted to high temperatures than AT-rich genomes. Previous studies consistently showed positive correlations between growth temperature and the GC contents of structural RNA genes. However, for the whole genome sequences and the silent sites of the codons in protein-coding genes, the relationship between GC content and growth temperature is in a long-lasting debate. RESULTS With a dataset much larger than previous studies (681 bacteria and 155 archaea with completely assembled genomes), our phylogenetic comparative analyses showed positive correlations between optimal growth temperature (Topt) and GC content both in bacterial and archaeal structural RNA genes and in bacterial whole genome sequences, chromosomal sequences, plasmid sequences, core genes, and accessory genes. However, in the 155 archaea, we did not observe a significant positive correlation of Topt with whole-genome GC content (GCw) or GC content at four-fold degenerate sites. We randomly drew 155 samples from the 681 bacteria for 1000 rounds. In most cases (> 95%), the positive correlations between Topt and genomic GC contents became statistically nonsignificant (P > 0.05). This result suggested that the small sample sizes might account for the lack of positive correlations between growth temperature and genomic GC content in the 155 archaea and the bacterial samples of previous studies. Comparing the GC content among four categories (psychrophiles/psychrotrophiles, mesophiles, thermophiles, and hyperthermophiles) also revealed a positive correlation between GCw and growth temperature in bacteria. By including the GCw of incompletely assembled genomes, we expanded the sample size of archaea to 303. Positive correlations between GCw and Topt appear especially after excluding the halophilic archaea whose GC contents might be strongly shaped by intense UV radiation. CONCLUSIONS This study explains the previous contradictory observations and ends a long debate. Prokaryotes growing in high temperatures have higher GC contents. Thermal adaptation is one possible explanation for the positive association. Meanwhile, we propose that the elevated efficiency of DNA repair in response to heat mutagenesis might have the by-product of increasing GC content like that happens in intracellular symbionts and marine bacterioplankton.
Collapse
Affiliation(s)
- En-Ze Hu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Xin-Ran Lan
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Zhi-Ling Liu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Jie Gao
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Deng-Ke Niu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
7
|
Abstract
DPANN is known as highly diverse, globally widespread, and mostly ectosymbiotic archaeal superphylum. However, this group of archaea was overlooked for a long time, and there were limited in-depth studies reported. In this investigation, 41 metagenome-assembled genomes (MAGs) belonging to the DPANN superphylum were recovered (18 MAGs had average nucleotide identity [ANI] values of <95% and a percentage of conserved proteins [POCP] of >50%, while 14 MAGs showed a POCP of <50%), which were analyzed comparatively with 515 other published DPANN genomes. Mismatches to known 16S rRNA gene primers were identified among 16S rRNA genes of DPANN archaea. Numbers of gene families lost (mostly related to energy and amino acid metabolism) were over three times greater than those gained in the evolution of DPANN archaea. Lateral gene transfer (LGT; ∼45.5% was cross-domain) had facilitated niche adaption of the DPANN archaea, ensuring a delicate equilibrium of streamlined genomes with efficient niche-adaptive strategies. For instance, LGT-derived cytochrome bd ubiquinol oxidase and arginine deiminase in the genomes of “Candidatus Micrarchaeota” could help them better adapt to aerobic acidic mine drainage habitats. In addition, most DPANN archaea acquired enzymes for biosynthesis of extracellular polymeric substances (EPS) and transketolase/transaldolase for the pentose phosphate pathway from Bacteria. IMPORTANCE The domain Archaea is a key research model for gaining insights into the origin and evolution of life, as well as the relevant biogeochemical processes. The discovery of nanosized DPANN archaea has overthrown many aspects of microbiology. However, the DPANN superphylum still contains a vast genetic novelty and diversity that need to be explored. Comprehensively comparative genomic analysis on the DPANN superphylum was performed in this study, with an attempt to illuminate its metabolic potential, ecological distribution and evolutionary history. Many interphylum differences within the DPANN superphylum were found. For example, Altiarchaeota had the biggest genome among DPANN phyla, possessing many pathways missing in other phyla, such as formaldehyde assimilation and the Wood-Ljungdahl pathway. In addition, LGT acted as an important force to provide DPANN archaeal genetic flexibility that permitted the occupation of diverse niches. This study has advanced our understanding of the diversity and genome evolution of archaea.
Collapse
|
8
|
Gao NL, He Z, Zhu Q, Jiang P, Hu S, Chen WH. Selection for Cheaper Amino Acids Drives Nucleotide Usage at the Start of Translation in Eukaryotic Genes. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:949-957. [PMID: 33741525 PMCID: PMC9403032 DOI: 10.1016/j.gpb.2021.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 05/30/2019] [Accepted: 08/18/2019] [Indexed: 12/04/2022]
Abstract
Coding regions have complex interactions among multiple selective forces, which are manifested as biases in nucleotide composition. Previous studies have revealed a decreasing GC gradient from the 5′-end to 3′-end of coding regions in various organisms. We confirmed that this gradient is universal in eukaryotic genes, but the decrease only starts from the ∼ 25th codon. This trend is mostly found in nonsynonymous (ns) sites at which the GC gradient is universal across the eukaryotic genome. Increased GC contents at ns sites result in cheaper amino acids, indicating a universal selection for energy efficiency toward the N-termini of encoded proteins. Within a genome, the decreasing GC gradient is intensified from lowly to highly expressed genes (more and more protein products), further supporting this hypothesis. This reveals a conserved selective constraint for cheaper amino acids at the translation start that drives the increased GC contents at ns sites. Elevated GC contents can facilitate transcription but result in a more stable local secondary structure around the start codon and subsequently impede translation initiation. Conversely, the GC gradients at four-fold and two-fold synonymous sites vary across species. They could decrease or increase, suggesting different constraints acting at the GC contents of different codon sites in different species. This study reveals that the overall GC contents at the translation start are consequences of complex interactions among several major biological processes that shape the nucleotide sequences, especially efficient energy usage.
Collapse
Affiliation(s)
- Na L Gao
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; Institute for Computer Science and Cluster of Excellence on Plant Sciences, Heinrich Heine University, Duesseldorf 40225, Germany
| | - Zilong He
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China; State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China; Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Interdisciplinary Innovation Institute of Medicine and Engineering, Beihang University, Beijing 100191, China
| | - Qianhui Zhu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China; State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Puzi Jiang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Songnian Hu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China; State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| |
Collapse
|
9
|
Liu XY, Li Y, Ji KK, Zhu J, Ling P, Zhou T, Fan LY, Xie SQ. Genome-wide codon usage pattern analysis reveals the correlation between codon usage bias and gene expression in Cuscuta australis. Genomics 2020; 112:2695-2702. [PMID: 32145379 DOI: 10.1016/j.ygeno.2020.03.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 02/05/2020] [Accepted: 03/03/2020] [Indexed: 11/28/2022]
Abstract
The protein-coding genes and pseudogenes of Cuscuta australis had the diverse contribution to the formation and evolution of parasitism. The codon usage pattern analysis of these two type genes could be used to understand the gene transcription and translation. In this study, we systematically analyzed the codon usage patterns of protein-coding sequences and pseudogenes sequences in C. australis. The results showed that the high frequency codons of protein coding sequences and pseudogenes had the same A/U bias in the third position. However, these two sequences had converse bias at the third base in optimal codons: the protein coding sequences preferred G/C-ending codons while pseudogene sequences preferred A/U-ending codons. Neutrality plot and effective number of codons plot revealed that natural selection played a more important role than mutation pressure in two sequences codon usage bias. Furthermore, the gene expression level had a significant positive correlation with codon usage bias in C. australis. Highly-expressed protein coding genes exhibited a higher codon bias than lowly-expressed genes. Meanwhile, the high-expression genes tended to use G/C-ending synonymous codons. This result further verified the optimal codons usage bias and its correlation with the gene expression in C. australis.
Collapse
Affiliation(s)
- Xu-Yuan Liu
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou 570228, China
| | - Yu Li
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou 570228, China
| | - Kai-Kai Ji
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou 570228, China
| | - Jie Zhu
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou 570228, China
| | - Peng Ling
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou 570228, China
| | - Tao Zhou
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou 570228, China
| | - Lan-Ying Fan
- Shanxi Academy of Forestry Sciences, Taiyuan 030012, China.
| | - Shang-Qian Xie
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou 570228, China.
| |
Collapse
|
10
|
Nariyampet SA, Hajamohideen AJA. A study on codon usage bias in cytochrome c oxidase I (COI) gene of solitary ascidian Herdmania momus Savigny, 1816. GENE REPORTS 2019. [DOI: 10.1016/j.genrep.2019.100523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
Aslam S, Lan XR, Zhang BW, Chen ZL, Wang L, Niu DK. Aerobic prokaryotes do not have higher GC contents than anaerobic prokaryotes, but obligate aerobic prokaryotes have. BMC Evol Biol 2019; 19:35. [PMID: 30691392 PMCID: PMC6350292 DOI: 10.1186/s12862-019-1365-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 01/17/2019] [Indexed: 12/17/2022] Open
Abstract
Background Among the four bases, guanine is the most susceptible to damage from oxidative stress. Replication of DNA containing damaged guanines results in G to T mutations. Therefore, the mutations resulting from oxidative DNA damage are generally expected to predominantly consist of G to T (and C to A when the damaged guanine is not in the reference strand) and result in decreased GC content. However, the opposite pattern was reported 16 years ago in a study of prokaryotic genomes. Although that result has been widely cited and confirmed by nine later studies with similar methods, the omission of the effect of shared ancestry requires a re-examination of the reliability of the results. Results When aerobic and obligate aerobic prokaryotes were mixed together and anaerobic and obligate anaerobic prokaryotes were mixed together, phylogenetic controlled analyses did not detect significant difference in GC content between aerobic and anaerobic prokaryotes. This result is consistent with two generally neglected studied that had accounted for the phylogenetic relationship. However, when obligate aerobic prokaryotes were compared with aerobic prokaryotes, anaerobic prokaryotes, and obligate anaerobic prokaryotes separately using phylogenetic regression analysis, a significant positive association was observed between aerobiosis and GC content, no matter it was calculated from whole genome sequences or the 4-fold degenerate sites of protein-coding genes. Obligate aerobes have significantly higher GC content than aerobes, anaerobes, and obligate anaerobes. Conclusions The positive association between aerobiosis and GC content could be attributed to a mutational force resulting from incorporation of damaged deoxyguanosine during DNA replication rather than oxidation of the guanine nucleotides within DNA sequences. Our results indicate a grade in the aerobiosis-associated mutational force, strong in obligate aerobes, moderate in aerobes, weak in anaerobes and obligate anaerobes. Electronic supplementary material The online version of this article (10.1186/s12862-019-1365-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sidra Aslam
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Xin-Ran Lan
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Bo-Wen Zhang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Zheng-Lin Chen
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Li Wang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Deng-Ke Niu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
12
|
Du MZ, Zhang C, Wang H, Liu S, Wei W, Guo FB. The GC Content as a Main Factor Shaping the Amino Acid Usage During Bacterial Evolution Process. Front Microbiol 2018; 9:2948. [PMID: 30581420 PMCID: PMC6292993 DOI: 10.3389/fmicb.2018.02948] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 11/16/2018] [Indexed: 11/13/2022] Open
Abstract
Understanding how proteins evolve is important, and the order of amino acids being recruited into the genetic codons was found to be an important factor shaping the amino acid composition of proteins. The latest work about the last universal common ancestor (LUCA) makes it possible to determine the potential factors shaping amino acid compositions during evolution. Those LUCA genes/proteins from Methanococcus maripaludis S2, which is one of the possible LUCA, were investigated. The evolutionary rates of these genes positively correlate with GC contents with P-value significantly lower than 0.05 for 94% homologous genes. Linear regression results showed that compositions of amino acids coded by GC-rich codons positively contribute to the evolutionary rates, while these amino acids tend to be gained in GC-rich organisms according to our results. The first principal component correlates with the GC content very well. The ratios of amino acids of the LUCA proteins coded by GC rich codons positively correlate with the GC content of different bacteria genomes, while the ratios of amino acids coded by AT rich codons negatively correlate with the increase of GC content of genomes. Next, we found that the recruitment order does correlate with the amino acid compositions, but gain and loss in codons showed newly recruited amino acids are not significantly increased along with the evolution. Thus, we conclude that GC content is a primary factor shaping amino acid compositions. GC content shapes amino acid composition to trade off the cost of amino acids with bases, which could be caused by the energy efficiency.
Collapse
Affiliation(s)
- Meng-Ze Du
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | | | - Huan Wang
- School of Life Sciences, Chongqing University, Chongqing, China
| | - Shuo Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wen Wei
- School of Life Sciences, Chongqing University, Chongqing, China
| | - Feng-Biao Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- Centre for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
13
|
Mu W, Liu J, Zhang H. Complete mitochondrial genome of Benthodytes marianensis (Holothuroidea: Elasipodida: Psychropotidae): Insight into deep sea adaptation in the sea cucumber. PLoS One 2018; 13:e0208051. [PMID: 30500836 PMCID: PMC6267960 DOI: 10.1371/journal.pone.0208051] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 11/09/2018] [Indexed: 01/01/2023] Open
Abstract
Complete mitochondrial genomes play important roles in studying genome evolution, phylogenetic relationships, and species identification. Sea cucumbers (Holothuroidea) are ecologically important and diverse members, living from the shallow waters to the hadal trench. In this study, we present the mitochondrial genome sequence of the sea cucumber Benthodytes marianensis collected from the Mariana Trench. To our knowledge, this is the first reported mitochondrial genome from the genus Benthodytes. This complete mitochondrial genome is 17567 bp in length and consists of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes (duplication of two tRNAs: trnL and trnS). Most of these genes are coded on the positive strand except for one protein-coding gene (nad6) and five tRNA genes which are coded on the negative strand. Two putative control regions (CRs) have been found in the B. marianensis mitogenome. We compared the order of genes from the 10 available holothurian mitogenomes and found a novel gene arrangement in B. marianensis. Phylogenetic analysis revealed that B. marianensis clustered with Peniagone sp. YYH-2013, forming the deep-sea Elasipodida clade. Positive selection analysis showed that eleven residues (24 S, 45 S, 185 S, 201 G, 211 F and 313 N in nad2; 108 S, 114 S, 322 C, 400 T and 427 S in nad4) were positively selected sites with high posterior probabilities. We predict that nad2 and nad4 may be the important candidate genes for the further investigation of the adaptation of B. marianensis to the deep-sea environment.
Collapse
Affiliation(s)
- Wendan Mu
- Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jun Liu
- Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, China
| | - Haibin Zhang
- Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, China
- * E-mail:
| |
Collapse
|
14
|
Du MZ, Liu S, Zeng Z, Alemayehu LA, Wei W, Guo FB. Amino acid compositions contribute to the proteins' evolution under the influence of their abundances and genomic GC content. Sci Rep 2018; 8:7382. [PMID: 29743515 PMCID: PMC5943316 DOI: 10.1038/s41598-018-25364-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 04/16/2018] [Indexed: 12/23/2022] Open
Abstract
Inconsistent results on the association between evolutionary rates and amino acid composition of proteins have been reported in eukaryotes. However, there are few studies of how amino acid composition can influence evolutionary rates in bacteria. Thus, we constructed linear regression models between composition frequencies of amino acids and evolutionary rates for bacteria. Compositions of all amino acids can on average explain 21.5% of the variation in evolutionary rates among 273 investigated bacterial organisms. In five model organisms, amino acid composition contributes more to variation in evolutionary rates than protein abundance, and frequency of optimal codons. The contribution of individual amino acid composition to evolutionary rate varies among organisms. The closer the GC-content of genome to its maximum or minimum, the better the correlation between the amino acid content and the evolutionary rate of proteins would appear in that genome. The types of amino acids that significantly contribute to evolutionary rates can be grouped into GC-rich and AT-rich amino acids. Besides, the amino acid with high composition also contributes more to evolutionary rates than amino acid with low composition in proteome. In summary, amino acid composition significantly contributes to the rate of evolution in bacterial organisms and this in turn is impacted by GC-content.
Collapse
Affiliation(s)
- Meng-Ze Du
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Shuo Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhi Zeng
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Labena Abraham Alemayehu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wen Wei
- School of Life Sciences, Chongqing University, Chongqing, China.
| | - Feng-Biao Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China. .,Centre for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China. .,Key Laboratory for Neuroinformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
15
|
Liu SS, Hockenberry AJ, Lancichinetti A, Jewett MC, Amaral LAN. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents. PLoS Comput Biol 2016; 12:e1005184. [PMID: 27835644 PMCID: PMC5106001 DOI: 10.1371/journal.pcbi.1005184] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Accepted: 10/05/2016] [Indexed: 01/08/2023] Open
Abstract
The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems. The generation of random sequences is instrumental to the accurate identification of non-random motifs within genomes, yet there are currently no tools available that allow users to simultaneously specify amino acid and GC composition to create random coding sequences. Here, we develop an algorithm based on maximum entropy that consistently generates fully random nucleotide sequences with the desired amino acid composition and GC content.
Collapse
Affiliation(s)
- Sophia S. Liu
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
| | - Adam J. Hockenberry
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
- Interdisciplinary Program in Biological Sciences, Northwestern University, Evanston, Illinois, United States of America
| | - Andrea Lancichinetti
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
| | - Michael C. Jewett
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
- Interdisciplinary Program in Biological Sciences, Northwestern University, Evanston, Illinois, United States of America
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, Illinois, United States of America
- Chemistry of Life Processes Institute, Northwestern University, Evanston, Illinois, United States of America
| | - Luís A. N. Amaral
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, Illinois, United States of America
- Department of Physics and Astronomy, Northwestern University, Evanston, Illinois, United States of America
- * E-mail:
| |
Collapse
|
16
|
Genome-Wide Analysis of the Synonymous Codon Usage Patterns in Riemerella anatipestifer. Int J Mol Sci 2016; 17:ijms17081304. [PMID: 27517915 PMCID: PMC5000701 DOI: 10.3390/ijms17081304] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Revised: 07/31/2016] [Accepted: 08/02/2016] [Indexed: 11/17/2022] Open
Abstract
Riemerella anatipestifer (RA) belongs to the Flavobacteriaceae family and can cause a septicemia disease in poultry. The synonymous codon usage patterns of bacteria reflect a series of evolutionary changes that enable bacteria to improve tolerance of the various environments. We detailed the codon usage patterns of RA isolates from the available 12 sequenced genomes by multiple codon and statistical analysis. Nucleotide compositions and relative synonymous codon usage (RSCU) analysis revealed that A or U ending codons are predominant in RA. Neutrality analysis found no significant correlation between GC12 and GC₃ (p > 0.05). Correspondence analysis and ENc-plot results showed that natural selection dominated over mutation in the codon usage bias. The tree of cluster analysis based on RSCU was concordant with dendrogram based on genomic BLAST by neighbor-joining method. By comparative analysis, about 50 highly expressed genes that were orthologs across all 12 strains were found in the top 5% of high CAI value. Based on these CAI values, we infer that RA contains a number of predicted highly expressed coding sequences, involved in transcriptional regulation and metabolism, reflecting their requirement for dealing with diverse environmental conditions. These results provide some useful information on the mechanisms that contribute to codon usage bias and evolution of RA.
Collapse
|
17
|
Sharma A, Gilbert JA, Lal R. (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans. Sci Rep 2016; 6:25527. [PMID: 27151933 PMCID: PMC4858710 DOI: 10.1038/srep25527] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 04/14/2016] [Indexed: 11/17/2022] Open
Abstract
Despite having serious clinical manifestations, Cellulosimicrobium cellulans remain under-reported with only three genome sequences available at the time of writing. Genome sequences of C. cellulans LMG16121, C. cellulans J36 and Cellulosimicrobium sp. strain MM were used to determine distribution of pathogenicity islands (PAIs) across C. cellulans, which revealed 49 potential marker genes with known association to human infections, e.g. Fic and VbhA toxin-antitoxin system. Oligonucleotide composition-based analysis of orthologous proteins (n = 791) across three genomes revealed significant negative correlation (P < 0.05) between frequency of optimal codons (Fopt) and gene G+C content, highlighting the G+C-biased gene conversion (gBGC) effect across Cellulosimicrobium strains. Bayesian molecular-clock analysis performed on three virulent PAI proteins (Fic; D-alanyl-D-alanine-carboxypeptidase; transposase) dated the divergence event at 300 million years ago from the most common recent ancestor. Synteny-based annotation of hypothetical proteins highlighted gene transfers from non-pathogenic bacteria as a key factor in the evolution of PAIs. Additonally, deciphering the metagenomic islands using strain MM’s genome with environmental data from the site of isolation (hot-spring biofilm) revealed (an)aerobic respiration as population segregation factor across the in situ cohorts. Using reference genomes and metagenomic data, our results highlight the emergence and evolution of PAIs in the genus Cellulosimicrobium.
Collapse
Affiliation(s)
| | - Jack A Gilbert
- Biosciences Division (BIO), Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL, USA.,Department of Surgery, University of Chicago, 5841 S Maryland Ave, Chicago, IL, USA.,Marine Biological Laboratory, Woods Hole, MA, USA
| | - Rup Lal
- Department of Zoology, University of Delhi, Delhi, India
| |
Collapse
|
18
|
Chen WH, Lu G, Bork P, Hu S, Lercher MJ. Energy efficiency trade-offs drive nucleotide usage in transcribed regions. Nat Commun 2016; 7:11334. [PMID: 27098217 PMCID: PMC4844684 DOI: 10.1038/ncomms11334] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Accepted: 03/16/2016] [Indexed: 01/29/2023] Open
Abstract
Efficient nutrient usage is a trait under universal selection. A substantial part of cellular resources is spent on making nucleotides. We thus expect preferential use of cheaper nucleotides especially in transcribed sequences, which are often amplified thousand-fold compared with genomic sequences. To test this hypothesis, we derive a mutation-selection-drift equilibrium model for nucleotide skews (strand-specific usage of 'A' versus 'T' and 'G' versus 'C'), which explains nucleotide skews across 1,550 prokaryotic genomes as a consequence of selection on efficient resource usage. Transcription-related selection generally favours the cheaper nucleotides 'U' and 'C' at synonymous sites. However, the information encoded in mRNA is further amplified through translation. Due to unexpected trade-offs in the codon table, cheaper nucleotides encode on average energetically more expensive amino acids. These trade-offs apply to both strand-specific nucleotide usage and GC content, causing a universal bias towards the more expensive nucleotides 'A' and 'G' at non-synonymous coding sites.
Collapse
Affiliation(s)
- Wei-Hua Chen
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- Structural and Computational Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Guanting Lu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Peer Bork
- Structural and Computational Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
- Bioinformatics department, Max Delbrück Centre for Molecular Medicine, Berlin 13125, Germany
| | - Songnian Hu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Martin J Lercher
- Institute for Computer Science and Cluster of Excellence on Plant Sciences, Heinrich Heine University, Düsseldorf 40225, Germany
| |
Collapse
|
19
|
Abstract
Amino acids typically are encoded by multiple synonymous codons that are not used with the same frequency. Codon usage bias has drawn considerable attention, and several explanations have been offered, including variation in GC-content between species. Focusing on a simple parameter—combined GC proportion of all the synonymous codons for a particular amino acid, termed GCsyn—we try to deepen our understanding of the relationship between GC-content and amino acid/codon usage in more details. We analyzed 65 widely distributed representative species and found a close association between GCsyn, GC-content, and amino acids usage. The overall usages of the four amino acids with the greatest GCsyn and the five amino acids with the lowest GCsyn both vary with the regional GC-content, whereas the usage of the remaining 11 amino acids with intermediate GCsyn is less variable. More interesting, we discovered that codon usage frequencies are nearly constant in regions with similar GC-content. We further quantified the effects of regional GC-content variation (low to high) on amino acid usage and found that GC-content determines the usage variation of amino acids, especially those with extremely high GCsyn, which accounts for 76.7% of the changed GC-content for those regions. Our results suggest that GCsyn correlates with GC-content and has impact on codon/amino acid usage. These findings suggest a novel approach to understanding the role of codon and amino acid usage in shaping genomic architecture and evolutionary patterns of organisms.
Collapse
|
20
|
Brbić M, Warnecke T, Kriško A, Supek F. Global Shifts in Genome and Proteome Composition Are Very Tightly Coupled. Genome Biol Evol 2015; 7:1519-32. [PMID: 25971281 PMCID: PMC4494046 DOI: 10.1093/gbe/evv088] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2015] [Indexed: 02/05/2023] Open
Abstract
The amino acid composition (AAC) of proteomes differs greatly between microorganisms and is associated with the environmental niche they inhabit, suggesting that these changes may be adaptive. Similarly, the oligonucleotide composition of genomes varies and may confer advantages at the DNA/RNA level. These influences overlap in protein-coding sequences, making it difficult to gauge their relative contributions. We disentangle these effects by systematically evaluating the correspondence between intergenic nucleotide composition, where protein-level selection is absent, the AAC, and ecological parameters of 909 prokaryotes. We find that G + C content, the most frequently used measure of genomic composition, cannot capture diversity in AAC and across ecological contexts. However, di-/trinucleotide composition in intergenic DNA predicts amino acid frequencies of proteomes to the point where very little cross-species variability remains unexplained (91% of variance accounted for). Qualitatively similar results were obtained for 49 fungal genomes, where 80% of the variability in AAC could be explained by the composition of introns and intergenic regions. Upon factoring out oligonucleotide composition and phylogenetic inertia, the residual AAC is poorly predictive of the microbes' ecological preferences, in stark contrast with the original AAC. Moreover, highly expressed genes do not exhibit more prominent environment-related AAC signatures than lowly expressed genes, despite contributing more to the effective proteome. Thus, evolutionary shifts in overall AAC appear to occur almost exclusively through factors shaping the global oligonucleotide content of the genome. We discuss these results in light of contravening evidence from biophysical data and further reading frame-specific analyses that suggest that adaptation takes place at the protein level.
Collapse
Affiliation(s)
- Maria Brbić
- Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia
| | - Tobias Warnecke
- MRC Clinical Sciences Centre, Imperial College, Hammersmith Campus, London, United Kingdom
| | - Anita Kriško
- Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia
| | - Fran Supek
- Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia EMBL/CRG Systems Biology Unit, Centre for Genomic Regulation, Barcelona, Spain
| |
Collapse
|
21
|
Bohlin J, Brynildsrud OB, Sekse C, Snipen L. An evolutionary analysis of genome expansion and pathogenicity in Escherichia coli. BMC Genomics 2014; 15:882. [PMID: 25297974 PMCID: PMC4200225 DOI: 10.1186/1471-2164-15-882] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 09/29/2014] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND There are several studies describing loss of genes through reductive evolution in microbes, but how selective forces are associated with genome expansion due to horizontal gene transfer (HGT) has not received similar attention. The aim of this study was therefore to examine how selective pressures influence genome expansion in 53 fully sequenced and assembled Escherichia coli strains. We also explored potential connections between genome expansion and the attainment of virulence factors. This was performed using estimations of several genomic parameters such as AT content, genomic drift (measured using relative entropy), genome size and estimated HGT size, which were subsequently compared to analogous parameters computed from the core genome consisting of 1729 genes common to the 53 E. coli strains. Moreover, we analyzed how selective pressures (quantified using relative entropy and dN/dS), acting on the E. coli core genome, influenced lineage and phylogroup formation. RESULTS Hierarchical clustering of dS and dN estimations from the E. coli core genome resulted in phylogenetic trees with topologies in agreement with known E. coli taxonomy and phylogroups. High values of dS, compared to dN, indicate that the E. coli core genome has been subjected to substantial purifying selection over time; significantly more than the non-core part of the genome (p<0.001). This is further supported by a linear association between strain-wise dS and dN values (β = 26.94 ± 0.44, R2~0.98, p<0.001). The non-core part of the genome was also significantly more AT-rich (p<0.001) than the core genome and E. coli genome size correlated with estimated HGT size (p<0.001). In addition, genome size (p<0.001), AT content (p<0.001) as well as estimated HGT size (p<0.005) were all associated with the presence of virulence factors, suggesting that pathogenicity traits in E. coli are largely attained through HGT. No associations were found between selective pressures operating on the E. coli core genome, as estimated using relative entropy, and genome size (p~0.98). CONCLUSIONS On a larger time frame, genome expansion in E. coli, which is significantly associated with the acquisition of virulence factors, appears to be independent of selective forces operating on the core genome.
Collapse
Affiliation(s)
- Jon Bohlin
- Division of Epidemiology, Norwegian Institute of Public Health, Marcus Thranes gate 6, P,O, Box 4404, Oslo 0403, Norway.
| | | | | | | |
Collapse
|
22
|
Das Roy R, Bhardwaj M, Bhatnagar V, Chakraborty K, Dash D. How do eubacterial organisms manage aggregation-prone proteome? F1000Res 2014; 3:137. [PMID: 25339987 PMCID: PMC4193397 DOI: 10.12688/f1000research.4307.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/24/2014] [Indexed: 11/20/2022] Open
Abstract
Eubacterial genomes vary considerably in their nucleotide composition. The percentage of genetic material constituted by guanosine and cytosine (GC) nucleotides ranges from 20% to 70%. It has been posited that GC-poor organisms are more dependent on protein folding machinery. Previous studies have ascribed this to the accumulation of mildly deleterious mutations in these organisms due to population bottlenecks. This phenomenon has been supported by protein folding simulations, which showed that proteins encoded by GC-poor organisms are more prone to aggregation than proteins encoded by GC-rich organisms. To test this proposition using a genome-wide approach, we classified different eubacterial proteomes in terms of their aggregation propensity and chaperone-dependence using multiple machine learning models. In contrast to the expected decrease in protein aggregation with an increase in GC richness, we found that the aggregation propensity of proteomes increases with GC content. A similar and even more significant correlation was obtained with the GroEL-dependence of proteomes: GC-poor proteomes have evolved to be less dependent on GroEL than GC-rich proteomes. We thus propose that a decrease in eubacterial GC content may have been selected in organisms facing proteostasis problems.
Collapse
Affiliation(s)
- Rishi Das Roy
- GNR Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Council of Scientific and Industrial Research, Delhi, 110007, India ; Department of Biotechnology, University of Pune, Pune, 411007, India
| | - Manju Bhardwaj
- Department of Computer Science, Maitreyi College, Chanakyapuri, Delhi, 110021, India
| | - Vasudha Bhatnagar
- Department of Computer Science, Faculty of Mathematical Sciences, University of Delhi, Delhi, 110007, India
| | - Kausik Chakraborty
- GNR Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Council of Scientific and Industrial Research, Delhi, 110007, India
| | - Debasis Dash
- GNR Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Council of Scientific and Industrial Research, Delhi, 110007, India ; Department of Biotechnology, University of Pune, Pune, 411007, India
| |
Collapse
|
23
|
Agashe D, Shankar N. The evolution of bacterial DNA base composition. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2014; 322:517-28. [DOI: 10.1002/jez.b.22565] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 01/22/2014] [Indexed: 11/08/2022]
Affiliation(s)
- Deepa Agashe
- National Center for Biological Sciences; Tata Institute of Fundamental Research; Bangalore India
| | - Nachiket Shankar
- National Center for Biological Sciences; Tata Institute of Fundamental Research; Bangalore India
| |
Collapse
|