1
|
Radrizzani S, Kudla G, Izsvák Z, Hurst LD. Selection on synonymous sites: the unwanted transcript hypothesis. Nat Rev Genet 2024; 25:431-448. [PMID: 38297070 DOI: 10.1038/s41576-023-00686-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2023] [Indexed: 02/02/2024]
Abstract
Although translational selection to favour codons that match the most abundant tRNAs is not readily observed in humans, there is nonetheless selection in humans on synonymous mutations. We hypothesize that much of this synonymous site selection can be explained in terms of protection against unwanted RNAs - spurious transcripts, mis-spliced forms or RNAs derived from transposable elements or viruses. We propose not only that selection on synonymous sites functions to reduce the rate of creation of unwanted transcripts (for example, through selection on exonic splice enhancers and cryptic splice sites) but also that high-GC content (but low-CpG content), together with intron presence and position, is both particular to functional native mRNAs and used to recognize transcripts as native. In support of this hypothesis, transcription, nuclear export, liquid phase condensation and RNA degradation have all recently been shown to promote GC-rich transcripts and suppress AU/CpG-rich ones. With such 'traps' being set against AU/CpG-rich transcripts, the codon usage of native genes has, in turn, evolved to avoid such suppression. That parallel filters against AU/CpG-rich transcripts also affect the endosomal import of RNAs further supports the unwanted transcript hypothesis of synonymous site selection and explains the similar design rules that have enabled the successful use of transgenes and RNA vaccines.
Collapse
Affiliation(s)
- Sofia Radrizzani
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Cancer, The University of Edinburgh, Edinburgh, UK
| | - Zsuzsanna Izsvák
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Society, Berlin, Germany
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK.
| |
Collapse
|
2
|
Li Y, Yi H, Zhu Y. Novel insights into adaptive evolution based on the unusual AT-skew in Acheilognathus gracilis mitogenome and phylogenetic relationships of bitterling. Gene 2024; 902:148154. [PMID: 38218382 DOI: 10.1016/j.gene.2024.148154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/20/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]
Abstract
Acheilognathus gracilis, a bitterling species, distribute in lower reaches of Yangtze River. They are identified as the top-priority bitterling species for conservation as having high evolutionary distinctiveness and are at risk of extinction. In present study, we first sequenced the complete mitogenome of A. gracilis and analyzed its phylogenetic position using 13 PCGs. The A. gracilis mitogenome is 16,774 bp in length, including 13 protein-coding genes, 2 ribosomal RNAs, 22 transfer RNAs, a control region and the origin of the light strand replication. The overall base composition of A. gracilis in descending order is T 27.9 %, A 27.7 %, C 26.1 % and G 18.3 %, shows a unusual AT-skew with slightly negative. Further investigation revealed A. gracilis uses excess T over A in NADH dehydrogenase 5 (nd5), whereas the most of other bitterlings are biased toward to use A not T, implying there is likely to be unique strategy of adaptive evolution in A. gracilis. We also compared 13 PCGs of 30 bitterling mitogenomes and the results exhibit highly conservative. Phylogenetic trees constructed by 13 PCGs strongly support the monophyly of Acheilognathus and the paraphyly of Rhodeus and Tanakia. Current results will provide valuable information for follow-up research on conservation of species facing with serious population decline and can provide novel insights into the phylogenetic analysis and evolutionary biology research.
Collapse
Affiliation(s)
- Yuxuan Li
- College of Fisheries, Engineering Research Center of Green development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Hongbo Yi
- College of Fisheries, Engineering Research Center of Green development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Yurong Zhu
- College of Fisheries, Engineering Research Center of Green development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China; Hubei Provincial Engineering Laboratory for Pond Aquaculture, Hubei, China.
| |
Collapse
|
3
|
Comparative Mitogenome Analyses Uncover Mitogenome Features and Phylogenetic Implications of the Parrotfishes (Perciformes: Scaridae). BIOLOGY 2023; 12:biology12030410. [PMID: 36979102 PMCID: PMC10044791 DOI: 10.3390/biology12030410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 02/28/2023] [Accepted: 03/02/2023] [Indexed: 03/09/2023]
Abstract
In order to investigate the molecular evolution of mitogenomes among the family Scaridae, the complete mitogenome sequences of twelve parrotfish species were determined and compared with those of seven other parrotfish species. The comparative analysis revealed that the general features and organization of the mitogenome were similar among the 19 parrotfish species. The base composition was similar among the parrotfishes, with the exception of the genus Calotomus, which exhibited an unusual negative AT skew in the whole mitogenome. The PCGs showed similar codon usage, and all of them underwent a strong purifying selection. The gene rearrangement typical of the parrotfishes was detected, with the tRNAMet inserted between the tRNAIle and tRNAGln, and the tRNAGln was followed by a putative tRNAMet pseudogene. The parrotfish mitogenomes displayed conserved gene overlaps and secondary structure in most tRNA genes, while the non-coding intergenic spacers varied among species. Phylogenetic analysis based on the thirteen PCGs and two rRNAs strongly supported the hypothesis that the parrotfishes could be subdivided into two clades with distinct ecological adaptations. The early divergence of the sea grass and coral reef clades occurred in the late Oligocene, probably related to the expansion of sea grass habitat. Later diversification within the coral reef clade could be dated back to the Miocene, likely associated with the geomorphology alternation since the closing of the Tethys Ocean. This work provided fundamental molecular data that will be useful for species identification, conservation, and further studies on the evolution of parrotfishes.
Collapse
|
4
|
Characterization of Two New Apodemus Mitogenomes (Rodentia: Muridae) and Mitochondrial Phylogeny of Muridae. DIVERSITY 2022. [DOI: 10.3390/d14121089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Apodemus is the most common small rodent species in the Palearctic realm and an ideal species for biogeographical research and understanding environmental changes. Elucidating phylogenetic relationships will help us better understand species adaptation and genetic evolution. Due to its stable structure, maternal inheritance, and rapid evolution, the mitogenome has become a hot spot for taxonomic and evolutionary studies. In this research, we determined the mitochondrial genome of Apodemus agrarius ningpoensis and Apodemus draco draco and studied the phylogeny of Muridae using ML and BI trees based on all known complete mitogenomes. The mitochondrial genome of Apodemus agrarius ningpoensis was 16,262 bp, whereas that of Apodemus draco draco was 16,222 bp, and both encoded 13 protein-coding genes, 2 ribosomal RNA genes, and 22 transfer RNA genes. Analysis of base composition showed a clear A-T preference. All tRNAs except tRNASer and tRNALys formed a typical trilobal structure. All protein-coding genes contained T- and TAA as stop codons. Phylogeny analysis revealed two main branches in the Muridae family. Apodemus agrarius ningpoensis formed sister species with Apodemus chevrieri, whereas Apodemus draco draco with Apodemus latronum. Our findings provide theoretical basis for future studies focusing on the mitogenome evolution of Apodemus.
Collapse
|
5
|
Li Y, Khandia R, Papadakis M, Alexiou A, Simonov AN, Khan AA. An investigation of codon usage pattern analysis in pancreatitis associated genes. BMC Genom Data 2022; 23:81. [PMID: 36434531 PMCID: PMC9700901 DOI: 10.1186/s12863-022-01089-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Accepted: 10/10/2022] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Pancreatitis is an inflammatory disorder resulting from the autoactivation of trypsinogen in the pancreas. The genetic basis of the disease is an old phenomenon, and evidence is accumulating for the involvement of synonymous/non-synonymous codon variants in disease initiation and progression. RESULTS The present study envisaged a panel of 26 genes involved in pancreatitis for their codon choices, compositional analysis, relative dinucleotide frequency, nucleotide disproportion, protein physical properties, gene expression, codon bias, and interrelated of all these factors. In this set of genes, gene length was positively correlated with nucleotide skews and codon usage bias. Codon usage of any gene is dependent upon its AT and GC component; however, AGG, CGT, and CGA encoding for Arg, TCG for Ser, GTC for Val, and CCA for Pro were independent of nucleotide compositions. In addition, Codon GTC showed a correlation with protein properties, isoelectric point, instability index, and frequency of basic amino acids. We also investigated the effect of various evolutionary forces in shaping the codon usage choices of genes. CONCLUSIONS This study will enable us to gain insight into the molecular signatures associated with the disease that might help identify more potential genes contributing to enhanced risk for pancreatitis. All the genes associated with pancreatitis are generally associated with physiological function, and mutations causing loss of function, over or under expression leads to an ailment. Therefore, the present study attempts to envisage the molecular signature in a group of genes that lead to pancreatitis in case of malfunction.
Collapse
Affiliation(s)
- Yuanyang Li
- Third-Grade Pharmacological Laboratory On Chinese Medicine Approved By State Administration of Traditional Chinese Medicine, Medical College of China Three Gorges, Yichang, China ,grid.254148.e0000 0001 0033 6389College of Medical Science, China Three Gorges University, Yichang, China
| | - Rekha Khandia
- grid.411530.20000 0001 0694 3745Department of Biochemistry and Genetics, Barkatullah University, Bhopal, MP 462026 India
| | - Marios Papadakis
- grid.412581.b0000 0000 9024 6397Department of Surgery II, University Hospital Witten-Herdecke, University of Witten-Herdecke, Heusnerstrasse 40, 42283 Wuppertal, Germany
| | - Athanasios Alexiou
- Department of Science and Engineering, Novel Global Community Educational Foundation, Hebersham, Australia ,AFNP Med Austria, Vienna, Austria
| | | | - Azmat Ali Khan
- grid.56302.320000 0004 1773 5396Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, 11451 Saudi Arabia
| |
Collapse
|
6
|
Khandia R, Saeed M, Alharbi AM, Ashraf GM, Greig NH, Kamal MA. Codon Usage Bias Correlates With Gene Length in Neurodegeneration Associated Genes. Front Neurosci 2022; 16:895607. [PMID: 35860292 PMCID: PMC9289476 DOI: 10.3389/fnins.2022.895607] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/08/2022] [Indexed: 11/13/2022] Open
Abstract
Codon usage analysis is a crucial part of molecular characterization and is used to determine the factors affecting the evolution of a gene. The length of a gene is an important parameter that affects the characteristics of the gene, such as codon usage, compositional parameters, and sometimes, its functions. In the present study, we investigated the association of various parameters related to codon usage with the length of genes. Gene expression is affected by nucleotide disproportion. In sixty genes related to neurodegenerative disorders, the G nucleotide was the most abundant and the T nucleotide was the least. The nucleotide T exhibited a significant association with the length of the gene at both the overall compositional level and the first and second codon positions. Codon usage bias (CUB) of these genes was affected by pyrimidine and keto skews. Gene length was found to be significantly correlated with codon bias in neurodegeneration associated genes. In gene segments with lengths below 1,200 bp and above 2,400 bp, CUB was positively associated with length. Relative synonymous CUB, which is another measure of CUB, showed that codons TTA, GTT, GTC, TCA, GGT, and GGA exhibited a positive association with length, whereas codons GTA, AGC, CGT, CGA, and GGG showed a negative association. GC-ending codons were preferred over AT-ending codons. Overall analysis indicated that the association between CUB and length varies depending on the segment size; however, CUB of 1,200–2,000 bp gene segments appeared not affected by gene length. In synopsis, analysis suggests that length of the genes correlates with various imperative molecular signatures including A/T nucleotide disproportion and codon choices. In the present study we additionally evaluated various molecular features and their correlation with different indices of codon usage, like the Codon Adaptation Index (CAI) and Relative Dynonymous Codon Usage (RSCU) of codons. We also considered the impact of gene fragment size on different molecular features in genes related to neurodegeneration. This analysis will aid our understanding of and in potentially modulating gene expression in cases of defective gene functioning in clinical settings.
Collapse
Affiliation(s)
- Rekha Khandia
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal, India
- *Correspondence: Rekha Khandia, ;
| | - Mohd. Saeed
- Department of Biology, College of Sciences, University of Hail, Hail, Saudi Arabia
| | - Ahmed M. Alharbi
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, University of Hail, Hail, Saudi Arabia
| | - Ghulam Md. Ashraf
- Pre-clinical Research Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Nigel H. Greig
- Drug Design and Development Section, Translational Gerontology Branch, Intramural Research Program National Institute on Aging, NIH, Baltimore, MD, United States
| | - Mohammad Amjad Kamal
- Institutes for Systems Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Pharmacy, Faculty of Allied Health Sciences, Daffodil International University, Dhaka, Bangladesh
- Enzymoics, Novel Global Community Educational Foundation, Hebersham, NSW, Australia
| |
Collapse
|
7
|
Merrikh H, Merrikh C. Reply to: Testing the adaptive hypothesis of lagging-strand encoding in bacterial genomes. Nat Commun 2022; 13:2627. [PMID: 35551437 PMCID: PMC9098457 DOI: 10.1038/s41467-022-30014-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 03/08/2022] [Indexed: 11/09/2022] Open
Affiliation(s)
- Houra Merrikh
- Department of Biochemistry, Vanderbilt University, Nashville, TN, USA.
| | | |
Collapse
|
8
|
Ho AT, Hurst LD. Unusual mammalian usage of TGA stop codons reveals that sequence conservation need not imply purifying selection. PLoS Biol 2022; 20:e3001588. [PMID: 35550630 PMCID: PMC9129041 DOI: 10.1371/journal.pbio.3001588] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 05/24/2022] [Accepted: 04/20/2022] [Indexed: 11/18/2022] Open
Abstract
The assumption that conservation of sequence implies the action of purifying selection is central to diverse methodologies to infer functional importance. GC-biased gene conversion (gBGC), a meiotic mismatch repair bias strongly favouring GC over AT, can in principle mimic the action of selection, this being thought to be especially important in mammals. As mutation is GC→AT biased, to demonstrate that gBGC does indeed cause false signals requires evidence that an AT-rich residue is selectively optimal compared to its more GC-rich allele, while showing also that the GC-rich alternative is conserved. We propose that mammalian stop codon evolution provides a robust test case. Although in most taxa TAA is the optimal stop codon, TGA is both abundant and conserved in mammalian genomes. We show that this mammalian exceptionalism is well explained by gBGC mimicking purifying selection and that TAA is the selectively optimal codon. Supportive of gBGC, we observe (i) TGA usage trends are consistent at the focal stop codon and elsewhere (in UTR sequences); (ii) that higher TGA usage and higher TAA→TGA substitution rates are predicted by a high recombination rate; and (iii) across species the difference in TAA <-> TGA substitution rates between GC-rich and GC-poor genes is largest in genomes that possess higher between-gene GC variation. TAA optimality is supported both by enrichment in highly expressed genes and trends associated with effective population size. High TGA usage and high TAA→TGA rates in mammals are thus consistent with gBGC’s predicted ability to “drive” deleterious mutations and supports the hypothesis that sequence conservation need not be indicative of purifying selection. A general trend for GC-rich trinucleotides to reside at frequencies far above their mutational equilibrium in high recombining domains supports the generality of these results.
Collapse
Affiliation(s)
- Alexander Thomas Ho
- Milner Centre for Evolution, University of Bath, Bath, United Kingdom
- * E-mail:
| | | |
Collapse
|
9
|
ŞEKER PS, SELÇUK AY, SELVİ E, BARAN M, TEBER S, KELEŞ GA, KEFELİOĞLU H, TEZ C, İBİŞ O. Complete mitochondrial genomes of Chionomys roberti and Chionomys nivalis (Mammalia: Rodentia) from Turkey: Insight into their phylogenetic position within Arvicolinae. ORG DIVERS EVOL 2022. [DOI: 10.1007/s13127-022-00559-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
10
|
Hubert B. SkewDB, a comprehensive database of GC and 10 other skews for over 30,000 chromosomes and plasmids. Sci Data 2022; 9:92. [PMID: 35318332 PMCID: PMC8941118 DOI: 10.1038/s41597-022-01179-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 01/25/2022] [Indexed: 11/12/2022] Open
Abstract
GC skew denotes the relative excess of G nucleotides over C nucleotides on the leading versus the lagging replication strand of eubacteria. While the effect is small, typically around 2.5%, it is robust and pervasive. GC skew and the analogous TA skew are a localized deviation from Chargaff’s second parity rule, which states that G and C, and T and A occur with (mostly) equal frequency even within a strand. Different bacterial phyla show different kinds of skew, and differing relations between TA and GC skew. This article introduces an open access database (https://skewdb.org) of GC and 10 other skews for over 30,000 chromosomes and plasmids. Further details like codon bias, strand bias, strand lengths and taxonomic data are also included. The SkewDB can be used to generate or verify hypotheses. Since the origins of both the second parity rule and GC skew itself are not yet satisfactorily explained, such a database may enhance our understanding of prokaryotic DNA. Measurement(s) | Imbalances in the use of DNA nucleotides | Technology Type(s) | Next Generation Sequencing | Factor Type(s) | Position within DNA sequence • Organism type | Sample Characteristic - Organism | bacterium • archaea | Sample Characteristic - Environment | Varying | Sample Characteristic - Location | World |
Collapse
Affiliation(s)
- Bert Hubert
- AHU Holding Research, Nootdorp, Netherlands.
| |
Collapse
|
11
|
Ding L, Luo G, Zhou Q, Sun Y, Liao J. Comparative Mitogenome Analysis of Gerbils and the Mitogenome Phylogeny of Gerbillinae (Rodentia: Muridae). Biochem Genet 2022; 60:2226-2249. [PMID: 35314913 DOI: 10.1007/s10528-022-10213-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 02/24/2022] [Indexed: 11/02/2022]
Abstract
To enrich the mitogenomic database of Gerbillinae (Rodentia: Muridae), mitogenomes of three gerbils from different genera, Meriones tamariscinus (16,393 bp), Brachiones przewalskii (16,357 bp), and Rhombomys opimus (16,352 bp), were elaborated and compared with those of other gerbils in the present study. The three gerbil mitogenomes consisted of 2 ribosomal RNA genes, 13 protein-coding genes (PCGs), 22 transfer RNA genes, and one control region. Here, gerbil mitogenomes have shown unique characteristics in terms of base composition, codon usage, non-coding region, and the replication origin of the light strand. There was no significant correlation between the nucleotide percentage of G + C and the phylogenetic status in gerbils, and between the GC content of PCGs and the leucine count. Phylogenetic relationships of the subfamily Gerbillinae were reconstructed by 7 gerbils that represented four genera based on concatenated mitochondrial DNA data using both Bayesian Inference and Maximum Likelihood. The phylogenetic analysis indicated that M. tamariscinus was phylogenetically distant from the genus Meriones, but has a close relationship with R. opimus. B. przewalskii was closely related to the genus Meriones rather than that of R. opimus.
Collapse
Affiliation(s)
- Li Ding
- School of Life Science and Engineering, Southwest University of Science and Technology, Mianyang, 621010, China.,School of Life Sciences, Lanzhou University, Lanzhou, 730000, China
| | - Guangjie Luo
- School of Life Sciences, Lanzhou University, Lanzhou, 730000, China
| | - Quan Zhou
- School of Life Sciences, Lanzhou University, Lanzhou, 730000, China
| | - Yuanhai Sun
- School of Life Sciences, Lanzhou University, Lanzhou, 730000, China
| | - Jicheng Liao
- School of Life Sciences, Lanzhou University, Lanzhou, 730000, China.
| |
Collapse
|
12
|
Fan Y, Wang W. Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features. BMC Bioinformatics 2021; 22:516. [PMID: 34688247 PMCID: PMC8542328 DOI: 10.1186/s12859-021-04431-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 10/04/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The origin is the starting site of DNA replication, an extremely vital part of the informational inheritance between parents and children. More importantly, accurately identifying the origin of replication has great application value in the diagnosis and treatment of diseases related to genetic information errors, while the traditional biological experimental methods are time-consuming and laborious. RESULTS We carried out research on the origin of replication in a variety of eukaryotes and proposed a unique prediction method for each species. Throughout the experiment, we collected data from 7 species, including Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Kluyveromyces lactis, Pichia pastoris and Schizosaccharomyces pombe. In addition to the commonly used sequence feature extraction methods PseKNC-II and Base-content, we designed a feature extraction method based on TF-IDF. Then the two-step method was utilized for feature selection. After comparing a variety of traditional machine learning classification models, the multi-layer perceptron was employed as the classification algorithm. Ultimately, the data and codes involved in the experiment are available at https://github.com/Sarahyouzi/EukOriginPredict . CONCLUSIONS The prediction accuracy of the training set of the above-mentioned seven species after 100 times fivefold cross validation reach 92.60%, 90.80%, 91.22%, 96.15%, 96.72%, 99.86%, 96.72%, respectively. It denotes that compared with other methods, the methods we designed could accomplish superior performance. In addition, our experiments reveals that the models of multiple species could predict each other with high accuracy, and the results of STREME shows that they have a certain common motif.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.
| | - Wanru Wang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
| |
Collapse
|
13
|
Morales AC, Rice AM, Ho AT, Mordstein C, Mühlhausen S, Watson S, Cano L, Young B, Kudla G, Hurst LD. Causes and Consequences of Purifying Selection on SARS-CoV-2. Genome Biol Evol 2021; 13:evab196. [PMID: 34427640 PMCID: PMC8504154 DOI: 10.1093/gbe/evab196] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/19/2021] [Indexed: 02/06/2023] Open
Abstract
Owing to a lag between a deleterious mutation's appearance and its selective removal, gold-standard methods for mutation rate estimation assume no meaningful loss of mutations between parents and offspring. Indeed, from analysis of closely related lineages, in SARS-CoV-2, the Ka/Ks ratio was previously estimated as 1.008, suggesting no within-host selection. By contrast, we find a higher number of observed SNPs at 4-fold degenerate sites than elsewhere and, allowing for the virus's complex mutational and compositional biases, estimate that the mutation rate is at least 49-67% higher than would be estimated based on the rate of appearance of variants in sampled genomes. Given the high Ka/Ks one might assume that the majority of such intrahost selection is the purging of nonsense mutations. However, we estimate that selection against nonsense mutations accounts for only ∼10% of all the "missing" mutations. Instead, classical protein-level selective filters (against chemically disparate amino acids and those predicted to disrupt protein functionality) account for many missing mutations. It is less obvious why for an intracellular parasite, amino acid cost parameters, notably amino acid decay rate, is also significant. Perhaps most surprisingly, we also find evidence for real-time selection against synonymous mutations that move codon usage away from that of humans. We conclude that there is common intrahost selection on SARS-CoV-2 that acts on nonsense, missense, and possibly synonymous mutations. This has implications for methods of mutation rate estimation, for determining times to common ancestry and the potential for intrahost evolution including vaccine escape.
Collapse
Affiliation(s)
- Atahualpa Castillo Morales
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Alan M Rice
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Alexander T Ho
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Christine Mordstein
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
- Department of Molecular Biology and Genetics, Aarhus University, Denmark
| | - Stefanie Mühlhausen
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Samir Watson
- Department of Molecular Biology and Genetics, Aarhus University, Denmark
| | - Laura Cano
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
| | - Bethan Young
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| |
Collapse
|
14
|
Zhang XL, Liu P, Xu SL, Rizo EZ, Zhang Q, Dumont HJ, Han BP. Geographic Variation of Phyllodiaptomus tunguidus Mitogenomes: Genetic Differentiation and Phylogeny. Front Genet 2021; 12:711992. [PMID: 34531896 PMCID: PMC8439380 DOI: 10.3389/fgene.2021.711992] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 08/03/2021] [Indexed: 12/04/2022] Open
Abstract
Phyllodiaptomus tunguidus (Copepoda: Calanoida) is largely endemic to and widespread in freshwater in southern China, where it inhabits a complex landscape from lowland to highland across an elevation gradient of 2000m. A deep genetic differentiation can be expected between its most distant geographic populations. Here, we sequenced nine mitogenomes from diverse populations. All mitogenomes contained 37 genes, including 13 protein-coding genes (PCG), two rRNA genes, 22 tRNA genes and one control region. Their base composition, genetic distance and tRNA structure indeed revealed a wide differentiation between mitogenomes. Two P. tunguidus from Guangxi near Vietnam differed from the other seven by up to 10.1%. Their tRNA-Arg had a complete clover-leaf structure, whereas that of the others did not contain an entire dihydrouridine arm. The nine mitogenomes also differed in the length of rRNA. NJ, ML, and Bayesian analyses all split them into two clades, viz. the two P. tunguidus from Guangxi (Clade 1), and the other seven (Clade 2). Both the structure and phylogeny of the mitogenomes suggest that P. tunguidus has complex geographic origin, and its populations in Clade 1 have long lived in isolation from those in Clade 2. They currently reach the level of subspecies or cryptic species. An extensive phylogenetic analysis of Copepoda further verified that Diaptomidae is the most recently diverging family in Calanoida and that P. tunguidus is at the evolutionary apex of the family.
Collapse
Affiliation(s)
- Xiao-Li Zhang
- Department of Ecology, Jinan University, Guangzhou, China
| | - Ping Liu
- Department of Ecology, Jinan University, Guangzhou, China
| | - Shao-Lin Xu
- Department of Ecology, Jinan University, Guangzhou, China
| | - Eric Zeus Rizo
- Department of Ecology, Jinan University, Guangzhou, China.,Division of Biological Sciences, College of Arts and Sciences, University of the Philippines Visayas, Iloilo, Philippines
| | - Qun Zhang
- Department of Ecology, Jinan University, Guangzhou, China
| | - Henri J Dumont
- Department of Ecology, Jinan University, Guangzhou, China.,Department of Biology, Ghent University, Ghent, Belgium
| | - Bo-Ping Han
- Department of Ecology, Jinan University, Guangzhou, China
| |
Collapse
|
15
|
de Oliveira JL, Morales AC, Hurst LD, Urrutia AO, Thompson CRL, Wolf JB. Inferring Adaptive Codon Preference to Understand Sources of Selection Shaping Codon Usage Bias. Mol Biol Evol 2021; 38:3247-3266. [PMID: 33871580 PMCID: PMC8321536 DOI: 10.1093/molbev/msab099] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Alternative synonymous codons are often used at unequal frequencies. Classically, studies of such codon usage bias (CUB) attempted to separate the impact of neutral from selective forces by assuming that deviations from a predicted neutral equilibrium capture selection. However, GC-biased gene conversion (gBGC) can also cause deviation from a neutral null. Alternatively, selection has been inferred from CUB in highly expressed genes, but the accuracy of this approach has not been extensively tested, and gBGC can interfere with such extrapolations (e.g., if expression and gene conversion rates covary). It is therefore critical to examine deviations from a mutational null in a species with no gBGC. To achieve this goal, we implement such an analysis in the highly AT rich genome of Dictyostelium discoideum, where we find no evidence of gBGC. We infer neutral CUB under mutational equilibrium to quantify "adaptive codon preference," a nontautologous genome wide quantitative measure of the relative selection strength driving CUB. We observe signatures of purifying selection consistent with selection favoring adaptive codon preference. Preferred codons are not GC rich, underscoring the independence from gBGC. Expression-associated "preference" largely matches adaptive codon preference but does not wholly capture the influence of selection shaping patterns across all genes, suggesting selective constraints associated specifically with high expression. We observe patterns consistent with effects on mRNA translation and stability shaping adaptive codon preference. Thus, our approach to quantifying adaptive codon preference provides a framework for inferring the sources of selection that shape CUB across different contexts within the genome.
Collapse
Affiliation(s)
- Janaina Lima de Oliveira
- Instituto de Biologia, Universidade Federal da Bahia, Salvador, Bahia, 40170-115, Brazil.,Milner Centre for Evolution and Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Atahualpa Castillo Morales
- Milner Centre for Evolution and Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Laurence D Hurst
- Milner Centre for Evolution and Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Araxi O Urrutia
- Milner Centre for Evolution and Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK.,Instituto de Ecologia, UNAM, Ciudad de Mexico 04510, Mexico
| | - Christopher R L Thompson
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Jason B Wolf
- Milner Centre for Evolution and Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| |
Collapse
|
16
|
Shi L, Liu L, Li X, Wu Y, Tian X, Shi Y, Wang Z. Phylogeny and evolution of Lasiopodomys in subfamily Arvivolinae based on mitochondrial genomics. PeerJ 2021; 9:e10850. [PMID: 33777513 PMCID: PMC7977381 DOI: 10.7717/peerj.10850] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 01/06/2021] [Indexed: 01/02/2023] Open
Abstract
The species of Lasiopodomys Lataste 1887 with their related genera remains undetermined owing to inconsistent morphological characteristics and molecular phylogeny. To investigate the phylogenetic relationship and speciation among species of the genus Lasiopodomys, we sequenced and annotated the whole mitochondrial genomes of three individual species, namely Lasiopodomys brandtii Radde 1861, L. mandarinus Milne-Edwards 1871, and Neodon (Lasiopodomys) fuscus Büchner 1889. The nucleotide sequences of the circular mitogenomes were identical for each individual species of L. brandtii, L. mandarinus, and N. fuscus. Each species contained 13 protein-coding genes (PCGs), 22 transfer RNAs, and 2 ribosomal RNAs, with mitochondrial genome lengths of 16,557 bp, 16,562 bp, and 16,324 bp, respectively. The mitogenomes and PCGs showed positive AT skew and negative GC skew. Mitogenomic phylogenetic analyses suggested that L. brandtii, L. mandarinus, and L. gregalis Pallas 1779 belong to the genus Lasiopodomys, whereas N. fuscus belongs to the genus Neodon grouped with N. irene. Lasiopodomys showed the closest relationship with Microtus fortis Büchner 1889 and M. kikuchii Kuroda 1920, which are considered as the paraphyletic species of genera Microtus. TMRCA and niche model analysis revealed that Lasiopodomys may have first appeared during the early Pleistocene epoch. Further, L. gregalis separated from others over 1.53 million years ago (Ma) and then diverged into L. brandtii and L. mandarinus 0.76 Ma. The relative contribution of climatic fluctuations to speciation and selection in this group requires further research.
Collapse
Affiliation(s)
- Luye Shi
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Likuan Liu
- School of Life Sciences, Qinghai Normal University, Xining, Qinghai, China
| | - Xiujuan Li
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Yue Wu
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Xiangyu Tian
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Yuhua Shi
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Zhenlong Wang
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|
17
|
Rice AM, Castillo Morales A, Ho AT, Mordstein C, Mühlhausen S, Watson S, Cano L, Young B, Kudla G, Hurst LD. Evidence for Strong Mutation Bias toward, and Selection against, U Content in SARS-CoV-2: Implications for Vaccine Design. Mol Biol Evol 2021; 38:67-83. [PMID: 32687176 PMCID: PMC7454790 DOI: 10.1093/molbev/msaa188] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Large-scale re-engineering of synonymous sites is a promising strategy to generate vaccines either through synthesis of attenuated viruses or via codon-optimized genes in DNA vaccines. Attenuation typically relies on deoptimization of codon pairs and maximization of CpG dinucleotide frequencies. So as to formulate evolutionarily informed attenuation strategies that aim to force nucleotide usage against the direction favored by selection, here, we examine available whole-genome sequences of SARS-CoV-2 to infer patterns of mutation and selection on synonymous sites. Analysis of mutational profiles indicates a strong mutation bias toward U. In turn, analysis of observed synonymous site composition implicates selection against U. Accounting for dinucleotide effects reinforces this conclusion, observed UU content being a quarter of that expected under neutrality. Possible mechanisms of selection against U mutations include selection for higher expression, for high mRNA stability or lower immunogenicity of viral genes. Consistent with gene-specific selection against CpG dinucleotides, we observe systematic differences of CpG content between SARS-CoV-2 genes. We propose an evolutionarily informed approach to attenuation that, unusually, seeks to increase usage of the already most common synonymous codons. Comparable analysis of H1N1 and Ebola finds that GC3 deviated from neutral equilibrium is not a universal feature, cautioning against generalization of results.
Collapse
Affiliation(s)
- Alan M Rice
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Atahualpa Castillo Morales
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Alexander T Ho
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Christine Mordstein
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, Edinburgh, United Kingdom
| | - Stefanie Mühlhausen
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Samir Watson
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Laura Cano
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, Edinburgh, United Kingdom
| | - Bethan Young
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, Edinburgh, United Kingdom
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, Edinburgh, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| |
Collapse
|
18
|
iterb-PPse: Identification of transcriptional terminators in bacterial by incorporating nucleotide properties into PseKNC. PLoS One 2020; 15:e0228479. [PMID: 32413030 PMCID: PMC7228126 DOI: 10.1371/journal.pone.0228479] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 05/01/2020] [Indexed: 11/19/2022] Open
Abstract
Terminator is a DNA sequence that gives the RNA polymerase the transcriptional termination signal. Identifying terminators correctly can optimize the genome annotation, more importantly, it has considerable application value in disease diagnosis and therapies. However, accurate prediction methods are deficient and in urgent need. Therefore, we proposed a prediction method "iterb-PPse" for terminators by incorporating 47 nucleotide properties into PseKNC-Ⅰ and PseKNC-Ⅱ and utilizing Extreme Gradient Boosting to predict terminators based on Escherichia coli and Bacillus subtilis. Combing with the preceding methods, we employed three new feature extraction methods K-pwm, Base-content, Nucleotidepro to formulate raw samples. The two-step method was applied to select features. When identifying terminators based on optimized features, we compared five single models as well as 16 ensemble models. As a result, the accuracy of our method on benchmark dataset achieved 99.88%, higher than the existing state-of-the-art predictor iTerm-PseKNC in 100 times five-fold cross-validation test. Its prediction accuracy for two independent datasets reached 94.24% and 99.45% respectively. For the convenience of users, we developed a software on the basis of "iterb-PPse" with the same name. The open software and source code of "iterb-PPse" are available at https://github.com/Sarahyouzi/iterb-PPse.
Collapse
|
19
|
Ding L, Zhou Q, Sun Y, Feoktistova NY, Liao J. Two novel cricetine mitogenomes: Insight into the mitogenomic characteristics and phylogeny in Cricetinae (Rodentia: Cricetidae). Genomics 2019; 112:1716-1725. [PMID: 31669701 DOI: 10.1016/j.ygeno.2019.09.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 09/06/2019] [Accepted: 09/18/2019] [Indexed: 01/30/2023]
Abstract
Both Cricetus cricetus and Phodopus sungorus mitochondrial genomes (mitogenomes) were sequenced and elaborated for the first time in the present study. Their mitogenomes contained 37 genes and showed typical characteristics of the vertebrate mitogenome. Comparative analysis of 10 cricetine mitogenomes indicated that they shared similar characteristics with those of other cricetines in terms of genes arrangement, nucleotide composition, codon usage, tRNA structure, nucleotide skew and the origin of replication of light strand. Phylogenetic relationship of the subfamily Cricetinae was reconstructed using mitogenomes data with the methods of Bayesian Inference and Maximum Likelihood. Phylogenetic analysis indicated that Cricetulus kamensis was at basal position and phylogenetically distant from all other Cricetulus species but had a close relationship with the group of Phodopus, and supported that the genus Urocricetus deserved as a separate genus rank. The phylogenetic status of Tscherskia triton represented a separate clade corresponding to a diversified cricetine lineage (Cricetulus, Allocricetulus, and Cricetus).
Collapse
Affiliation(s)
- Li Ding
- School of Life Sciences, Lanzhou University, Lanzhou 730000, PR China.
| | - Quan Zhou
- School of Life Sciences, Lanzhou University, Lanzhou 730000, PR China
| | - Yuanhai Sun
- School of Life Sciences, Lanzhou University, Lanzhou 730000, PR China
| | - Natalia Yu Feoktistova
- A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow 119071, Russia
| | - Jicheng Liao
- School of Life Sciences, Lanzhou University, Lanzhou 730000, PR China.
| |
Collapse
|
20
|
Bohlin J, Pettersson JHO. Evolution of Genomic Base Composition: From Single Cell Microbes to Multicellular Animals. Comput Struct Biotechnol J 2019; 17:362-370. [PMID: 30949307 PMCID: PMC6429543 DOI: 10.1016/j.csbj.2019.03.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 02/28/2019] [Accepted: 03/01/2019] [Indexed: 01/07/2023] Open
Abstract
Whole genome sequencing (WGS) of thousands of microbial genomes has provided considerable insight into evolutionary mechanisms in the microbial world. While substantially fewer eukaryotic genomes are available for analyses the number is rapidly increasing. This mini-review summarizes broadly evolutionary dynamics of base composition in the different domains of life from the perspective of prokaryotes. Common and different evolutionary mechanisms influencing genomic base composition in eukaryotes and prokaryotes are discussed. The conclusion from the data currently available suggests that while there are similarities there are also striking differences in how genomic base composition has evolved within prokaryotes and eukaryotes. For instance, homologous recombination appears to increase GC content locally in eukaryotes due to a non-selective process termed GC-biased gene conversion (gBGC). For prokaryotes on the other hand, increase in genomic GC content seems to be driven by the environment and selection. We find that similar phenomena observed for some organisms in each respective domain may be caused by very different mechanisms: while gBGC and recombination rates appear to explain the negative correlation between GC3 (GC content based on the third codon nucleotides) and genome size in some eukaryotes uptake of AT rich DNA sequences is the main reason for a similar negative correlation observed in prokaryotes. We provide further examples that indicate that base composition in prokaryotes and eukaryotes have evolved under very different constraints.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian Institute of Public Health, Division of Infection Control and Environmental Health, Department of Infectious Disease Epidemiology and Modelling, Lovisenberggata 8, 0456 Oslo, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, PO-Box 222 Skøyen, N-0213 Oslo, Norway.,Norwegian University of Life Sciences, Faculty of Veterinary Sciences, Production Animal Clinical Sciences, Ullevålsveien 72, 0454 Oslo, Norway
| | - John H-O Pettersson
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School the University of Sydney, New South Wales 2006, Australia.,Zoonosis Science Center, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Public Health Agency of Sweden, Nobels vg 18, SE-171 82 Solna, Sweden
| |
Collapse
|
21
|
Yu P, Zhou L, Zhou XY, Yang WT, Zhang J, Zhang XJ, Wang Y, Gui JF. Unusual AT-skew of Sinorhodeus microlepis mitogenome provides new insights into mitogenome features and phylogenetic implications of bitterling fishes. Int J Biol Macromol 2019; 129:339-350. [PMID: 30738158 DOI: 10.1016/j.ijbiomac.2019.01.200] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 01/17/2019] [Accepted: 01/29/2019] [Indexed: 12/25/2022]
Abstract
Sinorhodeus microlepis (S. microlepis) is recently described as a new species and represents a new genus Sinorhodeu of the subfamily Acheilognathinae. In this study, we first sequenced the complete mitogenome of S. microlepis and compared with the other 29 bitterling mitogenomes. The S. microlepis mitogenome is 16,591 bp in length and contains 37 genes. Gene distribution pattern is identical among 30 bitterling mitogenomes. A significant linear correlation between A+T% and AT-skew were found among 29 bitterling mitogenomes, except S. microlepis shows unusual AT-skew with slightly negative in tRNAs and PCGs. Bitterling mitogenomes exhibit highly conserved usage bias of start codon, relative synonymous codons and amino acids, overlaps and non-coding intergenic spacers. Phylogenetic trees constructed by 13 PCGs strongly support the polyphyly of the genus Acheilognathus and the paraphyly of Rhodeus and Tanakia. Together with the unusual characters of S. microlepis mitogenomes and phylogenetic trees, S. microlepis should be a sister species to the genus Rhodeu that might diverge about 13.69 Ma (95% HPD: 12.96-14.48 Ma).
Collapse
Affiliation(s)
- Peng Yu
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, The Innovation Academy of Seed Design, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China; College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, China
| | - Li Zhou
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, The Innovation Academy of Seed Design, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiao-Ya Zhou
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, China
| | - Wen-Tao Yang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, The Innovation Academy of Seed Design, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jun Zhang
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, China
| | - Xiao-Juan Zhang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, The Innovation Academy of Seed Design, Chinese Academy of Sciences, Wuhan 430072, China
| | - Yang Wang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, The Innovation Academy of Seed Design, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Jian-Fang Gui
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, The Innovation Academy of Seed Design, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
22
|
Zhang R, Zhang L, Wang W, Zhang Z, Du H, Qu Z, Li XQ, Xiang H. Differences in Codon Usage Bias between Photosynthesis-Related Genes and Genetic System-Related Genes of Chloroplast Genomes in Cultivated and Wild Solanum Species. Int J Mol Sci 2018; 19:E3142. [PMID: 30322061 PMCID: PMC6213243 DOI: 10.3390/ijms19103142] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 09/30/2018] [Accepted: 10/04/2018] [Indexed: 12/20/2022] Open
Abstract
Solanum is one of the largest genera, including two important crops-potato (Solanum tuberosum) and tomato (Solanum lycopersicum). In this study we compared the chloroplast codon usage bias (CUB) among 12 Solanum species, between photosynthesis-related genes (Photo-genes) and genetic system-related genes (Genet-genes), and between cultivated species and wild relatives. The Photo-genes encode proteins for photosystems, the photosynthetic electron transport chain, and RuBisCO, while the Genet-genes encode proteins for ribosomal subunits, RNA polymerases, and maturases. The following findings about the Solanum chloroplast genome CUB were obtained: (1) the nucleotide composition, gene expression, and selective pressure are identified as the main factors affecting chloroplast CUB; (2) all these 12 chloroplast genomes prefer A/U over G/C and pyrimidines over purines at the third-base of codons; (3) Photo-genes have higher codon adaptation indexes than Genet-genes, indicative of a higher gene expression level and a stronger adaptation of Photo-genes; (4) gene function is the primary factor affecting CUB of Photo-genes but not Genet-genes; (5) Photo-genes prefer pyrimidine over purine, whereas Genet-genes favor purine over pyrimidine, at the third position of codons; (6) Photo-genes are mainly affected by the selective pressure, whereas Genet-genes are under the underlying mutational bias; (7) S. tuberosum is more similar with Solanum commersonii than with Solanum bulbocastanum; (8) S. lycopersicum is greatly different from the analyzed seven wild relatives; (9) the CUB in codons for valine, aspartic acid, and threonine are the same between the two crop species, S. tuberosum and S. lycopersicum. These findings suggest that the chloroplast CUB contributed to the differential requirement of gene expression activity and function between Photo-genes and Genet-genes and to the performance of cultivated potato and tomato.
Collapse
Affiliation(s)
- Ruizhi Zhang
- College of Pharmaceutical Sciences, Southwest University, Chongqing 400715, China.
| | - Li Zhang
- Department of Math and Information, China West Normal University, Nanchong, Sichuan 637000, China.
| | - Wei Wang
- College of Animal Science and Technology, Southwest University, Chongqing 400715, China.
| | - Zhu Zhang
- College of Animal Science and Technology, Southwest University, Chongqing 400715, China.
| | - Huihui Du
- College of Animal Science and Technology, Southwest University, Chongqing 400715, China.
| | - Zheng Qu
- College of Animal Science and Technology, Southwest University, Chongqing 400715, China.
| | - Xiu-Qing Li
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, 850 Lincoln Road, Fredericton, NB E3B 4Z7, Canada.
| | - Heng Xiang
- College of Animal Science and Technology, Southwest University, Chongqing 400715, China.
| |
Collapse
|
23
|
Apostolou-Karampelis K, Nikolaou C, Almirantis Y. A novel skew analysis reveals substitution asymmetries linked to genetic code GC-biases and PolIII a-subunit isoforms. DNA Res 2016; 23:353-63. [PMID: 27345720 PMCID: PMC4991834 DOI: 10.1093/dnares/dsw021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 05/09/2016] [Indexed: 11/30/2022] Open
Abstract
Strand biases reflect deviations from a null expectation of DNA evolution that assumes strand-symmetric substitution rates. Here, we present strong evidence that nearest-neighbour preferences are a strand-biased feature of bacterial genomes, indicating neighbour-dependent substitution asymmetries. To detect such asymmetries we introduce an alignment free index (relative abundance skews). The profiles of relative abundance skews along coding sequences can trace the phylogenetic relations of bacteria, suggesting that the patterns of neighbour-dependent substitution strand-biases are not common among different lineages, but are rather species-specific. Analysis of neighbour-dependent and codon-site skews sheds light on the origins of substitution asymmetries. Via a simple model we argue that the structure of the genetic code imposes position-dependent substitution strand-biases along coding sequences, as a response to GC mutation pressure. Thus, the organization of the genetic code per se can lead to an uneven distribution of nucleotides among different codon sites, even when requirements for specific codons and amino-acids are not accounted for. Moreover, our results suggest that strand-biases in replication fidelity of PolIII α-subunit induce substitution asymmetries, both neighbour-dependent and independent, on a genome scale. The role of DNA repair systems, such as transcription-coupled repair, is also considered.
Collapse
Affiliation(s)
| | - Christoforos Nikolaou
- Computational Genomics Group, Department of Biology, University of Crete, 71409 Heraklion, Greece
| | - Yannis Almirantis
- Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", 15310 Athens, Greece
| |
Collapse
|
24
|
Chen WH, Lu G, Bork P, Hu S, Lercher MJ. Energy efficiency trade-offs drive nucleotide usage in transcribed regions. Nat Commun 2016; 7:11334. [PMID: 27098217 PMCID: PMC4844684 DOI: 10.1038/ncomms11334] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Accepted: 03/16/2016] [Indexed: 01/29/2023] Open
Abstract
Efficient nutrient usage is a trait under universal selection. A substantial part of cellular resources is spent on making nucleotides. We thus expect preferential use of cheaper nucleotides especially in transcribed sequences, which are often amplified thousand-fold compared with genomic sequences. To test this hypothesis, we derive a mutation-selection-drift equilibrium model for nucleotide skews (strand-specific usage of 'A' versus 'T' and 'G' versus 'C'), which explains nucleotide skews across 1,550 prokaryotic genomes as a consequence of selection on efficient resource usage. Transcription-related selection generally favours the cheaper nucleotides 'U' and 'C' at synonymous sites. However, the information encoded in mRNA is further amplified through translation. Due to unexpected trade-offs in the codon table, cheaper nucleotides encode on average energetically more expensive amino acids. These trade-offs apply to both strand-specific nucleotide usage and GC content, causing a universal bias towards the more expensive nucleotides 'A' and 'G' at non-synonymous coding sites.
Collapse
Affiliation(s)
- Wei-Hua Chen
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- Structural and Computational Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Guanting Lu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Peer Bork
- Structural and Computational Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
- Bioinformatics department, Max Delbrück Centre for Molecular Medicine, Berlin 13125, Germany
| | - Songnian Hu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Martin J Lercher
- Institute for Computer Science and Cluster of Excellence on Plant Sciences, Heinrich Heine University, Düsseldorf 40225, Germany
| |
Collapse
|
25
|
Genome Sequence of Rapid Beer-Spoiling Isolate Lactobacillus brevis BSO 464. GENOME ANNOUNCEMENTS 2015; 3:3/6/e01411-15. [PMID: 26634759 PMCID: PMC4669400 DOI: 10.1128/genomea.01411-15] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The genome of brewery-isolate Lactobacillus brevis BSO 464 was sequenced and assembly produced a chromosome and eight plasmids. This bacterium tolerates dissolved CO2/pressure and can rapidly spoil packaged beer. This genome is useful for analyzing the genetics associated with beer spoilage by lactic acid bacteria.
Collapse
|
26
|
Saha SK, Goswami A, Dutta C. Association of purine asymmetry, strand-biased gene distribution and PolC within Firmicutes and beyond: a new appraisal. BMC Genomics 2014; 15:430. [PMID: 24899249 PMCID: PMC4070872 DOI: 10.1186/1471-2164-15-430] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 05/08/2014] [Indexed: 11/10/2022] Open
Abstract
Background The Firmicutes often possess three conspicuous genome features: marked Purine Asymmetry (PAS) across two strands of replication, Strand-biased Gene Distribution (SGD) and presence of two isoforms of DNA polymerase III alpha subunit, PolC and DnaE. Despite considerable research efforts, it is not clear whether the co-existence of PAS, PolC and/or SGD is an essential and exclusive characteristic of the Firmicutes. The nature of correlations, if any, between these three features within and beyond the lineages of Firmicutes has also remained elusive. The present study has been designed to address these issues. Results A large-scale analysis of diverse bacterial genomes indicates that PAS, PolC and SGD are neither essential nor exclusive features of the Firmicutes. PolC prevails in four bacterial phyla: Firmicutes, Fusobacteria, Tenericutes and Thermotogae, while PAS occurs only in subsets of Firmicutes, Fusobacteria and Tenericutes. There are five major compositional trends in Firmicutes: (I) an explicit PAS or G + A-dominance along the entire leading strand (II) only G-dominance in the leading strand, (III) alternate stretches of purine-rich and pyrimidine-rich sequences, (IV) G + T dominance along the leading strand, and (V) no identifiable patterns in base usage. Presence of strong SGD has been observed not only in genomes having PAS, but also in genomes with G-dominance along their leading strands – an observation that defies the notion of co-occurrence of PAS and SGD in Firmicutes. The PolC-containing non-Firmicutes organisms often have alternate stretches of R-dominant and Y-dominant sequences along their genomes and most of them show relatively weak, but significant SGD. Firmicutes having G + A-dominance or G-dominance along LeS usually show distinct base usage patterns in three codon sites of genes. Probable molecular mechanisms that might have incurred such usage patterns have been proposed. Conclusion Co-occurrence of PAS, strong SGD and PolC should not be regarded as a genome signature of the Firmicutes. Presence of PAS in a species may warrant PolC and strong SGD, but PolC and/or SGD not necessarily implies PAS. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-430) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Chitra Dutta
- Structural Biology & Bioinformatics Division, CSIR- Indian Institute of Chemical Biology, 4, Raja S, C, Mullick Road, Kolkata 700032, India.
| |
Collapse
|
27
|
Feng Y, Chen HL, Chiu CH. Differential genomic variation between short- and long-term bacterial evolution revealed by ultradeep sequencing. Genome Biol Evol 2013; 5:572-7. [PMID: 23531725 PMCID: PMC3622303 DOI: 10.1093/gbe/evt031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Mutation and selection are both thought to impact significantly the nucleotide composition of bacterial genomes. Earlier studies have compared closely related strains to obtain mutation patterns based on the hypothesis that these bacterial strains had diverged so recently that selection will not have had enough time to play its role. In this study, we used a SOLiD autosequencer that was based on a dual-base encoding scheme to sequence the genome of Staphylococcus aureus with a mapping coverage of over 5,000×. By directly counting the variation obtained from these ultradeep sequencing reads, we found that A → G was the predominant single-base substitution and 1 bp deletions were the major small indel. These patterns are completely different from those obtained by comparison of closely related S. aureus strains, where C → T accounted for a larger proportion of mutations and deletions were shown to occur at an almost equal frequency to insertion. These findings suggest that the genomic differences between closely related bacterial strains have already undergone selection and are therefore not representative of spontaneous mutation.
Collapse
Affiliation(s)
- Ye Feng
- Genomics Research Center, Harbin Medical University, People's Republic of China.
| | | | | |
Collapse
|
28
|
McLean MA, Tirosh I. Opposite GC skews at the 5' and 3' ends of genes in unicellular fungi. BMC Genomics 2011; 12:638. [PMID: 22208287 PMCID: PMC3315797 DOI: 10.1186/1471-2164-12-638] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 12/30/2011] [Indexed: 11/24/2022] Open
Abstract
Background GC-skews have previously been linked to transcription in some eukaryotes. They have been associated with transcription start sites, with the coding strand G-biased in mammals and C-biased in fungi and invertebrates. Results We show a consistent and highly significant pattern of GC-skew within genes of almost all unicellular fungi. The pattern of GC-skew is asymmetrical: the coding strand of genes is typically C-biased at the 5' ends but G-biased at the 3' ends, with intermediate skews at the middle of genes. Thus, the initiation, elongation, and termination phases of transcription are associated with different skews. This pattern influences the encoded proteins by generating differential usage of amino acids at the 5' and 3' ends of genes. These biases also affect fourfold-degenerate positions and extend into promoters and 3' UTRs, indicating that skews cannot be accounted by selection for protein function or translation. Conclusions We propose two explanations, the mutational pressure hypothesis, and the adaptive hypothesis. The mutational pressure hypothesis is that different co-factors bind to RNA pol II at different phases of transcription, producing different mutational regimes. The adaptive hypothesis is that cytidine triphosphate deficiency may lead to C-avoidance at the 3' ends of transcripts to control the flow of RNA pol II molecules and reduce their frequency of collisions.
Collapse
Affiliation(s)
- Malcolm A McLean
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.
| | | |
Collapse
|