1
|
Cao J, Wang H, Cao Y, Kan S, Li J, Liu Y. Extreme Reconfiguration of Plastid Genomes in Papaveraceae: Rearrangements, Gene Loss, Pseudogenization, IR Expansion, and Repeats. Int J Mol Sci 2024; 25:2278. [PMID: 38396955 PMCID: PMC10888665 DOI: 10.3390/ijms25042278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 02/08/2024] [Accepted: 02/10/2024] [Indexed: 02/25/2024] Open
Abstract
The plastid genomes (plastomes) of angiosperms are typically highly conserved, with extreme reconfiguration being uncommon, although reports of such events have emerged in some lineages. In this study, we conducted a comprehensive comparison of the complete plastomes from twenty-two species, covering seventeen genera from three subfamilies (Fumarioideae, Hypecooideae, and Papaveroideae) of Papaveraceae. Our results revealed a high level of variability in the plastid genome size of Papaveraceae, ranging from 151,864 bp to 219,144 bp in length, which might be triggered by the expansion of the IR region and a large number of repeat sequences. Moreover, we detected numerous large-scale rearrangements, primarily occurring in the plastomes of Fumarioideae and Hypecooideae. Frequent gene loss or pseudogenization were also observed for ndhs, accD, clpP, infA, rpl2, rpl20, rpl32, rps16, and several tRNA genes, particularly in Fumarioideae and Hypecooideae, which might be associated with the structural variation in their plastomes. Furthermore, we found that the plastomes of Fumarioideae exhibited a higher GC content and more repeat sequences than those of Papaveroideae. Our results showed that Papaveroideae generally displayed a relatively conserved plastome, with the exception of Eomecon chionantha, while Fumarioideae and Hypecooideae typically harbored highly reconfigurable plastomes, showing high variability in the genome size, gene content, and gene order. This study provides insights into the plastome evolution of Papaveraceae and may contribute to the development of effective molecular markers.
Collapse
Affiliation(s)
- Jialiang Cao
- College of Plant Protection, Henan Agricultural University, Zhengzhou 450002, China; (J.C.); (H.W.); (Y.C.)
| | - Hongwei Wang
- College of Plant Protection, Henan Agricultural University, Zhengzhou 450002, China; (J.C.); (H.W.); (Y.C.)
| | - Yanan Cao
- College of Plant Protection, Henan Agricultural University, Zhengzhou 450002, China; (J.C.); (H.W.); (Y.C.)
| | - Shenglong Kan
- Marine College, Shandong University, Weihai 264209, China;
| | - Jiamei Li
- College of Life Sciences, Henan Agricultural University, Zhengzhou 450046, China
| | - Yanyan Liu
- College of Plant Protection, Henan Agricultural University, Zhengzhou 450002, China; (J.C.); (H.W.); (Y.C.)
| |
Collapse
|
2
|
Forsdyke DR. Speciation, natural selection, and networks: three historians versus theoretical population geneticists. Theory Biosci 2024; 143:1-26. [PMID: 38282046 DOI: 10.1007/s12064-024-00412-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 01/06/2024] [Indexed: 01/30/2024]
Abstract
In 1913, the geneticist William Bateson called for a halt in studies of genetic phenomena until evolutionary fundamentals had been sufficiently addressed at the molecular level. Nevertheless, in the 1960s, the theoretical population geneticists celebrated a "modern synthesis" of the teachings of Mendel and Darwin, with an exclusive role for natural selection in speciation. This was supported, albeit with minor reservations, by historians Mark Adams and William Provine, who taught it to generations of students. In subsequent decades, doubts were raised by molecular biologists and, despite the deep influence of various mentors, Adams and Provine noted serious anomalies and began to question traditional "just-so-stories." They were joined in challenging the genetic orthodoxy by a scientist-historian, Donald Forsdyke, who suggested that a "collective variation" postulated by Darwin's young research associate, George Romanes, and a mysterious "residue" postulated by Bateson, might relate to differences in short runs of DNA bases (oligonucleotides). The dispute between a small network of historians and a large network of geneticists can be understood in the context of national politics. Contrasts are drawn between democracies, where capturing the narrative makes reversal difficult, and dictatorships, where overthrow of a supportive dictator can result in rapid reversal.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, K7L3N6, Canada.
| |
Collapse
|
3
|
Tietze L, Lale R. Importance of the 5' regulatory region to bacterial synthetic biology applications. Microb Biotechnol 2021; 14:2291-2315. [PMID: 34171170 PMCID: PMC8601185 DOI: 10.1111/1751-7915.13868] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 06/03/2021] [Accepted: 06/04/2021] [Indexed: 01/02/2023] Open
Abstract
The field of synthetic biology is evolving at a fast pace. It is advancing beyond single-gene alterations in single hosts to the logical design of complex circuits and the development of integrated synthetic genomes. Recent breakthroughs in deep learning, which is increasingly used in de novo assembly of DNA components with predictable effects, are also aiding the discipline. Despite advances in computing, the field is still reliant on the availability of pre-characterized DNA parts, whether natural or synthetic, to regulate gene expression in bacteria and make valuable compounds. In this review, we discuss the different bacterial synthetic biology methodologies employed in the creation of 5' regulatory regions - promoters, untranslated regions and 5'-end of coding sequences. We summarize methodologies and discuss their significance for each of the functional DNA components, and highlight the key advances made in bacterial engineering by concentrating on their flaws and strengths. We end the review by outlining the issues that the discipline may face in the near future.
Collapse
Affiliation(s)
- Lisa Tietze
- PhotoSynLabDepartment of BiotechnologyFaculty of Natural SciencesNorwegian University of Science and TechnologyTrondheimN‐7491Norway
| | - Rahmi Lale
- PhotoSynLabDepartment of BiotechnologyFaculty of Natural SciencesNorwegian University of Science and TechnologyTrondheimN‐7491Norway
| |
Collapse
|
4
|
Forsdyke DR. Success of alignment-free oligonucleotide (k-mer) analysis confirms relative importance of genomes not genes in speciation and phylogeny. Biol J Linn Soc Lond 2019. [DOI: 10.1093/biolinnean/blz096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
AbstractThe utility of DNA sequence substrings (k-mers) in alignment-free phylogenetic classification, including that of bacteria and viruses, is increasingly recognized. However, its biological basis eludes many 21st century practitioners. A path from the 19th century recognition of the informational basis of heredity to the modern era can be discerned. Crick’s DNA ‘unpairing postulate’ predicted that recombinational pairing of homologous DNAs during meiosis would be mediated by short k-mers in the loops of stem-loop structures extruded from classical duplex helices. The complementary ‘kissing’ duplex loops – like tRNA anticodon–codon k-mer duplexes – would seed a more extensive pairing that would then extend until limited by lack of homology or other factors. Indeed, this became the principle behind alignment-based methods that assessed similarity by degree of DNA–DNA reassociation in vitro. These are now seen as less sensitive than alignment-free methods that are closely consistent, both theoretically and mechanistically, with chromosomal anti-recombination models for the initiation of divergence into new species. The analytical power of k-mer differences supports the theses that evolutionary advance sometimes serves the needs of nucleic acids (genomes) rather than proteins (genes), and that such differences can play a role in early speciation events.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen’s University, Kingston, Ontario, Canada
| |
Collapse
|
5
|
Codon Usage Heterogeneity in the Multipartite Prokaryote Genome: Selection-Based Coding Bias Associated with Gene Location, Expression Level, and Ancestry. mBio 2019; 10:mBio.00505-19. [PMID: 31138741 PMCID: PMC6538778 DOI: 10.1128/mbio.00505-19] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Prokaryotes represent an ancestral lineage in the tree of life and constitute optimal resources for investigating the evolution of genomes in unicellular organisms. Many bacterial species possess multipartite genomes offering opportunities to study functional variations among replicons, how and where new genes integrate into a genome, and how genetic information within a lineage becomes encoded and evolves. To analyze these issues, we focused on the model soil bacterium Sinorhizobium meliloti, which harbors a chromosome, a chromid (pSymB), a megaplasmid (pSymA), and, in many strains, one or more accessory plasmids. The analysis of several genomes, together with 1.4 Mb of accessory plasmid DNA that we purified and sequenced, revealed clearly different functional profiles associated with each genomic entity. pSymA, in particular, exhibited remarkable interstrain variation and a high density of singletons (unique, exclusive genes) featuring functionalities and modal codon usages that were very similar to those of the plasmidome. All this evidence reinforces the idea of a close relationship between pSymA and the plasmidome. Correspondence analyses revealed that adaptation of codon usages to the translational machinery increased from plasmidome to pSymA to pSymB to chromosome, corresponding as such to the ancestry of each replicon in the lineage. We demonstrated that chromosomal core genes gradually adapted to the translational machinery, reminiscent of observations in several bacterial taxa for genes with high expression levels. Such findings indicate a previously undiscovered codon usage adaptation associated with the chromosomal core information that likely operates to improve bacterial fitness. We present a comprehensive model illustrating the central findings described here, discussed in the context of the changes occurring during the evolution of a multipartite prokaryote genome.IMPORTANCE Bacterial genomes usually include many thousands of genes which are expressed with diverse spatial-temporal patterns and intensities. A well-known evidence is that highly expressed genes, such as the ribosomal and other translation-related proteins (RTRPs), have accommodated their codon usage to optimize translation efficiency and accuracy. Using a bioinformatic approach, we identify core-genes sets with different ancestries, and demonstrate that selection processes that optimize codon usage are not restricted to RTRPs but extended at a genome-wide scale. Such findings highlight, for the first time, a previously undiscovered adaptation strategy associated with the chromosomal-core information. Contrasted with the translationally more adapted genes, singletons (i.e., exclusive genes, including those of the plasmidome) appear as the gene pool with the less-ameliorated codon usage in the lineage. A comprehensive summary describing the inter- and intra-replicon heterogeneity of codon usages in a complex prokaryote genome is presented.
Collapse
|
6
|
Schikora-Tamarit MÀ, Carey LB. Poor codon optimality as a signal to degrade transcripts with frameshifts. Transcription 2018; 9:327-333. [PMID: 30105929 DOI: 10.1080/21541264.2018.1511676] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Frameshifting errors are common and mRNA quality control pathways, such as nonsense-mediated decay (NMD), exist to degrade these aberrant transcripts. Recent work has shown the existence of a genetic link between NMD and codon-usage mediated mRNA decay. Here we present computational evidence that these pathways are synergic for removing frameshifts.
Collapse
Affiliation(s)
- Miquel Àngel Schikora-Tamarit
- a Systems Bioengineering Program, Department of Experimental and Health Sciences , Universitat Pompeu Fabra , Barcelona , Spain
| | - Lucas B Carey
- a Systems Bioengineering Program, Department of Experimental and Health Sciences , Universitat Pompeu Fabra , Barcelona , Spain
| |
Collapse
|
7
|
Mason PH, Domínguez D JF, Winter B, Grignolio A. Hidden in plain view: degeneracy in complex systems. Biosystems 2014; 128:1-8. [PMID: 25543071 DOI: 10.1016/j.biosystems.2014.12.003] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 12/20/2014] [Accepted: 12/23/2014] [Indexed: 12/27/2022]
Abstract
Degeneracy is a word with two meanings. The popular usage of the word denotes deviance and decay. In scientific discourse, degeneracy refers to the idea that different pathways can lead to the same output. In the biological sciences, the concept of degeneracy has been ignored for a few key reasons. Firstly, the word "degenerate" in popular culture has negative, emotionally powerful associations that do not inspire scientists to consider its technical meaning. Secondly, the tendency of searching for single causes of natural and social phenomena means that scientists can overlook the multi-stranded relationships between cause and effect. Thirdly, degeneracy and redundancy are often confused with each other. Degeneracy refers to dissimilar structures that are functionally similar while redundancy refers to identical structures. Degeneracy can give rise to novelty in ways that redundancy cannot. From genetic codes to immunology, vaccinology and brain development, degeneracy is a crucial part of how complex systems maintain their functional integrity. This review article discusses how the scientific concept of degeneracy was imported into genetics from physics and was later introduced to immunology and neuroscience. Using examples of degeneracy in immunology, neuroscience and linguistics, we demonstrate that degeneracy is a useful way of understanding how complex systems function. Reviewing the history and theoretical scope of degeneracy allows its usefulness to be better appreciated, its coherency to be further developed, and its application to be more quickly realized.
Collapse
Affiliation(s)
- P H Mason
- Woolcock Institute of Medical Research, University of Sydney, 431 Glebe Point Road, Glebe, 2037 NSW, Australia.
| | - J F Domínguez D
- Experimental Neuropsychology Research Unit, School of Psychological Sciences, Monash University, Australia
| | - B Winter
- Cognitive and Information Sciences, University of California, Merced 5200 North Lake Rd., Merced, CA 95343, USA
| | - A Grignolio
- Section and Museum of History of Medicine, University of Rome "La Sapienza", viale dell'Università, 34a 00185 Rome, Italy
| |
Collapse
|
8
|
Ponce de Leon M, de Miranda AB, Alvarez-Valin F, Carels N. The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins. Bioinform Biol Insights 2014; 8:93-108. [PMID: 24899802 PMCID: PMC4039185 DOI: 10.4137/bbi.s13161] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2013] [Revised: 11/24/2013] [Accepted: 11/24/2013] [Indexed: 01/02/2023] Open
Abstract
For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional constraints on proteins.
Collapse
Affiliation(s)
- Miguel Ponce de Leon
- Sección Biomatemática, Facultad de Ciencias, Universidad de la República, Iguá, Montevideo, Uruguay
| | - Antonio Basilio de Miranda
- Fundação Oswaldo Cruz (FIOCRUZ), Instituto Oswaldo Cruz (IOC), Laboratório de Genômica Funcional e Bioinformática, Rio de Janeiro, RJ, Brazil
| | - Fernando Alvarez-Valin
- Sección Biomatemática, Facultad de Ciencias, Universidad de la República, Iguá, Montevideo, Uruguay
| | - Nicolas Carels
- Fundação Oswaldo Cruz (FIOCRUZ), Instituto Oswaldo Cruz (IOC), Laboratório de Genômica Funcional e Bioinformática, Rio de Janeiro, RJ, Brazil
| |
Collapse
|
9
|
Forsdyke DR. Implications of HIV RNA structure for recombination, speciation, and the neutralism-selectionism controversy. Microbes Infect 2013; 16:96-103. [PMID: 24211872 DOI: 10.1016/j.micinf.2013.10.017] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Revised: 10/24/2013] [Accepted: 10/24/2013] [Indexed: 11/29/2022]
Abstract
The conflict between the needs to encode both a protein (impaired by non-synonymous mutation), and nucleic acid structure (impaired by synonymous or non-synonymous mutation), can sometimes be resolved in favour of the nucleic acid because its structure is critical for a selectively advantageous genome-wide activity--recombination. However, above a sequence difference threshold, recombination is impaired. It may then be advantageous for new species to arise. Building on the work of Grantham and others critical of the neutralist viewpoint, heuristic support for this hypothesis emerged from studies of the base composition and structure of retroviral genomes. The extreme enrichment in the purine A of the RNA of human immunodeficiency virus (HIV-1), parallels the mild purine-loading of the RNAs of most organisms, for which there is an adaptive explanation--immune evasion. However, human T cell leukaemia virus (HTLV-1), with the potential to invade the same host cell, shows extreme enrichment in the pyrimidine C. Assuming the low GC% HIV and the high GC% HTLV-1 to share a common ancestor, it was postulated that differences in GC% had arisen to prevent homologous recombination between these emerging lentiviral species. Sympatrically isolated by this intracellular reproductive barrier, prototypic HIV-1 seized the AU-rich (low GC%) high ground (thus committing to purine A rather than purine G). Prototypic HTLV-1 forwent this advantage and evolved an independent evolutionary strategy--similar to that of the GC%-rich Epstein-Barr virus--profound latency maintained by transcription of one purine-rich mRNA. The evidence supporting these interpretations is reviewed.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON K7L3N6, Canada.
| |
Collapse
|
10
|
Collin MA, Edgerly JS, Hayashi CY. Comparison of fibroin cDNAs from webspinning insects: insight into silk formation and function. ZOOLOGY 2011; 114:239-46. [PMID: 21741226 DOI: 10.1016/j.zool.2011.01.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2010] [Revised: 01/20/2011] [Accepted: 01/23/2011] [Indexed: 10/18/2022]
Abstract
Embiopterans (webspinning insects) are renowned for their prolific use of silk. These organisms spin silk to construct elaborate networks of tubes in which they live, forage, and reproduce. The silken galleries are essential for protecting these soft-bodied insects from predators and other environmental hazards. Despite the ecological importance of embiopteran silk, very little is known about its constituent proteins. Here, we characterize the silk protein cDNAs from four embiopteran species to better understand the function and evolution of these adaptive molecules. We show that webspinner fibroins (silk proteins) are highly repetitive in sequence and possess several conserved characteristics, despite differences in habitat preferences across species. The most striking similarities are in the codon usage biases of the fibroin genes, particularly in the repetitive regions, as well as sequence conservation of the carboxyl-terminal regions of the fibroins. Based on analyses of the silk genes, we propose hypotheses regarding codon bias and its effect on the translation and replication of these unusual genes. Furthermore, we discuss the significance of specific fibroin motifs to the mechanical and structural characteristics of silk fibers. Lastly, we report that the conservation of webspinner fibroin carboxyl-terminal regions suggests that fiber formation may occur through a mechanism analogous to that found in Lepidoptera. From these results, insight is gained into the tempo and mode of evolution that has shaped embiopteran fibroins.
Collapse
Affiliation(s)
- Matthew A Collin
- Department of Biology, University of California, Riverside, CA 92521, USA.
| | | | | |
Collapse
|
11
|
Korzinov OM, Astakhova TV, Vlasov PK, Roytberg MA. Statistical analysis of DNA sequences in the neighborhood of splice sites. Mol Biol 2011. [DOI: 10.1134/s0026893308010202] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
12
|
Scherrer and Jost’s symposium: the gene concept in 2008. Theory Biosci 2009; 128:157-61. [DOI: 10.1007/s12064-009-0071-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Accepted: 02/03/2009] [Indexed: 10/20/2022]
|
13
|
Yang CM. On the structural regularity in nucleobases and amino acids and relationship to the origin and evolution of the genetic code. ORIGINS LIFE EVOL B 2005; 35:275-95. [PMID: 16228642 DOI: 10.1007/s11084-005-1078-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2003] [Revised: 02/19/2004] [Accepted: 02/19/2004] [Indexed: 10/25/2022]
Abstract
To explore how chemical structures of both nucleobases and amino acids may have played a role in shaping the genetic code, numbers of sp2 hybrid nitrogen atoms in nucleobases were taken as a determinative measure for empirical stereo-electronic property to analyze the genetic code. Results revealed that amino acid hydropathy correlates strongly with the sp2 nitrogen atom numbers in nucleobases rather than with the overall electronic property such as redox potentials of the bases, reflecting that stereo-electronic property of bases may play a role. In the rearranged code, five simple but stereo-structurally distinctive amino acids (Gly, Pro, Val, Thr and Ala) and their codon quartets form a crossed intersection "core". Secondly, a re-categorization of the amino acids according to their beta-carbon stereochemistry, verified by charge density (at beta-carbon) calculation, results in five groups of stereo-structurally distinctive amino acids, the group leaders of which are Gly, Pro, Val, Thr and Ala, remarkably overlapping the above "core". These two lines of independent observations provide empirical arguments for a contention that a seemingly "frozen" "core" could have formed at a certain evolutionary stage. The possible existence of this codon "core" is in conformity with a previous evolutionary model whereby stereochemical interactions may have shaped the code. Moreover, the genetic code listed in UCGA succession together with this codon "core" has recently facilitated an identification of the unprecedented icosikaioctagon symmetry and bi-pyramidal nature of the genetic code.
Collapse
Affiliation(s)
- Chi Ming Yang
- Neurochemistry and System Chemical Biology, Nankai University, Tian Jin, 300071, China.
| |
Collapse
|
14
|
Hu J, Wang J, Xu J, Li W, Han Y, Li Y, Ji J, Ye J, Xu Z, Zhang Z, Wei W, Li S, Wang J, Wang J, Yu J, Yang H. Evolution and variation of the SARS-CoV genome. GENOMICS PROTEOMICS & BIOINFORMATICS 2005; 1:216-25. [PMID: 15629034 PMCID: PMC5172238 DOI: 10.1016/s1672-0229(03)01027-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Knowledge of the evolution of pathogens is of great medical and biological significance to the prevention, diagnosis, and therapy of infectious diseases. In order to understand the origin and evolution of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus), we collected complete genome sequences of all viruses available in GenBank, and made comparative analyses with the SARS-CoV. Genomic signature analysis demonstrates that the coronaviruses all take the TGTT as their richest tetranucleotide except the SARS-CoV. A detailed analysis of the forty-two complete SARS-CoV genome sequences revealed the existence of two distinct genotypes, and showed that these isolates could be classified into four groups. Our manual analysis of the BLASTN results demonstrates that the HE (hemagglutinin-esterase) gene exists in the SARS-CoV, and many mutations made it unfamiliar to us.
Collapse
Affiliation(s)
- Jianfei Hu
- College of Life Sciences, Peking University, Beijing 100871, China
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
| | - Jing Wang
- College of Life Sciences, Peking University, Beijing 100871, China
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
| | - Jing Xu
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
| | - Wei Li
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
| | - Yujun Han
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
| | - Yan Li
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
| | - Jia Ji
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
| | - Jia Ye
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
- James D. Watson Institute of Genome Sciences, Zhijiang Campus, Zhejiang University and Hangzhou Genomics Institute, Hangzhou 310008, China
| | - Zhao Xu
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
| | - Zizhang Zhang
- College of Materials Science and Chemical Engineering, Yuquan Campus, Zhejiang University, Hangzhou 310027, China
| | - Wei Wei
- James D. Watson Institute of Genome Sciences, Zhijiang Campus, Zhejiang University and Hangzhou Genomics Institute, Hangzhou 310008, China
| | - Songgang Li
- College of Life Sciences, Peking University, Beijing 100871, China
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
| | - Jun Wang
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
| | - Jian Wang
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
- James D. Watson Institute of Genome Sciences, Zhijiang Campus, Zhejiang University and Hangzhou Genomics Institute, Hangzhou 310008, China
| | - Jun Yu
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
- James D. Watson Institute of Genome Sciences, Zhijiang Campus, Zhejiang University and Hangzhou Genomics Institute, Hangzhou 310008, China
- Corresponding authors.
| | - Huanming Yang
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
- James D. Watson Institute of Genome Sciences, Zhijiang Campus, Zhejiang University and Hangzhou Genomics Institute, Hangzhou 310008, China
- Corresponding authors.
| |
Collapse
|
15
|
Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A 2004; 101:3480-5. [PMID: 14990797 PMCID: PMC373487 DOI: 10.1073/pnas.0307827100] [Citation(s) in RCA: 230] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Analysis of genome-wide codon bias shows that only two parameters effectively differentiate the genome-wide codon bias of 100 eubacterial and archaeal organisms. The first parameter correlates with genome GC content, and the second parameter correlates with context-dependent nucleotide bias. Both of these parameters may be calculated from intergenic sequences. Therefore, genome-wide codon bias in eubacteria and archaea may be predicted from intergenic sequences that are not translated. When these two parameters are calculated for genes from nonmammalian eukaryotic organisms, genes from the same organism again have similar values, and genome-wide codon bias may also be predicted from intergenic sequences. In mammals, genes from the same organism are similar only in the second parameter, because GC content varies widely among isochores. Our results suggest that, in general, genome-wide codon bias is determined primarily by mutational processes that act throughout the genome, and only secondarily by selective forces acting on translated sequences.
Collapse
Affiliation(s)
- Swaine L Chen
- Department of Developmental Biology, Stanford University School of Medicine, Beckman Center, B300, Stanford, CA 94304, USA.
| | | | | | | | | |
Collapse
|
16
|
Abstract
Rich and Ayala propose that the zero rate of non-amino-acid-changing (synonymous) mutations in some proteins of Plasmodium falciparum reflects a recent population bottleneck. Alternatively, Arnot and Saul propose sequence conservation in response to selective pressures other than the pressure to encode protein. Among these are fold pressure and purine-loading pressure. Genomes adapt to these by acquisition of introns and/or low-complexity (simple-sequence) segments in proteins. Adaptive explanations include facilitation of intragenic recombination (and hence diversification of the encoded protein) by DNA stem-loop secondary structures.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Dept of Biochemistry, Queen's University, Kingston, Ontario, Canada K7L3N6.
| |
Collapse
|
17
|
Wada A, Suyama A. Third letters in codons counterbalance the (G + C)-content of their first and second letters. FEBS Lett 2002. [DOI: 10.1016/0014-5793(85)80389-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
18
|
Lao PJ, Forsdyke DR. Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine. Genome Res 2000; 10:228-36. [PMID: 10673280 PMCID: PMC310832 DOI: 10.1101/gr.10.2.228] [Citation(s) in RCA: 83] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/1999] [Accepted: 12/16/1999] [Indexed: 11/24/2022]
Abstract
When transcription is to the right of the promoter, the "top," mRNA-synonymous strand of DNA tends to be purine-rich. When transcription is to the left of the promoter, the top, mRNA-template strand tends to be pyrimidine-rich. This transcription-direction rule suggests that there has been an evolutionary selection pressure for the purine-loading of RNAs. The politeness hypothesis states that purine-loading prevents distracting RNA-RNA interactions and excessive formation of double-stranded RNA, which might trigger various intracellular alarms. Because RNA-RNA interactions have a distinct entropy-driven component, the pressure for the evolution of purine-loading might be greater in organisms living at high temperatures. In support of this, we find that Chargaff differences (a measure of purine-loading) are greater in thermophiles than in nonthermophiles and extend to both purine bases. In thermophiles the pressure to purine-load affects codon choice, indicating that some features of their amino acid composition (e.g., high levels of glutamic acid) might reflect purine-loading pressure (i.e., constraints on mRNA) rather than direct constraints on protein structure and function.
Collapse
Affiliation(s)
- P J Lao
- Department of Biochemistry, Queen's University, Kingston, Ontario, K7L 3N6, Canada
| | | |
Collapse
|
19
|
Abstract
The nuclear genomes of vertebrates are mosaics of isochores, very long stretches (>>300kb) of DNA that are homogeneous in base composition and are compositionally correlated with the coding sequences that they embed. Isochores can be partitioned in a small number of families that cover a range of GC levels (GC is the molar ratio of guanine+cytosine in DNA), which is narrow in cold-blooded vertebrates, but broad in warm-blooded vertebrates. This difference is essentially due to the fact that the GC-richest 10-15% of the genomes of the ancestors of mammals and birds underwent two independent compositional transitions characterized by strong increases in GC levels. The similarity of isochore patterns across mammalian orders, on the one hand, and across avian orders, on the other, indicates that these higher GC levels were then maintained, at least since the appearance of ancestors of warm-blooded vertebrates. After a brief review of our current knowledge on the organization of the vertebrate genome, evidence will be presented here in favor of the idea that the generation and maintenance of the GC-richest isochores in the genomes of warm-blooded vertebrates were due to natural selection.
Collapse
Affiliation(s)
- G Bernardi
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Napoli, Italy.
| |
Collapse
|
20
|
D'Onofrio G, Jabbari K, Musto H, Bernardi G. The correlation of protein hydropathy with the base composition of coding sequences. Gene 1999; 238:3-14. [PMID: 10570978 DOI: 10.1016/s0378-1119(99)00257-7] [Citation(s) in RCA: 71] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The "universal correlation" (D'Onofrio, G., Bernardi, G., 1992. A universal compositional correlation among codon positions. Gene 110, 81-88.) that holds between <GC3> and <GC1> or <GC2> (<GC> values are the average values of the coding sequences of each genome analyzed) at both the inter- and intra-genomic level, was re-analyzed on a vastly larger dataset. The results showed a slight, but significant, difference in the <GC3> vs. <GC1> correlations exhibited by prokaryotes and eukaryotes. This finding prompted an analysis of the correlation between <GC3> and the amino acid frequencies in the encoded proteins, which has shown that positive correlations exist between <GC3> values of coding sequences and the hydropathy of the corresponding proteins. These correlations are due to the fact that hydrophobic and amphypathic amino acids increase, whereas hydrophilic amino acids decrease with increasing <GC3> values. Hydropathy values of prokaryotic proteins are systematically higher than those of eukaryotes, but the slopes of the regression lines are identical. The lower hydrophobicity of eukaryotic proteins is due to differences in the amino acid composition. In particular, the twofold higher cysteine (and disulfide bond) level of eukaryotic proteins compared to prokaryotic proteins most probably compensates for their lower hydrophobicity. This supports the viewpoint that hydrophobicity plays a structural and functional role as far as protein stability is concerned.
Collapse
Affiliation(s)
- G D'Onofrio
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Napoli, Italy
| | | | | | | |
Collapse
|
21
|
D'Onofrio G, Jabbari K, Musto H, Alvarez-Valin F, Cruveiller S, Bernardi G. Evolutionary genomics of vertebrates and its implications. Ann N Y Acad Sci 1999; 870:81-94. [PMID: 10415475 DOI: 10.1111/j.1749-6632.1999.tb08867.x] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The discovery that the vertebrate genomes of warm-blooded vertebrates are mosaics of isochores, long DNA segments homogeneous in base composition, yet belonging to families covering a broad spectrum of GC levels, has led to two major observations. The first is that gene density is strikingly non-uniform in the genome of all vertebrates, gene concentration increasing with increasing GC levels. (Although the genomes of cold-blooded vertebrates are characterized by smaller compositional heterogeneities than those of warm-blooded vertebrates and high GC levels are not attained, their gene distribution is basically similar to that of warm-blooded vertebrates.) The second observation is that the GC-richest and gene-richest isochores underwent a compositional transition (characterized by a strong increase in GC level) between cold- and warm-blooded vertebrates. Evidence to be discussed favors the idea that this compositional transition and the ensuing highly heterogeneous compositional pattern was due to, and was maintained by, natural selection.
Collapse
Affiliation(s)
- G D'Onofrio
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod 2, Paris, France.
| | | | | | | | | | | |
Collapse
|
22
|
Abstract
In genetic systems there is a non-trivial interface between the sequence of symbols which constitutes the chromosome, or 'genotype', and the products which this sequence encodes--the 'phenotype'. This interface can be thought of as a 'computer'. In this case the chromosome is viewed as an algorithm and the phenotype as the result of the computation. In general, only a small fraction of all possible sequences of symbols makes any sense for a given computer. The difficulty of finding meaningful algorithms by random mutation is known as the brittleness problem. In this paper we show that mutation and crossover favor the emergence of an algorithmic language which facilitates the production of meaningful sequences following random mutations of the genotype. We base our conclusions on an analysis of the population dynamics of a variant of Kitano's neurogenetic model wherein the chromosome encodes the rules for cellular division and the phenotype is a 16-cell organism interpreted as a connectivity matrix for a feed-forward neural network. We show that an algorithmic language emerges, describe this language in extenso, and show how it helps to solve the brittleness problem.
Collapse
Affiliation(s)
- O A Palacios
- Facultad de Ingenieria, UNAM, México D.F., México
| | | | | |
Collapse
|
23
|
|
24
|
Musto H, Rodriguez-Maseda H, Bernardi G. Compositional properties of nuclear genes from Plasmodium falciparum. Gene X 1995; 152:127-32. [PMID: 7828919 DOI: 10.1016/0378-1119(94)00708-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
We have analyzed the compositional distributions of coding sequences and their different codon positions, as well as the codon usage of the nuclear genes of Plasmodium falciparum, a parasite characterized by an extremely GC-poor genome. As expected, coding sequences are AT-rich, codon usage is strongly biased towards A or T in third codon positions, and some particular amino acids (aa) are especially abundant in the encoded proteins. Remarkably, however, no difference was detected between housekeeping (HK) and antigen (Ag) genes, in spite of differences in expression level and evolutionary constraints. Moreover, all the features found in P. falciparum are very similar to those found in a bacterium characterized by a very GC-poor genome, Staphylococcus aureus. These findings stress the importance of compositional constraints in determining codon usage and aa utilisation.
Collapse
Affiliation(s)
- H Musto
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, Paris, France
| | | | | |
Collapse
|
25
|
Abstract
Nucleotide and amino acid sequences can be analyzed and compared by their oligomer compositions. Such methods are fundamentally different from comparison methods based on sequence alignment. They are analogous to the linguistic analysis of human texts. The methods have a wide range of sensitivity and can identify homologous as well as functionally and taxonomically related sequences. Significant sequence dissimilarity can also be identified enabling detection of foreign DNA sequences in genomes, genetic libraries and databases. The simplicity and speed of linguistic methods make them very suitable for database searching and maintenance and as a preliminary step to more specific and time-consuming analysis methods.
Collapse
Affiliation(s)
- S Pietrokovski
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
26
|
Abstract
In this paper we discuss and demonstrate the importance of several factors relative to the relationship between time and evolution of biosequences. In both quantitative and qualitative measurements of the genetic distances, the compositional constraints of the nucleotide sequences play a very important role. We demonstrate that when homologous sequences significantly differ in base composition we get erratic branching order and/or wrong evaluation of the evolutionary rates. We must consider that every gene may have a different evolutionary dynamic along its sequence, generally linked to its functional constraints; this too can seriously affect its clock-like behavior. We report some cases showing how these factors can affect the quantitative measurements of the genetic distances of biosequences.
Collapse
Affiliation(s)
- C Saccone
- Dipartimento di Biochimica e Biologia Molecolare, Universita degli Studi, Bari, Italy
| | | | | |
Collapse
|
27
|
Zhang CT, Chou KC. Graphic analysis of codon usage strategy in 1490 human proteins. JOURNAL OF PROTEIN CHEMISTRY 1993; 12:329-35. [PMID: 8397791 DOI: 10.1007/bf01028195] [Citation(s) in RCA: 27] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The frequencies of bases A (adenine), C (cytosine), G (guanine), and T (thymine) occurring in codon position i, denoted by ai, ci, gi, and ti, respectively (i = 1,2,3), have been calculated and diagrammatized for the 1490 human proteins in the codon usage table for primate genes compiled recently. Based on the characteristic graphs thus obtained, an overall picture of codon base distribution has been provided, and the relevant biological implication discussed. For the first codon position, it is shown in most cases that G is the most dominant base, and that the relationship g1 > a1 > c1 > t1 generally holds true. For the second codon position, A is generally the most dominant base and G is the one with the least occurrence frequently, with the relationship of a2 > t2 > c2 > g2. As to the third codon position, the values of g3 + c3 vary from 0.27 to 1, roughly keeping the relationship of c3 > g3 > a3 = t3 for the majority of cases. Interestingly, if the average frequencies for bases A, C, G, and T are defined as a = (a1 + a2 + a3)/3, c = (c1 + c2 + c3)/3, g = (g1 + g2 + g3)/3, and t = (t1 + t2 + t3)/3, respectively, we find that a2 + c2 + g2 + t2 < 1/3 is valid almost without exception. Such a characteristic inequality might reflect some inherent rule of codon usage, although its biological implications is unclear.(ABSTRACT TRUNCATED AT 250 WORDS)
Collapse
Affiliation(s)
- C T Zhang
- Computational Chemistry, Upjohn Research Laboratories, Kalamazoo, Michigan 49001
| | | |
Collapse
|
28
|
Chou KC, Zhang CT. Diagrammatization of codon usage in 339 human immunodeficiency virus proteins and its biological implication. AIDS Res Hum Retroviruses 1992; 8:1967-76. [PMID: 1493047 DOI: 10.1089/aid.1992.8.1967] [Citation(s) in RCA: 71] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The occurrence frequencies of bases A (adenine), C (cytosine, G (guanine), and T (thymine) occurring in the 1st, 2nd, and 3rd codon positions in the codon usage table of viral genes for the 339 human immunodeficiency virus (HIV) proteins compiled recently have been calculated and diagrammatized. For comparison, the corresponding diagrammatic representations for the 2681 human proteins from the codon usage table for primate genes are also presented. The analyzed results based on these characteristic diagrams indicate that considerably similar features have been found between HIV and human proteins for the 1st and 2nd codon positions; i.e., they are all occupied predominantly by purine, especially base A. However, a significant difference in the 3rd codon position between HIV and human proteins has been observed; i.e., human proteins are of high C + G content and low A + G content in the 3rd codon position, whereas the case is just the opposite for HIV proteins. The biological implication of such a duality on the codon bias of HIV against human proteins is discussed. It is suggested that the 1st and 2nd codon positions can be termed as the structure-determining position, and the 3rd codon position termed as the species-determining position. The diagrammatic representation and analysis method described here possess a great potential for the study of molecular evolution from the viewpoint of the genetic code for which data have been accumulated rapidly and will continue to grow at a much faster pace.
Collapse
Affiliation(s)
- K C Chou
- Upjohn Research Laboratories, Kalamazoo, MI 49001
| | | |
Collapse
|
29
|
Caron F, Ruiz F. A method for the amplification of Paramecium micronuclear DNA by polymerase chain reaction and its application to the central repeats of Paramecium primaurelia G surface antigen genes. THE JOURNAL OF PROTOZOOLOGY 1992; 39:312-8. [PMID: 1578405 DOI: 10.1111/j.1550-7408.1992.tb01321.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
This paper describes a method which allows the amplification of Paramecium micronuclear DNA. Amacronucleate cells are first obtained by an appropriate treatment with nocodazole, a microtubule depolymerizing agent which blocks the elongation of the macronucleus and the distribution of the micronuclei at cell division between the two daughter cells; then, DNA from such cells is amplified by the polymerase chain reaction technique. We have applied this method to the problem of the central repeats of the G surface antigen of P. primaurelia (strain 156). The central repeats consist of a 74 amino acid sequence repeated in tandem. The sequence identity of these repeats is also found in the nucleotide sequence even at silent codon positions, suggesting the existence of a mechanism of identity maintenance acting at the nucleotide level. Mechanisms based on RNA secondary structure which are frequently proposed as an explanation of this phenomenon are unlikely to be valid in this case. One can, therefore, imagine that these repeats might originate from one micronuclear sequence through duplicative processes which could occur during the formation of the macronucleus. We have used the described technique to amplify the micronuclear version of the central repeats and showed that it is identical to the macronuclear version, thus ruling out the above hypothesis. Therefore, intragenic recombination appears to be the most likely explanation of the sequence identity of these central repeats.
Collapse
Affiliation(s)
- F Caron
- Laboratoire de Génétique Moléculaire, Ecole Normale Supérieure, Paris, France
| | | |
Collapse
|
30
|
D'Onofrio G, Mouchiroud D, Aïssani B, Gautier C, Bernardi G. Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins. J Mol Evol 1991; 32:504-10. [PMID: 1908021 DOI: 10.1007/bf02102652] [Citation(s) in RCA: 124] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
We have analyzed the correlation that exists between the GC levels of third and first or second codon position for about 1400 human coding sequences. The linear relationship that was found indicates that the large differences in GC level of third codon positions of human genes are paralleled by smaller differences in GC levels of first and second codon positions. Whereas third codon position differences correspond to very large differences in codon usage within the human genome, the first and second codon position differences correspond to smaller, yet very remarkable, differences in the amino acid composition of encoded proteins. Because GC levels of codon positions are linearly correlated with the GC levels of the isochores harboring the corresponding genes, both codon usage and amino acid composition are different for proteins encoded by genes located in isochores of different GC levels. Furthermore, we have also shown that a linear relationship with a unit slope and a correlation coefficient of 0.77 exists between GC levels of introns and exons from the 238 human genes currently available for this analysis. Introns are, however, about 5% lower in GC, on average, than exons from the same genes.
Collapse
Affiliation(s)
- G D'Onofrio
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, Paris, France
| | | | | | | | | |
Collapse
|
31
|
Pietrokovski S, Hirshon J, Trifonov EN. Linguistic measure of taxonomic and functional relatedness of nucleotide sequences. J Biomol Struct Dyn 1990; 7:1251-68. [PMID: 2363847 DOI: 10.1080/07391102.1990.10508563] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The frequencies of "words", oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence "texts". Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested as a measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.
Collapse
Affiliation(s)
- S Pietrokovski
- Department of Polymer Research, Weizmann Institute of Science, Rehovot, Israel
| | | | | |
Collapse
|
32
|
Gusein-Zade SM, Borodovsky MYu. An improved distribution of codon frequencies allowing for inhomogeneity of DNA's primary-structure evolution. J Biomol Struct Dyn 1990; 7:1185-97. [PMID: 2361006 DOI: 10.1080/07391102.1990.10508555] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
This is a sequel to the paper, where a model which describes ranged series of codon frequencies was proposed. The model was tested against the empirical distributions obtained for the best studied species and was on the whole found to be in fairly good agreement with the available data. The few deviations from the model's predictions were found to have a monotypic regularity. In the present paper we proceed on the assumption that the deviations are due to inhomogeneous conditions of molecular evolution within a genome. This approach makes it possible to elaborate the theory presented earlier. An improved model is derived for the ranged distribution of codon frequencies, which is then tested against the experimental data.
Collapse
|
33
|
Abstract
For the first time it is shown that each of the three codon bases has a general correlation with a different, predictable amino acid property, depending on position within the codon. In addition to the previously recognized link between the mid-base and the hydrophobic-hydrophilic spectrum, we show that, with the exception of G, the first base is generally invariant within a synthetic pathway. G--coded amino acids show a different order, being found only at the head of the synthetic pathways. The redundancy of the nature of the third base has a previously unrecognised relationship with molecular weight. The bases U and A (transversions) are associated with the most sharply defined or opposite states in both the first and second position, C somewhat less so or intermediate, anf G neutral. The apparently systematic nature of these relationships has profound implications for the origin of the genetic code. It appears to be the remains of the first language of the cell, predating the tRNA/ribosome system, persisting with remarkably little change at a deeper level of organisation than the codon language.
Collapse
Affiliation(s)
- F J Taylor
- Department of Botany, University of British Columbia, Vancouver, Canada
| | | |
Collapse
|
34
|
Mita K, Ichimura S, Zama M, James TC. Specific codon usage pattern and its implications on the secondary structure of silk fibroin mRNA. J Mol Biol 1988; 203:917-25. [PMID: 3210244 DOI: 10.1016/0022-2836(88)90117-9] [Citation(s) in RCA: 80] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
We have identified two distinctive regions of the repetitive unit nucleotide sequence of fibroin mRNA of Bombyx mori. The codon usage for the major amino acids, glycine, alanine and serine is distinctly different in these two regions, indicating that it is determined by the fibroin mRNA or gene structure but not by the tRNA population. Comparative computer analyses of nucleotide substitutions in the unit sequence suggest that selection has operated on the codon usage to optimize the secondary structure characteristic of the fibroin mRNA.
Collapse
Affiliation(s)
- K Mita
- Division of Chemistry, National Institute of Radiological Sciences, Chiba, Japan
| | | | | | | |
Collapse
|
35
|
Brinkmann H, Martinez P, Quigley F, Martin W, Cerff R. Endosymbiotic origin and codon bias of the nuclear gene for chloroplast glyceraldehyde-3-phosphate dehydrogenase from maize. J Mol Evol 1987; 26:320-8. [PMID: 3131533 DOI: 10.1007/bf02101150] [Citation(s) in RCA: 100] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The nuclei of plant cells harbor genes for two types of glyceraldehyde-3-phosphate dehydrogenases (GAPDH) displaying a sequence divergence corresponding to the prokaryote/eukaryote separation. This strongly supports the endosymbiotic theory of chloroplast evolution and in particular the gene transfer hypothesis suggesting that the gene for the chloroplast enzyme, initially located in the genome of the endosymbiotic chloroplast progenitor, was transferred during the course of evolution into the nuclear genome of the endosymbiotic host. Codon usage in the gene for chloroplast GAPDH of maize is radically different from that employed by present-day chloroplasts and from that of the cytosolic (glycolytic) enzyme from the same cell. This reveals the presence of subcellular selective pressures which appear to be involved in the optimization of gene expression in the economically important graminaceous monocots.
Collapse
Affiliation(s)
- H Brinkmann
- Laboratoire de Biologie Moléculaire Végétale, CNRS UA 1178, Université de Grenoble I, Saint Martin D'Hères, France
| | | | | | | | | |
Collapse
|
36
|
Li WH. Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J Mol Evol 1987; 24:337-45. [PMID: 3110426 DOI: 10.1007/bf02134132] [Citation(s) in RCA: 174] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The population dynamics of nearly neutral mutations are studied using a single-site and a multisite model. In the latter model, the nucleotides in a sequence are completely linked and the selection schemes employed are additive, multiplicative, and additive with a threshold. Although the third selection scheme is very different from the first two, the three schemes produce identical results for a wide range of parameter values. Thus the present study provides a general theory for the population dynamics of nearly neutral mutations because the results can also be used to draw inferences about other selection schemes such as stabilizing selection and synergistic selection. It is shown that the number of slightly deleterious mutations accumulated in a sequence can be considerably larger under the multisite model than under the single-site model, particularly if the sequence is long or if the mutation rate per site is high. The results show that even a very slight selective difference between synonymous codons can produce a strong bias in codon usage. Three alternative explanations for the strong bias in codon usage in bacterial and yeast genes are considered. The implications of the present results for molecular evolution are discussed.
Collapse
|
37
|
Cantatore P, Saccone C. Organization, structure, and evolution of mammalian mitochondrial genes. INTERNATIONAL REVIEW OF CYTOLOGY 1987; 108:149-208. [PMID: 3312065 DOI: 10.1016/s0074-7696(08)61438-2] [Citation(s) in RCA: 48] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- P Cantatore
- Department of Biochemistry and Molecular Biology, University of Bari, Italy
| | | |
Collapse
|
38
|
Abstract
The genes for four glycolytic enzymes of Trypanosoma brucei have been analyzed. The proteins encoded by these genes show 38-57% identity with their counterparts in other organisms, whether pro- or eukaryotic. These data are consistent with a phylogenetic tree in which trypanosomes diverged very early from the main branch of the eukaryotic lineage. No definite conclusion can be drawn yet about the evolutionary origin of glycosomes, the microbodies of trypanosomes which contain most enzymes of the glycolytic pathway. A bias could be observed in the codon usage of the glycolytic genes and genes for other housekeeping proteins, indicating that trypanosomes may have selected a nucleotide sequence that enables efficient translation. However, the genes for variant surface glycoproteins (VSGs) do not show such a bias. This lack of preference for special codons is explained by the high evolutionary rate that could be observed for VSG genes.
Collapse
|
39
|
|
40
|
Lanave C, Tommasi S, Preparata G, Saccone C. Transition and transversion rate in the evolution of animal mitochondrial DNA. Biosystems 1986; 19:273-83. [PMID: 3801602 DOI: 10.1016/0303-2647(86)90004-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
We present a further application of the stochastic model previously described (Lanave et al., 1984, 1985) for measuring the nucleotide substitution rate in the mammalian evolution of the mitochondrial DNA (mtDNA). The applicability of this method depends on the validity of "stationarity conditions" (equal nucleotide frequencies at first, second and third silent codon positions in homologous protein coding genes). In the comparison of homologous sequences satisfying the stationarity condition at the silent sites, only the four codon families (quartets) for which both transitions and transversions are silent at the third position are considered here. This has allowed us to estimate the transition and transversion rates for any pair of species. We have analyzed the third silent codon position of the triplet rat-mouse-cow, of a series of slightly divergent primates and of two Drosophila species. In terms of two external dating input we have then determined the phylogenetic trees for rat, mouse, and cow as well as for a number of primates including man. The phylogenetic tree that we have derived for the triplet rat, mouse and cow agrees with that we had previously determined by analyzing the first, second and third silent codon positions (in both duets and quartets) of mt genes (Lanave et al., 1985). For primates our method leads to the following branching order from the oldest to the most recent: Gibbon, Orangutan, Gorilla, Chimpanzee and Man. In absolute time, fixing the distance Chimpanzee-Man as 5 million years (Myr) we estimate the dating of the divergence nodes as: Gorilla 7 Myr; Orangutan 16 Myr; Gibbon 20 Myr. In all cases analyzed, the transition rate has been found to be substantially higher than the transversion rate. Moreover we have found that the transition/transversion ratio is different in the various lineages. We suggest that this fact is probably related to the nucleotide frequencies at the third silent codon position.
Collapse
|
41
|
Wada A, Suyama A. Local stability of DNA and RNA secondary structure and its relation to biological functions. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 1986; 47:113-57. [PMID: 2424044 DOI: 10.1016/0079-6107(86)90012-x] [Citation(s) in RCA: 128] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
42
|
Palmieri M, Carsana A, Furia A, Libonati M. Sequence analysis of a cloned cDNA coding for bovine seminal ribonuclease. EUROPEAN JOURNAL OF BIOCHEMISTRY 1985; 152:275-7. [PMID: 3840434 DOI: 10.1111/j.1432-1033.1985.tb09194.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The sequence of a cloned cDNA coding for bovine seminal ribonuclease, an enzyme secreted in the bull seminal vesicles, was determined. The cDNA starts at the amino acid residue 47 and terminates 12 nucleotides beyond the consensus sequence AAUAAA in the 3' non-coding region of the mRNA. Northern blotting analysis shows that the mRNA for bovine seminal ribonuclease consists of about 950 nucleotides, a value that is similar to that of other mRNAs coding for ribonucleases of the pancreatic type.
Collapse
|
43
|
|
44
|
Abstract
Multi-dimensional scaling is applied to our codon space data on the protein coding sequences of DNA from a wide variety of organisms in an attempt to find the smallest number of parameters which will accurately represent these sequences. I find that a three-dimensional representation is satisfactory. One of the three resulting co-ordinates separates eukaryotes and their associated viruses from prokaryotes and their associated phages, while an orthogonal co-ordinate separates those organisms capable of synthesizing proteins (eukaryotes and prokaryotes) from those not so capable (viruses and phages). Mitochondria show no relation in our plots to any of these groups.
Collapse
|
45
|
Abstract
We construct a "codon space" in which a given DNA sequence can be plotted as a function of its base composition in each of the three codon positions. We demonstrate that the base composition is very highly nonrandom, with sequences from more primitive organisms having the least random compositions. By using cluster analysis on the points plotted in codon space we show that there is a strong correlation between base composition and type of organism, with the most primitive organisms having the highest A or T content in the second and third codon positions. A smooth transition toward lower A + T and higher G + C content is observed in the second and third codon positions as the evolutionary complexity of the organism increases. Besides this general trend, more detailed structure can be observed in the clustering that will become clearer as the data base is increased.
Collapse
|
46
|
Mahler HR. The exon:intron structure of some mitochondrial genes and its relation to mitochondrial evolution. INTERNATIONAL REVIEW OF CYTOLOGY 1983; 82:1-98. [PMID: 6352548 DOI: 10.1016/s0074-7696(08)60823-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
47
|
Moorman AF, De Boer PA, De Laaf RT, Destrée OH. Primary structure of the histone H2A and H2B genes and their flanking sequences in a minor histone gene cluster of Xenopus laevis. FEBS Lett 1982; 144:235-41. [PMID: 7117538 DOI: 10.1016/0014-5793(82)80645-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
48
|
Varenne S, Knibiehler M, Cavard D, Morlon J, Lazdunski C. Variable rate of polypeptide chain elongation for colicins A, E2 and E3. J Mol Biol 1982; 159:57-70. [PMID: 6813508 DOI: 10.1016/0022-2836(82)90031-6] [Citation(s) in RCA: 53] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
49
|
Kocherlakota RR, Acland ND. Ambiguity and the evolution of the genetic code. ORIGINS OF LIFE 1982; 12:71-80. [PMID: 7133671 DOI: 10.1007/bf00926913] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The evolution of the genetic code is an extremely complex problem. The addition of a new method by which the code could evolve, however, allows much to be explained about the way in which the present codes (gamma 3 and gamma 3) originated. The idea that ambiguity would allow the length of the codon to change is very useful, since it predicts the distribution of the 4-blocs and 2-blocs in the code, determines where variations in the code are probable, and presents a scenario for the evolution of the code.
Collapse
|
50
|
|