1
|
Cohen D. General Designs Reveal Distinct Codes in Protein-Coding and Non-Coding Human DNA. Genes (Basel) 2022; 13:1970. [PMID: 36360206 PMCID: PMC9690640 DOI: 10.3390/genes13111970] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 08/27/2023] Open
Abstract
This study seeks to investigate distinct signatures and codes within different genomic sequence locations of the human genome. The promoter and other non-coding regions contain sites for the binding of biological particles, for processes such as transcription regulation. The specific rules and sequence codes that govern this remain poorly understood. To derive these (codes), the general designs of sequence are investigated. Genomic signatures are a powerful tool for assessing the general designs of sequence, and cross-comparing different genomic regions for their distinct sequence properties. Through these genomic signatures, the relative non-random properties of sequences are also assessed. Furthermore, a binary components analysis is carried out making use of information theory ideas, to study the RY (purine/pyrimidine), WS (weak/strong) and KM (keto/amino) signatures in the sequences. From this comparison, it is possible to identify the relative importance of these properties within the various protein-coding and non-coding genomic locations. The results show that coding DNA has a strongly non-random WS signature, which reflects the genetic code, and the hydrogen-bond base pairing of codon-anti-codon interactions. In contrast, non-coding locations, such as the promoter, contain a distinct genomic signature. A prominent feature throughout non-coding DNA is a highly non-random RY signature, which is very different in nature to coding DNA, and suggests a structural-based RY code. This marks progress towards deciphering the unknown code(s) in non-protein-coding DNA, and a further understanding of the coding DNA. Additionally, it unravels how DNA carries information. These findings have implications for the most fundamental principles of biology, including knowledge of gene regulation, development and disease.
Collapse
Affiliation(s)
- Dana Cohen
- Ronin Institute, 127 Haddon Pl, Montclair, NJ 07043-2314, USA
| |
Collapse
|
2
|
Forsdyke DR. Success of alignment-free oligonucleotide (k-mer) analysis confirms relative importance of genomes not genes in speciation and phylogeny. Biol J Linn Soc Lond 2019. [DOI: 10.1093/biolinnean/blz096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
AbstractThe utility of DNA sequence substrings (k-mers) in alignment-free phylogenetic classification, including that of bacteria and viruses, is increasingly recognized. However, its biological basis eludes many 21st century practitioners. A path from the 19th century recognition of the informational basis of heredity to the modern era can be discerned. Crick’s DNA ‘unpairing postulate’ predicted that recombinational pairing of homologous DNAs during meiosis would be mediated by short k-mers in the loops of stem-loop structures extruded from classical duplex helices. The complementary ‘kissing’ duplex loops – like tRNA anticodon–codon k-mer duplexes – would seed a more extensive pairing that would then extend until limited by lack of homology or other factors. Indeed, this became the principle behind alignment-based methods that assessed similarity by degree of DNA–DNA reassociation in vitro. These are now seen as less sensitive than alignment-free methods that are closely consistent, both theoretically and mechanistically, with chromosomal anti-recombination models for the initiation of divergence into new species. The analytical power of k-mer differences supports the theses that evolutionary advance sometimes serves the needs of nucleic acids (genomes) rather than proteins (genes), and that such differences can play a role in early speciation events.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen’s University, Kingston, Ontario, Canada
| |
Collapse
|
3
|
Apostolou-Karampelis K, Nikolaou C, Almirantis Y. A novel skew analysis reveals substitution asymmetries linked to genetic code GC-biases and PolIII a-subunit isoforms. DNA Res 2016; 23:353-63. [PMID: 27345720 PMCID: PMC4991834 DOI: 10.1093/dnares/dsw021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 05/09/2016] [Indexed: 11/30/2022] Open
Abstract
Strand biases reflect deviations from a null expectation of DNA evolution that assumes strand-symmetric substitution rates. Here, we present strong evidence that nearest-neighbour preferences are a strand-biased feature of bacterial genomes, indicating neighbour-dependent substitution asymmetries. To detect such asymmetries we introduce an alignment free index (relative abundance skews). The profiles of relative abundance skews along coding sequences can trace the phylogenetic relations of bacteria, suggesting that the patterns of neighbour-dependent substitution strand-biases are not common among different lineages, but are rather species-specific. Analysis of neighbour-dependent and codon-site skews sheds light on the origins of substitution asymmetries. Via a simple model we argue that the structure of the genetic code imposes position-dependent substitution strand-biases along coding sequences, as a response to GC mutation pressure. Thus, the organization of the genetic code per se can lead to an uneven distribution of nucleotides among different codon sites, even when requirements for specific codons and amino-acids are not accounted for. Moreover, our results suggest that strand-biases in replication fidelity of PolIII α-subunit induce substitution asymmetries, both neighbour-dependent and independent, on a genome scale. The role of DNA repair systems, such as transcription-coupled repair, is also considered.
Collapse
Affiliation(s)
| | - Christoforos Nikolaou
- Computational Genomics Group, Department of Biology, University of Crete, 71409 Heraklion, Greece
| | - Yannis Almirantis
- Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", 15310 Athens, Greece
| |
Collapse
|
4
|
Tubiana L, Božič AL, Micheletti C, Podgornik R. Synonymous mutations reduce genome compactness in icosahedral ssRNA viruses. Biophys J 2015; 108:194-202. [PMID: 25564866 DOI: 10.1016/j.bpj.2014.10.070] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Revised: 09/29/2014] [Accepted: 10/08/2014] [Indexed: 12/15/2022] Open
Abstract
Recent studies have shown that single-stranded (ss) viral RNAs fold into more compact structures than random RNA sequences with similar chemical composition and identical length. Based on this comparison, it has been suggested that wild-type viral RNA may have evolved to be atypically compact so as to aid its encapsidation and assist the viral assembly process. To further explore the compactness selection hypothesis, we systematically compare the predicted sizes of >100 wild-type viral sequences with those of their mutants, which are evolved in silico and subject to a number of known evolutionary constraints. In particular, we enforce mutation synonynimity, preserve the codon-bias, and leave untranslated regions intact. It is found that progressive accumulation of these restricted mutations still suffices to completely erase the characteristic compactness imprint of the viral RNA genomes, making them in this respect physically indistinguishable from randomly shuffled RNAs. This shows that maintaining the physical compactness of the genome is indeed a primary factor among ssRNA viruses' evolutionary constraints, contributing also to the evidence that synonymous mutations in viral ssRNA genomes are not strictly neutral.
Collapse
Affiliation(s)
- Luca Tubiana
- Department of Theoretical Physics, Jožef Stefan Institute, Ljubljana, Slovenia.
| | - Anže Lošdorfer Božič
- Department of Theoretical Physics, Jožef Stefan Institute, Ljubljana, Slovenia; Max Planck Institute for Biology of Ageing, Cologne, Germany
| | | | - Rudolf Podgornik
- Department of Theoretical Physics, Jožef Stefan Institute, Ljubljana, Slovenia; Department of Physics, Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia; Department of Physics, University of Massachusetts, Amherst, Massachusetts
| |
Collapse
|
5
|
Rishishwar L, Pant B, Pant K, Pardasani KR. Mining genomic patterns in Mycobacterium tuberculosis H37Rv using a web server Tuber-Gene. GENOMICS PROTEOMICS & BIOINFORMATICS 2011; 9:171-8. [PMID: 22196360 PMCID: PMC5054438 DOI: 10.1016/s1672-0229(11)60020-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2011] [Accepted: 09/01/2011] [Indexed: 11/24/2022]
Abstract
Mycobacterium tuberculosis (MTB), causative agent of tuberculosis, is one of the most dreaded diseases of the century. It has long been studied by researchers throughout the world using various wet-lab and dry-lab techniques. In this study, we focus on mining useful patterns at genomic level that can be applied for in silico functional characterization of genes from the MTB complex. The model developed on the basis of the patterns found in this study can correctly identify 99.77% of the input genes from the genome of MTB strain H37Rv. The model was tested against four other MTB strains and the homologue M. bovis to further evaluate its generalization capability. The mean prediction accuracy was 85.76%. It was also observed that the GC content remained fairly constant throughout the genome, implicating the absence of any pathogenicity island transferred from other organisms. This study reveals that dinucleotide composition is an efficient functional class discriminator for MTB complex. To facilitate the application of this model, a web server Tuber-Gene has been developed, which can be freely accessed at http://www.bifmanit.org/tb2/.
Collapse
Affiliation(s)
- Lavanya Rishishwar
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.
| | | | | | | |
Collapse
|
6
|
Jin NZ, Liu ZX, Qiu WY. Frequency and Correlation of Nearest Neighboring Nucleotides in Human Genome. CHINESE J CHEM PHYS 2009. [DOI: 10.1088/1674-0068/22/01/27-33] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
7
|
Antezana MA, Jordan IK. Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes. PLoS One 2008; 3:e2145. [PMID: 18478116 PMCID: PMC2366069 DOI: 10.1371/journal.pone.0002145] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2007] [Accepted: 03/17/2008] [Indexed: 01/01/2023] Open
Abstract
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.
Collapse
Affiliation(s)
- Marcos A Antezana
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America.
| | | |
Collapse
|
8
|
Koopman WJM, Gort G. Significance tests and weighted values for AFLP similarities, based on Arabidopsis in silico AFLP fragment length distributions. Genetics 2005; 167:1915-28. [PMID: 15342529 PMCID: PMC1471014 DOI: 10.1534/genetics.103.015693] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Many AFLP studies include relatively unrelated genotypes that contribute noise to data sets instead of signal. We developed: (1) estimates of expected AFLP similarities between unrelated genotypes, (2) significance tests for AFLP similarities, enabling the detection of unrelated genotypes, and (3) weighted similarity coefficients, including band position information. Detection of unrelated genotypes and use of weighted similarity coefficients will make the analysis of AFLP data sets more informative and more reliable. Test statistics and weighted coefficients were developed for total numbers of shared bands and for Dice, Jaccard, Nei and Li, and simple matching (dis)similarity coefficients. Theoretical and in silico AFLP fragment length distributions (FLDs) were examined as a basis for the tests. The in silico AFLP FLD based on the Arabidopsis thaliana genome sequence was the most appropriate for angiosperms. The G + C content of the selective nucleotides in the in silico AFLP procedure significantly influenced the FLD. Therefore, separate test statistics were calculated for AFLP procedures with high, average, and low G + C contents in the selective nucleotides. The test statistics are generally applicable for angiosperms with a G + C content of approximately 35-40%, but represent conservative estimates for genotypes with higher G + C contents. For the latter, test statistics based on a rice genome sequence are more appropriate.
Collapse
Affiliation(s)
- Wim J M Koopman
- Nationaal Herbarium Nederland--Wageningen Branch, Biosystematics Group, Wageningen University, 6703 BL Wageningen, The Netherlands.
| | | |
Collapse
|
9
|
Kiriakidou M, Nelson PT, Kouranov A, Fitziev P, Bouyioukos C, Mourelatos Z, Hatzigeorgiou A. A combined computational-experimental approach predicts human microRNA targets. Genes Dev 2004; 18:1165-78. [PMID: 15131085 PMCID: PMC415641 DOI: 10.1101/gad.1184704] [Citation(s) in RCA: 550] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
A new paradigm of gene expression regulation has emerged recently with the discovery of microRNAs (miRNAs). Most, if not all, miRNAs are thought to control gene expression, mostly by base pairing with miRNA-recognition elements (MREs) found in their messenger RNA (mRNA) targets. Although a large number of human miRNAs have been reported, many of their mRNA targets remain unknown. Here we used a combined bioinformatics and experimental approach to identify important rules governing miRNA-MRE recognition that allow prediction of human miRNA targets. We describe a computational program, "DIANA-microT", that identifies mRNA targets for animal miRNAs and predicts mRNA targets, bearing single MREs, for human and mouse miRNAs.
Collapse
Affiliation(s)
- Marianthi Kiriakidou
- Department of Pathology, School of Medicine, Center for Bioinformatics, and Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | | | | | | | | | | | | |
Collapse
|
10
|
Abstract
MicroRNAs (miRNAs) can play important gene regulatory roles in nematodes, insects, and plants by basepairing to mRNAs to specify posttranscriptional repression of these messages. However, the mRNAs regulated by vertebrate miRNAs are all unknown. Here we predict more than 400 regulatory target genes for the conserved vertebrate miRNAs by identifying mRNAs with conserved pairing to the 5' region of the miRNA and evaluating the number and quality of these complementary sites. Rigorous tests using shuffled miRNA controls supported a majority of these predictions, with the fraction of false positives estimated at 31% for targets identified in human, mouse, and rat and 22% for targets identified in pufferfish as well as mammals. Eleven predicted targets (out of 15 tested) were supported experimentally using a HeLa cell reporter system. The predicted regulatory targets of mammalian miRNAs were enriched for genes involved in transcriptional regulation but also encompassed an unexpectedly broad range of other functions.
Collapse
Affiliation(s)
- Benjamin P Lewis
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | | | |
Collapse
|
11
|
Nikolaou C, Almirantis Y. Mutually symmetric and complementary triplets: differences in their use distinguish systematically between coding and non-coding genomic sequences. J Theor Biol 2003; 223:477-87. [PMID: 12875825 DOI: 10.1016/s0022-5193(03)00123-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The general property of asymmetry in word use in meaningful texts written in a variety of languages, motivates a quantification of the differences in the use of mutually symmetric triplets in genomic sequences. When this is done in the three reading frames, high values found for one of them are used as indication that the sequence is coding for a protein. Moreover, a similar quantification of the differences in the use of complementary triplets is introduced, again with predictive power of the coding character of a sequence. This method reflects the non-equivalence between sense and anti-sense strand of a coding segment. In both approaches, "linguistic asymmetry" in coding sequences is related to the form of the genetic code and to the bias in codon usage and amino acid use skews.
Collapse
Affiliation(s)
- Christoforos Nikolaou
- National Research Center for Physical Sciences Demokritos, Institute of Biology, 15310 Athens, Greece
| | | |
Collapse
|
12
|
Som A, Sahoo S, Mukhopadhyay I, Chakrabarti J, Chaudhury R. Scaling violations in coding DNA. EUROPHYSICS LETTERS (EPL) 2003; 62:271-277. [DOI: 10.1209/epl/i2003-00341-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2023]
|
13
|
Cutler JA, Mitchell MJ, Smith MP, Savidge GF. The identification and classification of 41 novel mutations in the factor VIII gene (F8C). Hum Mutat 2002; 19:274-8. [PMID: 11857744 DOI: 10.1002/humu.10056] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Hemophilia A is a bleeding disorder caused by a quantitative or qualitative deficiency in the coagulation factor VIII. Causative mutations are heterogeneous in nature and are distributed throughout the FVIII gene. With the exception of mutations that result in prematurely truncated protein, it has proved difficult to correlate mutation type/amino acid substitution with severity of disease. We have identified 81 mutations in 96 unrelated patients, all of whom have typed negative for the common IVS-22 inversion mutation. Forty-one of these mutations are not recorded on F8C gene mutation databases. We have analyzed these 41 mutations with regard to location, whether or not each is a cross-species conserved region, and type of substitution and correlated this information with the clinical severity of the disease. Our findings support the view that the phenotypic result of a mutation in the FVIII gene correlates more with the position of the amino acid change within the 3D structure of the protein than with the actual nature of the alteration.
Collapse
Affiliation(s)
- J A Cutler
- The Haemophilia Reference Centre, Centre for Thrombosis and Haemostasis, St Thomas' Hospital, Lambeth Palace Road, London, UK.
| | | | | | | |
Collapse
|
14
|
Abstract
In this article, we present some simple yet effective statistical techniques for analysing and comparing large DNA sequences. These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in public domain databases housed in the Internet, we demonstrate how SWORDS can be conveniently used by molecular biologists and geneticists to unmask biologically important features hidden in large sequences and assess their statistical significance.
Collapse
Affiliation(s)
- Probal Chaudhuri
- Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, 203 BT Road, Kolkata 700 108, India.
| | | |
Collapse
|
15
|
Häring D, Kypr J. Variations of the mononucleotide and short oligonucleotide distributions in the genomes of various organisms. J Theor Biol 1999; 201:141-56. [PMID: 10556022 DOI: 10.1006/jtbi.1999.1019] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
We calculated the variation coefficients of the mononucleotide and short oligonucleotide distributions in over 1700 long genomic sequences originating from six organisms to demonstrate that the human and Escherichia coli genomic sequences were the least and the most uniform, respectively. The most non-random genomic distributions were exhibited by the four canonical nucleotides, followed by the strong and weak nucleotides, while the distributions of purine or pyrimidine nucleotides and especially the distributions of (A+C) and (G+T) were significantly more uniform even in the human genome. In the human and mouse genomes, the highest coefficients of variation were further observed with the oligonucleotides where CG was combined with the strong nucleotides while its combination with the weak nucleotides significantly decreased the variation which, however, was still very high. High variation was also exhibited by the remaining oligonucleotides composed exclusively of the strong nucleotides or those containing only weak nucleotides. On the other hand, the distributions of oligonucleotides containing similar and especially the same numbers of the strong and weak nucleotides, but no CG or TA dinucleotide, were the most uniform. The information following from the present analysis will be useful not only in the identification of important genomic regions but also in computer simulations of the genomic nucleotide sequences in order to trace and reproduce the pathways of genome evolution.
Collapse
Affiliation(s)
- D Häring
- Academy of Sciences of the Czech Republic, Královopolská 135, Brno, CZ-61265, Czech Republic
| | | |
Collapse
|
16
|
Oresic M, Shalloway D. Specific correlations between relative synonymous codon usage and protein secondary structure. J Mol Biol 1998; 281:31-48. [PMID: 9680473 DOI: 10.1006/jmbi.1998.1921] [Citation(s) in RCA: 121] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We found significant species-specific correlations between the use of two synonymous codons and protein secondary structure units by comparing the three-dimensional structures of human and Escherichia coli proteins with their mRNA sequences. The correlations are not explained by codon-context, expression level, GC/AU content, or positional effects. The E. coli correlation is between Asn AAC and the C-terminal regions of beta-sheet segments; it may result from selection for translational accuracy, suggesting the hypothesis that downstream Asn residues are important for beta-sheet formation. The correlation in human proteins is between Asp GAU and the N termini of alpha-helices; it may be important for eukaryote-specific sequential, cotranslational folding. The kingdom-specific correlations may reflect kingdom-specific differences in translational mechanisms. The correlations may help identify residues that are important for secondary structure formation, be useful in secondary structure prediction algorithms, and have implications for recombinant gene expression.
Collapse
Affiliation(s)
- M Oresic
- Section of Biochemistry Molecular and Cell Biology, Cornell University, Ithaca, NY 14853, USA
| | | |
Collapse
|
17
|
Modeling dependencies in pre-mRNA splicing signals. COMPUTATIONAL METHODS IN MOLECULAR BIOLOGY 1998. [DOI: 10.1016/s0167-7306(08)60465-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
18
|
Ponte I, Monsalves C, Cabañas M, Martínez P, Suau P. Sequence simplicity and evolution of the 3' untranslated region of the histone H1o gene. J Mol Evol 1996; 43:125-34. [PMID: 8660437 DOI: 10.1007/bf02337357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The H10 gene has a long 3' untranslated region (3'UTR) of 1,125 nucleotides in the rat and 1,310 in humans. Analysis of the sequences shows that they have features of simple DNA that suggest involvement of replication slippage in their evolution. These features include the length imbalance between the rat and human sequences; the abundance of single-base repeats, two-base runs and other simple motifs clustered along the sequence; and the presence of single-base repeat length polymorphisms in the rat and mouse sequences. Pairwise comparisons show numerous short insertions/deletions, often flanked by direct repeats. In addition, a proportion of short insertions/deletions results from length differences in conserved single-base repeats. Quantification of the sequence simplicity shows that simple sequences have been more actively incorporated in the human lineage than in the rodent lineage. The combination of insertions/deletions and nucleotide substitutions along the sequence gives rise to three main regions of homology: a highly variable central region flanked by more conserved regions nearest the coding region and the polyA addition site.
Collapse
Affiliation(s)
- I Ponte
- Departamento de Bioquímica i Biología Molecular, Facultad de Ciencias, Universidad Autónoma de Barcelona, Barcelona, Spain
| | | | | | | | | |
Collapse
|
19
|
Leung MY, Marsh GM, Speed TP. Over- and underrepresentation of short DNA words in herpesvirus genomes. J Comput Biol 1996; 3:345-60. [PMID: 8891954 PMCID: PMC4076300 DOI: 10.1089/cmb.1996.3.345] [Citation(s) in RCA: 50] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
The relative abundance and rarity of DNA words have been recognized in previous biological studies to have implications for the regulation, repair, and evolutionary mechanisms of a genome. In this paper, we review several different measures of abundance and rarity of DNA words, including z-scores, representation ratios, and cross-ratios, that have appeared in the recent literature, and examine the concordance among them using the human cytomegalovirus genome sequence. We then rank all words of length k = 2, ..., 5 of seven herpesvirus genomes according to their abundance, as measured by one of the z-scores based upon a stationary Markov model of order k-2. Using a simple metric on the ranks of 2-words of the seven herpesvirus sequences, we construct an evolutionary tree. Several 3-words are observed to be consistently over- or underrepresented in all seven herpesviruses. Furthermore, clusters of some of the most over- and underrepresented 4- and 5-words in the genomes are identified with functional sites such as the origins of replication and regulatory signals of individual viruses.
Collapse
Affiliation(s)
- M Y Leung
- Division of Mathematics and Statistics, University of Texas at San Antonio 78249, USA.
| | | | | |
Collapse
|
20
|
Pesole G, Attimonelli M, Saccone C. Linguistic approaches to the analysis of sequence information. Trends Biotechnol 1994; 12:401-8. [PMID: 7765386 DOI: 10.1016/0167-7799(94)90028-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Biological macromolecules have many features that resemble modern languages. Thus, linguistic approaches to the analysis of sequence information are becoming powerful tools for deciphering genetic texts. The methodologies used, to date, to determine the global parameters of the genetic language and meaningful patterns within it are described.
Collapse
Affiliation(s)
- G Pesole
- Dipartimento di Biochimica e Biologia Molecolare, Università di Bari, Italy
| | | | | |
Collapse
|
21
|
Musto H, Alvarez F, Tort J, Maseda HR. Dinucleotide biases in the platyhelminth Schistosoma mansoni. Int J Parasitol 1994; 24:277-83. [PMID: 8026908 DOI: 10.1016/0020-7519(94)90039-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The analysis of dinucleotide biases in coding and flanking regions, introns, rDNA and repetitive sequences, in the flatworm Schistosoma mansoni is reported. Except for rDNA, all regions display CpG avoidance and TpG plus CpA excess, which might be evidence of the presence of 5mC. The distribution and hierarchies of dinucleotides differ from the data published for invertebrate and vertebrate coding sequences.
Collapse
Affiliation(s)
- H Musto
- Sección Bioquímica, Facultad de Ciencias, Montevideo, Uruguay
| | | | | | | |
Collapse
|
22
|
Abstract
The pattern of 20,200 point substitutions in the 16 unique neighbor-pair environments has been determined from aligned gene/pseudogene sequences in the current database of human DNA sequences. Substitution rates, representing averages over those for different regions of the genome, are distributed over a 60-fold range with strong biases in particular neighbor-pair environments. The rates for substitutions involving the CG doublet are the most rapid overall, where changes of the C.G pair vary over a tenfold range depending on the type of substitution and the 5' neighbor-pair. In general, the rates are fastest in alternating purine-pyrimidine sequences and slowest in purine.pyrimidine tracts, suggesting that the frequencies of one or both key molecular misadventures that can occur during replication, dNTP misinsertion and transient misalignment, may be associated with structural alternations and flexibility of the backbone. By contrast, purine.pyrimidine tracts are less flexible, less prone to substitution, and therefore their proportions accumulate in sequences over time. Characteristic biases of the content and arrangement of oligonucleotide strings or tuples in all sequence elements, but particularly in non-coding regions, appear to be due to the pattern of different neighbor-dependent substitution rates. Computer simulations of numerous replicative cycles have been carried out with substitutions occurring on the same schedule found in this study for pseudogenes. Statistical analyses of tuple frequencies at periodic intervals during the simulation experiment indicate that sequences slowly change in lexical complexity toward a quasi-equilibrium state that corresponds to that for introns.
Collapse
Affiliation(s)
- S T Hess
- Department of Biochemistry, Microbiology and Molecular Biology, University of Maine, Orono 04469
| | | | | |
Collapse
|
23
|
Mohrenweiser H. International Commission for Protection Against Environmental Mutagens and Carcinogens. Working paper no. 5. Impact of the molecular spectrum of mutational lesions on estimates of germinal gene-mutation rates. Mutat Res 1994; 304:119-37. [PMID: 7506352 DOI: 10.1016/0027-5107(94)90322-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Review of the molecular characteristics of the variants identified at a series of disease loci suggests significant differences among loci in the relative frequency of nucleotide substitutions versus more complex events such as deletions. Some common features are repeatedly observed in each class of variant. For example, a high proportion of the nucleotide substitutions involve transitions of deoxycytidine and are suggested to result from deamination of cytosine at 5-methyl-CpG sites. Similarly, deletions of three or fewer nucleotides are relatively common in the non-nucleotide substitution class and these deletions are often associated with a seven-nucleotide core sequence. A significant fraction of the larger deletions and rearrangements may be associated with repetitive elements. Many of the deletion events do not appear to involve a chromosomal recombination mechanism. Mechanisms involving transcription slippage and chromatid exchange have been suggested as possible alternative mechanisms for generating deletion events. The spectrum of mutational events identified, e.g. nucleotide substitutions versus deletions, differs between loci and is probably a reflection of both the gene structure and the selective pressure to generate a disease phenotype. This locus specificity (at both the biological and molecular level) would appear to have significant potential to compromise estimates of increases in the gene germinal mutation rate following exposure to mutagenic agents.
Collapse
|
24
|
Kalogeropoulos A. Linguistic analysis of chromosome III DNA sequence of Saccharomyces cerevisiae. Yeast 1993; 9:889-905. [PMID: 8212896 DOI: 10.1002/yea.320090809] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The analysis of the Saccharomyces cerevisiae chromosome III DNA sequence by computer ('in silico') permits the definition of its linguistic characteristics. These characteristics include the designation of non-randomly occurring oligonucleotides, their distribution along the chromosome, and the distribution of some particular homopolymers. All these elements may contribute to the understanding of the organization of information on the chromosome.
Collapse
Affiliation(s)
- A Kalogeropoulos
- Institut de Génétique et Microbiologie, Centre Universitaire d'Orsay, France
| |
Collapse
|
25
|
Affiliation(s)
- C H Spruck
- Urologic Cancer Research Laboratory, Kenneth Norris Jr. Comprehensive Cancer Center, University of Southern California, Los Angeles 90033
| | | | | |
Collapse
|
26
|
Abstract
Statistical approaches help in the determination of significant configurations in protein and nucleic acid sequence data. Three recent statistical methods are discussed: (i) score-based sequence analysis that provides a means for characterizing anomalies in local sequence text and for evaluating sequence comparisons; (ii) quantile distributions of amino acid usage that reveal general compositional biases in proteins and evolutionary relations; and (iii) r-scan statistics that can be applied to the analysis of spacings of sequence markers.
Collapse
Affiliation(s)
- S Karlin
- Department of Mathematics, Stanford University, CA 94305
| | | |
Collapse
|
27
|
Nussinov R. Distinct patterns in the dinucleotide nearest neighbors to G/C and A/T oligomers in eukaryotic sequences. J Mol Evol 1991; 33:259-66. [PMID: 1757996 DOI: 10.1007/bf02100677] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The eukaryotic and prokaryotic databases are scanned for potential nearest-neighbor doublet preferences at the 5' and 3' flanks of some oligomers. Here we focus on oligomers containing alternating nucleotides, i.e., UV, UVUV, and UUVV where U not equal to V. Strong, consistent trends are observed in eukaryotic sequences. A/T alternation oligomers are preferentially flanked by A/T. G/C flanks are disfavored. G/C alternation oligomers are preferentially flanked by G/C. A/T flanks are disfavored. These trends are consistent with those observed previously for homooligomer tracts (Nussinov et al. 1989a,b). G/C tracts are preferentially flanked by G/C. A/T nearest neighbors are disfavored. The reverse holds for A/T tracts. Additional patterns are described here as well. The possible origin of these DNA composition and sequence trends is discussed. These trends are suggested to stem from protein-DNA interaction constraints.
Collapse
Affiliation(s)
- R Nussinov
- Laboratory of Mathematical Biology, NCI, NIH, Bethesda, MD 20892
| |
Collapse
|
28
|
Nussinov R. The ordering of nucleotides in the DNA: strong pyrimidine-purine patterns near homooligomer tracts. J Theor Biol 1991; 149:21-42. [PMID: 1881144 DOI: 10.1016/s0022-5193(05)80069-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Here, we study the frequencies of occurrence of homooligomers flanked by one base, XnU or UXn, where X = A, C, G, T and U not equal to X. Specifically, we search for preferences (or discriminations) in their nearest neighbor doublet, VV. Extensive analysis of the data base reveals striking patterns in such VVUXn or UXn VV oligomers (V = A, C, G, T). With very few exceptions, if the VV and Xn are composed of complementary nucleotides, those oligomers having a pyrimidine (Y)-purine (R) junction are preferred over those with an RY one. If the VV and Xn nucleotides are not complementary, the RY junction oligomers are preferred over their YR counterparts. These trends are observed consistently in eukaryotic and prokaryotic sequences. They are particularly striking in the YR greater than RY oligomers containing complementary nucleotides. The general preferences and discriminations described here are in the same direction as our previous results for homooligomer tracts. These recurrences, along with some additional universal "rules", aid in our understanding of the ordering of nucleotides in the DNA.
Collapse
Affiliation(s)
- R Nussinov
- Laboratory of Mathematical Biology, NCI, NIH, Bethesda, MD 20892
| |
Collapse
|
29
|
Johnson AM. Comparison of dinucleotide frequency and codon usage in Toxoplasma and Plasmodium: evolutionary implications. J Mol Evol 1990; 30:383-7. [PMID: 2111850 DOI: 10.1007/bf02101892] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The weight-averaged observed/expected dinucleotide frequencies for the sum total of the coding regions of five Toxoplasma genes were compared with the same parameters previously determined for the coding regions of 21 Plasmodium genes. In addition, codon usage in the five Toxoplasma genes was compared with that in the 21 Plasmodium genes, and the percent distribution of amino acids in the Toxoplasma protein pool and the Plasmodium protein pool were compared with that in a general protein pool of 314 proteins. The results are consistent with the hypothesis that, contrary to currently held opinion, the genera Toxoplasma and Plasmodium are not especially closely related.
Collapse
Affiliation(s)
- A M Johnson
- Department of Clinical Microbiology, Flinders University School of Medicine, Flinders Medical Centre, South Australia
| |
Collapse
|
30
|
Pevzner PA, Borodovsky MYu, Mironov AA. Linguistics of nucleotide sequences. I: The significance of deviations from mean statistical characteristics and prediction of the frequencies of occurrence of words. J Biomol Struct Dyn 1989; 6:1013-26. [PMID: 2531596 DOI: 10.1080/07391102.1989.10506528] [Citation(s) in RCA: 75] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Mathematical models of the generation of genetic texts appeared simultaneously with the first sequencing DNA. They are used to establish functional and evolutionary relations between genetic texts, to predict the number and distribution of specific sites in a sequence and to identify "meaningful" words. The present paper deals with two problems: 1) The significance of deviations from the mean statistical characteristics in a genetic text. Anyone who has addressed himself to the statistical analysis of sequenced DNA is familiar with the question: what deviations from the expected frequencies of occurrence of particular words testify to the "biological" significance of those words? We propose a formula for the variance of the number of word's occurrences in the text, with allowance for word overlaps, making it possible to assess the significance of the deviations from the expected statistical characteristics. 2) A new method for predicting the frequencies of occurrence of particular words in a genetic text using the statistical characteristics of "spaced" L-grams. The method can be used for predicting the number of restriction sites in human DNA and in planning experiments on the physical mapping and sequencing of the human genome.
Collapse
Affiliation(s)
- P A Pevzner
- Institute for Genetics of Microorganisms, Moscow, USSR
| | | | | |
Collapse
|
31
|
Nussinov R, Sarai A, Smythers GW, Jernigan RL. Sequence context of oligomer tracts in eukaryotic DNA: biological and conformational implications. J Biomol Struct Dyn 1988; 6:543-62. [PMID: 3271538 DOI: 10.1080/07391102.1988.10506506] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Recent studies of homooligomer tracts suggest different characteristics from random sequence DNA (dA).(dT) and (dG).(dC) tracts are frequent in upstream regions and in some cases have been shown to be essential for regulation. Here we examine homooligomer occurrences in non-coding and coding eukaryotic sequences, focusing on the context in which the homooligomers occur. This analysis of sequences in the junction areas yields distinct and consistent characteristics. In particular, the nucleotide interrupting a run is most frequently complementary to the run. The base next to it is most frequently identical to the one constituting the run. For A or T runs the least frequent nearest and next to nearest neighbors are G or C. For G or C tracts the least frequent are A or T. Complementary oligomers behave similarly. These and additional trends are strongest for run lengths greater than or equal to 3. The computations are carried out on the whole eukaryotic database of greater than 4 x 10(6) nucleotides, separately for coding and non-coding regions. These same trends are evident for both groups, but are somewhat stronger for the non-coding regions. The context in which the homooligomers occur may yield some clues to DNA conformation and its biological implications.
Collapse
Affiliation(s)
- R Nussinov
- Laboratory of Mathematical Biology, National Cancer Institute, Bethesda, Maryland 20892
| | | | | | | |
Collapse
|
32
|
Hanai R, Wada A. The effects of guanine and cytosine variation on dinucleotide frequency and amino acid composition in the human genome. J Mol Evol 1988; 27:321-5. [PMID: 3146642 DOI: 10.1007/bf02101194] [Citation(s) in RCA: 29] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
One hundred twelve human DNA sequences were analyzed with respect to dinucleotide frequency and amino acid composition. The variation in guanine and cytosine (G + C) content revealed: (1) at 2-3 and 3-1 doublet positions CG discrimination is attenuated at high G + C, but TA disfavor is enhanced, and (2) several amino acids are subject to G + C change. These findings have been reported in part for collections of sequences from various species. The present study confirms that in a single organism--the human--the G + C effects do exist. Aspects of the argument that connects G + C with protein thermal stability are also discussed.
Collapse
Affiliation(s)
- R Hanai
- Department of Physics, Faculty of Science, University of Tokyo, Japan
| | | |
Collapse
|
33
|
Wong JT, Cedergren R. Natural selection versus primitive gene structure as determinant of codon usage. EUROPEAN JOURNAL OF BIOCHEMISTRY 1986; 159:175-80. [PMID: 3091367 DOI: 10.1111/j.1432-1033.1986.tb09849.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Different codons are not utilized equally in known gene sequences. One of the important biases of codon usage is observed in the form of an enrichment of RNY codons, especially within RNN codon families. Such biases could represent the residue of a primitive repeating-RNY gene structure, or the outcome of natural selection, or both. Analyses based on the rates of silent substitutions, the frequencies of base doublets, and synonymous codon ratios for Escherichia coli, yeast, Drosophila and Xenopus proteins have been performed. The results rule out any significant support for a primitive repeating-RNY or repeating-RRY gene structure, and establish the important role of natural selection in determining the choice of codons. With strong intervention by natural selection, the relationship between primitive gene structure and codon usage necessarily becomes minimal.
Collapse
|
34
|
Brendel V, Beckmann JS, Trifonov EN. Linguistics of nucleotide sequences: morphology and comparison of vocabularies. J Biomol Struct Dyn 1986; 4:11-21. [PMID: 3078230 DOI: 10.1080/07391102.1986.10507643] [Citation(s) in RCA: 120] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The concept of "words" in continuous languages devoid of blanks is introduced and an operational definition of words given. With this novel concept nucleotide sequences become object for linguistic analysis. The typical word size of the nucleotide language is found to be 3 to 5 (tri- to pentamers). Different genomes have distinct vocabularies. Comparison of these vocabularies can serve as a basis for revealing functional and evolutionary relatedness of sequences.
Collapse
Affiliation(s)
- V Brendel
- Department of Polymer Research, Weizmann Institute of Science Rehovot, Israel
| | | | | |
Collapse
|
35
|
Kolaskar AS, Reddy BV. Contextual constraints on codon pair usage: structural and biological implications. J Biomol Struct Dyn 1986; 3:725-38. [PMID: 3271046 DOI: 10.1080/07391102.1986.10508458] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Complementary DNA sequence data of 278 protein coding genes from prokaryotic systems have been analysed at the level of near neighbour codon pairs. Our analysis points out that constraints exist even at the level of near neighbour codon pairs. These constraints are in addition to those which arise due to relative levels of tRNA. Codon pairs, which in the data base have different occurrence values from their expected values, neither have common secondary structure nor do have better stabilization due to high base stacking. Our study points out that there are strong interaction between constituent codons in these codon pairs. These strongly interacting codon pairs, we suggest, are involved in the formation of three dimensional structural elements of cDNA/mRNA and interact with ribosome and thus modulate translation.
Collapse
Affiliation(s)
- A S Kolaskar
- Centre for Cellular and Molecular Biology, Hyderabad, India
| | | |
Collapse
|
36
|
Abstract
The entire biosynthetic pathway of PTH has been elucidated from the determination of the chromosomal location to the eventual secretion of the hormone from the cell. The human gene is present on the short arm of chromosome 11, and restriction site polymorphisms near the gene have been detected. The PTH genes and cDNAs have been isolated and characterized in the bovine, human, and rat species. The gene contains two introns, which are in the same position in each species, and dissect the gene into 3 exons that code, respectively, for the 5' untranslated region, the signal peptide, and PTH plus the 3' untranslated region. The mRNAs are about twice as long as necessary to code for preProPTH and contain a 7-methylquanosine cap at the 5' terminus and polyadenylic acid at the 3' terminus. The 5' termini of the bovine and human mRNAs are heterogeneous at the 5' terminus, the basis of which is two TATA sequences in the 5' flanking regions of the gene. In contrast, the rat gene contains a single TATA sequence and the mRNA has a single 5' terminus. The initial translational product of the mRNA is preProPTH, and the pre-peptide of 25 amino acids is equivalent to signal peptides of other secreted and membrane proteins. The genes of the three species are very homologous in the region that codes for preProPTH. Substantial homology is also retained in the gene flanking regions, introns, and mRNA untranslated regions. Silent sites are also conserved more than would be expected, particularly between the human and bovine sequences. The bovine and human sequences are more closely related than the rat is to either the human or bovine. These studies of the basic molecular biology of PTH will provide the framework for future analysis of significant biological and medical questions. In vitro mutagenesis techniques should soon provide information about the elements of the gene involved in regulating transcription and about functional elements of the signal peptide. Eventually, signals involved in directing the ProPTH molecule to secretory granules as well as the biologically active regions of PTH, itself, will be examined by these methods. The molecular biological studies, combined with the development of dispersed cell cultures, provide the opportunity to study the effects of chronic changes in calcium on gene transcription and mRNA metabolism. The restriction site polymorphisms associated with the human PTH gene will allow a search for correlations between PTH gene structure and parathyroid disease.(ABSTRACT TRUNCATED AT 400 WORDS)
Collapse
|
37
|
Lennon GG, Nussinov R. Eukaryotic oligomer frequencies are correlated with certain DNA helical parameters. J Theor Biol 1985; 116:427-33. [PMID: 4058030 DOI: 10.1016/s0022-5193(85)80279-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Nucleotide sequences were converted into purine (R)-pyrimidine (Y) series and divided into several groups, embracing higher and lower organisms. The frequencies of R-Y doublets, triplets and quartets in each were calculated. Whereas eukaryotes uniformly show RR + YY greater than RY + YR, in bacteria and phage no such relationship is observed. The triplet and quartet patterns in higher organisms differ from those seen in prokaryotes. In the higher organisms a correlation is observed between the frequencies of triplets and quartets and some DNA structural parameters. Specifically, the most frequent triplets are those with minimal torsion angle deviations from a regular B-DNA. The most frequent quartets are those with minimal roll angle deviations. No such correlations are observed in prokaryotes. We therefore propose that in eukaryotic DNA, tight, smooth packaging imposes sequence constraints.
Collapse
|
38
|
Kolaskar AS, Reddy BVB. Complimentary DNA sequence data analysis of prokaryotic systems. J Biosci 1985. [DOI: 10.1007/bf02716766] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
39
|
Abstract
Analysis of the sequence data available today, comprising more than 500,000 bases, confirms the previously observed phenomenon that there are distinct dinucleotide preferences in DNA sequences. Consistent behaviour is observed in the major sequence groups analysed here in prokaryotes, eukaryotes and mitochondria. Some doublet preferences are common to all groups and are found in most sequences of the Los Alamos Library. The patterns seen in such large data sets are very significant statistically and biologically. Since they are present in numerous and diverse nucleotide sequences, one may conclude that they confer evolutionary advantages on the organism. In eukaryotes RR and YY dinucleotides are preferred over YR and RY (where R is a purine and Y a pyrimidine). Since opposite-chain nearest-neighbour purine clashes are major determinants of DNA structure, it appears that the tight packaging of DNA in nucleosomes disfavors, in general, such (YR and RY) steric repulsion.
Collapse
|
40
|
Abstract
This paper is concerned primarily with how information is stored in viral DNA. The general problem of defining information content is discussed and a procedure for analysis extended from that of Gatlin (1972) is developed. Long range correlations in base sequences are analyzed for several viral genomes. The relationship of these correlations to the existence of strong codon biases is examined and the consequences discussed.
Collapse
|
41
|
Abstract
The relationship between degeneracy in the genetic code and the occurrence of a strong codon bias is examined, with particular reference to a group of viral genomes. The present paper shows how codon bias may have been imposed by thermodynamic considerations at the time the primitive DNA first formed in the primordial soup. Using a four-state Ising-like model with stacking interactions between successive base pairs, we show how primeval periodic DNA polymers could have arisen the remnants of which are still observed in codon biases today.
Collapse
|
42
|
Abstract
The sequence of bovine parathyroid hormone mRNA has been determined by sequence analysis of near full-length cloned DNA complementary to the mRNA. Restriction fragments hybridized to the mRNA and extended toward the 5' terminus with reverse transcriptase were analyzed to derive the sequence not present in cDNA. The reverse transcripts were heterogeneous in length with three major stopping points within 8 nucleotides of each other and a minor stop about 30 bases further toward the 5' terminus of the mRNA. The sequence of the gene corresponding to the minor reverse transcript begins with the sequence 5' XXXATATATAAAA which contains the consensus sequence for a TATA box, a putative eukaryotic promoter sequence. Assuming that the major reverse transcriptase stop nearest the 5' terminus of the mRNA, which is 24 bases downstream from the TATA box, represents the beginning of bovine PTH mRNA, the mRNA contains 672 nucleotides, 100 in the 5' noncoding region, 348 in the coding region and 224 in the 3' noncoding region. Bovine PTH mRNA contains 38% G and C bases. The 3' noncoding region is particularly rich in A and U bases with the last 100 nucleotides of the molecules containing 46% U and 32% A. As with other mRNAs, the sequences CG and UAG occur much less than expected. The 5' noncoding region does not contain an AUG before the initiator codon and contains two potential regions that could base-pair with sequences near the 3' terminus of 18S ribosomal RNA. The sequence AAUAAA is present 14 nucleotides from the polyadenylic acid at the 3' terminus. Bovine PTH mRNA exhibits extensive homology with human PTH mRNA.
Collapse
|
43
|
Boothroyd JC, Paynter CA, Coleman SL, Cross GA. Complete nucleotide sequence of complementary DNA coding for a variant surface glycoprotein from Trypanosoma brucei. J Mol Biol 1982; 157:547-56. [PMID: 7120401 DOI: 10.1016/0022-2836(82)90475-2] [Citation(s) in RCA: 48] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|