1
|
Chaley M, Kutyrkin V. Stochastic models for description of structural-statistical properties in DNA sequences. J Theor Biol 2019; 496:110126. [PMID: 31866393 DOI: 10.1016/j.jtbi.2019.110126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 12/02/2019] [Accepted: 12/18/2019] [Indexed: 10/25/2022]
Abstract
New stochastic models based on a notion of stochastic codon are proposed. These models, presented by special random strings, describe practical structural-statistical properties which are peculiar to coding DNA both from prokaryotic and eukaryotic genomes. In such the case coding regions are considered as the realizations of random strings. The models introduced explain existence of latent profile periodicity with a period which is not only equal to but also multiplied of three in the coding regions. For the sequences with latent profile period multiplied of three, but not equal to three, the proposed models ensure existence of special property of 3-regularity in these sequences which is practically recognized in all coding sequences of the genomes analyzed. Feasibility of the stochastic models proposed was tested in numerical experiments with binary reencoded paragraphs of literary texts (in English and Italian languages), used as analog of DNA coding regions.
Collapse
Affiliation(s)
- Maria Chaley
- Institute of Mathematical Problems of Biology RAS - Branch of Keldysh Institute of Applied Mathematics RAS, Professor Vitkevich St.,1, 142290 Pushchino, Russia.
| | - Vladimir Kutyrkin
- Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st.,5, 105005 Moscow, Russia.
| |
Collapse
|
2
|
Chaley M, Kutyrkin V. Stochastic model of homogeneous coding and latent periodicity in DNA sequences. J Theor Biol 2016; 390:106-16. [DOI: 10.1016/j.jtbi.2015.11.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 09/18/2015] [Accepted: 11/14/2015] [Indexed: 11/24/2022]
|
3
|
Chaley M, Kutyrkin V. Spectral-Statistical Approach for Revealing Latent Regular Structures in DNA Sequence. Methods Mol Biol 2016; 1415:315-340. [PMID: 27115640 DOI: 10.1007/978-1-4939-3572-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Methods of the spectral-statistical approach (2S-approach) for revealing latent periodicity in DNA sequences are described. The results of data analysis in the HeteroGenome database which collects the sequences similar to approximate tandem repeats in the genomes of model organisms are adduced. In consequence of further developing of the spectral-statistical approach, the techniques for recognizing latent profile periodicity are considered. These techniques are basing on extension of the notion of approximate tandem repeat. Examples of correlation of latent profile periodicity revealed in the CDSs with structural-functional properties in the proteins are given.
Collapse
Affiliation(s)
- Maria Chaley
- Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st., 4, 142290, Pushchino, Russia.
| | - Vladimir Kutyrkin
- Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University, n.a. N.E. Bauman the 2nd Baumanskaya st., 5, 105005, Moscow, Russia
| |
Collapse
|
4
|
Chaley M, Kutyrkin V, Tulbasheva G, Teplukhina E, Nazipova N. HeteroGenome: database of genome periodicity. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau040. [PMID: 24857969 PMCID: PMC4038257 DOI: 10.1093/database/bau040] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We present the first release of the HeteroGenome database collecting latent periodicity regions in genomes. Tandem repeats and highly divergent tandem repeats along with the regions of a new type of periodicity, known as profile periodicity, have been collected for the genomes of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans and Drosophila melanogaster. We obtained data with the aid of a spectral-statistical approach to search for reliable latent periodicity regions (with periods up to 2000 bp) in DNA sequences. The original two-level mode of data presentation (a broad view of the region of latent periodicity and a second level indicating conservative fragments of its structure) was further developed to enable us to obtain the estimate, without redundancy, that latent periodicity regions make up ∼10% of the analyzed genomes. Analysis of the quantitative and qualitative content of located periodicity regions on all chromosomes of the analyzed organisms revealed dominant characteristic types of periodicity in the genomes. The pattern of density distribution of latent periodicity regions on chromosome unambiguously characterizes each chromosome in genome. Database URL:http://www.jcbi.ru/lp_baze/
Collapse
Affiliation(s)
- Maria Chaley
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Vladimir Kutyrkin
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Gayane Tulbasheva
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Elena Teplukhina
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Nafisa Nazipova
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| |
Collapse
|
5
|
Abstract
Novel methods for identifying a new type of DNA latent periodicity, called latent profile periodicity or latent profility, are used to search for periodic structures in genes. These methods reveal two distinct levels of organization of genetic information encoding. It is shown that latent profility in genes may correlate with specific structural features of their encoded proteins.
Collapse
Affiliation(s)
- Maria Chaley
- Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st., 4, 142290 Pushchino, Russia.
| | | |
Collapse
|
7
|
Chaley MB, Nazipova NN, Kutyrkin VA. Statistical methods for detecting latent periodicity patterns in biological sequences: The case of small-size samples. PATTERN RECOGNITION AND IMAGE ANALYSIS 2009. [DOI: 10.1134/s1054661809020217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|