1
|
Suvorova YM, Korotkova MA, Skryabin KG, Korotkov EV. Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes. DNA Res 2019; 26:157-170. [PMID: 30726896 PMCID: PMC6476729 DOI: 10.1093/dnares/dsy046] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 12/07/2018] [Indexed: 01/01/2023] Open
Abstract
A new mathematical method for potential reading frameshift detection in protein-coding sequences (cds) was developed. The algorithm is adjusted to the triplet periodicity of each analysed sequence using dynamic programming and a genetic algorithm. This does not require any preliminary training. Using the developed method, cds from the Arabidopsis thaliana genome were analysed. In total, the algorithm found 9,930 sequences containing one or more potential reading frameshift(s). This is ∼21% of all analysed sequences of the genome. The Type I and Type II error rates were estimated as 11% and 30%, respectively. Similar results were obtained for the genomes of Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Rattus norvegicus and Xenopus tropicalis. Also, the developed algorithm was tested on 17 bacterial genomes. We compared our results with the previously obtained data on the search for potential reading frameshifts in these genomes. This study discussed the possibility that the reading frameshift seems like a relatively frequently encountered mutation; and this mutation could participate in the creation of new genes and proteins.
Collapse
Affiliation(s)
- Y M Suvorova
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - M A Korotkova
- National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow, Russia
| | - K G Skryabin
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - E V Korotkov
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia.,National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow, Russia
| |
Collapse
|
2
|
Suvorova YM, Korotkov EV. New Method for Potential Fusions Detection in Protein-Coding Sequences. J Comput Biol 2019; 26:1253-1261. [PMID: 31211597 DOI: 10.1089/cmb.2019.0122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
Gene fusion is known to be one of the mechanisms of a new gene formation. Most bioinformatics methods for studying fused genes are based on the sequence similarity search. However, if the ancestral sequences were lost during evolution or changed too much, it is impossible to detect the fusion. Previously, we have developed a method of searching for triplet periodicity (TP) change points in protein-coding sequences (CDS) and showed the possible relation of this phenomenon with gene formation as a result of fusion. In this study, we improved the TP change point detection method and studied the genes of six eukaryotic genomes. At the level of 2%-3% of the probability of type I error, TP change points were found in 20%-40% of genes. Further analysis showed that about 30% of the TP change points can be explained by amino acid repeats. Another 30% can be potentially fused genes, alignment for which was detected by the BLAST program. We believe that the rest of the results can be fused genes, the ancestral sequences for which have been lost. The method is more sensitive to TP changes and allowed us to find up to two to three times more cases of significant TP change points than our previous method.
Collapse
Affiliation(s)
- Yulia M Suvorova
- Federal State Institution "Federal Research Centre "Fundamentals of Biotechnology" of the Russian Academy of Sciences", Moscow, Russian Federation
| | - Eugene V Korotkov
- Federal State Institution "Federal Research Centre "Fundamentals of Biotechnology" of the Russian Academy of Sciences", Moscow, Russian Federation.,Applied Mathematics Department, National Research Nuclear University MEPhI, Moscow, Russian Federation
| |
Collapse
|
3
|
Suvorova YM, Pugacheva VM, Korotkov EV. A Database of Potential Reading Frame Shifts in Coding Sequences from Different Eukaryotic Genomes. Biophysics (Nagoya-shi) 2019. [DOI: 10.1134/s0006350919030217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
4
|
Li Z, Guan Y, Yuan X, Zheng P, Zhu H. Prediction of Sphingosine protein-coding regions with a self adaptive spectral rotation method. PLoS One 2019; 14:e0214442. [PMID: 30943219 PMCID: PMC6447165 DOI: 10.1371/journal.pone.0214442] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 03/13/2019] [Indexed: 01/08/2023] Open
Abstract
Identifying protein coding regions in DNA sequences by computational methods is an active research topic. Welan gum produced by Sphingomonas sp. WG has great application potential in oil recovery and concrete construction industry. Predicting the coding regions in the Sphingomonas sp. WG genome and addressing the mechanism underlying the explanation for the synthesis of Welan gum metabolism is an important issue at present. In this study, we apply a self adaptive spectral rotation (SASR, for short) method, which is based on the investigation of the Triplet Periodicity property, to predict the coding regions of the whole-genome data of Sphingomonas sp. WG without any previous training process, and 1115 suspected gene fragments are obtained. Suspected gene fragments are subjected to a similarity search against the non-redundant protein sequences (nr) database of NCBI with blastx, and 762 suspected gene fragments have been labeled as genes in the nr database.
Collapse
Affiliation(s)
- Zhongwei Li
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao, Shandong, China
| | - Yanan Guan
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao, Shandong, China
| | - Xiang Yuan
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao, Shandong, China
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, Christchurch, New Zealand
| | - Hu Zhu
- College of Chemistry and Materials, Fujian Normal University, Fuzhou, China
| |
Collapse
|
5
|
Chechetkin VR, Lobzin VV. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique. J Theor Biol 2017; 426:162-179. [PMID: 28552553 DOI: 10.1016/j.jtbi.2017.05.033] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2017] [Revised: 04/23/2017] [Accepted: 05/23/2017] [Indexed: 12/15/2022]
Abstract
Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization.
Collapse
Affiliation(s)
- V R Chechetkin
- Engelhardt Institute of Molecular Biology of Russian Academy of Sciences, Vavilov str., 32, Moscow 119334, Russia; Theoretical Department of Division for Perspective Investigations, Troitsk Institute of Innovation and Thermonuclear Investigations (TRINITI), Moscow, Troitsk District 108840, Russia.
| | - V V Lobzin
- School of Physics, University of Sydney, Sydney, NSW 2006, Australia.
| |
Collapse
|
6
|
Messaoudi I, Elloumi Oueslati A, Lachiri Z. Inferring Helitron Structures from 1D and 2D Representations Based on the Chaos Game Theory. Ing Rech Biomed 2017. [DOI: 10.1016/j.irbm.2017.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
7
|
Yin C, Wang J. Periodic power spectrum with applications in detection of latent periodicities in DNA sequences. J Math Biol 2016; 73:1053-1079. [PMID: 26942584 DOI: 10.1007/s00285-016-0982-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 02/19/2016] [Indexed: 12/27/2022]
Abstract
Periodic elements play important roles in genomic structures and functions, yet some complex periodic elements in genomes are difficult to detect by conventional methods such as digital signal processing and statistical analysis. We propose a periodic power spectrum (PPS) method for analyzing periodicities of DNA sequences. The PPS method employs periodic nucleotide distributions of DNA sequences and directly calculates power spectra at specific periodicities. The magnitude of a PPS reflects the strength of a signal on periodic positions. In comparison with Fourier transform, the PPS method avoids spectral leakage, and reduces background noise that appears high in Fourier power spectrum. Thus, the PPS method can effectively capture hidden periodicities in DNA sequences. Using a sliding window approach, the PPS method can precisely locate periodic regions in DNA sequences. We apply the PPS method for detection of hidden periodicities in different genome elements, including exons, microsatellite DNA sequences, and whole genomes. The results show that the PPS method can minimize the impact of spectral leakage and thus capture true hidden periodicities in genomes. In addition, performance tests indicate that the PPS method is more effective and efficient than a fast Fourier transform. The computational complexity of the PPS algorithm is [Formula: see text]. Therefore, the PPS method may have a broad range of applications in genomic analysis. The MATLAB programs for implementing the PPS method are available from MATLAB Central ( http://www.mathworks.com/matlabcentral/fileexchange/55298 ).
Collapse
Affiliation(s)
- Changchuan Yin
- Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, Chicago, IL, 60607-7045, USA.
| | - Jiasong Wang
- Department of Mathematics, Nanjing University, Nanjing, Jiangsu, 210093, China
| |
Collapse
|
8
|
Chaley M, Kutyrkin V. Stochastic model of homogeneous coding and latent periodicity in DNA sequences. J Theor Biol 2016; 390:106-16. [DOI: 10.1016/j.jtbi.2015.11.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 09/18/2015] [Accepted: 11/14/2015] [Indexed: 11/24/2022]
|
9
|
Chaley M, Kutyrkin V. Spectral-Statistical Approach for Revealing Latent Regular Structures in DNA Sequence. Methods Mol Biol 2016; 1415:315-340. [PMID: 27115640 DOI: 10.1007/978-1-4939-3572-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Methods of the spectral-statistical approach (2S-approach) for revealing latent periodicity in DNA sequences are described. The results of data analysis in the HeteroGenome database which collects the sequences similar to approximate tandem repeats in the genomes of model organisms are adduced. In consequence of further developing of the spectral-statistical approach, the techniques for recognizing latent profile periodicity are considered. These techniques are basing on extension of the notion of approximate tandem repeat. Examples of correlation of latent profile periodicity revealed in the CDSs with structural-functional properties in the proteins are given.
Collapse
Affiliation(s)
- Maria Chaley
- Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st., 4, 142290, Pushchino, Russia.
| | - Vladimir Kutyrkin
- Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University, n.a. N.E. Bauman the 2nd Baumanskaya st., 5, 105005, Moscow, Russia
| |
Collapse
|
10
|
Suvorova YM, Korotkova MA, Korotkov EV. Comparative analysis of periodicity search methods in DNA sequences. Comput Biol Chem 2014; 53 Pt A:43-8. [PMID: 25218218 DOI: 10.1016/j.compbiolchem.2014.08.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 11/30/2022]
Abstract
To determine the periodicity of a DNA sequence, different spectral approaches are applied (discrete Fourier transform (DFT), autocorrelation (CORR), information decomposition (ID), hybrid method (HYB), concept of spectral envelope for spectral analysis (SE), normalized autocorrelation (CORR_N) and profile analysis (PA). In this work, we investigated the possibility of finding the true period length, by depending on the average number of accumulated changes in DNA bases (PM) for the methods stated above. The results show that for periods with short length (≤4 b.p), it is possible to use the hybrid method (HYB), which combines properties of autocorrelation, Fourier transform, and information decomposition (ID). For larger period lengths (>4) with values of point mutation (PM) equal to 1.0 or more per one nucleotide, it is preferable to use information of decomposition method (ID), as the other spectral approaches cannot achieve correct determination of the period length present in the analyzed sequence.
Collapse
Affiliation(s)
- Yulia M Suvorova
- Centre of Bioengineering Russian Academy of Sciences, Prospect 60-tya Oktyabrya 7/1, Moscow 117312, Russian Federation.
| | - Maria A Korotkova
- National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoe Shosse, 31, Moscow 115522, Russian Federation.
| | - Eugene V Korotkov
- Centre of Bioengineering Russian Academy of Sciences, Prospect 60-tya Oktyabrya 7/1, Moscow 117312, Russian Federation; National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoe Shosse, 31, Moscow 115522, Russian Federation.
| |
Collapse
|
11
|
Messaoudi I, Oueslati AE, Lachiri Z. Wavelet analysis of frequency chaos game signal: a time-frequency signature of the C. elegans DNA. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2014; 2014:16. [PMID: 28194166 PMCID: PMC5270495 DOI: 10.1186/s13637-014-0016-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 08/26/2014] [Indexed: 11/10/2022]
Abstract
Challenging tasks are encountered in the field of bioinformatics. The choice of the genomic sequence’s mapping technique is one the most fastidious tasks. It shows that a judicious choice would serve in examining periodic patterns distribution that concord with the underlying structure of genomes. Despite that, searching for a coding technique that can highlight all the information contained in the DNA has not yet attracted the attention it deserves. In this paper, we propose a new mapping technique based on the chaos game theory that we call the frequency chaos game signal (FCGS). The particularity of the FCGS coding resides in exploiting the statistical properties of the genomic sequence itself. This may reflect important structural and organizational features of DNA. To prove the usefulness of the FCGS approach in the detection of different local periodic patterns, we use the wavelet analysis because it provides access to information that can be obscured by other time-frequency methods such as the Fourier analysis. Thus, we apply the continuous wavelet transform (CWT) with the complex Morlet wavelet as a mother wavelet function. Scalograms that relate to the organism Caenorhabditis elegans (C. elegans) exhibit a multitude of periodic organization of specific DNA sequences.
Collapse
Affiliation(s)
- Imen Messaoudi
- Ecole Nationale d'Ingénieurs de Tunis, LR Signal, Images et Technologies de l'Information, Université de Tunis El Manar, BP 37, le Belvédère, Tunis, 1002 Tunisia
| | - Afef Elloumi Oueslati
- Ecole Nationale d'Ingénieurs de Tunis, LR Signal, Images et Technologies de l'Information, Université de Tunis El Manar, BP 37, le Belvédère, Tunis, 1002 Tunisia
| | - Zied Lachiri
- Ecole Nationale d'Ingénieurs de Tunis, LR Signal, Images et Technologies de l'Information, Université de Tunis El Manar, BP 37, le Belvédère, Tunis, 1002 Tunisia.,Département de Génie Physique et Instrumentation, INSAT, Centre Urbain Cedex, BP 676, Tunis, 1080 Tunisia
| |
Collapse
|
12
|
Messaoudi I, Elloumi-Oueslati A, Lachiri Z. Building Specific Signals from Frequency Chaos Game and Revealing Periodicities Using a Smoothed Fourier Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:863-877. [PMID: 26356859 DOI: 10.1109/tcbb.2014.2315991] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Investigating the roles and functions of DNA within genomes is becoming a primary focus of genomic research. Thus, the research works are moving towards cooperation between different scientific disciplines which aims at facilitating the interpretation of genetic information. In order to characterize the DNA of living organisms, signal processing tools appear to be very suitable for such study. However, a DNA sequence must be converted into a numerical sequence before processing; which defines the concept of DNA coding. In line with this, we propose a new one dimensional model based on the chaos game representation theory called Frequency Chaos Game Signal: FCGS. Then, we perform a Smoothed Fourier Transform to enhance hidden periodicities in the C.elegans DNA sequences. Through this study, we demonstrate the performance of our coding approach in highlighting characteristic periodicities. Indeed, several periodicities are shown to be involved in the 1D spectra and the 2D spectrograms of FCGSs. To investigate further about the contribution of our method in the enhancement of characteristic spectral attributes, a comparison with a range of binary indicators is established.
Collapse
|
13
|
Valenzuela CY. The structure of selective dinucleotide interactions and periodicities in D melanogaster mtDNA. Biol Res 2014; 47:18. [PMID: 25027717 PMCID: PMC4101722 DOI: 10.1186/0717-6287-47-18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Accepted: 04/26/2014] [Indexed: 10/28/2022] Open
Abstract
BACKGROUND We found a strong selective 3-sites periodicity of deviations from randomness of the dinucleotide (DN) distribution, where both bases of DN were separated by 1, 2, K sites in prokaryotes and mtDNA. Three main aspects are studied. I) the specific 3 K-sites periodic structure of the 16 DN. II) to discard the possibility that the periodicity was produced by the highly nonrandom interactive association of contiguous bases, by studying the interaction of non-contiguous bases, the first one chosen each I sites and the second chosen J sites downstream. III) the difference between this selective periodicity of association (distance to randomness) of the four bases with the described fixed periodicities of base sequences. RESULTS I) The 16 pairs presented a consistent periodicity in the strength of association of both bases of the pairs; the most deviated pairs are those where G and C are involved and the least deviated ones are those where A and T are involved. II) we found significant non-random interactions when the first nucleotide is chosen every I sites and the second J sites downstream until I=J=76. III) we showed conclusive differences between these internucleotide association periodicities and sequence periodicities. CONCLUSIONS This relational selective periodicity is different from sequence periodicities and indicates that any base strongly interacts with the bases of the residual genome; this interaction and periodicity is highly structured and systematic for every pair of bases. This interaction should be destroyed in few generations by recurrent mutation; it is only compatible with the Synthetic Theory of Evolution and agrees with the Wright's adaptive landscape conception and evolution by shifting balanced adaptive peaks.
Collapse
|
14
|
SNR of DNA sequences mapped by general affine transformations of the indicator sequences. J Math Biol 2012; 67:433-51. [DOI: 10.1007/s00285-012-0564-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2011] [Revised: 07/02/2012] [Indexed: 10/28/2022]
|
15
|
Rivard SR, Mailloux JG, Beguenane R, Bui HT. Design of high-performance parallelized gene predictors in MATLAB. BMC Res Notes 2012; 5:183. [PMID: 22490084 PMCID: PMC3444342 DOI: 10.1186/1756-0500-5-183] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Accepted: 04/10/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel's algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). FINDINGS Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. CONCLUSIONS The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.
Collapse
Affiliation(s)
- Sylvain Robert Rivard
- Département des sciences appliquées, Université du Québec à Chicoutimi, 555 blvd de l'Université, Chicoutimi, QC G7H 2B1, Canada.
| | | | | | | |
Collapse
|
16
|
Rizvi AZ, Venu Gopal T, Bhattacharya C. Schematic for efficient computation of GC, GC3, and AT3 bias spectra of genome. Bioinformation 2012; 8:163-6. [PMID: 22368390 PMCID: PMC3283890 DOI: 10.6026/97320630008163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2012] [Accepted: 01/07/2012] [Indexed: 11/23/2022] Open
Abstract
Selection of synonymous codons for an amino acid is biased in protein translation process. This biased selection causes repetition of synonymous codons in structural parts of genome that stands for high N/3 peaks in DNA spectrum. Period-3 spectral property is utilized here to produce a 3-phase network model based on polyphase filterbank concepts for derivation of codon bias spectra (CBS). Modification of parameters in this model can produce GC, GC3, and AT3 bias spectra. Complete schematic in LabVIEW platform is presented here for efficient and parallel computation of GC, GC3, and AT3 bias spectra of genomes alongwith results of CBS patterns. We have performed the correlation coefficient analysis of GC, GC3, and AT3 bias spectra with codon bias patterns of CBS for biological and statistical significance of this model.
Collapse
Affiliation(s)
- Ahsan Z Rizvi
- Department of Electronics Engineering, Defence Institute of Advanced Technology, Girinagar, Pune, 411025, India
| | - T Venu Gopal
- Department of Electronics Engineering, Defence Institute of Advanced Technology, Girinagar, Pune, 411025, India
| | - C Bhattacharya
- Department of Electronics Engineering, Defence Institute of Advanced Technology, Girinagar, Pune, 411025, India
| |
Collapse
|
17
|
Sánchez J. 3-base periodicity in coding DNA is affected by intercodon dinucleotides. Bioinformation 2011; 6:327-9. [PMID: 21814388 PMCID: PMC3143393 DOI: 10.6026/97320630006327] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Accepted: 07/12/2011] [Indexed: 01/29/2023] Open
Abstract
All coding DNAs exhibit 3-base periodicity (TBP), which may be defined as the tendency of nucleotides and higher order n-tuples, e.g. trinucleotides (triplets), to be preferentially spaced by 3, 6, 9 etc, bases, and we have proposed an association between TBP and clustering of same-phase triplets. We here investigated if TBP was affected by intercodon dinucleotide tendencies and whether clustering of same-phase triplets was involved. Under constant protein sequence intercodon dinucleotide frequencies depend on the distribution of synonymous codons. So, possible effects were revealed by randomly exchanging synonymous codons without altering protein sequences to subsequently document changes in TBP via frequency distribution of distances (FDD) of DNA triplets. A tripartite positive correlation was found between intercodon dinucleotide frequencies, clustering of same-phase triplets and TBP. So, intercodon C|A (where "|" indicates the boundary between codons) was more frequent in native human DNA than in the codon-shuffled sequences; higher C|A frequency occurred along with more frequent clustering of C|AN triplets (where N jointly represents A, C, G and T) and with intense CAN TBP. The opposite was found for C|G, which was less frequent in native than in shuffled sequences; lower C|G frequency occurred together with reduced clustering of C|GN triplets and with less intense CGN TBP. We hence propose that intercodon dinucleotides affect TBP via same-phase triplet clustering. A possible biological relevance of our findings is briefly discussed.
Collapse
Affiliation(s)
- Joaquín Sánchez
- Facultad de Medicina, Universidad Autónoma del Estado de Morelos, Cuernavaca, 62020 México
| |
Collapse
|