1
|
Das L, Nanda S, Das JK. An integrated approach for identification of exon locations using recursive Gauss Newton tuned adaptive Kaiser window. Genomics 2018; 111:284-296. [PMID: 30342085 DOI: 10.1016/j.ygeno.2018.10.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 09/11/2018] [Accepted: 10/11/2018] [Indexed: 11/27/2022]
Abstract
Identification of exon location in a DNA sequence has been considered as the most demanding and challenging research topic in the field of Bioinformatics. This work proposes a robust approach combining the Trigonometric mapping with Adaptive tuned Kaiser Windowing approach for locating the protein coding regions (EXONS) in a genetic sequence. For better convergence as well as improved accurateness, the side lobe height control parameter (β) of Kaiser Window in the proposed algorithm is made adaptive to track the changing dynamics of the genetic sequence. This yields better tracking potential of the anticipated Adaptive Kaiser algorithm as it uses the recursive Gauss Newton tuning which in turn utilizes the covariance of the error signal to tune the β factor which has been shown through numerous simulation results under a variety of practical test conditions. A detailed comparative analysis with the existing mapping schemes, windowing techniques, and other signal processing methods like SVD, AN, DFT, STDFT, WT, and ST has also been included in the paper to focus on the strength and efficiency of the proposed approach. Moreover, some critical performance parameters have been computed using the proposed approach to investigate the effectiveness and robustness of the algorithm. In addition to this, the proposed approach has also been successfully applied on a number of benchmark gene sets like Musmusculus, Homosapiens, and C. elegans, etc., where the proposed approach revealed efficient prediction of exon location in contrast to the other existing mapping methods.
Collapse
Affiliation(s)
- Lopamudra Das
- School of Electronics Engineering, KIIT University, Bhubaneswar, India.
| | - Sarita Nanda
- School of Electronics Engineering, KIIT University, Bhubaneswar, India.
| | - J K Das
- School of Electronics Engineering, KIIT University, Bhubaneswar, India.
| |
Collapse
|
2
|
Hota MK, Srivastava VK. A multirate DSP structure for the identification of protein-coding regions. INT J BIOMATH 2017. [DOI: 10.1142/s1793524517501121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The identification of protein-coding regions in DNA sequence using digital signal processing methods is one of the central issues in bioinformatics. In this paper, a multirate structure is proposed for the identification of protein-coding regions whose input sampling rate is same as output sampling rate. The multirate structure consists of cascade combination of decimation filter, kernel filter and interpolation filter. The decimation filter is a complex filter, the kernel filter is an FIR lowpass filter and the interpolation filter isa moving average filter. Polyphase decomposition is applied on both decimation filter and interpolation filter for computationally efficient implementation. The potential of the proposed method is evaluated in comparison with existing methods using standard datasets. The results show that the proposed method improves the identification accuracy of protein-coding regions to a great extent compared to its counterparts.
Collapse
Affiliation(s)
- Malaya Kumar Hota
- School of Electronics Engineering, VIT University, Vellore 632014, Tamilnadu, India
| | - Vinay Kumar Srivastava
- Department of Electronics and Communication Engineering, Motilal Nehru National Institute of Technology, Allahabad 211004, Uttar Pradesh, India
| |
Collapse
|
3
|
Marhon SA, Kremer SC. Prediction of Protein Coding Regions Using a Wide-Range Wavelet Window Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:742-753. [PMID: 26415183 DOI: 10.1109/tcbb.2015.2476789] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Prediction of protein coding regions is an important topic in the field of genomic sequence analysis. Several spectrum-based techniques for the prediction of protein coding regions have been proposed. However, the outstanding issue in most of the proposed techniques is that these techniques depend on an experimentally-selected, predefined value of the window length. In this paper, we propose a new Wide-Range Wavelet Window (WRWW) method for the prediction of protein coding regions. The analysis of the proposed wavelet window shows that its frequency response can adapt its width to accommodate the change in the window length so that it can allow or prevent frequencies other than the basic frequency in the analysis of DNA sequences. This feature makes the proposed window capable of analyzing DNA sequences with a wide range of the window lengths without degradation in the performance. The experimental analysis of applying the WRWW method and other spectrum-based methods to five benchmark datasets has shown that the proposed method outperforms other methods along a wide range of the window lengths. In addition, the experimental analysis has shown that the proposed method is dominant in the prediction of both short and long exons.
Collapse
|
4
|
Yin C, Yin XE, Wang J. A Novel Method for Comparative Analysis of DNA Sequences by Ramanujan-Fourier Transform. J Comput Biol 2014; 21:867-79. [DOI: 10.1089/cmb.2014.0120] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Changchuan Yin
- College of Information Systems and Technology, University of Phoenix, Chicago, Illinois
| | | | - Jiasong Wang
- Department of Mathematics, Nanjing University, Nanjing, Jiangsu, China
| |
Collapse
|
5
|
Roy M, Barman S. Effective gene prediction by high resolution frequency estimator based on least-norm solution technique. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2014; 2014:2. [PMID: 24386895 PMCID: PMC3895782 DOI: 10.1186/1687-4153-2014-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 12/15/2013] [Indexed: 11/10/2022]
Abstract
Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method.
Collapse
Affiliation(s)
- Manidipa Roy
- The Calcutta Technical School, Govt. of West Bengal, 110,S.N.Banerjee Road, Kolkata 700013, India
| | - Soma Barman
- Institute of Radio Physics & Electronics, University of Calcutta, 92, A.P.C. Road, Kolkata 700 009, India
| |
Collapse
|
6
|
Shakya DK, Saxena R, Sharma SN. An adaptive window length strategy for eukaryotic CDS prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1241-1252. [PMID: 24384711 DOI: 10.1109/tcbb.2013.76] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Signal processing-based algorithms for identification of coding sequences (CDS) in eukaryotes are non-data driven and exploit the presence of three-base periodicity in these regions for their detection. Three-base periodicity is commonly detected using short time Fourier transform (STFT) that uses a window function of fixed length. As the length of the protein coding and noncoding regions varies widely, the identification accuracy of STFT-based algorithms is poor. In this paper, a novel signal processing-based algorithm is developed by enabling the window length adaptation in STFT of DNA sequences for improving the identification of three-base periodicity. The length of the window function has been made adaptive in coding regions to maximize the magnitude of period-3 measure, whereas in the noncoding regions, the window length is tailored to minimize this measure. Simulation results on bench mark data sets demonstrate the advantage of this algorithm when compared with other non-data-driven methods for CDS prediction.
Collapse
Affiliation(s)
| | - Rajiv Saxena
- Jaypee University of Engineering and Technology, Raghogarh, Guna
| | | |
Collapse
|
7
|
Jin J, An J. Robust discriminant analysis and its application to identify protein coding regions of rice genes. Math Biosci 2011; 232:96-100. [PMID: 21575644 DOI: 10.1016/j.mbs.2011.04.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2010] [Revised: 04/18/2011] [Accepted: 04/25/2011] [Indexed: 10/18/2022]
Abstract
Identification of protein coding regions is fundamentally a statistical pattern recognition problem. Discriminant analysis is a statistical technique for classifying a set of observations into predefined classes and it is useful to solve such problems. It is well known that outliers are present in virtually every data set in any application domain, and classical discriminant analysis methods (including linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)) do not work well if the data set has outliers. In order to overcome the difficulty, the robust statistical method is used in this paper. We choose four different coding characters as discriminant variables and an approving result is presented by the method of robust discriminant analysis.
Collapse
Affiliation(s)
- Jiao Jin
- Department of Statistics and Financial Mathematics, School of Mathematical Sciences, Beijing Normal University, Ministry of Education, Beijing, China
| | | |
Collapse
|
8
|
Marhon SA, Kremer SC. Gene Prediction Based on DNA Spectral Analysis: A Literature Review. J Comput Biol 2011; 18:639-76. [PMID: 21381961 DOI: 10.1089/cmb.2010.0184] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Sajid A. Marhon
- School of Computer Science, University of Guelph, Guelph, Ontario, Canada
| | - Stefan C. Kremer
- School of Computer Science, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
9
|
Chen B, Ji P. Visualization of the protein-coding regions with a self adaptive spectral rotation approach. Nucleic Acids Res 2010; 39:e3. [PMID: 20947567 PMCID: PMC3017620 DOI: 10.1093/nar/gkq891] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Identifying protein-coding regions in DNA sequences is an active issue in computational biology. In this study, we present a self adaptive spectral rotation (SASR) approach, which visualizes coding regions in DNA sequences, based on investigation of the Triplet Periodicity property, without any preceding training process. It is proposed to help with the rough coding regions prediction when there is no extra information for the training required by other outstanding methods. In this approach, at each position in the DNA sequence, a Fourier spectrum is calculated from the posterior subsequence. Following the spectrums, a random walk in complex plane is generated as the SASR's graphic output. Applications of the SASR on real DNA data show that patterns in the graphic output reveal locations of the coding regions and the frame shifts between them: arcs indicate coding regions, stable points indicate non-coding regions and corners’ shapes reveal frame shifts. Tests on genomic data set from Saccharomyces Cerevisiae reveal that the graphic patterns for coding and non-coding regions differ to a great extent, so that the coding regions can be visually distinguished. Meanwhile, a time cost test shows that the SASR can be easily implemented with the computational complexity of O(N).
Collapse
Affiliation(s)
- Bo Chen
- Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.
| | | |
Collapse
|
10
|
Marhon S, Kremer SC. Theoretical justification of computing the 3-base periodicity using nucleotide distribution variance. Biosystems 2010; 101:185-6. [PMID: 20633601 DOI: 10.1016/j.biosystems.2010.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2010] [Accepted: 07/05/2010] [Indexed: 10/19/2022]
Abstract
In a previous paper (Yin and Yau, 2005), a novel method was proposed to measure the power spectrum of a DNA sequence at frequency N/3 in order to distinguish protein-coding and non-coding regions in DNA sequences. This was accomplished by computing the distribution of the four nucleotides in the three reading frames (codon positions) and identifying variance as an indicator of 3-base periodicity. That work included an empirical justification for the claim that there exists a linear, 3:2 correlation between the variance and the power spectrum. In this note, we provide a theoretical justification for that observation in the form of a mathematical proof of this correlation. This work thus provides a more rigorous justification for the use of the variance instead of the more computationally expensive power spectrum, allowing users of this technique to apply it with absolute confidence that no compromise in accuracy is incurred.
Collapse
Affiliation(s)
- Sajid Marhon
- School of Computer Science, University of Guelph, 50 Stone Road East, Guelph, Ontario N1G 2W1, Canada.
| | | |
Collapse
|