251
|
Potential non homologous protein targets of mycobacterium tuberculosis H37Rv identified from protein–protein interaction network. J Theor Biol 2014; 361:152-8. [DOI: 10.1016/j.jtbi.2014.07.031] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Revised: 07/26/2014] [Accepted: 07/28/2014] [Indexed: 01/09/2023]
|
252
|
Lin H, Deng EZ, Ding H, Chen W, Chou KC. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res 2014; 42:12961-72. [PMID: 25361964 PMCID: PMC4245931 DOI: 10.1093/nar/gku1019] [Citation(s) in RCA: 398] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The σ54 promoters are unique in prokaryotic genome and responsible for transcripting carbon and nitrogen-related genes. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the σ54 promoters. Here, a predictor called ‘iPro54-PseKNC’ was developed. In the predictor, the samples of DNA sequences were formulated by a novel feature vector called ‘pseudo k-tuple nucleotide composition’, which was further optimized by the incremental feature selection procedure. The performance of iPro54-PseKNC was examined by the rigorous jackknife cross-validation tests on a stringent benchmark data set. As a user-friendly web-server, iPro54-PseKNC is freely accessible at http://lin.uestc.edu.cn/server/iPro54-PseKNC. For the convenience of the vast majority of experimental scientists, a step-by-step protocol guide was provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented in this paper just for its integrity. Meanwhile, we also discovered through an in-depth statistical analysis that the distribution of distances between the transcription start sites and the translation initiation sites were governed by the gamma distribution, which may provide a fundamental physical principle for studying the σ54 promoters.
Collapse
Affiliation(s)
- Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Gordon Life Science Institute, Belmont, MA, USA
| | - En-Ze Deng
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China Gordon Life Science Institute, Belmont, MA, USA
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA, USA Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
253
|
Xu R, Zhou J, Liu B, He Y, Zou Q, Wang X, Chou KC. Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach. J Biomol Struct Dyn 2014; 33:1720-30. [PMID: 25252709 DOI: 10.1080/07391102.2014.968624] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.
Collapse
Affiliation(s)
- Ruifeng Xu
- a School of Computer Science and Technology , Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town , Xili, Shenzhen 518055 , Guangdong , China
| | | | | | | | | | | | | |
Collapse
|
254
|
Transmission of intra-cellular genetic information: a system proposal. J Theor Biol 2014; 358:208-31. [PMID: 24928152 DOI: 10.1016/j.jtbi.2014.05.040] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Revised: 05/05/2014] [Accepted: 05/27/2014] [Indexed: 11/21/2022]
Abstract
One of the great challenges of the scientific community on theories of genetic information, genetic communication and genetic coding is to determine a mathematical structure related to DNA sequences. In this paper we propose a model of an intra-cellular transmission system of genetic information similar to a model of a power and bandwidth efficient digital communication system in order to identify a mathematical structure in DNA sequences where such sequences are biologically relevant. The model of a transmission system of genetic information is concerned with the identification, reproduction and mathematical classification of the nucleotide sequence of single stranded DNA by the genetic encoder. Hence, a genetic encoder is devised where labelings and cyclic codes are established. The establishment of the algebraic structure of the corresponding codes alphabets, mappings, labelings, primitive polynomials (p(x)) and code generator polynomials (g(x)) are quite important in characterizing error-correcting codes subclasses of G-linear codes. These latter codes are useful for the identification, reproduction and mathematical classification of DNA sequences. The characterization of this model may contribute to the development of a methodology that can be applied in mutational analysis and polymorphisms, production of new drugs and genetic improvement, among other things, resulting in the reduction of time and laboratory costs.
Collapse
|
255
|
An effective haplotype assembly algorithm based on hypergraph partitioning. J Theor Biol 2014; 358:85-92. [DOI: 10.1016/j.jtbi.2014.05.034] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Revised: 05/08/2014] [Accepted: 05/25/2014] [Indexed: 11/20/2022]
|
256
|
Prediction of CpG island methylation status by integrating DNA physicochemical properties. Genomics 2014; 104:229-33. [DOI: 10.1016/j.ygeno.2014.08.011] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2014] [Revised: 08/04/2014] [Accepted: 08/19/2014] [Indexed: 12/22/2022]
|
257
|
Hayat M, Iqbal N. Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 116:184-192. [PMID: 24997484 DOI: 10.1016/j.cmpb.2014.06.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 06/09/2014] [Accepted: 06/13/2014] [Indexed: 06/03/2023]
Abstract
Proteins control all biological functions in living species. Protein structure is comprised of four major classes including all-α class, all-β class, α+β, and α/β. Each class performs different function according to their nature. Owing to the large exploration of protein sequences in the databanks, the identification of protein structure classes is difficult through conventional methods with respect to cost and time. Looking at the importance of protein structure classes, it is thus highly desirable to develop a computational model for discriminating protein structure classes with high accuracy. For this purpose, we propose a silco method by incorporating Pseudo Average Chemical Shift and Support Vector Machine. Two feature extraction schemes namely Pseudo Amino Acid Composition and Pseudo Average Chemical Shift are used to explore valuable information from protein sequences. The performance of the proposed model is assessed using four benchmark datasets 25PDB, 1189, 640 and 399 employing jackknife test. The success rates of the proposed model are 84.2%, 85.0%, 86.4%, and 89.2%, respectively on the four datasets. The empirical results reveal that the performance of our proposed model compared to existing models is promising in the literature so far and might be useful for future research.
Collapse
Affiliation(s)
- Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| |
Collapse
|
258
|
Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou KC. PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. ACTA ACUST UNITED AC 2014; 31:119-20. [PMID: 25231908 DOI: 10.1093/bioinformatics/btu602] [Citation(s) in RCA: 181] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY The avalanche of genomic sequences generated in the post-genomic age requires efficient computational methods for rapidly and accurately identifying biological features from sequence information. Towards this goal, we developed a freely available and open-source package, called PseKNC-General (the general form of pseudo k-tuple nucleotide composition), that allows for fast and accurate computation of all the widely used nucleotide structural and physicochemical properties of both DNA and RNA sequences. PseKNC-General can generate several modes of pseudo nucleotide compositions, including conventional k-tuple nucleotide compositions, Moreau-Broto autocorrelation coefficient, Moran autocorrelation coefficient, Geary autocorrelation coefficient, Type I PseKNC and Type II PseKNC. In every mode, >100 physicochemical properties are available for choosing. Moreover, it is flexible enough to allow the users to calculate PseKNC with user-defined properties. The package can be run on Linux, Mac and Windows systems and also provides a graphical user interface. AVAILABILITY AND IMPLEMENTATION The package is freely available at: http://lin.uestc.edu.cn/server/pseknc.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, Chin
| | - Xitong Zhang
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Jordan Brooker
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Hao Lin
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Liqing Zhang
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Kuo-Chen Chou
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, Chin
| |
Collapse
|
259
|
Zhang Q, Li H, Zhao X, Zheng Y, Zhou D. Distribution bias of the sequence matching between exons and introns in exon joint and EJC binding region in C. elegans. J Theor Biol 2014; 364:295-304. [PMID: 25234235 DOI: 10.1016/j.jtbi.2014.09.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2014] [Revised: 08/30/2014] [Accepted: 09/04/2014] [Indexed: 11/17/2022]
Abstract
We propose a mechanism that there are matching relations between mRNA sequences and corresponding post-spliced introns, and introns play a significant role in the process of gene expression. In order to reveal the sequence matching features, Smith-Waterman local alignment method is used on C. elegans mRNA sequences to obtain optimal matched segments between exon-exon sequences and their corresponding introns. Distribution characters of matching frequency on exon-exon sequences and sequence characters of optimal matched segments are studied. Results show that distributions of matching frequency on exon-exon junction region have obvious differences, and the exon boundary is revealed. Distributions of the length and matching rate of optimal matched segments are consistent with sequence features of siRNA and miRNA. The optimal matched segments have special sequence characters compared with their host sequences. As for the first introns and long introns, matching frequency values of optimal matched segments with high GC content, rich CG dinucleotides and high λCG values show the minimum distribution in exon junction complex (EJC) binding region. High λCG values in optimal matched segments are main characters in distinguishing EJC binding region. Results indicate that EJC and introns have competitive and cooperative relations in the process of combining on protein coding sequences. Also intron sequences and protein coding sequences do have concerted evolution relations.
Collapse
Affiliation(s)
- Qiang Zhang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Hong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China.
| | - Xiaoqing Zhao
- Biotechnology Research Centre, Inner Mongolia Academy of Agricultural and Animal Husbandry Science, Hohhot, 010021, China
| | - Yan Zheng
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Deliang Zhou
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| |
Collapse
|
260
|
Alharbi BA, Alshammari TH, Felton NL, Zhurkin VB, Cui F. nuMap: a web platform for accurate prediction of nucleosome positioning. GENOMICS PROTEOMICS & BIOINFORMATICS 2014; 12:249-53. [PMID: 25220945 PMCID: PMC4411418 DOI: 10.1016/j.gpb.2014.08.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2014] [Revised: 08/03/2014] [Accepted: 08/05/2014] [Indexed: 11/27/2022]
Abstract
Nucleosome positioning is critical for gene expression and of major biological interest. The high cost of experimentally mapping nucleosomal arrangement signifies the need for computational approaches to predict nucleosome positions at high resolution. Here, we present a web-based application to fulfill this need by implementing two models, YR and W/S schemes, for the translational and rotational positioning of nucleosomes, respectively. Our methods are based on sequence-dependent anisotropic bending that dictates how DNA is wrapped around a histone octamer. This application allows users to specify a number of options such as schemes and parameters for threading calculation and provides multiple layout formats. The nuMap is implemented in Java/Perl/MySQL and is freely available for public use at http://numap.rit.edu. The user manual, implementation notes, description of the methodology and examples are available at the site.
Collapse
Affiliation(s)
- Bader A Alharbi
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Thamir H Alshammari
- B. Thomas Golisano College of Computing & Information Sciences, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Nathan L Felton
- Information Technology Services, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Victor B Zhurkin
- Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Feng Cui
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, NY 14623, USA.
| |
Collapse
|
261
|
Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, Chou KC. iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 2014; 9:e106691. [PMID: 25184541 PMCID: PMC4153653 DOI: 10.1371/journal.pone.0106691] [Citation(s) in RCA: 208] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 07/31/2014] [Indexed: 11/18/2022] Open
Abstract
Playing crucial roles in various cellular processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression, DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. With the avalanche of protein sequences generated in the postgenomic age, it is a critical challenge to develop automated methods for accurate and rapidly identifying DNA-binding proteins based on their sequence information alone. Here, a novel predictor, called "iDNA-Prot|dis", was established by incorporating the amino acid distance-pair coupling information and the amino acid reduced alphabet profile into the general pseudo amino acid composition (PseAAC) vector. The former can capture the characteristics of DNA-binding proteins so as to enhance its prediction quality, while the latter can reduce the dimension of PseAAC vector so as to speed up its prediction process. It was observed by the rigorous jackknife and independent dataset tests that the new predictor outperformed the existing predictors for the same purpose. As a user-friendly web-server, iDNA-Prot|dis is accessible to the public at http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step protocol guide is provided on how to use the web-server to get their desired results without the need to follow the complicated mathematic equations that are presented in this paper just for the integrity of its developing process. It is anticipated that the iDNA-Prot|dis predictor may become a useful high throughput tool for large-scale analysis of DNA-binding proteins, or at the very least, play a complementary role to the existing predictors in this regard.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China
- Gordon Life Science Institute, Belmont, Massachusetts, United States of America
- * E-mail: (BL); (KCC)
| | - Jinghao Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Xun Lan
- Stanford University, Stanford, California, United States of America
| | - Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, Massachusetts, United States of America
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
- * E-mail: (BL); (KCC)
| |
Collapse
|
262
|
Lin TH, Tsai TL. Constructing a linear QSAR for some metabolizable drugs by human or pig flavin-containing monooxygenases using some molecular features selected by a genetic algorithm trained SVM. J Theor Biol 2014; 356:85-97. [DOI: 10.1016/j.jtbi.2014.04.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Revised: 04/01/2014] [Accepted: 04/16/2014] [Indexed: 10/25/2022]
|
263
|
Li L, Yu S, Xiao W, Li Y, Li M, Huang L, Zheng X, Zhou S, Yang H. Prediction of bacterial protein subcellular localization by incorporating various features into Chou's PseAAC and a backward feature selection approach. Biochimie 2014; 104:100-7. [PMID: 24929100 DOI: 10.1016/j.biochi.2014.06.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2014] [Accepted: 06/01/2014] [Indexed: 02/08/2023]
|
264
|
Chou׳s pseudo amino acid composition improves sequence-based antifreeze protein prediction. J Theor Biol 2014; 356:30-5. [DOI: 10.1016/j.jtbi.2014.04.006] [Citation(s) in RCA: 116] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 03/28/2014] [Accepted: 04/02/2014] [Indexed: 11/22/2022]
|
265
|
Prediction of DNase I hypersensitive sites by using pseudo nucleotide compositions. ScientificWorldJournal 2014; 2014:740506. [PMID: 25215331 PMCID: PMC4152949 DOI: 10.1155/2014/740506] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Accepted: 08/03/2014] [Indexed: 11/19/2022] Open
Abstract
DNase I hypersensitive sites (DHS) associated with a wide variety of regulatory DNA elements. Knowledge about the locations of DHS is helpful for deciphering the function of noncoding genomic regions. With the acceleration of genome sequences in the postgenomic age, it is highly desired to develop cost-effective computational methods to identify DHS. In the present work, a support vector machine based model was proposed to identify DHS by using the pseudo dinucleotide composition. In the jackknife test, the proposed model obtained an accuracy of 83%, which is competitive with that of the existing method. This result suggests that the proposed model may become a useful tool for DHS identifications.
Collapse
|
266
|
Xu Y, Wen X, Wen LS, Wu LY, Deng NY, Chou KC. iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition. PLoS One 2014; 9:e105018. [PMID: 25121969 PMCID: PMC4133382 DOI: 10.1371/journal.pone.0105018] [Citation(s) in RCA: 167] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Accepted: 07/16/2014] [Indexed: 12/31/2022] Open
Abstract
Nitrotyrosine is one of the post-translational modifications (PTMs) in proteins that occurs when their tyrosine residue is nitrated. Compared with healthy people, a remarkably increased level of nitrotyrosine is detected in those suffering from rheumatoid arthritis, septic shock, and coeliac disease. Given an uncharacterized protein sequence that contains many tyrosine residues, which one of them can be nitrated and which one cannot? This is a challenging problem, not only directly related to in-depth understanding the PTM’s mechanism but also to the nitrotyrosine-based drug development. Particularly, with the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop a high throughput tool in this regard. Here, a new predictor called “iNitro-Tyr” was developed by incorporating the position-specific dipeptide propensity into the general pseudo amino acid composition for discriminating the nitrotyrosine sites from non-nitrotyrosine sites in proteins. It was demonstrated via the rigorous jackknife tests that the new predictor not only can yield higher success rate but also is much more stable and less noisy. A web-server for iNitro-Tyr is accessible to the public at http://app.aporc.org/iNitro-Tyr/. For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process. It has not escaped our notice that the approach presented here can be also used to deal with the other PTM sites in proteins.
Collapse
Affiliation(s)
- Yan Xu
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, China
- * E-mail:
| | - Xin Wen
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, China
| | - Li-Shu Wen
- College of Sciences, Liaoning Shiyou University, FuShun, China
| | - Ling-Yun Wu
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Nai-Yang Deng
- College of Science, China Agricultural University, Beijing, China
| | - Kuo-Chen Chou
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
- Gordon Life Science Institute, Boston, Massachusetts, United States of America
| |
Collapse
|
267
|
Xing YQ, Liu GQ, Zhao XJ, Zhao HY, Cai L. Genome-wide characterization and prediction of Arabidopsis thaliana replication origins. Biosystems 2014; 124:1-6. [PMID: 25050475 DOI: 10.1016/j.biosystems.2014.07.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Revised: 03/25/2014] [Accepted: 07/15/2014] [Indexed: 01/25/2023]
Abstract
Identification of replication origins is crucial for the faithful duplication of genomic DNA. The frequencies of single nucleotides and dinucleotides, GC/AT bias and GC/AT profile in the vicinity of Arabidopsis thaliana replication origins were analyzed in the present work. The guanine content or cytosine content is higher in origin of replication (Ori) than in non-Ori. The SS (S=G or C) dinucleotides are favoured in Ori whereas WW (W=A or T) dinucleotides are favoured in non-Ori. GC/AT bias and GC/AT profile in Ori are significantly different from that in non-Ori. Furthermore, by inputting DNA sequence features into support vector machine, we distinguished between the Ori and non-Ori regions in A. thaliana. The total prediction accuracy is about 69.5% as evaluated by the 10-fold cross-validation. This result suggested that apart from DNA sequence, deciphering the selection of replication origin must integrate many other factors including nucleosome positioning, DNA methylation, histone modification, etc. In addition, by comparing predictive performance we found that the predictive accuracy of SVM using sequence features on the context of WS language is significantly better than that of RY language. Furthermore, the same conclusion was also obtained in S. cerevisiae and D. melanogaster.
Collapse
Affiliation(s)
- Yong-Qiang Xing
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Guo-Qing Liu
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Xiu-Juan Zhao
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Hong-Yu Zhao
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China; Inner Mongolia Key Laboratory of Biomass-Energy Conversion, Baotou, 014010, China
| | - Lu Cai
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China; Inner Mongolia Key Laboratory of Biomass-Energy Conversion, Baotou, 014010, China.
| |
Collapse
|
268
|
Protein binding site prediction by combining hidden Markov support vector machine and profile-based propensities. ScientificWorldJournal 2014; 2014:464093. [PMID: 25133234 PMCID: PMC4122092 DOI: 10.1155/2014/464093] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Accepted: 07/01/2014] [Indexed: 11/22/2022] Open
Abstract
Identification of protein binding sites is critical for studying the function of the proteins. In this paper, we proposed a method for protein binding site prediction, which combined the order profile propensities and hidden Markov support vector machine (HM-SVM). This method employed the sequential labeling technique to the field of protein binding site prediction. The input features of HM-SVM include the profile-based propensities, the Position-Specific Score Matrix (PSSM), and Accessible Surface Area (ASA). When tested on different data sets, the proposed method showed promising results, and outperformed some closely relative methods by more than 10% in terms of AUC.
Collapse
|
269
|
Chen W, Feng PM, Deng EZ, Lin H, Chou KC. iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal Biochem 2014; 462:76-83. [PMID: 25016190 DOI: 10.1016/j.ab.2014.06.022] [Citation(s) in RCA: 218] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 06/26/2014] [Accepted: 06/27/2014] [Indexed: 01/25/2023]
Abstract
Translation is a key process for gene expression. Timely identification of the translation initiation site (TIS) is very important for conducting in-depth genome analysis. With the avalanche of genome sequences generated in the postgenomic age, it is highly desirable to develop automated methods for rapidly and effectively identifying TIS. Although some computational methods were proposed in this regard, none of them considered the global or long-range sequence-order effects of DNA, and hence their prediction quality was limited. To count this kind of effects, a new predictor, called "iTIS-PseTNC," was developed by incorporating the physicochemical properties into the pseudo trinucleotide composition, quite similar to the PseAAC (pseudo amino acid composition) approach widely used in computational proteomics. It was observed by the rigorous cross-validation test on the benchmark dataset that the overall success rate achieved by the new predictor in identifying TIS locations was over 97%. As a web server, iTIS-PseTNC is freely accessible at http://lin.uestc.edu.cn/server/iTIS-PseTNC. To maximize the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web server to obtain the desired results without the need to go through detailed mathematical equations, which are presented in this paper just for the integrity of the new prection method.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Peng-Mian Feng
- School of Public Health, Hebei United University, Tangshan 063000, China.
| | - En-Ze Deng
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Hao Lin
- Gordon Life Science Institute, Boston, MA 02478, USA; Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Kuo-Chen Chou
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China; Gordon Life Science Institute, Boston, MA 02478, USA; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| |
Collapse
|
270
|
A set of descriptors for identifying the protein-drug interaction in cellular networking. J Theor Biol 2014; 359:120-8. [PMID: 24949993 DOI: 10.1016/j.jtbi.2014.06.008] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 06/02/2014] [Accepted: 06/06/2014] [Indexed: 12/24/2022]
Abstract
The study of protein-drug interactions is a significant issue for drug development. Unfortunately, it is both expensive and time-consuming to perform physical experiments to determine whether a drug and a protein are interacting with each other. Some previous attempts to design an automated system to perform this task were based on the knowledge of the 3D structure of a protein, which is not always available in practice. With the availability of protein sequences generated in the post-genomic age, however, a sequence-based solution to deal with this problem is necessary. Following other works in this area, we propose a new machine learning system based on several protein descriptors extracted from several protein representations, such as, variants of the position specific scoring matrix (PSSM) of proteins, the amino-acid sequence, and a matrix representation of a protein. The prediction engine is operated by an ensemble of support vector machines (SVMs), with each SVM trained on a specific descriptor and the results of each SVM combined by sum rule. The overall success rate achieved by our final ensemble is notably higher than previous results obtained on the same datasets using the same testing protocols reported in the literature. MATLAB code and the datasets used in our experiments are freely available for future comparison at http://www.dei.unipd.it/node/2357.
Collapse
|
271
|
iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BIOMED RESEARCH INTERNATIONAL 2014; 2014:286419. [PMID: 24991545 PMCID: PMC4058692 DOI: 10.1155/2014/286419] [Citation(s) in RCA: 137] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Revised: 04/22/2014] [Accepted: 05/07/2014] [Indexed: 11/30/2022]
Abstract
Conotoxins are small disulfide-rich neurotoxic peptides, which can bind to ion channels with very high specificity and modulate their activities. Over the last few decades, conotoxins have been the drug candidates for treating chronic pain, epilepsy, spasticity, and cardiovascular diseases. According to their functions and targets, conotoxins are generally categorized into three types: potassium-channel type, sodium-channel type, and calcium-channel types. With the avalanche of peptide sequences generated in the postgenomic age, it is urgent and challenging to develop an automated method for rapidly and accurately identifying the types of conotoxins based on their sequence information alone. To address this challenge, a new predictor, called iCTX-Type, was developed by incorporating the dipeptide occurrence frequencies of a conotoxin sequence into a 400-D (dimensional) general pseudoamino acid composition, followed by the feature optimization procedure to reduce the sample representation from 400-D to 50-D vector. The overall success rate achieved by iCTX-Type via a rigorous cross-validation was over 91%, outperforming its counterpart (RBF network). Besides, iCTX-Type is so far the only predictor in this area with its web-server available, and hence is particularly useful for most experimental scientists to get their desired results without the need to follow the complicated mathematics involved.
Collapse
|
272
|
Human proteins characterization with subcellular localizations. J Theor Biol 2014; 358:61-73. [PMID: 24862400 DOI: 10.1016/j.jtbi.2014.05.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2014] [Revised: 05/04/2014] [Accepted: 05/05/2014] [Indexed: 11/20/2022]
Abstract
Proteins are responsible for performing the vast majority of cellular functions which are critical to a cell's survival. The knowledge of the subcellular localization of proteins can provide valuable information about their molecular functions. Therefore, one of the fundamental goals in cell biology and proteomics is to analyze the subcellular localizations and functions of these proteins. Recent large-scale human genomics and proteomics studies have made it possible to characterize human proteins at a subcellular localization level. In this study, according to the annotation in Swiss-Prot, 8842 human proteins were classified into seven subcellular localizations. Human proteins in the seven subcellular localizations were compared by using topological properties, biological properties, codon usage indices, mRNA expression levels, protein complexity and physicochemical properties. All these properties were found to be significantly different in the seven categories. In addition, based on these properties and pseudo-amino acid compositions, a machine learning classifier was built for the prediction of protein subcellular localization. The study presented here was an attempt to address the aforementioned properties for comparing human proteins of different subcellular localizations. We hope our findings presented in this study may provide important help for the prediction of protein subcellular localization and for understanding the general function of human proteins in cells.
Collapse
|
273
|
iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. BIOMED RESEARCH INTERNATIONAL 2014; 2014:623149. [PMID: 24967386 PMCID: PMC4055483 DOI: 10.1155/2014/623149] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Revised: 04/22/2014] [Accepted: 04/23/2014] [Indexed: 11/17/2022]
Abstract
In eukaryotic genes, exons are generally interrupted by introns. Accurately removing introns and joining exons together are essential processes in eukaryotic gene expression. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapid and effective detection of splice sites that play important roles in gene structure annotation and even in RNA splicing. Although a series of computational methods were proposed for splice site identification, most of them neglected the intrinsic local structural properties. In the present study, a predictor called “iSS-PseDNC” was developed for identifying splice sites. In the new predictor, the sequences were formulated by a novel feature-vector called “pseudo dinucleotide composition” (PseDNC) into which six DNA local structural properties were incorporated. It was observed by the rigorous cross-validation tests on two benchmark datasets that the overall success rates achieved by iSS-PseDNC in identifying splice donor site and splice acceptor site were 85.45% and 87.73%, respectively. It is anticipated that iSS-PseDNC may become a useful tool for identifying splice sites and that the six DNA local structural properties described in this paper may provide novel insights for in-depth investigations into the mechanism of RNA splicing.
Collapse
|
274
|
Xu Y, Wen X, Shao XJ, Deng NY, Chou KC. iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. Int J Mol Sci 2014; 15:7594-610. [PMID: 24857907 PMCID: PMC4057693 DOI: 10.3390/ijms15057594] [Citation(s) in RCA: 177] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Revised: 04/04/2014] [Accepted: 04/17/2014] [Indexed: 11/16/2022] Open
Abstract
Post-translational modifications (PTMs) play crucial roles in various cell functions and biological processes. Protein hydroxylation is one type of PTM that usually occurs at the sites of proline and lysine. Given an uncharacterized protein sequence, which site of its Pro (or Lys) can be hydroxylated and which site cannot? This is a challenging problem, not only for in-depth understanding of the hydroxylation mechanism, but also for drug development, because protein hydroxylation is closely relevant to major diseases, such as stomach and lung cancers. With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop computational methods to address this problem. In view of this, a new predictor called “iHyd-PseAAC” (identify hydroxylation by pseudo amino acid composition) was proposed by incorporating the dipeptide position-specific propensity into the general form of pseudo amino acid composition. It was demonstrated by rigorous cross-validation tests on stringent benchmark datasets that the new predictor is quite promising and may become a useful high throughput tool in this area. A user-friendly web-server for iHyd-PseAAC is accessible at http://app.aporc.org/iHyd-PseAAC/. Furthermore, for the convenience of the majority of experimental scientists, a step-by-step guide on how to use the web-server is given. Users can easily obtain their desired results by following these steps without the need of understanding the complicated mathematical equations presented in this paper just for its integrity.
Collapse
Affiliation(s)
- Yan Xu
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing 100083, China.
| | - Xin Wen
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing 100083, China.
| | - Xiao-Jian Shao
- Department of Mathematics and Information Science, Binzhou University, Binzhou 256603, China.
| | - Nai-Yang Deng
- College of Science, China Agricultural University, Beijing 100083, China.
| | - Kuo-Chen Chou
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| |
Collapse
|
275
|
Chen W, Lei TY, Jin DC, Lin H, Chou KC. PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Anal Biochem 2014; 456:53-60. [PMID: 24732113 DOI: 10.1016/j.ab.2014.04.001] [Citation(s) in RCA: 304] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2013] [Revised: 03/20/2014] [Accepted: 04/01/2014] [Indexed: 10/25/2022]
Abstract
The pseudo oligonucleotide composition, or pseudo K-tuple nucleotide composition (PseKNC), can be used to represent a DNA or RNA sequence with a discrete model or vector yet still keep considerable sequence order information, particularly the global or long-range sequence order information, via the physicochemical properties of its constituent oligonucleotides. Therefore, the PseKNC approach may hold very high potential for enhancing the power in dealing with many problems in computational genomics and genome sequence analysis. However, dealing with different DNA or RNA problems may need different kinds of PseKNC. Here, we present a flexible and user-friendly web server for PseKNC (at http://lin.uestc.edu.cn/pseknc/default.aspx) by which users can easily generate many different modes of PseKNC according to their need by selecting various parameters and physicochemical properties. Furthermore, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the current web server to generate their desired PseKNC without the need to follow the complicated mathematical equations, which are presented in this article just for the integrity of PseKNC formulation and its development. It is anticipated that the PseKNC web server will become a very useful tool in computational genomics and genome sequence analysis.
Collapse
Affiliation(s)
- Wei Chen
- School of Sciences, and Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China; Gordon Life Science Institute, Belmont, MA 02478, USA.
| | - Tian-Yu Lei
- School of Sciences, and Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China
| | - Dian-Chuan Jin
- School of Sciences, and Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China
| | - Hao Lin
- Gordon Life Science Institute, Belmont, MA 02478, USA; Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Kuo-Chen Chou
- School of Sciences, and Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China; Gordon Life Science Institute, Belmont, MA 02478, USA; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| |
Collapse
|
276
|
Zuo Y, Zhang P, Liu L, Li T, Peng Y, Li G, Li Q. Sequence-specific flexibility organization of splicing flanking sequence and prediction of splice sites in the human genome. Chromosome Res 2014; 22:321-34. [PMID: 24728765 DOI: 10.1007/s10577-014-9414-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 03/24/2014] [Accepted: 03/26/2014] [Indexed: 12/15/2022]
Abstract
More and more reported results of nucleosome positioning and histone modifications showed that DNA structure play a well-established role in splicing. In this study, a set of DNA geometric flexibility parameters originated from molecular dynamics (MD) simulations were introduced to discuss the structure organization around splice sites at the DNA level. The obtained profiles of specific flexibility/stiffness around splice sites indicated that the DNA physical-geometry deformation could be used as an alternative way to describe the splicing junction region. In combination with structural flexibility as discriminatory parameter, we developed a hybrid computational model for predicting potential splicing sites. And the better prediction performance was achieved when the benchmark dataset evaluated. Our results showed that the mechanical deformability character of a splice junction is closely correlated with both the splice site strength and structural information in its flanking sequences.
Collapse
Affiliation(s)
- Yongchun Zuo
- The Key Laboratory of National Education Ministry for Mammalian Reproductive Biology and Biotechnology, Inner Mongolia University, Hohhot, 010021, China,
| | | | | | | | | | | | | |
Collapse
|
277
|
Fan YN, Xiao X, Min JL, Chou KC. iNR-Drug: predicting the interaction of drugs with nuclear receptors in cellular networking. Int J Mol Sci 2014; 15:4915-37. [PMID: 24651462 PMCID: PMC3975431 DOI: 10.3390/ijms15034915] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Revised: 02/12/2014] [Accepted: 02/16/2014] [Indexed: 12/20/2022] Open
Abstract
Nuclear receptors (NRs) are closely associated with various major diseases such as cancer, diabetes, inflammatory disease, and osteoporosis. Therefore, NRs have become a frequent target for drug development. During the process of developing drugs against these diseases by targeting NRs, we are often facing a problem: Given a NR and chemical compound, can we identify whether they are really in interaction with each other in a cell? To address this problem, a predictor called “iNR-Drug” was developed. In the predictor, the drug compound concerned was formulated by a 256-D (dimensional) vector derived from its molecular fingerprint, and the NR by a 500-D vector formed by incorporating its sequential evolution information and physicochemical features into the general form of pseudo amino acid composition, and the prediction engine was operated by the SVM (support vector machine) algorithm. Compared with the existing prediction methods in this area, iNR-Drug not only can yield a higher success rate, but is also featured by a user-friendly web-server established at http://www.jci-bioinfo.cn/iNR-Drug/, which is particularly useful for most experimental scientists to obtain their desired data in a timely manner. It is anticipated that the iNR-Drug server may become a useful high throughput tool for both basic research and drug development, and that the current approach may be easily extended to study the interactions of drug with other targets as well.
Collapse
Affiliation(s)
- Yue-Nong Fan
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen 333046, Jiangxi, China.
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen 333046, Jiangxi, China.
| | - Jian-Liang Min
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen 333046, Jiangxi, China.
| | - Kuo-Chen Chou
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| |
Collapse
|