9
|
Liu B, Liu F, Fang L, Wang X, Chou KC. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. ACTA ACUST UNITED AC 2014; 31:1307-9. [PMID: 25504848 DOI: 10.1093/bioinformatics/btu820] [Citation(s) in RCA: 203] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 12/05/2014] [Indexed: 12/29/2022]
Abstract
UNLABELLED In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA. AVAILABILITY AND IMPLEMENTATION The repDNA Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repDNA/. CONTACT bliu@insun.hit.edu.cn or kcchou@gordonlifescience.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Fule Liu
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Longyun Fang
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Xiaolong Wang
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Kuo-Chen Chou
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| |
Collapse
|
11
|
Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou KC. PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. ACTA ACUST UNITED AC 2014; 31:119-20. [PMID: 25231908 DOI: 10.1093/bioinformatics/btu602] [Citation(s) in RCA: 181] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY The avalanche of genomic sequences generated in the post-genomic age requires efficient computational methods for rapidly and accurately identifying biological features from sequence information. Towards this goal, we developed a freely available and open-source package, called PseKNC-General (the general form of pseudo k-tuple nucleotide composition), that allows for fast and accurate computation of all the widely used nucleotide structural and physicochemical properties of both DNA and RNA sequences. PseKNC-General can generate several modes of pseudo nucleotide compositions, including conventional k-tuple nucleotide compositions, Moreau-Broto autocorrelation coefficient, Moran autocorrelation coefficient, Geary autocorrelation coefficient, Type I PseKNC and Type II PseKNC. In every mode, >100 physicochemical properties are available for choosing. Moreover, it is flexible enough to allow the users to calculate PseKNC with user-defined properties. The package can be run on Linux, Mac and Windows systems and also provides a graphical user interface. AVAILABILITY AND IMPLEMENTATION The package is freely available at: http://lin.uestc.edu.cn/server/pseknc.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, Chin
| | - Xitong Zhang
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Jordan Brooker
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Hao Lin
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Liqing Zhang
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Kuo-Chen Chou
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, China, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, School of Life Science and Technology, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, Boston, MA 02478, Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, Department of Computer Science, Vassar College, Poughkeepsie, NY 12604, USA, Excellence in Genomic Medicine Research, Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China and Excellence in Genomic Medicine Research, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063009, Chin
| |
Collapse
|