201
|
Abstract
Nucleosome positioning is an important process required for proper genome packing and its accessibility to execute the genetic program in a cell-specific, timely manner. In the recent years hundreds of papers have been devoted to the bioinformatics, physics and biology of nucleosome positioning. The purpose of this review is to cover a practical aspect of this field, namely, to provide a guide to the multitude of nucleosome positioning resources available online. These include almost 300 experimental datasets of genome-wide nucleosome occupancy profiles determined in different cell types and more than 40 computational tools for the analysis of experimental nucleosome positioning data and prediction of intrinsic nucleosome formation probabilities from the DNA sequence. A manually curated, up to date list of these resources will be maintained at http://generegulation.info.
Collapse
|
202
|
Kabir M, Iqbal M, Ahmad S, Hayat M. iTIS-PseKNC: Identification of Translation Initiation Site in human genes using pseudo k-tuple nucleotides composition. Comput Biol Med 2015; 66:252-7. [PMID: 26433457 DOI: 10.1016/j.compbiomed.2015.09.010] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2015] [Accepted: 09/14/2015] [Indexed: 10/23/2022]
Abstract
Translation is an essential genetic process for understanding the mechanism of gene expression. Due to the large number of protein sequences generated in the post-genomic era, conventional methods are unable to identify Translation Initiation Site (TIS) in human genes timely and accurately. It is thus highly desirable to develop an automatic and accurate computational model for identification of TIS. Considerable improvements have been achieved in developing computational models; however, development of accurate and reliable automated systems for TIS identification in human genes is still a challenging task. In this connection, we propose iTIS-PseKNC, a novel protocol for identification of TIS. Three protein sequence representation methods including dinucleotide composition, pseudo-dinucleotide composition and Trinucleotide composition have been used in order to extract numerical descriptors. Support Vector Machine (SVM), K-nearest neighbor and Probabilistic Neural Network are assessed for their performance using the constructed descriptors. The proposed model iTIS-PseKNC has achieved 99.40% accuracy using jackknife test. The experimental results validated the superior performance of iTIS-PseKNC over the existing methods reported in the literature. It is highly anticipated that the iTIS-PseKNC predictor will be useful for basic research studies.
Collapse
Affiliation(s)
- Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Saeed Ahmad
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| |
Collapse
|
203
|
Identifying Antioxidant Proteins by Using Optimal Dipeptide Compositions. Interdiscip Sci 2015; 8:186-191. [DOI: 10.1007/s12539-015-0124-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Revised: 06/29/2015] [Accepted: 08/29/2015] [Indexed: 11/26/2022]
|
204
|
Identification and analysis of the N(6)-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci Rep 2015; 5:13859. [PMID: 26343792 PMCID: PMC4561376 DOI: 10.1038/srep13859] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 08/10/2015] [Indexed: 12/14/2022] Open
Abstract
Knowledge of the distribution of N6-methyladenosine (m6A) is invaluable for understanding RNA biological functions. However, limitation in experimental methods impedes the progress towards the identification of m6A site. As a complement of experimental methods, a support vector machine based-method is proposed to identify m6A sites in Saccharomyces cerevisiae genome. In this model, RNA sequences are encoded by their nucleotide chemical property and accumulated nucleotide frequency information. It is observed in the jackknife test that the accuracy achieved by the proposed model in identifying the m6A site was 78.15%. For the convenience of experimental scientists, a web-server for the proposed model is provided at http://lin.uestc.edu.cn/server/m6Apred.php.
Collapse
|
205
|
Survey of Programs Used to Detect Alternative Splicing Isoforms from Deep Sequencing Data In Silico. BIOMED RESEARCH INTERNATIONAL 2015; 2015:831352. [PMID: 26421304 PMCID: PMC4573434 DOI: 10.1155/2015/831352] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 02/17/2015] [Accepted: 03/02/2015] [Indexed: 11/29/2022]
Abstract
Next-generation sequencing techniques have been rapidly emerging. However, the massive sequencing reads hide a great deal of unknown important information. Advances have enabled researchers to discover alternative splicing (AS) sites and isoforms using computational approaches instead of molecular experiments. Given the importance of AS for gene expression and protein diversity in eukaryotes, detecting alternative splicing and isoforms represents a hot topic in systems biology and epigenetics research. The computational methods applied to AS prediction have improved since the emergence of next-generation sequencing. In this study, we introduce state-of-the-art research on AS and then compare the research methods and software tools available for AS based on next-generation sequencing reads. Finally, we discuss the prospects of computational methods related to AS.
Collapse
|
206
|
Kou G, Feng Y. Identify five kinds of simple super-secondary structures with quadratic discriminant algorithm based on the chemical shifts. J Theor Biol 2015; 380:392-8. [DOI: 10.1016/j.jtbi.2015.06.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 06/02/2015] [Accepted: 06/04/2015] [Indexed: 10/23/2022]
|
207
|
Zhao X, Ning Q, Chai H, Ai M, Ma Z. PGlcS: Prediction of protein O-GlcNAcylation sites with multiple features and analysis. J Theor Biol 2015; 380:524-9. [DOI: 10.1016/j.jtbi.2015.06.026] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 06/01/2015] [Accepted: 06/02/2015] [Indexed: 10/23/2022]
|
208
|
Zepeda Mendoza ML, Sicheritz-Pontén T, Gilbert MTP. Environmental genes and genomes: understanding the differences and challenges in the approaches and software for their analyses. Brief Bioinform 2015; 16:745-58. [PMID: 25673291 PMCID: PMC4570204 DOI: 10.1093/bib/bbv001] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Revised: 12/16/2014] [Indexed: 01/19/2023] Open
Abstract
DNA-based taxonomic and functional profiling is widely used for the characterization of organismal communities across a rapidly increasing array of research areas that include the role of microbiomes in health and disease, biomonitoring, and estimation of both microbial and metazoan species richness. Two principal approaches are currently used to assign taxonomy to DNA sequences: DNA metabarcoding and metagenomics. When initially developed, each of these approaches mandated their own particular methods for data analysis; however, with the development of high-throughput sequencing (HTS) techniques they have begun to share many aspects in data set generation and processing. In this review we aim to define the current characteristics, goals and boundaries of each field, and describe the different software used for their analysis. We argue that an appreciation of the potential and limitations of each method can help underscore the improvements required by each field so as to better exploit the richness of current HTS-based data sets.
Collapse
|
209
|
Iqbal S, Mishra A, Hoque MT. Improved prediction of accessible surface area results in efficient energy function application. J Theor Biol 2015; 380:380-91. [DOI: 10.1016/j.jtbi.2015.06.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Revised: 05/15/2015] [Accepted: 06/02/2015] [Indexed: 01/16/2023]
|
210
|
Kabir M, Hayat M. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Mol Genet Genomics 2015; 291:285-96. [DOI: 10.1007/s00438-015-1108-5] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 08/19/2015] [Indexed: 10/23/2022]
|
211
|
Association analysis between the distributions of histone modifications and gene expression in the human embryonic stem cell. Gene 2015; 575:90-100. [PMID: 26302750 DOI: 10.1016/j.gene.2015.08.041] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Revised: 05/13/2015] [Accepted: 08/20/2015] [Indexed: 12/17/2022]
Abstract
It is well known that histone modifications are associated with gene expression. In order to further study this relationship, 16 kinds of Chip-seq histone modification data and mRNA-seq data of the human embryonic stem cell H1 are chosen. The distributions of histone modifications in the regions flanking transcription start sites (TSSs) for highly expressed and lowly expressed genes are computed, respectively. And four types of distributions of histone modifications in regions flanking TSSs and the spatial patterning of the correlations between histone modifications and gene expression are detected. Our results suggest that the correlations between the regions overlapped by peaks are higher than the non-overlapped ones for each histone modification. In addition, to obtain the effect of the cooperative action of histone modification on gene expression, five histone modification clusters are found in highly expressed and lowly expressed genes, histone modification and gene expression interaction network is constructed. To further explore which region is the main target region for the specific histone modification, the human genes are divided into five functional regions. The results indicate that histone modifications are mostly located in the promoters of highly expressed genes versus the exons of lowly expressed genes, and exons have a smaller range of normalized tag counts than other gene elements in the two groups of genes. Finally, the type specificity and regional bias of histone modifications for 11 key transcription factor genes regulating the stem cell renewal are analyzed.
Collapse
|
212
|
Ali F, Hayat M. Classification of membrane protein types using Voting Feature Interval in combination with Chou's Pseudo Amino Acid Composition. J Theor Biol 2015; 384:78-83. [PMID: 26297889 DOI: 10.1016/j.jtbi.2015.07.034] [Citation(s) in RCA: 112] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 07/15/2015] [Accepted: 07/29/2015] [Indexed: 12/11/2022]
Abstract
Membrane protein is a major constituent of cell, performing numerous crucial functions in the cell. These functions are mostly concerned with membrane protein's types. Initially, membrane proteins types are classified through traditional methods and reasonable results were obtained using these methods. However, due to large exploration of protein sequences in databases, it is very difficult or sometimes impossible to classify through conventional methods, because it is laborious and wasting of time. Therefore, a new powerful discriminating model is indispensable for classification of membrane protein's types with high precision. In this work, a quite promising classification model is developed having effective discriminating power of membrane protein's types. In our classification model, silent features of protein sequences are extracted via Pseudo Amino Acid Composition. Five classification algorithms were utilized. Among these classification algorithms Voting Feature Interval has obtained outstanding performance in all the three datasets. The accuracy of proposed model is 93.9% on dataset S1, 89.33% on S2 and 86.9% on dataset S3, respectively, applying 10-fold cross validation test. The success rates revealed that our proposed model has obtained the utmost outcomes than other existing models in literatures so far and will be played a substantial role in the fields of drug design and pharmaceutical industry.
Collapse
Affiliation(s)
- Farman Ali
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| |
Collapse
|
213
|
Zheng W, Ruan J, Hu G, Wang K, Hanlon M, Gao J. Analysis of Conformational B-Cell Epitopes in the Antibody-Antigen Complex Using the Depth Function and the Convex Hull. PLoS One 2015; 10:e0134835. [PMID: 26244562 PMCID: PMC4526569 DOI: 10.1371/journal.pone.0134835] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 07/14/2015] [Indexed: 01/05/2023] Open
Abstract
The prediction of conformational b-cell epitopes plays an important role in immunoinformatics. Several computational methods are proposed on the basis of discrimination determined by the solvent-accessible surface between epitopes and non-epitopes, but the performance of existing methods is far from satisfying. In this paper, depth functions and the k-th surface convex hull are used to analyze epitopes and exposed non-epitopes. On each layer of the protein, we compute relative solvent accessibility and four different types of depth functions, i.e., Chakravarty depth, DPX, half-sphere exposure and half space depth, to analyze the location of epitopes on different layers of the proteins. We found that conformational b-cell epitopes are rich in charged residues Asp, Glu, Lys, Arg, His; aliphatic residues Gly, Pro; non-charged residues Asn, Gln; and aromatic residue Tyr. Conformational b-cell epitopes are rich in coils. Conservation of epitopes is not significantly lower than that of exposed non-epitopes. The average depths (obtained by four methods) for epitopes are significantly lower than that of non-epitopes on the surface using the Wilcoxon rank sum test. Epitopes are more likely to be located in the outer layer of the convex hull of a protein. On the benchmark dataset, the cumulate 10th convex hull covers 84.6% of exposed residues on the protein surface area, and nearly 95% of epitope sites. These findings may be helpful in building a predictor for epitopes.
Collapse
Affiliation(s)
- Wei Zheng
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Jishou Ruan
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
- State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, People’s Republic of China
| | - Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Michelle Hanlon
- Department of Physical Sciences, Grant MacEwan University, Alberta, Canada
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
- * E-mail:
| |
Collapse
|
214
|
Liu Y, Munteanu CR, Fernández Blanco E, Tan Z, Santos Del Riego A, Pazos A. Prediction of Nucleotide Binding Peptides Using Star Graph Topological Indices. Mol Inform 2015; 34:736-41. [PMID: 27491034 DOI: 10.1002/minf.201500064] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 07/06/2015] [Indexed: 01/14/2023]
Abstract
The nucleotide binding proteins are involved in many important cellular processes, such as transmission of genetic information or energy transfer and storage. Therefore, the screening of new peptides for this biological function is an important research topic. The current study proposes a mixed methodology to obtain the first classification model that is able to predict new nucleotide binding peptides, using only the amino acid sequence. Thus, the methodology uses a Star graph molecular descriptor of the peptide sequences and the Machine Learning technique for the best classifier. The best model represents a Random Forest classifier based on two features of the embedded and non-embedded graphs. The performance of the model is excellent, considering similar models in the field, with an Area Under the Receiver Operating Characteristic Curve (AUROC) value of 0.938 and true positive rate (TPR) of 0.886 (test subset). The prediction of new nucleotide binding peptides with this model could be useful for drug target studies in drug development.
Collapse
Affiliation(s)
- Yong Liu
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160.,Faculty of Veterinary Medicine and Animal Science, Autonomous University of the State of Mexico, Toluca, 50090, México.,Key Laboratory of Subtropical Agro-ecological Engineering, Institute of Subtropical Agriculture, the Chinese Academy of Sciences, Changsha, Hunan, 410125, P. R. China
| | - Cristian R Munteanu
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160.
| | - Enrique Fernández Blanco
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160
| | - Zhiliang Tan
- Key Laboratory of Subtropical Agro-ecological Engineering, Institute of Subtropical Agriculture, the Chinese Academy of Sciences, Changsha, Hunan, 410125, P. R. China
| | - Antonino Santos Del Riego
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160
| | - Alejandro Pazos
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160
| |
Collapse
|
215
|
Yu JF, Chen QL, Ren J, Yang YL, Wang JH, Sun X. Analysis of the multi-copied genes and the impact of the redundant protein coding sequences on gene annotation in prokaryotic genomes. J Theor Biol 2015; 376:8-14. [PMID: 25865522 DOI: 10.1016/j.jtbi.2015.04.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Revised: 03/09/2015] [Accepted: 04/01/2015] [Indexed: 10/23/2022]
Abstract
The important roles of duplicated genes in evolutional process have been recognized in bacteria, archaebacteria and eukaryotes, while there is very little study on the multi-copied protein coding genes that share sequence identity of 100%. In this paper, the multi-copied protein coding genes in a number of prokaryotic genomes are comprehensively analyzed firstly. The results show that 0-15.93% of the protein coding genes in each genome are multi-copied genes and 0-16.49% of the protein coding genes in each genome are highly similar with the sequence identity ≥ 80%. Function and COG (Clusters of Orthologous Groups of proteins) analysis shows that 64.64% of multi-copied genes concentrate on the function of transposase and 86.28% of the COG assigned multi-copied genes concentrate on the COG code of 'L'. Furthermore, the impact of redundant protein coding sequences on the gene prediction results is studied. The results show that the problem of protein coding sequence redundancies cannot be ignored and the consistency of the gene annotation results before and after excluding the redundant sequences is negatively related with the sequences redundancy degree of the protein coding sequences in the training set.
Collapse
Affiliation(s)
- Jia-Feng Yu
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China; State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China.
| | - Qing-Li Chen
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China; College of life science, Shandong Normal University, Jinan 250358, China
| | - Jing Ren
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Yan-Ling Yang
- School of Physics and Electronic Information, Dezhou University, Dezhou 253023, China
| | - Ji-Hua Wang
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China; School of Physics and Electronic Information, Dezhou University, Dezhou 253023, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| |
Collapse
|
216
|
iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. J Theor Biol 2015; 377:47-56. [DOI: 10.1016/j.jtbi.2015.04.011] [Citation(s) in RCA: 243] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 04/07/2015] [Accepted: 04/09/2015] [Indexed: 12/24/2022]
|
217
|
Liu B, Liu F, Fang L, Wang X, Chou KC. repRNA: a web server for generating various feature vectors of RNA sequences. Mol Genet Genomics 2015; 291:473-81. [DOI: 10.1007/s00438-015-1078-7] [Citation(s) in RCA: 102] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Accepted: 06/04/2015] [Indexed: 10/23/2022]
|
218
|
TargetFreeze: Identifying Antifreeze Proteins via a Combination of Weights using Sequence Evolutionary Information and Pseudo Amino Acid Composition. J Membr Biol 2015; 248:1005-14. [PMID: 26058944 DOI: 10.1007/s00232-015-9811-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 05/19/2015] [Indexed: 11/26/2022]
Abstract
Antifreeze proteins (AFPs) are indispensable for living organisms to survive in an extremely cold environment and have a variety of potential biotechnological applications. The accurate prediction of antifreeze proteins has become an important issue and is urgently needed. Although considerable progress has been made, AFP prediction is still a challenging problem due to the diversity of species. In this study, we proposed a new sequence-based AFP predictor, called TargetFreeze. TargetFreeze utilizes an enhanced feature representation method that weightedly combines multiple protein features and takes the powerful support vector machine as the prediction engine. Computer experiments on benchmark datasets demonstrate the superiority of the proposed TargetFreeze over most recently released AFP predictors. We also implemented a user-friendly web server, which is openly accessible for academic use and is available at http://csbio.njust.edu.cn/bioinf/TargetFreeze. TargetFreeze supplements existing AFP predictors and will have potential applications in AFP-related biotechnology fields.
Collapse
|
219
|
Marrero-Ponce Y, Contreras-Torres E, García-Jacas CR, Barigye SJ, Cubillán N, Alvarado YJ. Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes. J Theor Biol 2015; 374:125-37. [DOI: 10.1016/j.jtbi.2015.03.026] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Revised: 02/23/2015] [Accepted: 03/20/2015] [Indexed: 12/11/2022]
|
220
|
Giancarlo R, Rombo SE, Utro F. Epigenomick-mer dictionaries: shedding light on how sequence composition influencesin vivonucleosome positioning. Bioinformatics 2015; 31:2939-46. [DOI: 10.1093/bioinformatics/btv295] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Accepted: 05/04/2015] [Indexed: 12/28/2022] Open
|
221
|
Briefing in application of machine learning methods in ion channel prediction. ScientificWorldJournal 2015; 2015:945927. [PMID: 25961077 PMCID: PMC4415473 DOI: 10.1155/2015/945927] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 09/11/2014] [Indexed: 01/09/2023] Open
Abstract
In cells, ion channels are one of the most important classes of membrane proteins which allow inorganic ions to move across the membrane. A wide range of biological processes are involved and regulated by the opening and closing of ion channels. Ion channels can be classified into numerous classes and different types of ion channels exhibit different functions. Thus, the correct identification of ion channels and their types using computational methods will provide in-depth insights into their function in various biological processes. In this review, we will briefly introduce and discuss the recent progress in ion channel prediction using machine learning methods.
Collapse
|
222
|
Liu B, Liu F, Wang X, Chen J, Fang L, Chou KC. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 2015; 43:W65-71. [PMID: 25958395 PMCID: PMC4489303 DOI: 10.1093/nar/gkv458] [Citation(s) in RCA: 558] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 04/27/2015] [Indexed: 11/12/2022] Open
Abstract
With the avalanche of biological sequences generated in the post-genomic age, one of the most challenging problems in computational biology is how to effectively formulate the sequence of a biological sample (such as DNA, RNA or protein) with a discrete model or a vector that can effectively reflect its sequence pattern information or capture its key features concerned. Although several web servers and stand-alone tools were developed to address this problem, all these tools, however, can only handle one type of samples. Furthermore, the number of their built-in properties is limited, and hence it is often difficult for users to formulate the biological sequences according to their desired features or properties. In this article, with a much larger number of built-in properties, we are to propose a much more flexible web server called Pse-in-One (http://bioinformatics.hitsz.edu.cn/Pse-in-One/), which can, through its 28 different modes, generate nearly all the possible feature vectors for DNA, RNA and protein sequences. Particularly, it can also generate those feature vectors with the properties defined by users themselves. These feature vectors can be easily combined with machine-learning algorithms to develop computational predictors and analysis methods for various tasks in bioinformatics and system biology. It is anticipated that the Pse-in-One web server will become a very useful tool in computational proteomics, genomics, as well as biological sequence analysis. Moreover, to maximize users’ convenience, its stand-alone version can also be downloaded from http://bioinformatics.hitsz.edu.cn/Pse-in-One/download/, and directly run on Windows, Linux, Unix and Mac OS.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China Gordon Life Science Institute, Belmont, MA 02478, USA
| | - Fule Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Longyun Fang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA 02478, USA Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
223
|
Liu B, Chen J, Wang X. Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Mol Genet Genomics 2015; 290:1919-31. [DOI: 10.1007/s00438-015-1044-4] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Accepted: 04/06/2015] [Indexed: 02/07/2023]
|
224
|
Wang X, Zhang W, Zhang Q, Li GZ. MultiP-SChlo: multi-label protein subchloroplast localization prediction with Chou's pseudo amino acid composition and a novel multi-label classifier. Bioinformatics 2015; 31:2639-45. [PMID: 25900916 DOI: 10.1093/bioinformatics/btv212] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 04/13/2015] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION Identifying protein subchloroplast localization in chloroplast organelle is very helpful for understanding the function of chloroplast proteins. There have existed a few computational prediction methods for protein subchloroplast localization. However, these existing works have ignored proteins with multiple subchloroplast locations when constructing prediction models, so that they can predict only one of all subchloroplast locations of this kind of multilabel proteins. RESULTS To address this problem, through utilizing label-specific features and label correlations simultaneously, a novel multilabel classifier was developed for predicting protein subchloroplast location(s) with both single and multiple location sites. As an initial study, the overall accuracy of our proposed algorithm reaches 55.52%, which is quite high to be able to become a promising tool for further studies. AVAILABILITY AND IMPLEMENTATION An online web server for our proposed algorithm named MultiP-SChlo was developed, which are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/multip-schlo/. CONTACT pandaxiaoxi@gmail.com or gzli@tongji.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China and
| | - Weiwei Zhang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China and
| | - Qiuwen Zhang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China and
| | - Guo-Zheng Li
- Department of Control Science and Engineering, Tongji University, Shanghai 201804, China
| |
Collapse
|
225
|
Liu B, Fang L, Liu F, Wang X, Chen J, Chou KC. Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One 2015; 10:e0121501. [PMID: 25821974 PMCID: PMC4378912 DOI: 10.1371/journal.pone.0121501] [Citation(s) in RCA: 179] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Accepted: 01/31/2015] [Indexed: 01/08/2023] Open
Abstract
Containing about 22 nucleotides, a micro RNA (abbreviated miRNA) is a small non-coding RNA molecule, functioning in transcriptional and post-transcriptional regulation of gene expression. The human genome may encode over 1000 miRNAs. Albeit poorly characterized, miRNAs are widely deemed as important regulators of biological processes. Aberrant expression of miRNAs has been observed in many cancers and other disease states, indicating they are deeply implicated with these diseases, particularly in carcinogenesis. Therefore, it is important for both basic research and miRNA-based therapy to discriminate the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops). Particularly, with the avalanche of RNA sequences generated in the postgenomic age, it is highly desired to develop computational sequence-based methods in this regard. Here two new predictors, called “iMcRNA-PseSSC” and “iMcRNA-ExPseSSC”, were proposed for identifying the human pre-microRNAs by incorporating the global or long-range structure-order information using a way quite similar to the pseudo amino acid composition approach. Rigorous cross-validations on a much larger and more stringent newly constructed benchmark dataset showed that the two new predictors (accessible at http://bioinformatics.hitsz.edu.cn/iMcRNA/) outperformed or were highly comparable with the best existing predictors in this area.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Gordon Life Science Institute, Belmont, Massachusetts, United States of America
- * E-mail:
| | - Longyun Fang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Fule Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, Massachusetts, United States of America
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
226
|
Abbas MM, Mohie-Eldin MM, EL-Manzalawy Y. Assessing the effects of data selection and representation on the development of reliable E. coli sigma 70 promoter region predictors. PLoS One 2015; 10:e0119721. [PMID: 25803493 PMCID: PMC4372424 DOI: 10.1371/journal.pone.0119721] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Accepted: 01/26/2015] [Indexed: 11/27/2022] Open
Abstract
As the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery through binding to a variety of regulatory proteins (known as sigma factors). The identification of the promoter regions is very challenging because these regions do not adhere to specific sequence patterns or motifs and are difficult to determine experimentally. Machine learning represents a promising and cost-effective approach for computational identification of prokaryotic promoter regions. However, the quality of the predictors depends on several factors including: i) training data; ii) data representation; iii) classification algorithms; iv) evaluation procedures. In this work, we create several variants of E. coli promoter data sets and utilize them to experimentally examine the effect of these factors on the predictive performance of E. coli σ70 promoter models. Our results suggest that under some combinations of the first three criteria, a prediction model might perform very well on cross-validation experiments while its performance on independent test data is drastically very poor. This emphasizes the importance of evaluating promoter region predictors using independent test data, which corrects for the over-optimistic performance that might be estimated using the cross-validation procedure. Our analysis of the tested models shows that good prediction models often perform well despite how the non-promoter data was obtained. On the other hand, poor prediction models seems to be more sensitive to the choice of non-promoter sequences. Interestingly, the best performing sequence-based classifiers outperform the best performing structure-based classifiers on both cross-validation and independent test performance evaluation experiments. Finally, we propose a meta-predictor method combining two top performing sequence-based and structure-based classifiers and compare its performance with some of the state-of-the-art E. coli σ70 promoter prediction methods.
Collapse
Affiliation(s)
- Mostafa M. Abbas
- KINDI Center for Computing Research, College of Engineering, Qatar University, Doha, Qatar
| | | | - Yasser EL-Manzalawy
- Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
- College of Information Sciences, Penn State University, University Park, United States of America
| |
Collapse
|
227
|
Tanchotsrinon W, Lursinsap C, Poovorawan Y. A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition. BMC Bioinformatics 2015; 16:71. [PMID: 25880169 PMCID: PMC4375884 DOI: 10.1186/s12859-015-0493-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 02/06/2015] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Human Papillomavirus (HPV) genotyping is an important approach to fight cervical cancer due to the relevant information regarding risk stratification for diagnosis and the better understanding of the relationship of HPV with carcinogenesis. This paper proposed two new feature extraction techniques, i.e. ChaosCentroid and ChaosFrequency, for predicting HPV genotypes associated with the cancer. The additional diversified 12 HPV genotypes, i.e. types 6, 11, 16, 18, 31, 33, 35, 45, 52, 53, 58, and 66, were studied in this paper. In our proposed techniques, a partitioned Chaos Game Representation (CGR) is deployed to represent HPV genomes. ChaosCentroid captures the structure of sequences in terms of centroid of each sub-region with Euclidean distances among the centroids and the center of CGR as the relations of all sub-regions. ChaosFrequency extracts the statistical distribution of mono-, di-, or higher order nucleotides along HPV genomes and forms a matrix of frequency of dots in each sub-region. For performance evaluation, four different types of classifiers, i.e. Multi-layer Perceptron, Radial Basis Function, K-Nearest Neighbor, and Fuzzy K-Nearest Neighbor Techniques were deployed, and our best results from each classifier were compared with the NCBI genotyping tool. RESULTS The experimental results obtained by four different classifiers are in the same trend. ChaosCentroid gave considerably higher performance than ChaosFrequency when the input length is one but it was moderately lower than ChaosFrequency when the input length is two. Both proposed techniques yielded almost or exactly the best performance when the input length is more than three. But there is no significance between our proposed techniques and the comparative alignment method. CONCLUSIONS Our proposed alignment-free and scale-independent method can successfully transform HPV genomes with 7,000 - 10,000 base pairs into features of 1 - 11 dimensions. This signifies that our ChaosCentroid and ChaosFrequency can be served as the effective feature extraction techniques for predicting the HPV genotypes.
Collapse
Affiliation(s)
- Watcharaporn Tanchotsrinon
- Advanced Virtual and Intelligent Computing Research Center (AVIC), Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Phayathai Road, Bangkok, Thailand.
| | - Chidchanok Lursinsap
- Advanced Virtual and Intelligent Computing Research Center (AVIC), Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Phayathai Road, Bangkok, Thailand.
| | - Yong Poovorawan
- Center of Excellence in Clinical Virology, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Phayathai Road, Bangkok, Thailand.
| |
Collapse
|
228
|
Liu B, Fang L, Liu F, Wang X, Chou KC. iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. J Biomol Struct Dyn 2015; 34:223-35. [DOI: 10.1080/07391102.2015.1014422] [Citation(s) in RCA: 96] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
229
|
Ding Y, Wang X, Mou Z. Communities in the iron superoxide dismutase amino acid network. J Theor Biol 2015; 367:278-285. [PMID: 25500180 DOI: 10.1016/j.jtbi.2014.11.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Revised: 11/24/2014] [Accepted: 11/28/2014] [Indexed: 10/24/2022]
Abstract
Amino acid networks (AANs) analysis is a new way to reveal the relationship between protein structure and function. We constructed six different types of AANs based on iron superoxide dismutase (Fe-SOD) three-dimensional structure information. These Fe-SOD AANs have clear community structures when they were modularized by different methods. Especially, detected communities are related to Fe-SOD secondary structures. Regular structures show better correlations with detected communities than irregular structures, and loops weaken these correlations, which suggest that secondary structure is the unit element in Fe-SOD folding process. In addition, a comparative analysis of mesophilic and thermophilic Fe-SOD AANs' communities revealed that thermostable Fe-SOD AANs had more highly associated community structures than mesophilic one. Thermophilic Fe-SOD AANs also had more high similarity between communities and secondary structures than mesophilic Fe-SOD AANs. The communities in Fe-SOD AANs show that dense interactions in modules can help to stabilize thermophilic Fe-SOD.
Collapse
Affiliation(s)
- Yanrui Ding
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu, 214122, P. R. China; Key Laboratory of Industrial Biotechnology, Jiangnan University, Wuxi, Jiangsu, 214122, P. R. China.
| | - Xueqin Wang
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu, 214122, P. R. China
| | - Zhaolin Mou
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu, 214122, P. R. China
| |
Collapse
|
230
|
Xu R, Zhou J, Wang H, He Y, Wang X, Liu B. Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 1:S10. [PMID: 25708928 PMCID: PMC4331676 DOI: 10.1186/1752-0509-9-s1-s10] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
BACKGROUND DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. RESULTS We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. CONCLUSIONS The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/.
Collapse
Affiliation(s)
- Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Hongpeng Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Yulan He
- School of Engineering & Applied Science, Aston University, Birmingham, UK
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| |
Collapse
|
231
|
Panwar B, Raghava GPS. Identification of protein-interacting nucleotides in a RNA sequence using composition profile of tri-nucleotides. Genomics 2015; 105:197-203. [PMID: 25640448 DOI: 10.1016/j.ygeno.2015.01.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 01/21/2015] [Accepted: 01/23/2015] [Indexed: 10/24/2022]
Abstract
The RNA-protein interactions play a diverse role in the cells, thus identification of RNA-protein interface is essential for the biologist to understand their function. In the past, several methods have been developed for predicting RNA interacting residues in proteins, but limited efforts have been made for the identification of protein-interacting nucleotides in RNAs. In order to discriminate protein-interacting and non-interacting nucleotides, we used various classifiers (NaiveBayes, NaiveBayesMultinomial, BayesNet, ComplementNaiveBayes, MultilayerPerceptron, J48, SMO, RandomForest, SMO and SVM(light)) for prediction model development using various features and achieved highest 83.92% sensitivity, 84.82 specificity, 84.62% accuracy and 0.62 Matthew's correlation coefficient by SVM(light) based models. We observed that certain tri-nucleotides like ACA, ACC, AGA, CAC, CCA, GAG, UGA, and UUU preferred in protein-interaction. All the models have been developed using a non-redundant dataset and are evaluated using five-fold cross validation technique. A web-server called RNApin has been developed for the scientific community (http://crdd.osdd.net/raghava/rnapin/).
Collapse
Affiliation(s)
- Bharat Panwar
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Sector 39A, Chandigarh, India.
| | - Gajendra P S Raghava
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Sector 39A, Chandigarh, India. http://www.imtech.res.in/raghava/
| |
Collapse
|
232
|
Liu Z, Xiao X, Qiu WR, Chou KC. iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal Biochem 2015; 474:69-77. [PMID: 25596338 DOI: 10.1016/j.ab.2014.12.009] [Citation(s) in RCA: 212] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2014] [Revised: 12/05/2014] [Accepted: 12/08/2014] [Indexed: 12/11/2022]
Abstract
Predominantly occurring on cytosine, DNA methylation is a process by which cells can modify their DNAs to change the expression of gene products. It plays very important roles in life development but also in forming nearly all types of cancer. Therefore, knowledge of DNA methylation sites is significant for both basic research and drug development. Given an uncharacterized DNA sequence containing many cytosine residues, which one can be methylated and which one cannot? With the avalanche of DNA sequences generated during the postgenomic age, it is highly desired to develop computational methods for accurately identifying the methylation sites in DNA. Using the trinucleotide composition, pseudo amino acid components, and a dataset-optimizing technique, we have developed a new predictor called "iDNA-Methyl" that has achieved remarkably higher success rates in identifying the DNA methylation sites than the existing predictors. A user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/iDNA-Methyl, where users can easily get their desired results. We anticipate that the web-server predictor will become a very useful high-throughput tool for basic research and drug development and that the novel approach and technique can also be used to investigate many other DNA-related problems and genome analysis.
Collapse
Affiliation(s)
- Zi Liu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China.
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
233
|
Xiao X, Min JL, Lin WZ, Liu Z, Cheng X, Chou KC. iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach. J Biomol Struct Dyn 2015; 33:2221-33. [DOI: 10.1080/07391102.2014.998710] [Citation(s) in RCA: 146] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333046, China
- Information School, ZheJiang Textile & Fashion College , NingBo 315211, China
- Gordon Life Science Institute , 53 South Cottage Road, Boston 02478, MA, USA
| | - Jian-Liang Min
- Computer Department, Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333046, China
| | - Wei-Zhong Lin
- Computer Department, Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333046, China
| | - Zi Liu
- Computer Department, Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333046, China
| | - Xiang Cheng
- Computer Department, Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333046, China
| | - Kuo-Chen Chou
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University , JeddaH 21589, Saudi Arabia
- Gordon Life Science Institute , 53 South Cottage Road, Boston 02478, MA, USA
| |
Collapse
|
234
|
Ganjtabesh M, Montaseri S, Zare-Mirakabad F. Using temperature effects to predict the interactions between two RNAs. J Theor Biol 2015; 364:98-102. [PMID: 25218429 DOI: 10.1016/j.jtbi.2014.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Revised: 08/30/2014] [Accepted: 09/02/2014] [Indexed: 10/24/2022]
Abstract
MOTIVATION Interaction of two RNA molecules is considered as an important factor that regulates gene expression post-transcriptional process. Most of the ncRNAs prevent the translation of their target mRNA(s) by forming stable bindings with them. Although several computational methods have been proposed to predict the interactions between two RNAs, none of them can produce reliable and accurate results. RESULTS In this paper, a new approach entitled tempRNAs is presented to accurately predict interaction structure between two RNAs based on a gradual temperature decrease. For each specified temperature, our algorithm contains two main steps. First, the secondary structure of each RNA is determined with respect to the previous base pairs as constraints. Second, two RNAs are concatenated and then the interaction between them is calculated according to the previous base pairs. The secondary structures are determined based on minimum free energy model. The proposed algorithm is evaluated for a set of known interacting RNA pairs. The results show the higher accuracy of the proposed method in comparison to the other state-of-the-art algorithms, namely inRNAs and RactIP.
Collapse
Affiliation(s)
- Mohammad Ganjtabesh
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran.
| | - Soheila Montaseri
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran.
| | - Fatemeh Zare-Mirakabad
- Department of Computer Science, Faculty of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran.
| |
Collapse
|
235
|
Kumar R, Srivastava A, Kumari B, Kumar M. Prediction of β-lactamase and its class by Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 2015; 365:96-103. [DOI: 10.1016/j.jtbi.2014.10.008] [Citation(s) in RCA: 125] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2014] [Revised: 10/01/2014] [Accepted: 10/06/2014] [Indexed: 01/01/2023]
|
236
|
Chen W, Lin H, Chou KC. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. MOLECULAR BIOSYSTEMS 2015; 11:2620-34. [DOI: 10.1039/c5mb00155b] [Citation(s) in RCA: 262] [Impact Index Per Article: 29.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
With the avalanche of DNA/RNA sequences generated in the post-genomic age, it is urgent to develop automated methods for analyzing the relationship between the sequences and their functions.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| | - Hao Lin
- Gordon Life Science Institute
- Boston
- USA
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
| | - Kuo-Chen Chou
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| |
Collapse
|
237
|
Ruiz-Blanco YB, Marrero-Ponce Y, Prieto PJ, Salgado J, García Y, Sotomayor-Torres CM. A Hooke׳s law-based approach to protein folding rate. J Theor Biol 2015; 364:407-17. [DOI: 10.1016/j.jtbi.2014.09.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Revised: 08/28/2014] [Accepted: 09/02/2014] [Indexed: 10/24/2022]
|
238
|
Liu B, Fang L, Chen J, Liu F, Wang X. miRNA-dis: microRNA precursor identification based on distance structure status pairs. MOLECULAR BIOSYSTEMS 2015; 11:1194-204. [DOI: 10.1039/c5mb00050e] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
MicroRNA precursor identification is an important task in bioinformatics.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology
- Harbin Institute of Technology Shenzhen Graduate School
- HIT Campus Shenzhen University Town
- Shenzhen
- China
| | - Longyun Fang
- School of Computer Science and Technology
- Harbin Institute of Technology Shenzhen Graduate School
- HIT Campus Shenzhen University Town
- Shenzhen
- China
| | - Junjie Chen
- School of Computer Science and Technology
- Harbin Institute of Technology Shenzhen Graduate School
- HIT Campus Shenzhen University Town
- Shenzhen
- China
| | - Fule Liu
- School of Computer Science and Technology
- Harbin Institute of Technology Shenzhen Graduate School
- HIT Campus Shenzhen University Town
- Shenzhen
- China
| | - Xiaolong Wang
- School of Computer Science and Technology
- Harbin Institute of Technology Shenzhen Graduate School
- HIT Campus Shenzhen University Town
- Shenzhen
- China
| |
Collapse
|
239
|
Bag S, Ramaiah S, Anbarasu A. fabp4 is central to eight obesity associated genes: A functional gene network-based polymorphic study. J Theor Biol 2015; 364:344-54. [DOI: 10.1016/j.jtbi.2014.09.034] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Revised: 08/27/2014] [Accepted: 09/23/2014] [Indexed: 01/04/2023]
|
240
|
Zhu PP, Li WC, Zhong ZJ, Deng EZ, Ding H, Chen W, Lin H. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. MOLECULAR BIOSYSTEMS 2015; 11:558-63. [DOI: 10.1039/c4mb00645c] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Mycobacterium tuberculosis is a bacterium that causes tuberculosis, one of the most prevalent infectious diseases.
Collapse
Affiliation(s)
- Pan-Pan Zhu
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| | - Wen-Chao Li
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| | - Zhe-Jin Zhong
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| | - En-Ze Deng
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| | - Wei Chen
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| |
Collapse
|
241
|
Li L, Yu S, Xiao W, Li Y, Hu W, Huang L, Zheng X, Zhou S, Yang H. Protein submitochondrial localization from integrated sequence representation and SVM-based backward feature extraction. MOLECULAR BIOSYSTEMS 2015; 11:170-7. [PMID: 25335193 DOI: 10.1039/c4mb00340c] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Mitochondrion, a tiny energy factory, plays an important role in various biological processes of most eukaryotic cells.
Collapse
Affiliation(s)
- Liqi Li
- Department of General Surgery
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Sanjiu Yu
- Institute of Cardiovascular Diseases of PLA
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Weidong Xiao
- Department of General Surgery
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Yongsheng Li
- Institute of Cancer
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Wenjuan Hu
- Department of Pathophysiology and High Altitude Pathology
- College of High Altitude Military Medicine
- Third Military Medical University
- Chongqing 400038
- China
| | - Lan Huang
- Institute of Cardiovascular Diseases of PLA
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Xiaoqi Zheng
- Department of Mathematics
- Shanghai Normal University
- Shanghai 200234
- China
| | - Shiwen Zhou
- National Drug Clinical Trial Institution
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Hua Yang
- Department of General Surgery
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| |
Collapse
|
242
|
Khan ZU, Hayat M, Khan MA. Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J Theor Biol 2015; 365:197-203. [DOI: 10.1016/j.jtbi.2014.10.014] [Citation(s) in RCA: 110] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Revised: 09/09/2014] [Accepted: 10/11/2014] [Indexed: 12/11/2022]
|
243
|
Liu B, Liu F, Fang L, Wang X, Chou KC. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. ACTA ACUST UNITED AC 2014; 31:1307-9. [PMID: 25504848 DOI: 10.1093/bioinformatics/btu820] [Citation(s) in RCA: 203] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 12/05/2014] [Indexed: 12/29/2022]
Abstract
UNLABELLED In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA. AVAILABILITY AND IMPLEMENTATION The repDNA Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repDNA/. CONTACT bliu@insun.hit.edu.cn or kcchou@gordonlifescience.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Fule Liu
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Longyun Fang
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Xiaolong Wang
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Kuo-Chen Chou
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| |
Collapse
|
244
|
Chen J, Tang YY, Chen CLP, Fang B, Lin Y, Shang Z. Multi-Label Learning With Fuzzy Hypergraph Regularization for Protein Subcellular Location Prediction. IEEE Trans Nanobioscience 2014; 13:438-47. [DOI: 10.1109/tnb.2014.2341111] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
245
|
Wen J, Zhang Y, Yau SS. k-mer Sparse matrix model for genetic sequence and its applications in sequence comparison. J Theor Biol 2014; 363:145-50. [DOI: 10.1016/j.jtbi.2014.08.028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2014] [Revised: 07/14/2014] [Accepted: 08/17/2014] [Indexed: 10/24/2022]
|
246
|
3D QSAR studies, pharmacophore modeling and virtual screening on a series of steroidal aromatase inhibitors. Int J Mol Sci 2014; 15:20927-47. [PMID: 25405729 PMCID: PMC4264204 DOI: 10.3390/ijms151120927] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2014] [Revised: 09/28/2014] [Accepted: 10/22/2014] [Indexed: 12/12/2022] Open
Abstract
Aromatase inhibitors are the most important targets in treatment of estrogen-dependent cancers. In order to search for potent steroidal aromatase inhibitors (SAIs) with lower side effects and overcome cellular resistance, comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) were performed on a series of SAIs to build 3D QSAR models. The reliable and predictive CoMFA and CoMSIA models were obtained with statistical results (CoMFA: q2 = 0.636, r2ncv = 0.988, r2pred = 0.658; CoMSIA: q2 = 0.843, r2ncv = 0.989, r2pred = 0.601). This 3D QSAR approach provides significant insights that can be used to develop novel and potent SAIs. In addition, Genetic algorithm with linear assignment of hypermolecular alignment of database (GALAHAD) was used to derive 3D pharmacophore models. The selected pharmacophore model contains two acceptor atoms and four hydrophobic centers, which was used as a 3D query for virtual screening against NCI2000 database. Six hit compounds were obtained and their biological activities were further predicted by the CoMFA and CoMSIA models, which are expected to design potent and novel SAIs.
Collapse
|
247
|
Tian F, Zhou P, Kang W, Luo L, Fan X, Yan J, Liang H. The small-molecule inhibitor selectivity between IKKα and IKKβ kinases in NF-κB signaling pathway. J Recept Signal Transduct Res 2014; 35:307-18. [PMID: 25386663 DOI: 10.3109/10799893.2014.980950] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The enzyme complex IκB kinase (IKK) is an essential activator of NF-κB signaling pathway involved in propagating the cellular response to inflammation. The complex contains two functional subunits IKKα and IKKβ, which are structurally conserved kinases and selective inhibition of them would result in distinct biological effects. However, most existing IKK inhibitors show moderate or high promiscuity for the two homologous kinases. Understanding of the molecular mechanism and biological implication underlying the specific interactions in IKK-ligand recognition is thus fundamentally important for the rational design of selective IKK inhibitors. In the current work, we integrated molecular docking, quantum mechanics/molecular mechanics calculation and Poisson-Boltzmann/surface area analysis to investigate the structural basis and energetic property of the selective binding of small-molecule ligands to IKKα and IKKβ. It was found that the selectivity is primarily determined by the size and topology difference in ATP-binding pocket of IKKα and IKKβ kinase domains; bulky inhibitor molecules commonly have, respectively, low and appropriate affinities towards IKKα and IKKβ, and thus exhibit relatively high selectivity for IKKβ over IKKα, whereas small ligands can only bind weakly to both the two kinases with low selectivity. In addition, the conformation, arrangement and distribution of residues in IKK pockets are also responsible for constituting the exquisite specificity of ligand binding to KKα and IKKβ. Next, a novel quantitative structure-selectivity relationship model was developed to characterize the relative contribution of each kinase residue to inhibitor selectivity and to predict the selectivity and specificity for a number of known IKK inhibitors. Results showed that the active-site residues contribute significantly to the selectivity by directly interacting with inhibitor ligands, while those protein portions far away from the kinase active sites may also play an important role in determining the selectivity through long-range non-bonded forces and indirect allosteric effect.
Collapse
Affiliation(s)
- Feifei Tian
- a State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital , Third Military Medical University , Chongqing , China .,b School of Life Science and Engineering , Southwest Jiaotong University , Chengdu , China , and
| | - Peng Zhou
- c Center of Bioinformatics (COBI), School of Life Science and Technology , University of Electronic Science and Technology of China (UESTC) , Chengdu , China
| | - Wenyuan Kang
- b School of Life Science and Engineering , Southwest Jiaotong University , Chengdu , China , and
| | - Li Luo
- a State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital , Third Military Medical University , Chongqing , China
| | - Xia Fan
- a State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital , Third Military Medical University , Chongqing , China
| | - Jun Yan
- a State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital , Third Military Medical University , Chongqing , China
| | - Huaping Liang
- a State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital , Third Military Medical University , Chongqing , China
| |
Collapse
|
248
|
Ding H, Li D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids 2014; 47:329-33. [DOI: 10.1007/s00726-014-1862-4] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 10/27/2014] [Indexed: 10/24/2022]
|
249
|
Qiu WR, Xiao X, Lin WZ, Chou KC. iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J Biomol Struct Dyn 2014; 33:1731-42. [PMID: 25248923 DOI: 10.1080/07391102.2014.968875] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
As one of the most important posttranslational modifications (PTMs), ubiquitination plays an important role in regulating varieties of biological processes, such as signal transduction, cell division, apoptosis, and immune response. Ubiquitination is also named "lysine ubiquitination" because it occurs when an ubiquitin is covalently attached to lysine (K) residues of targeting proteins. Given an uncharacterized protein sequence that contains many lysine residues, which one of them is the ubiquitination site, and which one is of non-ubiquitination site? With the avalanche of protein sequences generated in the postgenomic age, it is highly desired for both basic research and drug development to develop an automated method for rapidly and accurately annotating the ubiquitination sites in proteins. In view of this, a new predictor called "iUbiq-Lys" was developed based on the evolutionary information, gray system model, as well as the general form of pseudo-amino acid composition. It was demonstrated via the rigorous cross-validations that the new predictor remarkably outperformed all its counterparts. As a web-server, iUbiq-Lys is accessible to the public at http://www.jci-bioinfo.cn/iUbiq-Lys . For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- a Computer Department, Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | | | | | | |
Collapse
|
250
|
Liu L, Zhang SW, Zhang YC, Liu H, Zhang L, Chen R, Huang Y, Meng J. Decomposition of RNA methylome reveals co-methylation patterns induced by latent enzymatic regulators of the epitranscriptome. MOLECULAR BIOSYSTEMS 2014; 11:262-74. [PMID: 25370990 DOI: 10.1039/c4mb00604f] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Biochemical modifications to mRNA, especially N6-methyladenosine (m6A) and 5-methylcytosine (m5C), have been recently shown to be associated with crucial biological functions. Despite the intriguing advancements, little is known so far about the dynamic landscape of RNA methylome across different cell types and how the epitranscriptome is regulated at the system level by enzymes, i.e., RNA methyltransferases and demethylases. To investigate this issue, a meta-analysis of m6A MeRIP-Seq datasets collected from 10 different experimental conditions (cell type/tissue or treatment) is performed, and the combinatorial epitranscriptome, which consists of 42 758 m6A sites, is extracted and divided into 3 clusters, in which the methylation sites are likely to be hyper- or hypo-methylated simultaneously (or co-methylated), indicating the sharing of a common methylation regulator. Four different clustering approaches are used, including K-means, hierarchical clustering (HC), Bayesian factor regression model (BFRM) and nonnegative matrix factorization (NMF) to unveil the co-methylation patterns. To validate whether the patterns are corresponding to enzymatic regulators, i.e., RNA methyltransferases or demethylases, the target sites of a known m6A regulator, fat mass and obesity-associated protein (FTO), are identified from an independent mouse MeRIP-Seq dataset and lifted to human. Our study shows that 3 out of the 4 clustering approaches used can successfully identify a group of methylation sites overlapping with FTO target sites at a significance level of 0.05 (after multiple hypothesis adjustment), among which, the result of NMF is the most significant (p-value 2.81×10(-06)). We defined a new approach evaluating the consistency between two clustering results which shows that clustering results of different methods are highly correlated strongly indicating the existence of co-methylation patterns. Consistent with recent studies, a number of cancer and neuronal disease-related bimolecular functions are enriched in the identified clusters, which are biological functions that can be regulated at the epitranscriptional level, indicating the pharmaceutical prospect of RNA N6-methyladenosine-related studies. This result successfully reveals the linkage between the global RNA co-methylation patterns embedded in the epitranscriptomic data under multiple experimental conditions and the latent enzymatic regulators, suggesting a promising direction towards a more comprehensive understanding of the epitranscriptome.
Collapse
Affiliation(s)
- Lian Liu
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, Shaanxi 710027, China.
| | | | | | | | | | | | | | | |
Collapse
|