1
|
Zheng J, Xiao X, Qiu WR. iCDI-W2vCom: Identifying the Ion Channel-Drug Interaction in Cellular Networking Based on word2vec and node2vec. Front Genet 2021; 12:738274. [PMID: 34567088 PMCID: PMC8458815 DOI: 10.3389/fgene.2021.738274] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 08/02/2021] [Indexed: 12/04/2022] Open
Abstract
Ion channels are the second largest drug target family. Ion channel dysfunction may lead to a number of diseases such as Alzheimer’s disease, epilepsy, cephalagra, and type II diabetes. In the research work for predicting ion channel–drug, computational approaches are effective and efficient compared with the costly, labor-intensive, and time-consuming experimental methods. Most of the existing methods can only be used to deal with the ion channels of knowing 3D structures; however, the 3D structures of most ion channels are still unknown. Many predictors based on protein sequence were developed to address the challenge, while most of their results need to be improved, or predicting web servers are missing. In this paper, a sequence-based classifier, called “iCDI-W2vCom,” was developed to identify the interactions between ion channels and drugs. In the predictor, the drug compound was formulated by SMILES-word2vec, FP2-word2vec, SMILES-node2vec, and ECFPs via a 1184D vector, ion channel was represented by the word2vec via a 64D vector, and the prediction engine was operated by the LightGBM classifier. The accuracy and AUC achieved by iCDI-W2vCom via the fivefold cross validation were 91.95% and 0.9703, which outperformed other existing predictors in this area. A user-friendly web server for iCDI-W2vCom was established at http://www.jci-bioinfo.cn/icdiw2v. The proposed method may also be a potential method for predicting target–drug interaction.
Collapse
Affiliation(s)
- Jie Zheng
- Department of Computer Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Department of Computer Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Wang-Ren Qiu
- Department of Computer Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| |
Collapse
|
2
|
Recent Advances in the Prediction of Protein Structural Classes: Feature Descriptors and Machine Learning Algorithms. CRYSTALS 2021. [DOI: 10.3390/cryst11040324] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
In the postgenomic age, rapid growth in the number of sequence-known proteins has been accompanied by much slower growth in the number of structure-known proteins (as a result of experimental limitations), and a widening gap between the two is evident. Because protein function is linked to protein structure, successful prediction of protein structure is of significant importance in protein function identification. Foreknowledge of protein structural class can help improve protein structure prediction with significant medical and pharmaceutical implications. Thus, a fast, suitable, reliable, and reasonable computational method for protein structural class prediction has become pivotal in bioinformatics. Here, we review recent efforts in protein structural class prediction from protein sequence, with particular attention paid to new feature descriptors, which extract information from protein sequence, and the use of machine learning algorithms in both feature selection and the construction of new classification models. These new feature descriptors include amino acid composition, sequence order, physicochemical properties, multiprofile Bayes, and secondary structure-based features. Machine learning methods, such as artificial neural networks (ANNs), support vector machine (SVM), K-nearest neighbor (KNN), random forest, deep learning, and examples of their application are discussed in detail. We also present our view on possible future directions, challenges, and opportunities for the applications of machine learning algorithms for prediction of protein structural classes.
Collapse
|
3
|
iDRP-PseAAC: Identification of DNA Replication Proteins Using General PseAAC and Position Dependent Features. Int J Pept Res Ther 2021; 27:1315-1329. [PMID: 33584161 PMCID: PMC7869428 DOI: 10.1007/s10989-021-10170-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/18/2021] [Indexed: 10/25/2022]
Abstract
DNA replication is one of the specific processes to be considered in all the living organisms, specifically eukaryotes. The prevalence of DNA replication is significant for an evolutionary transition at the beginning of life. DNA replication proteins are those proteins which support the process of replication and are also reported to be important in drug design and discovery. This information depicts that DNA replication proteins have a very important role in human bodies, however, to study their mechanism, their identification is necessary. Thus, it is a very important task but, in any case, an experimental identification is time-consuming, highly-costly and laborious. To cope with this issue, a computational methodology is required for prediction of these proteins, however, no prior method exists. This study comprehends the construction of novel prediction model to serve the proposed purpose. The prediction model is developed based on the artificial neural network by integrating the position relative features and sequence statistical moments in PseAAC for training neural networks. Highest overall accuracy has been achieved through tenfold cross-validation and Jackknife testing that was computed to be 96.22% and 98.56%, respectively. Our astonishing experimental results demonstrated that the proposed predictor surpass the existing models that can be served as a time and cost-effective stratagem for designing novel drugs to strike the contemporary bacterial infection.
Collapse
|
4
|
Abstract
During the last three decades or so, many efforts have been made to study the protein cleavage
sites by some disease-causing enzyme, such as HIV (Human Immunodeficiency Virus) protease
and SARS (Severe Acute Respiratory Syndrome) coronavirus main proteinase. It has become increasingly
clear <i>via</i> this mini-review that the motivation driving the aforementioned studies is quite wise,
and that the results acquired through these studies are very rewarding, particularly for developing peptide
drugs.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
5
|
Chou KC. An Insightful 10-year Recollection Since the Emergence of the 5-steps Rule. Curr Pharm Des 2020; 25:4223-4234. [PMID: 31782354 DOI: 10.2174/1381612825666191129164042] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/25/2019] [Indexed: 11/22/2022]
Abstract
OBJECTIVE One of the most challenging and also the most difficult problems is how to formulate a biological sequence with a vector but considerably keep its sequence order information. METHODS To address such a problem, the approach of Pseudo Amino Acid Components or PseAAC has been developed. RESULTS AND CONCLUSION It has become increasingly clear via the 10-year recollection that the aforementioned proposal has been indeed very powerful.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts 02478, United States.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
6
|
Chou KC. Impacts of Pseudo Amino Acid Components and 5-steps Rule to Proteomics and Proteome Analysis. Curr Top Med Chem 2019; 19:2283-2300. [DOI: 10.2174/1568026619666191018100141] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 08/18/2019] [Accepted: 08/26/2019] [Indexed: 01/27/2023]
Abstract
Stimulated by the 5-steps rule during the last decade or so, computational proteomics has achieved remarkable progresses in the following three areas: (1) protein structural class prediction; (2) protein subcellular location prediction; (3) post-translational modification (PTM) site prediction. The results obtained by these predictions are very useful not only for an in-depth study of the functions of proteins and their biological processes in a cell, but also for developing novel drugs against major diseases such as cancers, Alzheimer’s, and Parkinson’s. Moreover, since the targets to be predicted may have the multi-label feature, two sets of metrics are introduced: one is for inspecting the global prediction quality, while the other for the local prediction quality. All the predictors covered in this review have a userfriendly web-server, through which the majority of experimental scientists can easily obtain their desired data without the need to go through the complicated mathematics.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
7
|
Chou KC. Proposing Pseudo Amino Acid Components is an Important Milestone for Proteome and Genome Analyses. Int J Pept Res Ther 2019. [DOI: 10.1007/s10989-019-09910-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
8
|
Terán JE, Marrero-Ponce Y, Contreras-Torres E, García-Jacas CR, Vivas-Reyes R, Terán E, Torres FJ. Tensor Algebra-based Geometrical (3D) Biomacro-Molecular Descriptors for Protein Research: Theory, Applications and Comparison with other Methods. Sci Rep 2019; 9:11391. [PMID: 31388082 PMCID: PMC6684663 DOI: 10.1038/s41598-019-47858-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 07/22/2019] [Indexed: 11/16/2022] Open
Abstract
In this report, a new type of tridimensional (3D) biomacro-molecular descriptors for proteins are proposed. These descriptors make use of multi-linear algebra concepts based on the application of 3-linear forms (i.e., Canonical Trilinear (Tr), Trilinear Cubic (TrC), Trilinear-Quadratic-Bilinear (TrQB) and so on) as a specific case of the N-linear algebraic forms. The definition of the kth 3-tuple similarity-dissimilarity spatial matrices (Tensor’s Form) are used for the transformation and for the representation of the existing chemical information available in the relationships between three amino acids of a protein. Several metrics (Minkowski-type, wave-edge, etc) and multi-metrics (Triangle area, Bond-angle, etc) are proposed for the interaction information extraction, as well as probabilistic transformations (e.g., simple stochastic and mutual probability) to achieve matrix normalization. A generalized procedure considering amino acid level-based indices that can be fused together by using aggregator operators for descriptors calculations is proposed. The obtained results demonstrated that the new proposed 3D biomacro-molecular indices perform better than other approaches in the SCOP-based discrimination and the prediction of folding rate of proteins by using simple linear parametrical models. It can be concluded that the proposed method allows the definition of 3D biomacro-molecular descriptors that contain orthogonal information capable of providing better models for applications in protein science.
Collapse
Affiliation(s)
- Julio E Terán
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador.,Universidad San Francisco de Quito (USFQ), Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, and Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador. .,Universidad de San Buenaventura - Cartagena - Facultad de Ciencias de la Salud - Grupo de Investigación Microbiología & Ambiente (GIMA) - Calle Real de Ternera, Diagonal 32, No. 30-966, Cartagena, Código postal: 1300 10, Colombia.
| | - Ernesto Contreras-Torres
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador
| | - César R García-Jacas
- Cátedras CONACYT - Departamento de Ciencia de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| | - Ricardo Vivas-Reyes
- Grupo de Química Cuántica y Teórica de la Universidad de Cartagena-Facultad de Ciencias Exactas y Naturales. Programa de Química. Campus de San Pablo and Grupo GINUMED Corporacion Universitaria Rafal Nuñez. Facultad de Salud. Programa de Medicina., Cartagena, Colombia.,Grupo CipTec, Facultad de Ingenierias. Fundacion Universitaria Tecnologico Comfenalco - Cartagena, Cartagena, Bolívar, Colombia
| | - Enrique Terán
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador
| | - F Javier Torres
- Universidad San Francisco de Quito (USFQ), Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, and Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
| |
Collapse
|
9
|
Contreras-Torres E. Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou's PseAAC. J Theor Biol 2018; 454:139-145. [DOI: 10.1016/j.jtbi.2018.05.033] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Revised: 05/23/2018] [Accepted: 05/28/2018] [Indexed: 11/24/2022]
|
10
|
Nojoomi S, Koehl P. String kernels for protein sequence comparisons: improved fold recognition. BMC Bioinformatics 2017; 18:137. [PMID: 28245816 PMCID: PMC5331664 DOI: 10.1186/s12859-017-1560-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 02/23/2017] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND The amino acid sequence of a protein is the blueprint from which its structure and ultimately function can be derived. Therefore, sequence comparison methods remain essential for the determination of similarity between proteins. Traditional approaches for comparing two protein sequences begin with strings of letters (amino acids) that represent the sequences, before generating textual alignments between these strings and providing scores for each alignment. When the similitude between the two protein sequences to be compared is low however, the quality of the corresponding sequence alignment is usually poor, leading to poor performance for the recognition of similarity. RESULTS In this study, we develop an alignment free alternative to these methods that is based on the concept of string kernels. Starting from recently proposed kernels on the discrete space of protein sequences (Shen et al, Found. Comput. Math., 2013,14:951-984), we introduce our own version, SeqKernel. Its implementation depends on two parameters, a coefficient that tunes the substitution matrix and the maximum length of k-mers that it includes. We provide an exhaustive analysis of the impacts of these two parameters on the performance of SeqKernel for fold recognition. We show that with the right choice of parameters, use of the SeqKernel similarity measure improves fold recognition compared to the use of traditional alignment-based methods. We illustrate the application of SeqKernel to inferring phylogeny on RNA polymerases and show that it performs as well as methods based on multiple sequence alignments. CONCLUSION We have presented and characterized a new alignment free method based on a mathematical kernel for scoring the similarity of protein sequences. We discuss possible improvements of this method, as well as an extension of its applications to other modeling methods that rely on sequence comparison.
Collapse
Affiliation(s)
- Saghi Nojoomi
- Biotechnology program, University of California, Davis, 1, Shields Avenue, Davis, CA, 95616 USA
| | - Patrice Koehl
- Department of Computer Science and Genome Center, 1, Shields Avenue, Davis, CA, 95616 USA
| |
Collapse
|
11
|
Liu B, Wu H, Chou KC. Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.94007] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
12
|
Marrero-Ponce Y, Contreras-Torres E, García-Jacas CR, Barigye SJ, Cubillán N, Alvarado YJ. Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes. J Theor Biol 2015; 374:125-37. [DOI: 10.1016/j.jtbi.2015.03.026] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Revised: 02/23/2015] [Accepted: 03/20/2015] [Indexed: 12/11/2022]
|
13
|
Chen W, Lin H, Chou KC. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. MOLECULAR BIOSYSTEMS 2015; 11:2620-34. [DOI: 10.1039/c5mb00155b] [Citation(s) in RCA: 262] [Impact Index Per Article: 29.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
With the avalanche of DNA/RNA sequences generated in the post-genomic age, it is urgent to develop automated methods for analyzing the relationship between the sequences and their functions.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| | - Hao Lin
- Gordon Life Science Institute
- Boston
- USA
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
| | - Kuo-Chen Chou
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| |
Collapse
|
14
|
Qin SW, Li Z, Jin Y, Zhang SP. Shape similarity comparison of protein CPK models based on improved L₁-medial skeleton. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2014; 25:747-759. [PMID: 25079211 DOI: 10.1080/1062936x.2014.942696] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We propose a new method to analyse the similarity of protein CPK models. In the proposed method we first construct the skeleton of protein models by an improved L1-medial skeleton extraction. The skeleton information is then used to form a local radius descriptor. Finally, the shape similarity of protein models is compared by using the local radius descriptor based on the absolute degree of grey incidence. Experimental results show that the improved L1-medial skeleton of protein models can describe the shapes of the protein models well. The local descriptor based on the skeleton combined with the absolute degree of grey incidence shows satisfactory performance for comparing the shape similarity of protein CPK models.
Collapse
Affiliation(s)
- S W Qin
- a College of Science , Zhejiang Sci-Tech University , Hangzhou , China
| | | | | | | |
Collapse
|
15
|
Ding S, Yan S, Qi S, Li Y, Yao Y. A protein structural classes prediction method based on PSI-BLAST profile. J Theor Biol 2014; 353:19-23. [DOI: 10.1016/j.jtbi.2014.02.034] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 01/27/2014] [Accepted: 02/24/2014] [Indexed: 11/27/2022]
|
16
|
Nanni L, Lumini A, Brahnam S. An empirical study of different approaches for protein classification. ScientificWorldJournal 2014; 2014:236717. [PMID: 25028675 PMCID: PMC4084589 DOI: 10.1155/2014/236717] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 05/05/2014] [Accepted: 05/07/2014] [Indexed: 01/05/2023] Open
Abstract
Many domains would benefit from reliable and efficient systems for automatic protein classification. An area of particular interest in recent studies on automatic protein classification is the exploration of new methods for extracting features from a protein that work well for specific problems. These methods, however, are not generalizable and have proven useful in only a few domains. Our goal is to evaluate several feature extraction approaches for representing proteins by testing them across multiple datasets. Different types of protein representations are evaluated: those starting from the position specific scoring matrix of the proteins (PSSM), those derived from the amino-acid sequence, two matrix representations, and features taken from the 3D tertiary structure of the protein. We also test new variants of proteins descriptors. We develop our system experimentally by comparing and combining different descriptors taken from the protein representations. Each descriptor is used to train a separate support vector machine (SVM), and the results are combined by sum rule. Some stand-alone descriptors work well on some datasets but not on others. Through fusion, the different descriptors provide a performance that works well across all tested datasets, in some cases performing better than the state-of-the-art.
Collapse
Affiliation(s)
- Loris Nanni
- Dipartimento di Ingegneria dell'Informazione, Via Gradenigo 6/A, 35131 Padova, Italy
| | | | - Sheryl Brahnam
- Computer Information Systems, Missouri State University, 901 South National, Springfield, MO 65804, USA
| |
Collapse
|
17
|
Wang J, Li Y, Liu X, Dai Q, Yao Y, He P. High-accuracy prediction of protein structural classes using PseAA structural properties and secondary structural patterns. Biochimie 2014; 101:104-12. [PMID: 24412731 DOI: 10.1016/j.biochi.2013.12.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 12/30/2013] [Indexed: 10/25/2022]
Abstract
Since introduction of PseAAs and functional domains, promising results have been achieved in protein structural class predication, but some challenges still exist in the representation of the PseAA structural correlation and structural domains. This paper proposed a high-accuracy prediction method using novel PseAA structural properties and secondary structural patterns, reflecting the long-range and local structural properties of the PseAAs and certain compact structural domains. The proposed prediction method was tested against the competing prediction methods with four experiments. The experiment results indicate that the proposed method achieved the best performance. Its overall accuracies for datasets 25 PDB, D640, FC699 and 1189 are 88.8%, 90.9%, 96.4% and 87.4%, which are 4.5%, 7.6%, 2% and 3.9% higher than the existing best-performing method. This understanding can be used to guide development of more powerful methods for protein structural class prediction. The software and supplement material are freely available at http://bioinfo.zstu.edu.cn/PseAA-SSP.
Collapse
Affiliation(s)
- Junru Wang
- College of Mechanical Engineering and Automation, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Yan Li
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China.
| | - Yuhua Yao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Pingan He
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| |
Collapse
|
18
|
Xiao X, Min JL, Wang P, Chou KC. iGPCR-drug: a web server for predicting interaction between GPCRs and drugs in cellular networking. PLoS One 2013; 8:e72234. [PMID: 24015221 PMCID: PMC3754978 DOI: 10.1371/journal.pone.0072234] [Citation(s) in RCA: 91] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Accepted: 07/08/2013] [Indexed: 11/19/2022] Open
Abstract
Involved in many diseases such as cancer, diabetes, neurodegenerative, inflammatory and respiratory disorders, G-protein-coupled receptors (GPCRs) are among the most frequent targets of therapeutic drugs. It is time-consuming and expensive to determine whether a drug and a GPCR are to interact with each other in a cellular network purely by means of experimental techniques. Although some computational methods were developed in this regard based on the knowledge of the 3D (dimensional) structure of protein, unfortunately their usage is quite limited because the 3D structures for most GPCRs are still unknown. To overcome the situation, a sequence-based classifier, called "iGPCR-drug", was developed to predict the interactions between GPCRs and drugs in cellular networking. In the predictor, the drug compound is formulated by a 2D (dimensional) fingerprint via a 256D vector, GPCR by the PseAAC (pseudo amino acid composition) generated with the grey model theory, and the prediction engine is operated by the fuzzy K-nearest neighbour algorithm. Moreover, a user-friendly web-server for iGPCR-drug was established at http://www.jci-bioinfo.cn/iGPCR-Drug/. For the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated math equations presented in this paper just for its integrity. The overall success rate achieved by iGPCR-drug via the jackknife test was 85.5%, which is remarkably higher than the rate by the existing peer method developed in 2010 although no web server was ever established for it. It is anticipated that iGPCR-Drug may become a useful high throughput tool for both basic research and drug development, and that the approach presented here can also be extended to study other drug - target interaction networks.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
- Information School, ZheJiang Textile and Fashion College, NingBo, China
- Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| | - Jian-Liang Min
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Pu Wang
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Kuo-Chen Chou
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
- Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| |
Collapse
|
19
|
Xiao X, Min JL, Wang P, Chou KC. iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. J Theor Biol 2013; 337:71-9. [PMID: 23988798 DOI: 10.1016/j.jtbi.2013.08.013] [Citation(s) in RCA: 104] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Revised: 07/26/2013] [Accepted: 08/14/2013] [Indexed: 12/29/2022]
Abstract
Many crucial functions in life, such as heartbeat, sensory transduction and central nervous system response, are controlled by cell signalings via various ion channels. Therefore, ion channels have become an excellent drug target, and study of ion channel-drug interaction networks is an important topic for drug development. However, it is both time-consuming and costly to determine whether a drug and a protein ion channel are interacting with each other in a cellular network by means of experimental techniques. Although some computational methods were developed in this regard based on the knowledge of the 3D (three-dimensional) structure of protein, unfortunately their usage is quite limited because the 3D structures for most protein ion channels are still unknown. With the avalanche of protein sequences generated in the post-genomic age, it is highly desirable to develop the sequence-based computational method to address this problem. To take up the challenge, we developed a new predictor called iCDI-PseFpt, in which the protein ion-channel sample is formulated by the PseAAC (pseudo amino acid composition) generated with the gray model theory, the drug compound by the 2D molecular fingerprint, and the operation engine is the fuzzy K-nearest neighbor algorithm. The overall success rate achieved by iCDI-PseFpt via the jackknife cross-validation was 87.27%, which is remarkably higher than that by any of the existing predictors in this area. As a user-friendly web-server, iCDI-PseFpt is freely accessible to the public at the website http://www.jci-bioinfo.cn/iCDI-PseFpt/. Furthermore, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated math equations presented in the paper just for its integrity. It has not escaped our notice that the current approach can also be used to study other drug-target interaction networks.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China; Information School, Zhe-Jiang Textile & Fashion College, Ning-Bo 315211, China; Gordon Life Science Institute, 53 South Cottage Road, Belmont, MA 02478, United States.
| | - Jian-Liang Min
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China.
| | - Pu Wang
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China.
| | - Kuo-Chen Chou
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia; Gordon Life Science Institute, 53 South Cottage Road, Belmont, MA 02478, United States.
| |
Collapse
|
20
|
An empirical study on the matrix-based protein representations and their combination with sequence-based approaches. Amino Acids 2012; 44:887-901. [PMID: 23108592 DOI: 10.1007/s00726-012-1416-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 10/03/2012] [Indexed: 10/27/2022]
Abstract
Many domains have a stake in the development of reliable systems for automatic protein classification. Of particular interest in recent studies of automatic protein classification is the exploration of new methods for extracting features from a protein that enhance classification for specific problems. These methods have proven very useful in one or two domains, but they have failed to generalize well across several domains (i.e. classification problems). In this paper, we evaluate several feature extraction approaches for representing proteins with the aim of sequence-based protein classification. Several protein representations are evaluated, those starting from: the position specific scoring matrix (PSSM) of the proteins; the amino-acid sequence; a matrix representation of the protein, of dimension (length of the protein) ×20, obtained using the substitution matrices for representing each amino-acid as a vector. A valuable result is that a texture descriptor can be extracted from the PSSM protein representation which improves the performance of standard descriptors based on the PSSM representation. Experimentally, we develop our systems by comparing several protein descriptors on nine different datasets. Each descriptor is used to train a support vector machine (SVM) or an ensemble of SVM. Although different stand-alone descriptors work well on some datasets (but not on others), we have discovered that fusion among classifiers trained using different descriptors obtains a good performance across all the tested datasets. Matlab code/Datasets used in the proposed paper are available at http://www.bias.csr.unibo.it\nanni\PSSM.rar.
Collapse
|
21
|
Cheng X, Xiao X, Wu ZC, Wang P, Lin WZ. Swfoldrate: predicting protein folding rates from amino acid sequence with sliding window method. Proteins 2012; 81:140-8. [PMID: 22933332 DOI: 10.1002/prot.24171] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Revised: 07/20/2012] [Accepted: 08/25/2012] [Indexed: 01/18/2023]
Abstract
Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci-bioinfo.cn/swfrate/input.jsp.
Collapse
Affiliation(s)
- Xiang Cheng
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China
| | | | | | | | | |
Collapse
|
22
|
Xia XY, Ge M, Wang ZX, Pan XM. Accurate prediction of protein structural class. PLoS One 2012; 7:e37653. [PMID: 22723837 PMCID: PMC3378576 DOI: 10.1371/journal.pone.0037653] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2012] [Accepted: 04/12/2012] [Indexed: 11/18/2022] Open
Abstract
Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods.
Collapse
Affiliation(s)
- Xia-Yu Xia
- Ministry of Education, The Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| | - Meng Ge
- Ministry of Education, The Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| | - Zhi-Xin Wang
- Ministry of Education, The Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| | - Xian-Ming Pan
- Ministry of Education, The Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- * E-mail:
| |
Collapse
|
23
|
|
24
|
Ye H, Tang K, Yang L, Cao Z, Li Y. Study of drug function based on similarity of pathway fingerprint. Protein Cell 2012; 3:132-9. [PMID: 22426982 DOI: 10.1007/s13238-012-2011-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Accepted: 01/04/2012] [Indexed: 02/06/2023] Open
Abstract
Drugs sharing similar therapeutic function may not bind to the same group of targets. However, their targets may be involved in similar pathway profiles which are associated with certain pathological process. In this study, pathway fingerprint was introduced to indicate the profile of significant pathways being influenced by the targets of drugs. Then drug-drug network was further constructed based on significant similarity of pathway fingerprints. In this way, the functions of a drug may be hinted by the enriched therapeutic functions of its neighboring drugs. In the test of 911 FDA approved drugs with more than one known target, 471 drugs could be connected into networks. 760 significant associations of drug-therapeutic function were generated, among which around 60% of them were supported by scientific literatures or ATC codes of drug functional classification. Therefore, pathway fingerprints may be useful to further study on the potential function of known drugs, or the unknown function of new drugs.
Collapse
Affiliation(s)
- Hao Ye
- State Key Laboratory of Bioreactor Engineering, East China University of Science & Technology, Shanghai, 200237, China
| | | | | | | | | |
Collapse
|
25
|
Qiu Z, Wang X. Prediction of protein-protein interaction sites using patch-based residue characterization. J Theor Biol 2011; 293:143-50. [PMID: 22037062 DOI: 10.1016/j.jtbi.2011.10.021] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2011] [Revised: 09/13/2011] [Accepted: 10/15/2011] [Indexed: 10/15/2022]
Abstract
Identifying protein-protein interaction sites provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Using a patch-based model for residue characterization, we trained random forest classifiers for residue-based interface prediction, which was followed by a clustering procedure to produce patches for patch-based interface prediction. For residue-based interface prediction, our method achieves a specificity rate of 0.7 and a sensitivity rate of 0.78. For patch-based interface prediction, a success rate of 0.80 is achieved. Based on same datasets, we also compare it with several published methods. The results show that our method is a successful predictor for residue-based and patch-based interface prediction.
Collapse
Affiliation(s)
- Zhijun Qiu
- The State Key Laboratory of Structural Analysis of Industrial Equipment, Dalian University of Technology, 2 Ling-Gong Road, Dalian 116024, China
| | | |
Collapse
|
26
|
Lin WZ, Fang JA, Xiao X, Chou KC. iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 2011; 6:e24756. [PMID: 21935457 PMCID: PMC3174210 DOI: 10.1371/journal.pone.0024756] [Citation(s) in RCA: 194] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2011] [Accepted: 08/16/2011] [Indexed: 11/18/2022] Open
Abstract
DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results.
Collapse
Affiliation(s)
- Wei-Zhong Lin
- Information Science and Technology School, Donghua University, Shanghai, China
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Jian-An Fang
- Information Science and Technology School, Donghua University, Shanghai, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
- Gordon Life Science Institute, San Diego, California, United States of America
- * E-mail:
| | - Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, California, United States of America
| |
Collapse
|
27
|
Jingbo X, Silan Z, Feng S, Huijuan X, Xuehai H, Xiaohui N, Zhi L. Using the concept of pseudo amino acid composition to predict resistance gene against Xanthomonas oryzae pv. oryzae in rice: An approach from chaos games representation. J Theor Biol 2011; 284:16-23. [DOI: 10.1016/j.jtbi.2011.06.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Revised: 06/02/2011] [Accepted: 06/03/2011] [Indexed: 10/18/2022]
|
28
|
Self-similarity analysis of eubacteria genome based on weighted graph. J Theor Biol 2011; 280:10-8. [PMID: 21496459 PMCID: PMC7094106 DOI: 10.1016/j.jtbi.2011.03.033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2010] [Revised: 03/08/2011] [Accepted: 03/26/2011] [Indexed: 11/22/2022]
Abstract
We introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes. The regular treating in similarity comparison about genome is to discover the evolution distance among different genomes. Few people focus their attention on the overall statistical characteristics of each gene compared with other genes in the same genome. In our model, each genome is attributed to a weighted graph, whose topology describes the similarity relationship among genes in the same genome. Based on the related weighted graph theory, we extract some quantified statistical variables from the topology, and give the distribution of some variables derived from the largest social structure in the topology. The 23 eubacteria recently studied by Sorimachi and Okayasu are markedly classified into two different groups by their double logarithmic point-plots describing the similarity relationship among genes of the largest social structure in genome. The results show that the proposed model may provide us with some new sights to understand the structures and evolution patterns determined from the complete genomes.
Collapse
|
29
|
Lee YT. Structure activity relationship analysis of phenolic acid phenethyl esters on oral and human breast cancers: The grey GM(0, N) approach. Comput Biol Med 2011; 41:506-11. [DOI: 10.1016/j.compbiomed.2011.04.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2010] [Revised: 03/31/2011] [Accepted: 04/29/2011] [Indexed: 11/26/2022]
|
30
|
Liu T, Geng X, Zheng X, Li R, Wang J. Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles. Amino Acids 2011; 42:2243-9. [PMID: 21698456 DOI: 10.1007/s00726-011-0964-5] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 06/11/2011] [Indexed: 02/07/2023]
Abstract
Computational prediction of protein structural class based solely on sequence data remains a challenging problem in protein science. Existing methods differ in the protein sequence representation models and prediction engines adopted. In this study, a powerful feature extraction method, which combines position-specific score matrix (PSSM) with auto covariance (AC) transformation, is introduced. Thus, a sample protein is represented by a series of discrete components, which could partially incorporate the long-range sequence order information and evolutionary information reflected from the PSI-BLAST profile. To verify the performance of our method, jackknife cross-validation tests are performed on four widely used benchmark datasets. Comparison of our results with existing methods shows that our method provides the state-of-the-art performance for structural class prediction. A Web server that implements the proposed method is freely available at http://202.194.133.5/xinxi/AAC_PSSM_AC/index.htm.
Collapse
Affiliation(s)
- Taigang Liu
- College of Information Sciences and Engineering, Shandong Agricultural University, Taian, 271018, China
| | | | | | | | | |
Collapse
|
31
|
Optimal atomic-resolution structures of prion AGAAAAGA amyloid fibrils. J Theor Biol 2011; 279:17-28. [DOI: 10.1016/j.jtbi.2011.02.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2010] [Revised: 02/05/2011] [Accepted: 02/16/2011] [Indexed: 11/20/2022]
|
32
|
González-Díaz H, Prado-Prado F, Sobarzo-Sánchez E, Haddad M, Maurel Chevalley S, Valentin A, Quetin-Leclercq J, Dea-Ayuela MA, Teresa Gomez-Muños M, Munteanu CR, José Torres-Labandeira J, García-Mera X, Tapia RA, Ubeira FM. NL MIND-BEST: A web server for ligands and proteins discovery—Theoretic-experimental study of proteins of Giardia lamblia and new compounds active against Plasmodium falciparum. J Theor Biol 2011; 276:229-49. [DOI: 10.1016/j.jtbi.2011.01.010] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Revised: 12/02/2010] [Accepted: 01/10/2011] [Indexed: 10/18/2022]
|
33
|
Feature importance analysis in guide strand identification of microRNAs. Comput Biol Chem 2011; 35:131-6. [PMID: 21704258 DOI: 10.1016/j.compbiolchem.2011.04.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2011] [Revised: 03/22/2011] [Accepted: 04/23/2011] [Indexed: 11/22/2022]
Abstract
MicroRNA (miRNA) is the negative regulator of gene expression, also known as guide strand of transient miRNA:miRNA* duplex. It is critical in maintaining the normal physiological processes such as development, differentiation, and apoptosis in many organisms. With increasing miRNA data, it is desirable to design methods to identify guide strand based on machine learning algorithms. In this study, the random forest models based on local sequence-structure features were proposed to identify miRNA in four species. The accuracies achieved were 86.51% for Homo sapiens, 81.66% for Ornithorhynchus anatinus, 82.33% for Mus musculus and 85.71% for Schmidtea mediterranea, respectively. Furthermore, the important analysis of feature elements was carried out by using the conditional feature importance strategy. The analysis results revealed that most of the significant elements were related to guanine-cytosine (GC) base pair. We believed that our method could be beneficial to annotate the function of miRNA and help the further understanding of the RNA interference mechanism.
Collapse
|
34
|
Mahdavi A, Jahandideh S. Application of density similarities to predict membrane protein types based on pseudo-amino acid composition. J Theor Biol 2011; 276:132-7. [PMID: 21296088 DOI: 10.1016/j.jtbi.2011.01.048] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2010] [Revised: 01/28/2011] [Accepted: 01/30/2011] [Indexed: 11/26/2022]
Abstract
Cell membranes provide integrity of living cells. Although the stability of biological membrane is maintained by the lipid bilayer, membrane proteins perform most of the specific functions such as signal transduction, transmembrane transport, etc. Then it is plausible membrane proteins being attractive drug targets. In this article, based on the concept of using the pseudo-amino acid composition to define a protein, three different density similarities are developed for predicting the membrane protein type. The predicted results showed that the proposed approach can remarkably improve the accuracy, and might become a useful tool for predicting the other attributes of proteins as well.
Collapse
Affiliation(s)
- Abbas Mahdavi
- Department of Statistics, Faculty of Science, Shiraz University, Shiraz, Iran
| | | |
Collapse
|
35
|
Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 2010; 273:236-47. [PMID: 21168420 PMCID: PMC7125570 DOI: 10.1016/j.jtbi.2010.12.024] [Citation(s) in RCA: 956] [Impact Index Per Article: 68.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2010] [Revised: 12/08/2010] [Accepted: 12/13/2010] [Indexed: 11/29/2022]
Abstract
With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased explosively. In contrast, the pace is much slower in determining their biological attributes. As a consequence, the gap between sequence-known proteins and attribute-known proteins has become increasingly large. The unbalanced situation, which has critically limited our ability to timely utilize the newly discovered proteins for basic research and drug development, has called for developing computational methods or high-throughput automated tools for fast and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. Actually, during the last two decades or so, many methods in this regard have been established in hope to bridge such a gap. In the course of developing these methods, the following things were often needed to consider: (1) benchmark dataset construction, (2) protein sample formulation, (3) operating algorithm (or engine), (4) anticipated accuracy, and (5) web-server establishment. In this review, we are to discuss each of the five procedures, with a special focus on the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, 13784 Torrey Del Mar Drive, San Diego, CA 92130, USA.
| |
Collapse
|
36
|
iFC²: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content. Amino Acids 2010; 40:963-73. [PMID: 20730460 DOI: 10.1007/s00726-010-0721-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 08/06/2010] [Indexed: 10/19/2022]
Abstract
Several descriptors of protein structure at the sequence and residue levels have been recently proposed. They are widely adopted in the analysis and prediction of structural and functional characteristics of proteins. Numerous in silico methods have been developed for sequence-based prediction of these descriptors. However, many of them do not have a public web-server and only a few integrate multiple descriptors to improve the predictions. We introduce iFC² (integrated prediction of fold, class, and content) server that is the first to integrate three modern predictors of sequence-level descriptors. They concern fold type (PFRES), structural class (SCEC), and secondary structure content (PSSC-core). The server exploits relations between the three descriptors to implement a cross-evaluation procedure that improves over the predictions of the individual methods. The iFC² annotates fold and class predictions as potentially correct/incorrect. When tested on datasets with low-similarity chains, for the fold prediction iFC² labels 82% of the PFRES predictions as correct and the accuracy of these predictions equals 72%. The accuracy of the remaining 28% of the PFRES predictions equals 38%. Similarly, our server assigns correct labels for over 79% of SCEC predictions, which are shown to be 98% accurate, while the remaining SCEC predictions are only 15% accurate. These results are shown to be competitive when contrasted against recent relevant web-servers. Predictions on CASP8 targets show that the content predicted by iFC² is competitive when compared with the content computed from the tertiary structures predicted by three best-performing methods in CASP8. The iFC² server is available at http://biomine.ece.ualberta.ca/1D/1D.html .
Collapse
|
37
|
A study of entropy/clarity of genetic sequences using metric spaces and fuzzy sets. J Theor Biol 2010; 267:95-105. [PMID: 20708019 DOI: 10.1016/j.jtbi.2010.08.010] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2010] [Revised: 07/22/2010] [Accepted: 08/06/2010] [Indexed: 11/22/2022]
Abstract
The study of genetic sequences is of great importance in biology and medicine. Sequence analysis and taxonomy are two major fields of application of bioinformatics. In the present paper we extend the notion of entropy and clarity to the use of different metrics and apply them in the case of the Fuzzy Polynuclotide Space (FPS). Applications of these notions on selected polynucleotides and complete genomes both in the I(12×k) space, but also using their representation in FPS are presented. Our results show that the values of fuzzy entropy/clarity are indicative of the degree of complexity necessary for the description of the polynucleotides in the FPS, although in the latter case the interpretation is slightly different than in the case of the I(12×k) hypercube. Fuzzy entropy/clarity along with the use of appropriate metrics can contribute to sequence analysis and taxonomy.
Collapse
|
38
|
Yu L, Guo Y, Li Y, Li G, Li M, Luo J, Xiong W, Qin W. SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition. J Theor Biol 2010; 267:1-6. [PMID: 20691704 DOI: 10.1016/j.jtbi.2010.08.001] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2010] [Revised: 07/30/2010] [Accepted: 08/01/2010] [Indexed: 11/17/2022]
Abstract
Protein secretion plays an important role in bacterial lifestyles. Secreted proteins are crucial for bacterial pathogenesis by making bacteria interact with their environments, particularly delivering pathogenic and symbiotic bacteria into their eukaryotic hosts. Therefore, identification of bacterial secreted proteins becomes an important process for the study of various diseases and the corresponding drugs. In this paper, fusing several new features into Chou's pseudo-amino acid composition (PseAAC), two support vector machine (SVM)-based ternary classifiers are developed to predict secreted proteins of Gram-negative and Gram-positive bacteria. For the two types of bacteria, the high accuracy of 94.03% and 94.36% are obtained in distinguishing classically secreted, non-classically secreted and non-secreted proteins by our method. In order to compare the practical ability of our method in identifying bacterial secreted proteins with those of six published methods, proteins in Escherichia coli and Bacillus subtilis are collected to construct the test sets of Gram-negative and Gram-positive bacteria, and the prediction results of our method are comparable to those of existing methods. When performed on two public independent data sets for predicting NCSPs, it also yields satisfactory results for Gram-negative bacterial proteins. The prediction server SecretP can be accessed at http://cic.scu.edu.cn/bioinformatics/secretPV2/index.htm.
Collapse
Affiliation(s)
- Lezheng Yu
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Ji G, Wu X, Shen Y, Huang J, Quinn Li Q. A classification-based prediction model of messenger RNA polyadenylation sites. J Theor Biol 2010; 265:287-96. [DOI: 10.1016/j.jtbi.2010.05.015] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2009] [Revised: 03/21/2010] [Accepted: 05/13/2010] [Indexed: 12/30/2022]
|
40
|
Huang W, Zhang J, Wang Y, Huang D. A simple method to analyze the similarity of biological sequences based on the fuzzy theory. J Theor Biol 2010; 265:323-8. [DOI: 10.1016/j.jtbi.2010.05.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2009] [Revised: 04/01/2010] [Accepted: 05/07/2010] [Indexed: 11/28/2022]
|
41
|
Li Z, Zhou X, Dai Z, Zou X. Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm. BMC Bioinformatics 2010; 11:325. [PMID: 20550715 PMCID: PMC2905366 DOI: 10.1186/1471-2105-11-325] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 06/16/2010] [Indexed: 11/25/2022] Open
Abstract
Background Because a priori knowledge about function of G protein-coupled receptors (GPCRs) can provide useful information to pharmaceutical research, the determination of their function is a quite meaningful topic in protein science. However, with the rapid increase of GPCRs sequences entering into databanks, the gap between the number of known sequence and the number of known function is widening rapidly, and it is both time-consuming and expensive to determine their function based only on experimental techniques. Therefore, it is vitally significant to develop a computational method for quick and accurate classification of GPCRs. Results In this study, a novel three-layer predictor based on support vector machine (SVM) and feature selection is developed for predicting and classifying GPCRs directly from amino acid sequence data. The maximum relevance minimum redundancy (mRMR) is applied to pre-evaluate features with discriminative information while genetic algorithm (GA) is utilized to find the optimized feature subsets. SVM is used for the construction of classification models. The overall accuracy with three-layer predictor at levels of superfamily, family and subfamily are obtained by cross-validation test on two non-redundant dataset. The results are about 0.5% to 16% higher than those of GPCR-CA and GPCRPred. Conclusion The results with high success rates indicate that the proposed predictor is a useful automated tool in predicting GPCRs. GPCR-SVMFS, a corresponding executable program for GPCRs prediction and classification, can be acquired freely on request from the authors.
Collapse
Affiliation(s)
- Zhanchao Li
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, PR China
| | | | | | | |
Collapse
|
42
|
High performance set of PseAAC and sequence based descriptors for protein classification. J Theor Biol 2010; 266:1-10. [PMID: 20558184 DOI: 10.1016/j.jtbi.2010.06.006] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Revised: 05/31/2010] [Accepted: 06/02/2010] [Indexed: 11/21/2022]
Abstract
The study of reliable automatic systems for protein classification is important for several domains, including finding novel drugs and vaccines. The last decade has seen a number of advances in the development of reliable systems for classifying proteins. Of particular interest has been the exploration of new methods for extracting features from a protein that enhance classification for a given problem. Most methods developed to date, however, have been evaluated in only one or two application areas. Methods have not been explored that generalize well across a number of application areas and datasets. The aim of this study is to find a general method, or an ensemble of methods, that works well on different protein classification datasets and problems. Towards this end, we evaluate several feature extraction approaches for representing proteins starting from their amino acid sequence as well as different feature descriptor combinations using an ensemble of classifiers (support vector machines). In our experiments, more than ten different protein descriptors are compared using nine different datasets. We develop our system using a blind testing protocol, where the parameters of the system are optimized using one dataset and then validated using the other datasets (and so on for each dataset). Although different stand-alone classifiers work well on some datasets and not on others, we have discovered that fusion among different methods obtains a good performance across all the tested datasets, especially when using the weighted sum rule. Included in our feature descriptor combinations is the introduction of two new descriptors, one based on wavelets and the other based on amino acid groups. Using our system, both outperform their standard implementations. We also consider as a baseline the simple amino acid composition (AC) and dipeptide composition (2G), since they have been widely used for protein classification. Our proposed method outperforms AC and 2G.
Collapse
|
43
|
Chauhan JS, Mishra NK, Raghava GPS. Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information. BMC Bioinformatics 2010; 11:301. [PMID: 20525281 PMCID: PMC3098072 DOI: 10.1186/1471-2105-11-301] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Accepted: 06/03/2010] [Indexed: 11/17/2022] Open
Abstract
Background Guanosine triphosphate (GTP)-binding proteins play an important role in regulation of G-protein. Thus prediction of GTP interacting residues in a protein is one of the major challenges in the field of the computational biology. In this study, an attempt has been made to develop a computational method for predicting GTP interacting residues in a protein with high accuracy (Acc), precision (Prec) and recall (Rc). Result All the models developed in this study have been trained and tested on a non-redundant (40% similarity) dataset using five-fold cross-validation. Firstly, we have developed neural network based models using single sequence and PSSM profile and achieved maximum Matthews Correlation Coefficient (MCC) 0.24 (Acc 61.30%) and 0.39 (Acc 68.88%) respectively. Secondly, we have developed a support vector machine (SVM) based models using single sequence and PSSM profile and achieved maximum MCC 0.37 (Prec 0.73, Rc 0.57, Acc 67.98%) and 0.55 (Prec 0.80, Rc 0.73, Acc 77.17%) respectively. In this work, we have introduced a new concept of predicting GTP interacting dipeptide (two consecutive GTP interacting residues) and tripeptide (three consecutive GTP interacting residues) for the first time. We have developed SVM based model for predicting GTP interacting dipeptides using PSSM profile and achieved MCC 0.64 with precision 0.87, recall 0.74 and accuracy 81.37%. Similarly, SVM based model have been developed for predicting GTP interacting tripeptides using PSSM profile and achieved MCC 0.70 with precision 0.93, recall 0.73 and accuracy 83.98%. Conclusion These results show that PSSM based method performs better than single sequence based method. The prediction models based on dipeptides or tripeptides are more accurate than the traditional model based on single residue. A web server "GTPBinder" http://www.imtech.res.in/raghava/gtpbinder/ based on above models has been developed for predicting GTP interacting residues in a protein.
Collapse
Affiliation(s)
- Jagat S Chauhan
- Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh, India
| | | | | |
Collapse
|
44
|
Yan S, Wu G. Linking mutated primary structure of adrenoleukodystrophy protein with X-linked adrenoleukodystrophy. Comput Methods Biomech Biomed Engin 2010; 13:403-11. [DOI: 10.1080/10255840903279974] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
45
|
Wang S, Tian F, Qiu Y, Liu X. Bilateral similarity function: a novel and universal method for similarity analysis of biological sequences. J Theor Biol 2010; 265:194-201. [PMID: 20399215 DOI: 10.1016/j.jtbi.2010.04.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Revised: 04/11/2010] [Accepted: 04/12/2010] [Indexed: 11/26/2022]
Abstract
Bilateral similarity function is designed for analyzing the similarities of biological sequences such as DNA, RNA secondary structure or protein in this paper. The defined function can perform comprehensive comparison between sequences remarkably well, both in terms of the Hamming distance of two compared sequences and the corresponding location difference. Compared with the existing methods for similarity analysis, the examination of similarities/dissimilarities illustrates that the proposed method with the computational complexity of O(N) is effective for these three kinds of biological sequences, and bears the universality for them.
Collapse
Affiliation(s)
- Shiyuan Wang
- College of Communication Engineering, Chongqing University, Chongqing 400044, China.
| | | | | | | |
Collapse
|
46
|
Nanni L, Shi JY, Brahnam S, Lumini A. Protein classification using texture descriptors extracted from the protein backbone image. J Theor Biol 2010; 264:1024-32. [PMID: 20307550 DOI: 10.1016/j.jtbi.2010.03.020] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Revised: 01/28/2010] [Accepted: 03/11/2010] [Indexed: 10/19/2022]
Abstract
In this work, we propose a method for protein classification that combines different texture descriptors extracted from the 2-D distance matrix obtained from the 3-D tertiary structure of a given protein. Instead of considering all atoms in the protein, the distance matrix is calculated by considering only those atoms that belong to the protein backbone. The positive results reported in this paper offer further experimental confirmation that the distance matrix contains sufficient information for describing a protein. Moreover, we show that combining features extracted from the primary structure with features extracted from the distance matrix increases the performance of our classification system. We demonstrate this finding by comparing the performance of an ensemble of classifiers that uses the combined features. The classifiers used in our experiments are support vector machines and random subspace of support vector machines. The experimental results, validated using three different datasets (protein fold recognition, DNA-binding proteins recognition, biological processes, and molecular functions recognition) along with different texture feature extraction methods (variants of local binary patterns, Radon feature transform based approaches, and Haralick descriptors) demonstrate the effectiveness of the proposed approach. Particularly interesting are the results in the classification of 27 types of structural properties: our proposed approach achieves significant improvement compared with other reported methods.
Collapse
Affiliation(s)
- Loris Nanni
- DEIS, IEIIT-CNR, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy.
| | | | | | | |
Collapse
|
47
|
He Z, Zhang J, Shi XH, Hu LL, Kong X, Cai YD, Chou KC. Predicting drug-target interaction networks based on functional groups and biological features. PLoS One 2010; 5:e9603. [PMID: 20300175 PMCID: PMC2836373 DOI: 10.1371/journal.pone.0009603] [Citation(s) in RCA: 189] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2009] [Accepted: 02/16/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Study of drug-target interaction networks is an important topic for drug development. It is both time-consuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner. METHODS/PRINCIPAL FINDINGS To realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively. CONCLUSION/SIGNIFICANCE Our results indicate that the network prediction system thus established is quite promising and encouraging.
Collapse
Affiliation(s)
- Zhisong He
- CAS-MPG Partner Institute of Computational Biology, Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China
- Centre for Computational Systems Biology, Fudan University, Shanghai, China
| | - Jian Zhang
- Department of Ophthalmology, Yangpu District Central Hospital, Shanghai, China
| | - Xiao-He Shi
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, China
| | - Le-Le Hu
- Institute of System Biology, Shanghai University, Shanghai, China
| | - Xiangyin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, China
- State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai, China
- * E-mail: (XK); (YDC)
| | - Yu-Dong Cai
- Institute of System Biology, Shanghai University, Shanghai, China
- Gordon Life Science Institute, San Diego, California, United States of America
- * E-mail: (XK); (YDC)
| | - Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, California, United States of America
| |
Collapse
|
48
|
Esmaeili M, Mohabatkar H, Mohsenzadeh S. Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 2010; 263:203-9. [DOI: 10.1016/j.jtbi.2009.11.016] [Citation(s) in RCA: 241] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2009] [Revised: 11/18/2009] [Accepted: 11/20/2009] [Indexed: 01/25/2023]
|
49
|
Mizianty MJ, Kurgan L. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences. BMC Bioinformatics 2009; 10:414. [PMID: 20003388 PMCID: PMC2805645 DOI: 10.1186/1471-2105-10-414] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 12/13/2009] [Indexed: 11/13/2022] Open
Abstract
Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.
Collapse
Affiliation(s)
- Marcin J Mizianty
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada.
| | | |
Collapse
|
50
|
Yan SM, Wu G. Trends in global warming and evolution of matrix protein 2 family from influenza A virus. Interdiscip Sci 2009; 1:272-9. [PMID: 20640805 PMCID: PMC7091293 DOI: 10.1007/s12539-009-0053-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2009] [Revised: 05/22/2009] [Accepted: 05/25/2009] [Indexed: 05/29/2023]
Abstract
The global warming is an important factor affecting the biological evolution, and the influenza is an important disease that threatens humans with possible epidemics or pandemics. In this study, we attempted to analyze the trends in global warming and evolution of matrix protein 2 family from influenza A virus, because this protein is a target of anti-flu drug, and its mutation would have significant effect on the resistance to anti-flu drugs. The evolution of matrix protein 2 of influenza A virus from 1959 to 2008 was defined using the unpredictable portion of amino-acid pair predictability. Then the trend in this evolution was compared with the trend in the global temperature, the temperature in north and south hemispheres, and the temperature in influenza A virus sampling site, and species carrying influenza A virus. The results showed the similar trends in global warming and in evolution of M2 proteins although we could not correlate them at this stage of study. The study suggested the potential impact of global warming on the evolution of proteins from influenza A virus.
Collapse
Affiliation(s)
- Shao-Min Yan
- National Engineering Research Center for Non-food Biorefinery, Guangxi Academy of Sciences, Nanning, Guangxi 530007 P.R. China
| | - Guang Wu
- Computational Mutation Project, DreamSciTech Consulting, Shenzhen, Guangdong, 518054 P.R. China
| |
Collapse
|