Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Raghava GPS, Han JH. Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein. BMC Bioinformatics 2005;6:59. [PMID: 15773999 PMCID: PMC1083413 DOI: 10.1186/1471-2105-6-59] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2004] [Accepted: 03/17/2005] [Indexed: 11/29/2022] Open

For:	Raghava GPS, Han JH. Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein. BMC Bioinformatics 2005;6:59. [PMID: 15773999 PMCID: PMC1083413 DOI: 10.1186/1471-2105-6-59] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2004] [Accepted: 03/17/2005] [Indexed: 11/29/2022] Open

Number

Cited by Other Article(s)

Khandia R, Garg R, Pandey MK, Khan AA, Dhanda SK, Malik A, Gurjar P. Determination of codon pattern and evolutionary forces acting on genes linked to inflammatory bowel disease. Int J Biol Macromol 2024;278:134480. [PMID: 39116987 DOI: 10.1016/j.ijbiomac.2024.134480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/25/2024] [Accepted: 07/31/2024] [Indexed: 08/10/2024]

Malaina I, Gonzalez-Melero L, Martínez L, Salvador A, Sanchez-Diez A, Asumendi A, Margareto J, Carrasco-Pujante J, Legarreta L, García MA, Pérez-Pinilla MB, Izu R, Martínez de la Fuente I, Igartua M, Alonso S, Hernandez RM, Boyano MD. Computational and Experimental Evaluation of the Immune Response of Neoantigens for Personalized Vaccine Design. Int J Mol Sci 2023;24:9024. [PMID: 37240369 PMCID: PMC10219310 DOI: 10.3390/ijms24109024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 05/16/2023] [Accepted: 05/17/2023] [Indexed: 05/28/2023] Open

Affiliation(s)

Iker Malaina Department of Mathematics, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain
Lorena Gonzalez-Melero NanoBioCel Research Group, Laboratory of Pharmaceutics, School of Pharmacy, University of the Basque Country (UPV/EHU), 01006 Vitoria-Gasteiz, Spain (R.M.H.) Bioaraba, NanoBioCel Research Group, 01009 Vitoria-Gasteiz, Spain
Luis Martínez Department of Mathematics, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain Luis Martínez, Basque Center for Applied Mathematics BCAM, 48009 Bilbao, Spain
Aiala Salvador NanoBioCel Research Group, Laboratory of Pharmaceutics, School of Pharmacy, University of the Basque Country (UPV/EHU), 01006 Vitoria-Gasteiz, Spain (R.M.H.) Bioaraba, NanoBioCel Research Group, 01009 Vitoria-Gasteiz, Spain Biomedical Research Networking Centre in Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN). Institute of Health Carlos III, 28029 Madrid, Spain
Ana Sanchez-Diez Department of Dermatology, Basurto University Hospital, 48013 Bilbao, Spain Biocruces Bizkaia Health Research Institute, 48903 Barakaldo, Spain (M.D.B.)
Aintzane Asumendi Biocruces Bizkaia Health Research Institute, 48903 Barakaldo, Spain (M.D.B.) Department of Cell Biology and Histology, Faculty of Medicine and Nursing, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain
Javier Margareto Technological Services Division, Health and Quality of Life, TECNALIA, 01510 Miñano, Spain
Jose Carrasco-Pujante Department of Mathematics, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain Luis Martínez, Basque Center for Applied Mathematics BCAM, 48009 Bilbao, Spain
Leire Legarreta Department of Mathematics, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain Luis Martínez, Basque Center for Applied Mathematics BCAM, 48009 Bilbao, Spain
María Asunción García Department of Mathematics, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain Luis Martínez, Basque Center for Applied Mathematics BCAM, 48009 Bilbao, Spain
Martín Blas Pérez-Pinilla Department of Mathematics, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain Luis Martínez, Basque Center for Applied Mathematics BCAM, 48009 Bilbao, Spain
Rosa Izu Department of Dermatology, Basurto University Hospital, 48013 Bilbao, Spain Biocruces Bizkaia Health Research Institute, 48903 Barakaldo, Spain (M.D.B.)
Ildefonso Martínez de la Fuente Department of Mathematics, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain Luis Martínez, Basque Center for Applied Mathematics BCAM, 48009 Bilbao, Spain CEBAS-CSIC Institute, Department of Nutrition, 30100 Murcia, Spain
Manoli Igartua NanoBioCel Research Group, Laboratory of Pharmaceutics, School of Pharmacy, University of the Basque Country (UPV/EHU), 01006 Vitoria-Gasteiz, Spain (R.M.H.) Bioaraba, NanoBioCel Research Group, 01009 Vitoria-Gasteiz, Spain Biomedical Research Networking Centre in Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN). Institute of Health Carlos III, 28029 Madrid, Spain
Santos Alonso Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain
Rosa Maria Hernandez NanoBioCel Research Group, Laboratory of Pharmaceutics, School of Pharmacy, University of the Basque Country (UPV/EHU), 01006 Vitoria-Gasteiz, Spain (R.M.H.) Bioaraba, NanoBioCel Research Group, 01009 Vitoria-Gasteiz, Spain Biomedical Research Networking Centre in Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN). Institute of Health Carlos III, 28029 Madrid, Spain
María Dolores Boyano Biocruces Bizkaia Health Research Institute, 48903 Barakaldo, Spain (M.D.B.) Department of Cell Biology and Histology, Faculty of Medicine and Nursing, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain

Collapse

Jaiswal M, Singh A, Kumar S. PTPAMP: prediction tool for plant-derived antimicrobial peptides. Amino Acids 2023;55:1-17. [PMID: 35864258 DOI: 10.1007/s00726-022-03190-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 07/12/2022] [Indexed: 01/28/2023]

Agrawal P, Bhalla S, Chaudhary K, Kumar R, Sharma M, Raghava GPS. In Silico Approach for Prediction of Antifungal Peptides. Front Microbiol 2018. [PMID: 29535692 PMCID: PMC5834480 DOI: 10.3389/fmicb.2018.00323] [Citation(s) in RCA: 92] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Bessière C, Taha M, Petitprez F, Vandel J, Marin JM, Bréhélin L, Lèbre S, Lecellier CH. Probing instructions for expression regulation in gene nucleotide compositions. PLoS Comput Biol 2018;14:e1005921. [PMID: 29293496 PMCID: PMC5766238 DOI: 10.1371/journal.pcbi.1005921] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 01/12/2018] [Accepted: 12/10/2017] [Indexed: 01/22/2023] Open

Codon usage and amino acid usage influence genes expression level. Genetica 2017;146:53-63. [DOI: 10.1007/s10709-017-9996-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2017] [Accepted: 10/09/2017] [Indexed: 11/30/2022]

Yerukala Sathipati S, Ho SY. Identifying the miRNA signature associated with survival time in patients with lung adenocarcinoma using miRNA expression profiles. Sci Rep 2017;7:7507. [PMID: 28790336 PMCID: PMC5548864 DOI: 10.1038/s41598-017-07739-y] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Accepted: 07/04/2017] [Indexed: 12/19/2022] Open

Kang S, Odom OW, Thangamani S, Herrin DL. Toward mosquito control with a green alga: Expression of Cry toxins of Bacillus thuringiensis subsp. israelensis (Bti) in the chloroplast of Chlamydomonas. JOURNAL OF APPLIED PHYCOLOGY 2017;29:1377-1389. [PMID: 28713202 PMCID: PMC5509220 DOI: 10.1007/s10811-016-1008-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]

Bae YA. Codon Usage Patterns of Tyrosinase Genes in Clonorchis sinensis. THE KOREAN JOURNAL OF PARASITOLOGY 2017;55:175-183. [PMID: 28506040 PMCID: PMC5450960 DOI: 10.3347/kjp.2017.55.2.175] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Revised: 04/05/2017] [Accepted: 04/06/2017] [Indexed: 11/28/2022]

BARUAH VISHWAJYOTI, SATAPATHY SIDDHARTHASANKAR, POWDEL BHESHRAJ, KONWARH ROCKTOTPAL, BURAGOHAIN ALAKKUMAR, RAY SUVENDRAKUMAR. Comparative analysis of codon usage bias in Crenarchaea and Euryarchaea genome reveals differential preference of synonymous codons to encode highly expressed ribosomal and RNA polymerase proteins. J Genet 2016;95:537-49. [PMID: 27659324 DOI: 10.1007/s12041-016-0667-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]

Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression. Int J Genomics 2015;2015:269127. [PMID: 26114098 PMCID: PMC4465843 DOI: 10.1155/2015/269127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 05/11/2015] [Indexed: 11/24/2022] Open

An unsupervised approach to predict functional relations between genes based on expression data. BIOMED RESEARCH INTERNATIONAL 2014;2014:154594. [PMID: 24800208 PMCID: PMC3988973 DOI: 10.1155/2014/154594] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2013] [Revised: 01/31/2014] [Accepted: 02/03/2014] [Indexed: 11/17/2022]

Hybrid approach for predicting coreceptor used by HIV-1 from its V3 loop amino acid sequence. PLoS One 2013;8:e61437. [PMID: 23596523 PMCID: PMC3626595 DOI: 10.1371/journal.pone.0061437] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 03/13/2013] [Indexed: 12/18/2022] Open

Abstract

Background

HIV-1 infects the host cell by interacting with the primary receptor CD4 and a coreceptor CCR5 or CXCR4. Maraviroc, a CCR5 antagonist binds to CCR5 receptor. Thus, it is important to identify the coreceptor used by the HIV strains dominating in the patient. In past, a number of experimental assays and in-silico techniques have been developed for predicting the coreceptor tropism. The prediction accuracy of these methods is excellent when predicting CCR5(R5) tropic sequences but is relatively poor for CXCR4(X4) tropic sequences. Therefore, any new method for accurate determination of coreceptor usage would be of paramount importance to the successful management of HIV-infected individuals.

Results

The dataset used in this study comprised 1799 R5-tropic and 598 X4-tropic third variable (V3) sequences of HIV-1. We compared the amino acid composition of both types of V3 sequences and observed that certain types of residues, e.g., Asparagine and Isoleucine, were preferred in R5-tropic sequences whereas residues like Lysine, Arginine, and Tryptophan were preferred in X4-tropic sequences. Initially, Support Vector Machine-based models were developed using amino acid composition, dipeptide composition, and split amino acid composition, which achieved accuracy up to 90%. We used BLAST to discriminate R5- and X4-tropic sequences and correctly predicted 93.16% of R5- and 75.75% of X4-tropic sequences. In order to improve the prediction accuracy, a Hybrid model was developed that achieved 91.66% sensitivity, 81.77% specificity, 89.19% accuracy and 0.72 Matthews Correlation Coefficient. The performance of our models was also evaluated on an independent dataset (256 R5- and 81 X4-tropic sequences) and achieved maximum accuracy of 84.87% with Matthews Correlation Coefficient 0.63.

Conclusion

This study describes a highly efficient method for predicting HIV-1 coreceptor usage from V3 sequences. In order to provide a service to the scientific community, a webserver HIVcoPred was developed (http://www.imtech.res.in/raghava/hivcopred/) for predicting the coreceptor usage.

Collapse

Gautam A, Chaudhary K, Kumar R, Sharma A, Kapoor P, Tyagi A, Raghava GPS. In silico approaches for designing highly effective cell penetrating peptides. J Transl Med 2013;11:74. [PMID: 23517638 PMCID: PMC3615965 DOI: 10.1186/1479-5876-11-74] [Citation(s) in RCA: 207] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 03/11/2013] [Indexed: 11/23/2022] Open

Abstract

Background

Cell penetrating peptides have gained much recognition as a versatile transport vehicle for the intracellular delivery of wide range of cargoes (i.e. oligonucelotides, small molecules, proteins, etc.), that otherwise lack bioavailability, thus offering great potential as future therapeutics. Keeping in mind the therapeutic importance of these peptides, we have developed in silico methods for the prediction of cell penetrating peptides, which can be used for rapid screening of such peptides prior to their synthesis.

Methods

In the present study, support vector machine (SVM)-based models have been developed for predicting and designing highly effective cell penetrating peptides. Various features like amino acid composition, dipeptide composition, binary profile of patterns, and physicochemical properties have been used as input features. The main dataset used in this study consists of 708 peptides. In addition, we have identified various motifs in cell penetrating peptides, and used these motifs for developing a hybrid prediction model. Performance of our method was evaluated on an independent dataset and also compared with that of the existing methods.

Results

In cell penetrating peptides, certain residues (e.g. Arg, Lys, Pro, Trp, Leu, and Ala) are preferred at specific locations. Thus, it was possible to discriminate cell-penetrating peptides from non-cell penetrating peptides based on amino acid composition. All models were evaluated using five-fold cross-validation technique. We have achieved a maximum accuracy of 97.40% using the hybrid model that combines motif information and binary profile of the peptides. On independent dataset, we achieved maximum accuracy of 81.31% with MCC of 0.63.

Conclusion

The present study demonstrates that features like amino acid composition, binary profile of patterns and motifs, can be used to train an SVM classifier that can predict cell penetrating peptides with higher accuracy. The hybrid model described in this study achieved more accuracy than the previous methods and thus may complement the existing methods. Based on the above study, a user- friendly web server CellPPD has been developed to help the biologists, where a user can predict and design CPPs with much ease. CellPPD web server is freely accessible at http://crdd.osdd.net/raghava/cellppd/.

Collapse

Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition. BMC Bioinformatics 2012;13 Suppl 17:S3. [PMID: 23282103 PMCID: PMC3521471 DOI: 10.1186/1471-2105-13-s17-s3] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Existing methods for predicting protein solubility on overexpression in Escherichia coli advance performance by using ensemble classifiers such as two-stage support vector machine (SVM) based classifiers and a number of feature types such as physicochemical properties, amino acid and dipeptide composition, accompanied with feature selection. It is desirable to develop a simple and easily interpretable method for predicting protein solubility, compared to existing complex SVM-based methods.

RESULTS

This study proposes a novel scoring card method (SCM) by using dipeptide composition only to estimate solubility scores of sequences for predicting protein solubility. SCM calculates the propensities of 400 individual dipeptides to be soluble using statistic discrimination between soluble and insoluble proteins of a training data set. Consequently, the propensity scores of all dipeptides are further optimized using an intelligent genetic algorithm. The solubility score of a sequence is determined by the weighted sum of all propensity scores and dipeptide composition. To evaluate SCM by performance comparisons, four data sets with different sizes and variation degrees of experimental conditions were used. The results show that the simple method SCM with interpretable propensities of dipeptides has promising performance, compared with existing SVM-based ensemble methods with a number of feature types. Furthermore, the propensities of dipeptides and solubility scores of sequences can provide insights to protein solubility. For example, the analysis of dipeptide scores shows high propensity of α-helix structure and thermophilic proteins to be soluble.

CONCLUSIONS

The propensities of individual dipeptides to be soluble are varied for proteins under altered experimental conditions. For accurately predicting protein solubility using SCM, it is better to customize the score card of dipeptide propensities by using a training data set under the same specified experimental conditions. The proposed method SCM with solubility scores and dipeptide propensities can be easily applied to the protein function prediction problems that dipeptide composition features play an important role.

AVAILABILITY

The used datasets, source codes of SCM, and supplementary files are available at http://iclab.life.nctu.edu.tw/SCM/.

Collapse

Song J, Tan H, Wang M, Webb GI, Akutsu T. TANGLE: two-level support vector regression approach for protein backbone torsion angle prediction from primary sequences. PLoS One 2012;7:e30361. [PMID: 22319565 PMCID: PMC3271071 DOI: 10.1371/journal.pone.0030361] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 12/14/2011] [Indexed: 12/29/2022] Open

Identification of mannose interacting residues using local composition. PLoS One 2011;6:e24039. [PMID: 21931639 PMCID: PMC3172211 DOI: 10.1371/journal.pone.0024039] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2011] [Accepted: 07/29/2011] [Indexed: 01/24/2023] Open

Abstract

Background

Mannose binding proteins (MBPs) play a vital role in several biological functions such as defense mechanisms. These proteins bind to mannose on the surface of a wide range of pathogens and help in eliminating these pathogens from our body. Thus, it is important to identify mannose interacting residues (MIRs) in order to understand mechanism of recognition of pathogens by MBPs.

Results

This paper describes modules developed for predicting MIRs in a protein. Support vector machine (SVM) based models have been developed on 120 mannose binding protein chains, where no two chains have more than 25% sequence similarity. SVM models were developed on two types of datasets: 1) main dataset consists of 1029 mannose interacting and 1029 non-interacting residues, 2) realistic dataset consists of 1029 mannose interacting and 10320 non-interacting residues. In this study, firstly, we developed standard modules using binary and PSSM profile of patterns and got maximum MCC around 0.32. Secondly, we developed SVM modules using composition profile of patterns and achieved maximum MCC around 0.74 with accuracy 86.64% on main dataset. Thirdly, we developed a model on a realistic dataset and achieved maximum MCC of 0.62 with accuracy 93.08%. Based on this study, a standalone program and web server have been developed for predicting mannose interacting residues in proteins (http://www.imtech.res.in/raghava/premier/).

Conclusions

Compositional analysis of mannose interacting and non-interacting residues shows that certain types of residues are preferred in mannose interaction. It was also observed that residues around mannose interacting residues have a preference for certain types of residues. Composition of patterns/peptide/segment has been used for predicting MIRs and achieved reasonable high accuracy. It is possible that this novel strategy may be effective to predict other types of interacting residues. This study will be useful in annotating the function of protein as well as in understanding the role of mannose in the immune system.

Collapse

Van Damme P, Hole K, Pimenta-Marques A, Helsens K, Vandekerckhove J, Martinho RG, Gevaert K, Arnesen T. NatF contributes to an evolutionary shift in protein N-terminal acetylation and is important for normal chromosome segregation. PLoS Genet 2011;7:e1002169. [PMID: 21750686 PMCID: PMC3131286 DOI: 10.1371/journal.pgen.1002169] [Citation(s) in RCA: 146] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2011] [Accepted: 05/20/2011] [Indexed: 01/31/2023] Open

Abstract

N-terminal acetylation (N-Ac) is a highly abundant eukaryotic protein modification. Proteomics revealed a significant increase in the occurrence of N-Ac from lower to higher eukaryotes, but evidence explaining the underlying molecular mechanism(s) is currently lacking. We first analysed protein N-termini and their acetylation degrees, suggesting that evolution of substrates is not a major cause for the evolutionary shift in N-Ac. Further, we investigated the presence of putative N-terminal acetyltransferases (NATs) in higher eukaryotes. The purified recombinant human and Drosophila homologues of a novel NAT candidate was subjected to in vitro peptide library acetylation assays. This provided evidence for its NAT activity targeting Met-Lys- and other Met-starting protein N-termini, and the enzyme was termed Naa60p and its activity NatF. Its in vivo activity was investigated by ectopically expressing human Naa60p in yeast followed by N-terminal COFRADIC analyses. hNaa60p acetylated distinct Met-starting yeast protein N-termini and increased general acetylation levels, thereby altering yeast in vivo acetylation patterns towards those of higher eukaryotes. Further, its activity in human cells was verified by overexpression and knockdown of hNAA60 followed by N-terminal COFRADIC. NatF's cellular impact was demonstrated in Drosophila cells where NAA60 knockdown induced chromosomal segregation defects. In summary, our study revealed a novel major protein modifier contributing to the evolution of N-Ac, redundancy among NATs, and an essential regulator of normal chromosome segregation. With the characterization of NatF, the co-translational N-Ac machinery appears complete since all the major substrate groups in eukaryotes are accounted for.

Small chemical groups are commonly attached to proteins in order to control their activity, localization, and stability. An abundant protein modification is N-terminal acetylation, in which an N-terminal acetyltransferase (NAT) catalyzes the transfer of an acetyl group to the very N-terminal amino acid of the protein. When going from lower to higher eukaryotes there is a significant increase in the occurrence of N-terminal acetylation. We demonstrate here that this is partly because higher eukaryotes uniquely express NatF, an enzyme capable of acetylating a large group of protein N-termini including those previously found to display an increased N-acetylation potential in higher eukaryotes. Thus, the current study has possibly identified the last major component of the eukaryotic machinery responsible for co-translational N-acetylation of proteins. All eukaryotic proteins start with methionine, which is co-translationally cleaved when the second amino acid is small. Thereafter, NatA may acetylate these newly exposed N-termini. Interestingly, NatF also has the potential to act on these types of N-termini where the methionine was not cleaved. At the cellular level, we further found that NatF is essential for normal chromosome segregation during cell division.

Collapse

Ivanisenko VA, Demenkov PS, Ivanisenko TV, Kolchanov NA. [Protein Structure Discovery: software package to perform computational proteomics tasks]. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2011;37:22-35. [PMID: 21460878 DOI: 10.1134/s1068162011010080] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Panwar B, Raghava GPS. Predicting sub-cellular localization of tRNA synthetases from their primary structures. Amino Acids 2011;42:1703-13. [DOI: 10.1007/s00726-011-0872-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2010] [Accepted: 02/21/2011] [Indexed: 11/25/2022]

Misawa K, Kikuno RF. Relationship between amino acid composition and gene expression in the mouse genome. BMC Res Notes 2011;4:20. [PMID: 21272306 PMCID: PMC3038927 DOI: 10.1186/1756-0500-4-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2010] [Accepted: 01/27/2011] [Indexed: 11/10/2022] Open

Panwar B, Raghava GPS. Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains. BMC Genomics 2010;11:507. [PMID: 20860794 PMCID: PMC2997003 DOI: 10.1186/1471-2164-11-507] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2010] [Accepted: 09/22/2010] [Indexed: 12/02/2022] Open

Abstract

Background

Aminoacyl tRNA synthetases (aaRSs) catalyse the first step of protein synthesis in all organisms. They are responsible for the precise attachment of amino acids to their cognate transfer RNAs. There are twenty different types of aaRSs, unique for each amino acid. These aaRSs have been divided into two classes, each comprising ten enzymes. It is important to predict and classify aaRSs in order to understand protein synthesis.

Results

In this study, all models were developed on a non-redundant dataset containing 117 aaRSs and an equal number of non-aaRSs, in which no two sequences have more than 30% similarity. First, we applied the similarity search technique, BLAST, and achieved a maximum accuracy of 67.52%. We observed that 62% of tRNA synthetases contain one or more domains from amongst the following four PROSITE domains: PS50862, PS00178, PS50860 and PS50861. An SVM-based model was developed to discriminate between aaRSs, and non-aaRSs, and achieved a maximum MCC of 0.68 with accuracy of 83.73%, using selective dipeptide composition. We developed a hybrid approach and achieved a maximum MCC of 0.72 with accuracy of 85.49%, where SVM model developed using selected dipeptide composition and information of four PROSITE domains. We further developed an SVM-based model for classifying the aaRSs into class-1 and class-2, using selective dipeptide composition and achieved an MCC of 0.79. We also observed that two domains (PS00178, PS50889) in class-1 and three domains (PS50862, PS50860, PS50861) in class-2 were preferred. A hybrid method was developed using these domains as descriptor, along with selected dipeptide composition, and achieved an MCC of 0.87 with a sensitivity of 94.55% and an accuracy of 93.19%. All models were evaluated using a five-fold cross-validation technique.

Conclusions

We have analyzed protein sequences of aaRSs (class-1 and class-2) and non-aaRSs and identified interesting patterns. The high accuracy achieved by our SVM models using selected dipeptide composition demonstrates that certain types of dipeptide are preferred in aaRSs. We were able to identify PROSITE domains that are preferred in aaRSs and their classes, providing interesting insights into tRNA synthetases. The method developed in this study will be useful for researchers studying aaRS enzymes and tRNA biology. The web-server based on the above study, is available at http://www.imtech.res.in/raghava/icaars/.

Collapse

Metabolic flux distributions: genetic information, computational predictions, and experimental validation. Appl Microbiol Biotechnol 2010;86:1243-55. [DOI: 10.1007/s00253-010-2506-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Revised: 02/10/2010] [Accepted: 02/11/2010] [Indexed: 01/15/2023]

Liu X, Zhang J, Ni F, Dong X, Han B, Han D, Ji Z, Zhao Y. Genome wide exploration of the origin and evolution of amino acids. BMC Evol Biol 2010;10:77. [PMID: 20230639 PMCID: PMC2853539 DOI: 10.1186/1471-2148-10-77] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2009] [Accepted: 03/15/2010] [Indexed: 11/10/2022] Open

Song J, Tan H, Shen H, Mahmood K, Boyd SE, Webb GI, Akutsu T, Whisstock JC. Cascleave: towards more accurate prediction of caspase substrate cleavage sites. ACTA ACUST UNITED AC 2010;26:752-60. [PMID: 20130033 DOI: 10.1093/bioinformatics/btq043] [Citation(s) in RCA: 132] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

Song J, Tan H, Mahmood K, Law RHP, Buckle AM, Webb GI, Akutsu T, Whisstock JC. Prodepth: predict residue depth by support vector regression approach from protein sequences only. PLoS One 2009;4:e7072. [PMID: 19759917 PMCID: PMC2742725 DOI: 10.1371/journal.pone.0007072] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2009] [Accepted: 08/20/2009] [Indexed: 11/24/2022] Open

Abstract

Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis.

Collapse

Tartaglia GG, Pechmann S, Dobson CM, Vendruscolo M. A relationship between mRNA expression levels and protein solubility in E. coli. J Mol Biol 2009;388:381-9. [PMID: 19281824 DOI: 10.1016/j.jmb.2009.03.002] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2008] [Revised: 02/26/2009] [Accepted: 03/03/2009] [Indexed: 10/21/2022]

Miura F, Kawaguchi N, Yoshida M, Uematsu C, Kito K, Sakaki Y, Ito T. Absolute quantification of the budding yeast transcriptome by means of competitive PCR between genomic and complementary DNAs. BMC Genomics 2008;9:574. [PMID: 19040753 PMCID: PMC2612024 DOI: 10.1186/1471-2164-9-574] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2008] [Accepted: 11/29/2008] [Indexed: 11/10/2022] Open

Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L. Sequence based residue depth prediction using evolutionary information and predicted secondary structure. BMC Bioinformatics 2008;9:388. [PMID: 18803867 PMCID: PMC2567998 DOI: 10.1186/1471-2105-9-388] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2008] [Accepted: 09/20/2008] [Indexed: 11/29/2022] Open

Abstract

Background

Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus. Accurate prediction of residue depth would provide valuable information for fold recognition, prediction of functional sites, and protein design.

Results

A new method, RDPred, for the real-value depth prediction from protein sequence is proposed. RDPred combines information extracted from the sequence, PSI-BLAST scoring matrices, and secondary structure predicted with PSIPRED. Three-fold/ten-fold cross validation based tests performed on three independent, low-identity datasets show that the distance based depth (computed using MSMS) predicted by RDPred is characterized by 0.67/0.67, 0.66/0.67, and 0.64/0.65 correlation with the actual depth, by the mean absolute errors equal 0.56/0.56, 0.61/0.60, and 0.58/0.57, and by the mean relative errors equal 17.0%/16.9%, 18.2%/18.1%, and 17.7%/17.6%, respectively. The mean absolute and the mean relative errors are shown to be statistically significantly better when compared with a method recently proposed by Yuan and Wang [Proteins 2008; 70:509–516]. The results show that three-fold cross validation underestimates the variability of the prediction quality when compared with the results based on the ten-fold cross validation. We also show that the hydrophilic and flexible residues are predicted more accurately than hydrophobic and rigid residues. Similarly, the charged residues that include Lys, Glu, Asp, and Arg are the most accurately predicted. Our analysis reveals that evolutionary information encoded using PSSM is characterized by stronger correlation with the depth for hydrophilic amino acids (AAs) and aliphatic AAs when compared with hydrophobic AAs and aromatic AAs. Finally, we show that the secondary structure of coils and strands is useful in depth prediction, in contrast to helices that have relatively uniform distribution over the protein depth. Application of the predicted residue depth to prediction of buried/exposed residues shows consistent improvements in detection rates of both buried and exposed residues when compared with the competing method. Finally, we contrasted the prediction performance among distance based (MSMS and DPX) and volume based (SADIC) depth definitions. We found that the distance based indices are harder to predict due to the more complex nature of the corresponding depth profiles.

Conclusion

The proposed method, RDPred, provides statistically significantly better predictions of residue depth when compared with the competing method. The predicted depth can be used to provide improved prediction of both buried and exposed residues. The prediction of exposed residues has implications in characterization/prediction of interactions with ligands and other proteins, while the prediction of buried residues could be used in the context of folding predictions and simulations.

Collapse

Song J, Tan H, Takemoto K, Akutsu T. HSEpred: predict half-sphere exposure from protein sequences. Bioinformatics 2008;24:1489-97. [DOI: 10.1093/bioinformatics/btn222] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Otaki JM, Gotoh T, Yamamoto H. Potential implications of availability of short amino acid sequences in proteins: an old and new approach to protein decoding and design. BIOTECHNOLOGY ANNUAL REVIEW 2008;14:109-41. [PMID: 18606361 DOI: 10.1016/s1387-2656(08)00004-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Abstract

Three-dimensional structure of a protein molecule is primarily determined by its amino acid sequence, and thus the elucidation of general rules embedded in amino acid sequences is of great importance in protein science and engineering. To extract valuable information from sequences, we propose an analytical method in which a protein sequence is considered to be constructed by serial superimpositions of short amino acid sequences of n amino acid sets, especially triplets (3-aa sets). Using the comprehensive nonredundant protein database, we first examined "availability" of all possible combinatorial sets of 8,000 triplet species. Availability score was mathematically defined as an indicator for the relative "preference" or "avoidance" for a given short constituent sequence to be used in protein chain. Availability scores of real proteins were clearly biased against those of randomly generated proteins. We found many triplet species that occurred in the database more than expected or less than expected. Such bias was extended to longer sets, and we found that some species of pentats (5-aa sets) that occurred reasonably frequently in the randomly generated protein population did not occur at all in any real proteins known today. Availability score was dependent on species, potentially serving as a phylogenetic indicator. Furthermore, we suggest possibilities of various biotechnological applications of characteristic short sequences such as human-specific and pathogen-specific short sequences obtained from availability analysis. Availability score was also dependent on secondary structures, potentially serving as a structural indicator. Availability analysis on triplets may be combined with a comprehensive data collection on the varphi and psi peptide-bond angles of the amino acid at the center of each triplet, i.e., a collection of Ramachandran plots for each triplet. These triplet characters, together with other physicochemical data, will provide us with basic information between protein sequence and structure, by which structure prediction and engineering may be greatly facilitated. Availability analysis may also be useful in identifying word processing units in amino acid sequences based on an analogy to natural languages. Together with other approaches, availability analysis will elucidate general rules hidden in the primary sequences and eventually contributes to rebuilding the paradigm of protein science.

Collapse

Deschavanne P, Tufféry P. Exploring an alignment free approach for protein classification and structural class prediction. Biochimie 2007;90:615-25. [PMID: 18067866 DOI: 10.1016/j.biochi.2007.11.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2007] [Accepted: 11/09/2007] [Indexed: 11/25/2022]

Adaptation of model proteins from cold to hot environments involves continuous and small adjustments of average parameters related to amino acid composition. J Theor Biol 2007;250:156-71. [PMID: 17950361 DOI: 10.1016/j.jtbi.2007.09.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2007] [Revised: 08/29/2007] [Accepted: 09/01/2007] [Indexed: 10/22/2022]

Abstract

The growth temperature adaptation of six model proteins has been studied in 42 microorganisms belonging to eubacterial and archaeal kingdoms, covering optimum growth temperatures from 7 to 103 degrees C. The selected proteins include three elongation factors involved in translation, the enzymes glyceraldehyde-3-phosphate dehydrogenase and superoxide dismutase, the cell division protein FtsZ. The common strategy of protein adaptation from cold to hot environments implies the occurrence of small changes in the amino acid composition, without altering the overall structure of the macromolecule. These continuous adjustments were investigated through parameters related to the amino acid composition of each protein. The average value per residue of mass, volume and accessible surface area allowed an evaluation of the usage of bulky residues, whereas the average hydrophobicity reflected that of hydrophobic residues. The specific proportion of bulky and hydrophobic residues in each protein almost linearly increased with the temperature of the host microorganism. This finding agrees with the structural and functional properties exhibited by proteins in differently adapted sources, thus explaining the great compactness or the high flexibility exhibited by (hyper)thermophilic or psychrophilic proteins, respectively. Indeed, heat-adapted proteins incline toward the usage of heavier-size and more hydrophobic residues with respect to mesophiles, whereas the cold-adapted macromolecules show the opposite behavior with a certain preference for smaller-size and less hydrophobic residues. An investigation on the different increase of bulky residues along with the growth temperature observed in the six model proteins suggests the relevance of the possible different role and/or structure organization played by protein domains. The significance of the linear correlations between growth temperature and parameters related to the amino acid composition improved when the analysis was collectively carried out on all model proteins.

Collapse

Wu G, Nie L, Freeland SJ. The effects of differential gene expression on coding sequence features: Analysis by one-way ANOVA. Biochem Biophys Res Commun 2007;358:1108-13. [PMID: 17517370 DOI: 10.1016/j.bbrc.2007.05.043] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2007] [Accepted: 05/08/2007] [Indexed: 10/23/2022]

Prediction of highly expressed genes in microbes based on chromatin accessibility. BMC Mol Biol 2007;8:11. [PMID: 17295928 PMCID: PMC1805505 DOI: 10.1186/1471-2199-8-11] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2006] [Accepted: 02/13/2007] [Indexed: 12/22/2022] Open

Machine learning techniques in disease forecasting: a case study on rice blast prediction. BMC Bioinformatics 2006;7:485. [PMID: 17083731 PMCID: PMC1647291 DOI: 10.1186/1471-2105-7-485] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2006] [Accepted: 11/03/2006] [Indexed: 11/22/2022] Open

Abstract

Background

Diverse modeling approaches viz. neural networks and multiple regression have been followed to date for disease prediction in plant populations. However, due to their inability to predict value of unknown data points and longer training times, there is need for exploiting new prediction softwares for better understanding of plant-pathogen-environment relationships. Further, there is no online tool available which can help the plant researchers or farmers in timely application of control measures. This paper introduces a new prediction approach based on support vector machines for developing weather-based prediction models of plant diseases.

Results

Six significant weather variables were selected as predictor variables. Two series of models (cross-location and cross-year) were developed and validated using a five-fold cross validation procedure. For cross-year models, the conventional multiple regression (REG) approach achieved an average correlation coefficient (r) of 0.50, which increased to 0.60 and percent mean absolute error (%MAE) decreased from 65.42 to 52.24 when back-propagation neural network (BPNN) was used. With generalized regression neural network (GRNN), the r increased to 0.70 and %MAE also improved to 46.30, which further increased to r = 0.77 and %MAE = 36.66 when support vector machine (SVM) based method was used. Similarly, cross-location validation achieved r = 0.48, 0.56 and 0.66 using REG, BPNN and GRNN respectively, with their corresponding %MAE as 77.54, 66.11 and 58.26. The SVM-based method outperformed all the three approaches by further increasing r to 0.74 with improvement in %MAE to 44.12. Overall, this SVM-based prediction approach will open new vistas in the area of forecasting plant diseases of various crops.

Conclusion

Our case study demonstrated that SVM is better than existing machine learning techniques and conventional REG approaches in forecasting plant diseases. In this direction, we have also developed a SVM-based web server for rice blast prediction, a first of its kind worldwide, which can help the plant science community and farmers in their decision making process. The server is freely available at .

Collapse

Song J, Burrage K. Predicting residue-wise contact orders in proteins by support vector regression. BMC Bioinformatics 2006;7:425. [PMID: 17014735 PMCID: PMC1618864 DOI: 10.1186/1471-2105-7-425] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2006] [Accepted: 10/03/2006] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships.

RESULTS

We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods.

CONCLUSION

The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

Collapse

Tuikkala J, Elo L, Nevalainen OS, Aittokallio T. Improving missing value estimation in microarray data with gene ontology. Bioinformatics 2005;22:566-72. [PMID: 16377613 DOI: 10.1093/bioinformatics/btk019] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Arakawa K, Suzuki H, Fujishima K, Fujimoto K, Ueda S, Matsui M, Tomita M. A Comprehensive Software Suite for the Analysis of cDNAs. GENOMICS, PROTEOMICS & BIOINFORMATICS 2005;3:179-88. [PMID: 16487083 PMCID: PMC5172547 DOI: 10.1016/s1672-0229(05)03023-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]