301
|
Bassett T, Harpur B, Poon HY, Kuo KH, Lee CH. Effective stimulation of growth in MCF-7 human breast cancer cells by inhibition of syntaxin18 by external guide sequence and ribonuclease P. Cancer Lett 2008; 272:167-75. [PMID: 18722709 DOI: 10.1016/j.canlet.2008.07.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2008] [Revised: 04/22/2008] [Accepted: 07/10/2008] [Indexed: 10/21/2022]
Abstract
Syntaxin18 (Stx18) is an endoplasmic reticulum (ER)-membrane bound SNARE protein involved in membrane trafficking between the ER and Golgi as well as in phagocytosis. Stx18 has also been shown to physically interact with proteins involved in the cell cycle and apoptosis. These findings suggest the possible role of Stx18 in regulating cell growth. In this study, we used theoretically designed external guide sequence molecule which utilizes RNase P to cleave Stx18 mRNA and down-regulate Stx18 levels in MCF-7 human breast cancer cells. We showed that down-regulation of Stx18 leads to significant enhancement of growth in MCF-7 cells. Consistent with this finding was the observation that over-expression of Stx18 using the CMV promoter led to suppression of cell growth. Over-expressing Stx18 had no effect on c-myc mRNA expression and half-life, suggesting that the mechanism does not involve control at the transcriptional and post-transcriptional level of the c-myc gene. Finally, we showed that Stx18 is over-expressed in clinical human breast cancer. Overall, this study showed that Stx18 plays a role in the growth of human breast cancer cells and provided the basis for further investigation in determining whether it can be used as a prognostic marker and as a molecular target in the treatment of breast cancer.
Collapse
Affiliation(s)
- Tyler Bassett
- Chemistry Program, University of Northern British Columbia, 3333 University Way, Prince George, BC, Canada V2N 4Z9
| | | | | | | | | |
Collapse
|
302
|
Cui J, Liu Q, Puett D, Xu Y. Computational prediction of human proteins that can be secreted into the bloodstream. ACTA ACUST UNITED AC 2008; 24:2370-5. [PMID: 18697770 DOI: 10.1093/bioinformatics/btn418] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We present a novel computational method for predicting which proteins from highly and abnormally expressed genes in diseased human tissues, such as cancers, can be secreted into the bloodstream, suggesting possible marker proteins for follow-up serum proteomic studies. A main challenging issue in tackling this problem is that our understanding about the downstream localization after proteins are secreted outside the cells is very limited and not sufficient to provide useful hints about secretion to the bloodstream. To bypass this difficulty, we have taken a data mining approach by first collecting, through extensive literature searches, human proteins that are known to be secreted into the bloodstream due to various pathological conditions as detected by previous proteomic studies, and then asking the question: 'what do these secreted proteins have in common in terms of their physical and chemical properties, amino acid sequence and structural features that can be used to predict them?' We have identified a list of features, such as signal peptides, transmembrane domains, glycosylation sites, disordered regions, secondary structural content, hydrophobicity and polarity measures that show relevance to protein secretion. Using these features, we have trained a support vector machine-based classifier to predict protein secretion to the bloodstream. On a large test set containing 98 secretory proteins and 6601 non-secretory proteins of human, our classifier achieved approximately 90% prediction sensitivity and approximately 98% prediction specificity. Several additional datasets are used to further assess the performance of our classifier. On a set of 122 proteins that were found to be of abnormally high abundance in human blood due to various cancers, our program predicted 62 as blood-secreted proteins. By applying our program to abnormally highly expressed genes in gastric cancer and lung cancer tissues detected through microarray gene expression studies, we predicted 13 and 31 as blood secreted, respectively, suggesting that they could serve as potential biomarkers for these two cancers, respectively. Our study demonstrated that our method can provide highly useful information to link genomic and proteomic studies for disease biomarker discovery. Our software can be accessed at http://csbl1.bmb.uga.edu/cgi-bin/Secretion/secretion.cgi.
Collapse
Affiliation(s)
- Juan Cui
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA
| | | | | | | |
Collapse
|
303
|
Shazman S, Mandel-Gutfreund Y. Classifying RNA-binding proteins based on electrostatic properties. PLoS Comput Biol 2008; 4:e1000146. [PMID: 18716674 PMCID: PMC2518515 DOI: 10.1371/journal.pcbi.1000146] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Accepted: 06/26/2008] [Indexed: 01/15/2023] Open
Abstract
Protein structure can provide new insight into the biological function of a protein and can enable the design of better experiments to learn its biological roles. Moreover, deciphering the interactions of a protein with other molecules can contribute to the understanding of the protein's function within cellular processes. In this study, we apply a machine learning approach for classifying RNA-binding proteins based on their three-dimensional structures. The method is based on characterizing unique properties of electrostatic patches on the protein surface. Using an ensemble of general protein features and specific properties extracted from the electrostatic patches, we have trained a support vector machine (SVM) to distinguish RNA-binding proteins from other positively charged proteins that do not bind nucleic acids. Specifically, the method was applied on proteins possessing the RNA recognition motif (RRM) and successfully classified RNA-binding proteins from RRM domains involved in protein–protein interactions. Overall the method achieves 88% accuracy in classifying RNA-binding proteins, yet it cannot distinguish RNA from DNA binding proteins. Nevertheless, by applying a multiclass SVM approach we were able to classify the RNA-binding proteins based on their RNA targets, specifically, whether they bind a ribosomal RNA (rRNA), a transfer RNA (tRNA), or messenger RNA (mRNA). Finally, we present here an innovative approach that does not rely on sequence or structural homology and could be applied to identify novel RNA-binding proteins with unique folds and/or binding motifs. Gene expression in all living organisms is regulated by a complex set of events at both transcriptional and posttranscriptional levels. RNA-binding proteins play a key role in posttranscriptional events including splicing, stability, transport, and translation. Nowadays, there is increasing evidence that many other cellular processes may be mediated by RNA. Identifying new proteins involved in interaction with RNA is thus essential to unraveling the cellular processes in which these interactions are involved. In the current study we present a successful computational approach for classifying RNA-binding proteins and distinguishing them from other proteins based on structural and electrostatic properties. We test the method on a unique protein domain, the RNA recognition motif (RRM), which mediates both RNA and protein interactions. We show that we can discriminate RNA-binding RRMs from protein-binding RRMs. Further, we demonstrate that we can classify known RNA-binding proteins based on their RNA target (mRNA, rRNA, or tRNA). Our method does not rely on any kind of evolutionary information and thus can be applied to identify RNA-binding proteins with novel modes of RNA recognition.
Collapse
Affiliation(s)
- Shula Shazman
- Faculty of Biology, Technion—Israel Institute of Technology, Haifa, Israel
| | | |
Collapse
|
304
|
Kosinski J, Plotz G, Guarné A, Bujnicki JM, Friedhoff P. The PMS2 subunit of human MutLalpha contains a metal ion binding domain of the iron-dependent repressor protein family. J Mol Biol 2008; 382:610-27. [PMID: 18619468 DOI: 10.1016/j.jmb.2008.06.056] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2008] [Revised: 06/13/2008] [Accepted: 06/23/2008] [Indexed: 12/22/2022]
Abstract
DNA mismatch repair (MMR) is responsible for correcting replication errors. MutLalpha, one of the main players in MMR, has been recently shown to harbor an endonuclease/metal-binding activity, which is important for its function in vivo. This endonuclease activity has been confined to the C-terminal domain of the hPMS2 subunit of the MutLalpha heterodimer. In this work, we identify a striking sequence-structure similarity of hPMS2 to the metal-binding/dimerization domain of the iron-dependent repressor protein family and present a structural model of the metal-binding domain of MutLalpha. According to our model, this domain of MutLalpha comprises at least three highly conserved sequence motifs, which are also present in most MutL homologs from bacteria that do not rely on the endonuclease activity of MutH for strand discrimination. Furthermore, based on our structural model, we predict that MutLalpha is a zinc ion binding protein and confirm this prediction by way of biochemical analysis of zinc ion binding using the full-length and C-terminal domain of MutLalpha. Finally, we demonstrate that the conserved residues of the metal ion binding domain are crucial for MMR activity of MutLalpha in vitro.
Collapse
Affiliation(s)
- Jan Kosinski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland
| | | | | | | | | |
Collapse
|
305
|
Zhang HL, Lin HH, Tao L, Ma XH, Dai JL, Jia J, Cao ZW. Prediction of antibiotic resistance proteins from sequence-derived properties irrespective of sequence similarity. Int J Antimicrob Agents 2008; 32:221-6. [PMID: 18583101 DOI: 10.1016/j.ijantimicag.2008.03.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2008] [Revised: 03/13/2008] [Accepted: 03/15/2008] [Indexed: 11/29/2022]
Abstract
Increasing antibiotic resistance has become a worldwide challenge to the clinical treatment of infectious diseases. The identification of antibiotic resistance proteins (ARPs) would be helpful in the discovery of new therapeutic targets and the design of novel drugs to control the potential spread of antibiotic resistance. In this work, a support vector machine (SVM)-based ARP prediction system was developed using 1308 ARPs and 15587 non-ARPs. Its performance was evaluated using 313 ARPs and 7156 non-ARPs. The computed prediction accuracy was 88.5% for ARPs and 99.2% for non-ARPs. A potential application of this method is the identification of ARPs non-homologous to proteins of known function. Further genome screening found that ca. 3.5% and 3.2% of proteins in Escherichia coli and Staphylococcus aureus, respectively, are potential ARPs. These results suggest the usefulness of SVMs for facilitating the identification of ARPs. The software can be accessed at SARPI (Server for Antibiotic Resistance Protein Identification).
Collapse
Affiliation(s)
- H L Zhang
- Department of Pharmacy, 18 Science Drive 4, National University of Singapore, Singapore 117543, Singapore
| | | | | | | | | | | | | |
Collapse
|
306
|
Ma XH, Wang R, Yang SY, Li ZR, Xue Y, Wei YC, Low BC, Chen YZ. Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds. J Chem Inf Model 2008; 48:1227-37. [PMID: 18533644 DOI: 10.1021/ci800022e] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Virtual screening performance of support vector machines (SVM) depends on the diversity of training active and inactive compounds. While diverse inactive compounds can be routinely generated, the number and diversity of known actives are typically low. We evaluated the performance of SVM trained by sparsely distributed actives in six MDDR biological target classes composed of a high number of known actives (983-1645) of high, intermediate, and low structural diversity (muscarinic M1 receptor agonists, NMDA receptor antagonists, thrombin inhibitors, HIV protease inhibitors, cephalosporins, and renin inhibitors). SVM trained by regularly sparse data sets of 100 actives show improved yields at substantially reduced false-hit rates compared to those of published studies and those of Tanimoto-based similarity searching method based on the same data sets and molecular descriptors. SVM trained by very sparse data sets of 40 actives (2.4%-4.1% of the known actives) predicted 17.5-39.5%, 23.0-48.1%, and 70.2-92.4% of the remaining 943-1605 actives in the high, intermediate, and low diversity classes, respectively, 13.8-68.7% of which are outside the training compound families. SVM predicted 99.97% and 97.1% of the 9.997 M PUBCHEM and 167K remaining MDDR compounds as inactive and 2.6%-8.3% of the 19,495-38,483 MDDR compounds similar to the known actives as active. These suggest that SVM has substantial capability in identifying novel active compounds from sparse active data sets at low false-hit rates.
Collapse
Affiliation(s)
- X H Ma
- Centre for Computational Science and Engineering, National University of Singapore, Singapore
| | | | | | | | | | | | | | | |
Collapse
|
307
|
Ishihama Y, Schmidt T, Rappsilber J, Mann M, Hartl FU, Kerner MJ, Frishman D. Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 2008; 9:102. [PMID: 18304323 PMCID: PMC2292177 DOI: 10.1186/1471-2164-9-102] [Citation(s) in RCA: 353] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2008] [Accepted: 02/27/2008] [Indexed: 11/10/2022] Open
Abstract
Background Knowledge about the abundance of molecular components is an important prerequisite for building quantitative predictive models of cellular behavior. Proteins are central components of these models, since they carry out most of the fundamental processes in the cell. Thus far, protein concentrations have been difficult to measure on a large scale, but proteomic technologies have now advanced to a stage where this information becomes readily accessible. Results Here, we describe an experimental scheme to maximize the coverage of proteins identified by mass spectrometry of a complex biological sample. Using a combination of LC-MS/MS approaches with protein and peptide fractionation steps we identified 1103 proteins from the cytosolic fraction of the Escherichia coli strain MC4100. A measure of abundance is presented for each of the identified proteins, based on the recently developed emPAI approach which takes into account the number of sequenced peptides per protein. The values of abundance are within a broad range and accurately reflect independently measured copy numbers per cell. As expected, the most abundant proteins were those involved in protein synthesis, most notably ribosomal proteins. Proteins involved in energy metabolism as well as those with binding function were also found in high copy number while proteins annotated with the terms metabolism, transcription, transport, and cellular organization were rare. The barrel-sandwich fold was found to be the structural fold with the highest abundance. Highly abundant proteins are predicted to be less prone to aggregation based on their length, pI values, and occurrence patterns of hydrophobic stretches. We also find that abundant proteins tend to be predominantly essential. Additionally we observe a significant correlation between protein and mRNA abundance in E. coli cells. Conclusion Abundance measurements for more than 1000 E. coli proteins presented in this work represent the most complete study of protein abundance in a bacterial cell so far. We show significant associations between the abundance of a protein and its properties and functions in the cell. In this way, we provide both data and novel insights into the role of protein concentration in this model organism.
Collapse
Affiliation(s)
- Yasushi Ishihama
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan.
| | | | | | | | | | | | | |
Collapse
|
308
|
Yang JY, Zhou Y, Yu ZG, Anh V, Zhou LQ. Human Pol II promoter recognition based on primary sequences and free energy of dinucleotides. BMC Bioinformatics 2008; 9:113. [PMID: 18294399 PMCID: PMC2292139 DOI: 10.1186/1471-2105-9-113] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Accepted: 02/24/2008] [Indexed: 01/29/2023] Open
Abstract
Background Promoter region plays an important role in determining where the transcription of a particular gene should be initiated. Computational prediction of eukaryotic Pol II promoter sequences is one of the most significant problems in sequence analysis. Existing promoter prediction methods are still far from being satisfactory. Results We attempt to recognize the human Pol II promoter sequences from the non-promoter sequences which are made up of exon and intron sequences. Four methods are used: two kinds of multifractal analysis performed on the numeric sequences obtained from the dinucleotide free energy, Z curve analysis and global descriptor of the promoter/non-promoter primary sequences. A total of 141 parameters are extracted from these methods and categorized into seven groups (methods). They are used to generate certain spaces and then each promoter/non-promoter sequence is represented by a point in the corresponding space. All the 120 possible combinations of the seven methods are tested. Based on Fisher's linear discriminant algorithm, with a relatively smaller number of parameters (96 and 117), we get satisfactory discriminant accuracies. Particularly, in the case of 117 parameters, the accuracies for the training and test sets reach 90.43% and 89.79%, respectively. A comparison with five other existing methods indicates that our methods have a better performance. Using the global descriptor method (36 parameters), 17 of the 18 experimentally verified promoter sequences of human chromosome 22 are correctly identified. Conclusion The high accuracies achieved suggest that the methods of this paper are useful for understanding the difficult problem of promoter prediction.
Collapse
Affiliation(s)
- Jian-Yi Yang
- School of Mathematics and Computational Science, Xiangtan University, Hunan 411105, China.
| | | | | | | | | |
Collapse
|
309
|
Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs. BMC Bioinformatics 2008; 9:101. [PMID: 18282281 PMCID: PMC2335299 DOI: 10.1186/1471-2105-9-101] [Citation(s) in RCA: 117] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2007] [Accepted: 02/18/2008] [Indexed: 12/02/2022] Open
Abstract
Background As one of the most common protein post-translational modifications, glycosylation is involved in a variety of important biological processes. Computational identification of glycosylation sites in protein sequences becomes increasingly important in the post-genomic era. A new encoding scheme was employed to improve the prediction of mucin-type O-glycosylation sites in mammalian proteins. Results A new protein bioinformatics tool, CKSAAP_OGlySite, was developed to predict mucin-type O-glycosylation serine/threonine (S/T) sites in mammalian proteins. Using the composition of k-spaced amino acid pairs (CKSAAP) based encoding scheme, the proposed method was trained and tested in a new and stringent O-glycosylation dataset with the assistance of Support Vector Machine (SVM). When the ratio of O-glycosylation to non-glycosylation sites in training datasets was set as 1:1, 10-fold cross-validation tests showed that the proposed method yielded a high accuracy of 83.1% and 81.4% in predicting O-glycosylated S and T sites, respectively. Based on the same datasets, CKSAAP_OGlySite resulted in a higher accuracy than the conventional binary encoding based method (about +5.0%). When trained and tested in 1:5 datasets, the CKSAAP encoding showed a more significant improvement than the binary encoding. We also merged the training datasets of S and T sites and integrated the prediction of S and T sites into one single predictor (i.e. S+T predictor). Either in 1:1 or 1:5 datasets, the performance of this S+T predictor was always slightly better than those predictors where S and T sites were independently predicted, suggesting that the molecular recognition of O-glycosylated S/T sites seems to be similar and the increase of the S+T predictor's accuracy may be a result of expanded training datasets. Moreover, CKSAAP_OGlySite was also shown to have better performance when benchmarked against two existing predictors. Conclusion Because of CKSAAP encoding's ability of reflecting characteristics of the sequences surrounding mucin-type O-glycosylation sites, CKSAAP_ OGlySite has been proved more powerful than the conventional binary encoding based method. This suggests that it can be used as a competitive mucin-type O-glycosylation site predictor to the biological community. CKSAAP_OGlySite is now available at .
Collapse
|
310
|
Vilasi S, Ragone R. Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle. FEBS J 2008; 275:763-74. [DOI: 10.1111/j.1742-4658.2007.06242.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
311
|
EL-Manzalawy Y, Dobbs D, Honavar V. Predicting flexible length linear B-cell epitopes. COMPUTATIONAL SYSTEMS BIOINFORMATICS. COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2008; 7:121-132. [PMID: 19642274 PMCID: PMC3400678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Identifying B-cell epitopes play an important role in vaccine design, immunodiagnostic tests, and antibody production. Therefore, computational tools for reliably predicting B-cell epitopes are highly desirable. We explore two machine learning approaches for predicting flexible length linear B-cell epitopes. The first approach utilizes four sequence kernels for determining a similarity score between any arbitrary pair of variable length sequences. The second approach utilizes four different methods of mapping a variable length sequence into a fixed length feature vector. Based on our empirical comparisons, we propose FBCPred, a novel method for predicting flexible length linear B-cell epitopes using the subsequence kernel. Our results demonstrate that FBCPred significantly outperforms all other classifiers evaluated in this study. An implementation of FBCPred and the datasets used in this study are publicly available through our linear B-cell epitope prediction server, BCPREDS, at: http://ailab.cs.iastate.edu/bcpreds/.
Collapse
Affiliation(s)
- Yasser EL-Manzalawy
- Artificial Intelligence Laboratory, Iowa State University, Ames, IA 50010, USA
- Department of Computer Science, Iowa State University, Ames, IA 50010, USA
- Center for Computational Intelligence, Learning, and Discovery, Iowa State University, Ames, IA 50010, USA
| | - Drena Dobbs
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50010, USA
- Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA 50010, USA
- Center for Computational Intelligence, Learning, and Discovery, Iowa State University, Ames, IA 50010, USA
| | - Vasant Honavar
- Artificial Intelligence Laboratory, Iowa State University, Ames, IA 50010, USA
- Department of Computer Science, Iowa State University, Ames, IA 50010, USA
- Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA 50010, USA
- Center for Computational Intelligence, Learning, and Discovery, Iowa State University, Ames, IA 50010, USA
| |
Collapse
|
312
|
Majumder HK. Searching the Tritryp genomes for drug targets. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2008; 625:133-40. [PMID: 18365664 PMCID: PMC7123030 DOI: 10.1007/978-0-387-77570-8_11] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The recent publication of the complete genome sequences of Leishmania major, Trypanosoma brucei and Trypanosoma cruzi revealed that each genome contains 8300-12,000 protein-coding genes, of which approximately 6500 are common to all three genomes, and ushers in a new, post-genomic, era for trypanosomatid drug discovery. This vast amount of new information makes possible more comprehensive and accurate target identification using several new computational approaches, including identification of metabolic "choke-points", searching the parasite proteomes for orthologues of known drug targets, and identification of parasite proteins likely to interact with known drugs and drug-like small molecules. In this chapter, we describe several databases (such as GENEDB, BRENDA, KEGG, METACYC, the THERAPEUTIC TARGET DATABASE, and CHEMBANK) and algorithms (including PATHOLOGIC, PATHWAY HUNTER TOOL, AND AUToDOCK) which have been developed to facilitate the bioinformatic analyses underlying these approaches. While target identification is only the first step in the drug development pipeline, these new approaches give rise to renewed optimism for the discovery of new drugs to combat the devastating diseases caused by these parasites. Traditionally, drug discovery in the trypanosomatids (and other organisms) has proceeded from two different starting points: screening large numbers of existing compounds for activity against whole parasites or more focused screening of compounds for activity against defined molecular targets. Most existing anti-trypanosomatids drugs were developed using the former approach, although the latter has gained much attention in the last twenty years under the rubric of "rational drug design". Until recently, one of the major bottlenecks in anti-trypanosomatid drug development has been our ability to identify good targets, since only a very small percentage of the total number of trypanosomatid genes were known. That has now changed forever, with the recent (July, 2005) publication of the "Tritryp" (Trypanosoma brucei, Trypanosoma cruzi and Leishmania major) genome sequences. This vast amount of information now makes possible several new approaches for target identification and ushers in a post-genomic era for trypanosomatid drug discovery.
Collapse
Affiliation(s)
- Hemanta K. Majumder
- Molecular Parasitology Laboratory, Indian Institute of Chemical Biology, Kolkata, India
| |
Collapse
|
313
|
Han LY, Ma XH, Lin HH, Jia J, Zhu F, Xue Y, Li ZR, Cao ZW, Ji ZL, Chen YZ. A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor. J Mol Graph Model 2007; 26:1276-86. [PMID: 18218332 DOI: 10.1016/j.jmgm.2007.12.002] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2007] [Revised: 12/05/2007] [Accepted: 12/05/2007] [Indexed: 01/04/2023]
Abstract
Support vector machines (SVM) and other machine-learning (ML) methods have been explored as ligand-based virtual screening (VS) tools for facilitating lead discovery. While exhibiting good hit selection performance, in screening large compound libraries, these methods tend to produce lower hit-rate than those of the best performing VS tools, partly because their training-sets contain limited spectrum of inactive compounds. We tested whether the performance of SVM can be improved by using training-sets of diverse inactive compounds. In retrospective database screening of active compounds of single mechanism (HIV protease inhibitors, DHFR inhibitors, dopamine antagonists) and multiple mechanisms (CNS active agents) from large libraries of 2.986 million compounds, the yields, hit-rates, and enrichment factors of our SVM models are 52.4-78.0%, 4.7-73.8%, and 214-10,543, respectively, compared to those of 62-95%, 0.65-35%, and 20-1200 by structure-based VS and 55-81%, 0.2-0.7%, and 110-795 by other ligand-based VS tools in screening libraries of >or=1 million compounds. The hit-rates are comparable and the enrichment factors are substantially better than the best results of other VS tools. 24.3-87.6% of the predicted hits are outside the known hit families. SVM appears to be potentially useful for facilitating lead discovery in VS of large compound libraries.
Collapse
Affiliation(s)
- L Y Han
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | | | | | | | | | | | |
Collapse
|
314
|
Sarac OS, Gürsoy-Yüzügüllü O, Cetin-Atalay R, Atalay V. Subsequence-based feature map for protein function classification. Comput Biol Chem 2007; 32:122-30. [PMID: 18243801 DOI: 10.1016/j.compbiolchem.2007.11.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2007] [Accepted: 11/30/2007] [Indexed: 11/19/2022]
Abstract
Automated classification of proteins is indispensable for further in vivo investigation of excessive number of unknown sequences generated by large scale molecular biology techniques. This study describes a discriminative system based on feature space mapping, called subsequence profile map (SPMap) for functional classification of protein sequences. SPMap takes into account the information coming from the subsequences of a protein. A group of protein sequences that belong to the same level of classification is decomposed into fixed-length subsequences and they are clustered to obtain a representative feature space mapping. Mapping is defined as the distribution of the subsequences of a protein sequence over these clusters. The resulting feature space representation is used to train discriminative classifiers for functional families. The aim of this approach is to incorporate information coming from important subregions that are conserved over a family of proteins while avoiding the difficult task of explicit motif identification. The performance of the method was assessed through tests on various protein classification tasks. Our results showed that SPMap is capable of high accuracy classification in most of these tasks. Furthermore SPMap is fast and scalable enough to handle large datasets.
Collapse
Affiliation(s)
- Omer Sinan Sarac
- Department of Computer Engineering, Middle East Technical University, 06531 Ankara, Turkey
| | | | | | | |
Collapse
|
315
|
Xu H, Xu H, Lin M, Wang W, Li Z, Huang J, Chen Y, Chen X. Learning the drug target-likeness of a protein. Proteomics 2007; 7:4255-63. [DOI: 10.1002/pmic.200700062] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
316
|
Mitra J, Mundra P, Kulkarni BD, Jayaraman VK. Using Recurrence Quantification Analysis Descriptors for Protein Sequence Classification with Support Vector Machines. J Biomol Struct Dyn 2007; 25:289-98. [DOI: 10.1080/07391102.2007.10507177] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
317
|
Kumar M, Gromiha MM, Raghava GPS. Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics 2007; 8:463. [PMID: 18042272 PMCID: PMC2216048 DOI: 10.1186/1471-2105-8-463] [Citation(s) in RCA: 196] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 11/27/2007] [Indexed: 11/10/2022] Open
Abstract
Background Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and tested on multiple datasets of non-redundant proteins. Results SVM models have been developed on DNAaset, which consists of 1153 DNA-binding and equal number of non DNA-binding proteins, and achieved the maximum accuracy of 72.42% and 71.59% using amino acid and dipeptide compositions, respectively. The performance of SVM model improved from 72.42% to 74.22%, when evolutionary information in form of PSSM profiles was used as input instead of amino acid composition. In addition, SVM models have been developed on DNAset, which consists of 146 DNA-binding and 250 non-binding chains/domains, and achieved the maximum accuracy of 79.80% and 86.62% using amino acid composition and PSSM profiles. The SVM models developed in this study perform better than existing methods on a blind dataset. Conclusion A highly accurate method has been developed for predicting DNA-binding proteins using SVM and PSSM profiles. This is the first study in which evolutionary information in form of PSSM profiles has been used successfully for predicting DNA-binding proteins. A web-server DNAbinder has been developed for identifying DNA-binding proteins and domains from query amino acid sequences .
Collapse
Affiliation(s)
- Manish Kumar
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39A, Chandigarh-160036, India.
| | | | | |
Collapse
|
318
|
Faulon JL, Misra M, Martin S, Sale K, Sapra R. Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor. ACTA ACUST UNITED AC 2007; 24:225-33. [PMID: 18037612 DOI: 10.1093/bioinformatics/btm580] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. There is now sufficient information to apply machine-learning techniques to predict interactions between chemicals and proteins at a genome scale. Current machine-learning techniques use as input either protein sequences and structures or chemical information. We propose here a method to infer protein-chemical interactions using heterogeneous input consisting of both protein sequence and chemical information. RESULTS Our method relies on expressing proteins and chemicals with a common cheminformatics representation. We demonstrate our approach by predicting whether proteins can catalyze reactions not present in training sets. We also predict whether a given drug can bind a target, in the absence of prior binding information for that drug and target. Such predictions cannot be made with current machine-learning techniques requiring binding information for individual reactions or individual targets.
Collapse
Affiliation(s)
- Jean-Loup Faulon
- Sandia National Laboratories, Computational Biosciences Department, P.O. Box 5800, Albuquerque, NM 87185-1413, USA.
| | | | | | | | | |
Collapse
|
319
|
Nagarajan V, Elasri MO. Structure and function predictions of the Msa protein in Staphylococcus aureus. BMC Bioinformatics 2007; 8 Suppl 7:S5. [PMID: 18047728 PMCID: PMC2099497 DOI: 10.1186/1471-2105-8-s7-s5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Staphylococcus aureus is a human pathogen that causes a wide variety of life-threatening infections using a large number of virulence factors. One of the major global regulators used by S. aureus is the staphylococcal accessory regulator (sarA). We have identified and characterized a new gene (modulator of sarA: msa) that modulates the expression of sarA. Genetic and functional analysis shows that msa has a global effect on gene expression in S. aureus. However, the mechanism of Msa function is still unknown. Function predictions of Msa are complicated by the fact that it does not have a homologous partner in any other organism. This work aims at predicting the structure and function of the Msa protein. RESULTS Preliminary sequence analysis showed that Msa is a putative membrane protein. It would therefore be very difficult to purify and crystallize Msa in order to acquire structure information about this protein. We have used several computational tools to predict the physico-chemical properties, secondary structural features, topology, 3D tertiary structure, binding sites, motifs/patterns/domains and cellular location. We have built a consensus that is derived from analysis using different algorithms to predict several structural features. We confirm that Msa is a putative membrane protein with three transmembrane regions. We also predict that Msa has phosphorylation sites and binding sites suggesting functions in signal transduction. CONCLUSION Based on our predictions we hypothesise that Msa is a novel signal transducer that might be involved in the interaction of the S. aureus with its environment.
Collapse
Affiliation(s)
- Vijayaraj Nagarajan
- Department of Biological Sciences, The University of Southern Mississippi, Hattiesburg, MS 39406, USA.
| | | |
Collapse
|
320
|
Syntactic structures in languages and biology. Cogn Process 2007; 9:153-8. [PMID: 17952479 DOI: 10.1007/s10339-007-0194-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2006] [Revised: 09/04/2007] [Accepted: 09/21/2007] [Indexed: 10/22/2022]
Abstract
Both natural languages and cell biology make use of one-dimensional encryption. Their investigation calls for syntactic deciphering of the text and semantic understanding of the resulting structures. Here we discuss recently published algorithms that allow for such searches: automatic distillation of structure (ADIOS) that is successful in discovering syntactic structures in linguistic texts and its motif extraction (MEX) component that can be used for uncovering motifs in DNA and protein sequences. The underlying principles of these syntactic algorithms and some of their results will be described.
Collapse
|
321
|
Li Q, Lai L. Prediction of potential drug targets based on simple sequence properties. BMC Bioinformatics 2007; 8:353. [PMID: 17883836 PMCID: PMC2082046 DOI: 10.1186/1471-2105-8-353] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Accepted: 09/20/2007] [Indexed: 02/02/2023] Open
Abstract
Background During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets. Results Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research. Conclusion We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.
Collapse
Affiliation(s)
- Qingliang Li
- Beijing National Laboratory for Molecular Sciences, State Key Laboratory of Structural Chemistry for Stable and Unstable Species, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China.
| | | |
Collapse
|
322
|
Rashid M, Saha S, Raghava GPS. Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs. BMC Bioinformatics 2007; 8:337. [PMID: 17854501 PMCID: PMC2147037 DOI: 10.1186/1471-2105-8-337] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2007] [Accepted: 09/13/2007] [Indexed: 11/17/2022] Open
Abstract
Background In past number of methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative and Gram-positive bacteria) and human proteins but no method has been developed for mycobacterial proteins which may represent repertoire of potent immunogens of this dreaded pathogen. In this study, attempt has been made to develop method for predicting subcellular location of mycobacterial proteins. Results The models were trained and tested on 852 mycobacterial proteins and evaluated using five-fold cross-validation technique. First SVM (Support Vector Machine) model was developed using amino acid composition and overall accuracy of 82.51% was achieved with average accuracy (mean of class-wise accuracy) of 68.47%. In order to utilize evolutionary information, a SVM model was developed using PSSM (Position-Specific Scoring Matrix) profiles obtained from PSI-BLAST (Position-Specific Iterated BLAST) and overall accuracy achieved was of 86.62% with average accuracy of 73.71%. In addition, HMM (Hidden Markov Model), MEME/MAST (Multiple Em for Motif Elicitation/Motif Alignment and Search Tool) and hybrid model that combined two or more models were also developed. We achieved maximum overall accuracy of 86.8% with average accuracy of 89.00% using combination of PSSM based SVM model and MEME/MAST. Performance of our method was compared with that of the existing methods developed for predicting subcellular locations of Gram-positive bacterial proteins. Conclusion A highly accurate method has been developed for predicting subcellular location of mycobacterial proteins. This method also predicts very important class of proteins that is membrane-attached proteins. This method will be useful in annotating newly sequenced or hypothetical mycobacterial proteins. Based on above study, a freely accessible web server TBpred http://www.imtech.res.in/raghava/tbpred/ has been developed.
Collapse
Affiliation(s)
- Mamoon Rashid
- Bioinformatics Centre, Institute of Microbial Technology, Sector-39A, Chandigarh, India
| | - Sudipto Saha
- Bioinformatics Centre, Institute of Microbial Technology, Sector-39A, Chandigarh, India
| | - Gajendra PS Raghava
- Bioinformatics Centre, Institute of Microbial Technology, Sector-39A, Chandigarh, India
| |
Collapse
|
323
|
Ong SAK, Lin HH, Chen YZ, Li ZR, Cao Z. Efficacy of different protein descriptors in predicting protein functional families. BMC Bioinformatics 2007; 8:300. [PMID: 17705863 PMCID: PMC1997217 DOI: 10.1186/1471-2105-8-300] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Accepted: 08/17/2007] [Indexed: 02/02/2023] Open
Abstract
Background Sequence-derived structural and physicochemical descriptors have frequently been used in machine learning prediction of protein functional families, thus there is a need to comparatively evaluate the effectiveness of these descriptor-sets by using the same method and parameter optimization algorithm, and to examine whether the combined use of these descriptor-sets help to improve predictive performance. Six individual descriptor-sets and four combination-sets were evaluated in support vector machines (SVM) prediction of six protein functional families. Results The performance of these descriptor-sets were ranked by Matthews correlation coefficient (MCC), and categorized into two groups based on their performance. While there is no overwhelmingly favourable choice of descriptor-sets, certain trends were found. The combination-sets tend to give slightly but consistently higher MCC values and thus overall best performance such that three out of four combination-sets show slightly better performance compared to one out of six individual descriptor-sets. Conclusion Our study suggests that currently used descriptor-sets are generally useful for classifying proteins and the prediction performance may be enhanced by exploring combinations of descriptors.
Collapse
Affiliation(s)
- Serene AK Ong
- Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 08-14, 3 Science Drive 2, Singapore 117543, Singapore
| | - Hong Huang Lin
- Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 08-14, 3 Science Drive 2, Singapore 117543, Singapore
| | - Yu Zong Chen
- Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 08-14, 3 Science Drive 2, Singapore 117543, Singapore
| | - Ze Rong Li
- College of Chemistry, Sichuan University, Chengdu, 610064, P.R. China
| | - Zhiwei Cao
- Shanghai Center for Bioinformatics Technology, 100, Qinzhou Road, Shanghai 200235 P.R. China
| |
Collapse
|
324
|
Abstract
The increasing availability of data related to genes, proteins and their modulation by small molecules has provided a vast amount of biological information leading to the emergence of systems biology and the broad use of simulation tools for data analysis. However, there is a critical need to develop cheminformatics tools that can integrate chemical knowledge with these biological databases and simulation approaches, with the goal of creating systems chemical biology.
Collapse
Affiliation(s)
- Tudor I Oprea
- Division of Biocomputing, MSC11 6145, University of New Mexico School of Medicine, 2703 Frontier NE, Albuquerque, New Mexico 87131, USA.
| | | | | | | |
Collapse
|
325
|
Kunik V, Meroz Y, Solan Z, Sandbank B, Weingart U, Ruppin E, Horn D. Functional representation of enzymes by specific peptides. PLoS Comput Biol 2007; 3:e167. [PMID: 17722976 PMCID: PMC1950953 DOI: 10.1371/journal.pcbi.0030167] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2007] [Accepted: 07/10/2007] [Indexed: 11/19/2022] Open
Abstract
Predicting the function of a protein from its sequence is a long-standing goal of bioinformatic research. While sequence similarity is the most popular tool used for this purpose, sequence motifs may also subserve this goal. Here we develop a motif-based method consisting of applying an unsupervised motif extraction algorithm (MEX) to all enzyme sequences, and filtering the results by the four-level classification hierarchy of the Enzyme Commission (EC). The resulting motifs serve as specific peptides (SPs), appearing on single branches of the EC. In contrast to previous motif-based methods, the new method does not require any preprocessing by multiple sequence alignment, nor does it rely on over-representation of motifs within EC branches. The SPs obtained comprise on average 8.4 ± 4.5 amino acids, and specify the functions of 93% of all enzymes, which is much higher than the coverage of 63% provided by ProSite motifs. The SP classification thus compares favorably with previous function annotation methods and successfully demonstrates an added value in extreme cases where sequence similarity fails. Interestingly, SPs cover most of the annotated active and binding site amino acids, and occur in active-site neighboring 3-D pockets in a highly statistically significant manner. The latter are assumed to have strong biological relevance to the activity of the enzyme. Further filtering of SPs by biological functional annotations results in reduced small subsets of SPs that possess very large enzyme coverage. Overall, SPs both form a very useful tool for enzyme functional classification and bear responsibility for the catalytic biological function carried out by enzymes. Sequence motifs are known to provide information about functional properties of proteins. In the past, many approaches have looked for deterministic motifs in protein sequences, by searching for functionally over-represented k-mers, with moderate levels of success. Here we revisit and renew the utility of deterministic motifs, by searching for them in a partially unsupervised and context-dependent manner. Using a novel motif extraction algorithm, MEX, deterministic sequence motifs are extracted from Swiss Prot data containing more than 50,000 enzymes. They are then filtered by the Enzyme Commission classification hierarchy to produce sets of specific peptides (SPs). The latter specify enzyme function for 93% of the data, comparing well with existing approaches for enzyme classification. Importantly, SPs are found to have biological significance. A majority of all known active and binding sites of enzymes are covered by SPs, and many SPs are found to lie within spatial pockets in the neighborhood of the active sites. Both these results have extremely high statistical significance. A user-friendly tool that displays the hits of SPs for any protein sequence that is presented as a query, together with the EC assignments due to these SPs, is available at http://adios.tau.ac.il/SPSearch.
Collapse
Affiliation(s)
- Vered Kunik
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Yasmine Meroz
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
| | - Zach Solan
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
| | - Ben Sandbank
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Uri Weingart
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
| | - Eytan Ruppin
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - David Horn
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
326
|
Fujishima K, Komasa M, Kitamura S, Suzuki H, Tomita M, Kanai A. Proteome-wide prediction of novel DNA/RNA-binding proteins using amino acid composition and periodicity in the hyperthermophilic archaeon Pyrococcus furiosus. DNA Res 2007; 14:91-102. [PMID: 17573465 PMCID: PMC2779898 DOI: 10.1093/dnares/dsm011] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Proteins play a critical role in complex biological systems, yet about half of the proteins in publicly available databases are annotated as functionally unknown. Proteome-wide functional classification using bioinformatics approaches thus is becoming an important method for revealing unknown protein functions. Using the hyperthermophilic archaeon Pyrococcus furiosus as a model species, we used the support vector machine (SVM) method to discriminate DNA/RNA-binding proteins from proteins with other functions, using amino acid composition and periodicities as feature vectors. We defined this value as the composition score (CO) and periodicity score (PD). The P. furiosus proteins were classified into three classes (I–III) on the basis of the two-dimensional correlation analysis of CO score and PD score. As a result, approximately 87% of the functionally known proteins categorized as class I proteins (CO score + PD score > 0.6) were found to be DNA/RNA-binding proteins. Applying the two-dimensional correlation analysis to the 994 hypothetical proteins in P. furiosus, a total of 151 proteins were predicted to be novel DNA/RNA-binding protein candidates. DNA/RNA-binding activities of randomly chosen hypothetical proteins were experimentally verified. Six out of seven candidate proteins in class I possessed DNA/RNA-binding activities, supporting the efficacy of our method.
Collapse
Affiliation(s)
- Kosuke Fujishima
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa 252-8520, Japan
| | - Mizuki Komasa
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa 252-8520, Japan
| | - Sayaka Kitamura
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa 252-8520, Japan
| | - Haruo Suzuki
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa 252-8520, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-8520, Japan
| | - Akio Kanai
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-8520, Japan
- To whom correspondence should be addressed. Tel. +81 235-29-0524. Fax. +81 235-29-0525. E-mail:
| |
Collapse
|
327
|
Xu JR, Zhang JX, Han BC, Liang L, Ji ZL. CytoSVM: an advanced server for identification of cytokine-receptor interactions. Nucleic Acids Res 2007; 35:W538-42. [PMID: 17526528 PMCID: PMC1933174 DOI: 10.1093/nar/gkm254] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The interactions between cytokines and their complementary receptors are the gateways to properly understand a large variety of cytokine-specific cellular activities such as immunological responses and cell differentiation. To discover novel cytokine-receptor interactions, an advanced support vector machines (SVMs) model, CytoSVM, was constructed in this study. This model was iteratively trained using 449 mammal (except rat) cytokine-receptor interactions and about 1 million virtually generated positive and negative vectors in an enriched way. Final independent evaluation by rat's data received sensitivity of 97.4%, specificity of 99.2% and the Matthews correlation coefficient (MCC) of 0.89. This performance is better than normal SVM-based models. Upon this well-optimized model, a web-based server was created to accept primary protein sequence and present its probabilities to interact with one or several cytokines. Moreover, this model was applied to identify putative cytokine-receptor pairs in the whole genomes of human and mouse. Excluding currently known cytokine-receptor interactions, total 1609 novel cytokine-receptor pairs were discovered from human genome with probability ∼80% after further transmembrane analysis. These cover 220 novel receptors (excluding their isoforms) for 126 human cytokines. The screening results have been deposited in a database. Both the server and the database can be freely accessed at http://bioinf.xmu.edu.cn/software/cytosvm/cytosvm.php.
Collapse
Affiliation(s)
- Jin-Rui Xu
- Key Laboratory for Cell Biology & Tumor Cell Engineering, the Ministry of Education of China, School of Life Sciences and The Key Laboratory for Chemical Biology of Fujian Province, Xiamen University, Xiamen 361005, FuJian Province, P R China
| | - Jing-Xian Zhang
- Key Laboratory for Cell Biology & Tumor Cell Engineering, the Ministry of Education of China, School of Life Sciences and The Key Laboratory for Chemical Biology of Fujian Province, Xiamen University, Xiamen 361005, FuJian Province, P R China
| | - Bu-Cong Han
- Key Laboratory for Cell Biology & Tumor Cell Engineering, the Ministry of Education of China, School of Life Sciences and The Key Laboratory for Chemical Biology of Fujian Province, Xiamen University, Xiamen 361005, FuJian Province, P R China
| | - Liang Liang
- Key Laboratory for Cell Biology & Tumor Cell Engineering, the Ministry of Education of China, School of Life Sciences and The Key Laboratory for Chemical Biology of Fujian Province, Xiamen University, Xiamen 361005, FuJian Province, P R China
| | - Zhi-Liang Ji
- Key Laboratory for Cell Biology & Tumor Cell Engineering, the Ministry of Education of China, School of Life Sciences and The Key Laboratory for Chemical Biology of Fujian Province, Xiamen University, Xiamen 361005, FuJian Province, P R China
- *To whom correspondence should be addressed. 86-0592-218289786-0592-2181015;
| |
Collapse
|
328
|
Ung CY, Li H, Cao ZW, Li YX, Chen YZ. Are herb-pairs of traditional Chinese medicine distinguishable from others? Pattern analysis and artificial intelligence classification study of traditionally defined herbal properties. JOURNAL OF ETHNOPHARMACOLOGY 2007; 111:371-7. [PMID: 17267151 DOI: 10.1016/j.jep.2006.11.037] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2006] [Revised: 11/24/2006] [Accepted: 11/28/2006] [Indexed: 05/13/2023]
Abstract
Multi-herb prescriptions of traditional Chinese medicine (TCM) often include special herb-pairs for mutual enhancement, assistance, and restraint. These TCM herb-pairs have been assembled and interpreted based on traditionally defined herbal properties (TCM-HPs) without knowledge of mechanism of their assumed synergy. While these mechanisms are yet to be determined, properties of TCM herb-pairs can be investigated to determine if they exhibit features consistent with their claimed unique synergistic combinations. We analyzed distribution patterns of TCM-HPs of TCM herb-pairs to detect signs indicative of possible synergy and used artificial intelligence (AI) methods to examine whether combination of their TCM-HPs are distinguishable from those of non-TCM herb-pairs assembled by random combinations and by modification of known TCM herb-pairs. Patterns of the majority of 394 known TCM herb-pairs were found to exhibit signs of herb-pair correlation. Three AI systems, trained and tested by using 394 TCM herb-pairs and 2470 non-TCM herb-pairs, correctly classified 72.1-87.9% of TCM herb-pairs and 91.6-97.6% of the non-TCM herb-pairs. The best AI system predicted 96.3% of the 27 known non-TCM herb-pairs and 99.7% of the other 1,065,100 possible herb-pairs as non-TCM herb-pairs. Our studies suggest that TCM-HPs of known TCM herb-pairs contain features distinguishable from those of non-TCM herb-pairs consistent with their claimed synergistic or modulating combinations.
Collapse
Affiliation(s)
- Choong Yong Ung
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | |
Collapse
|
329
|
Kunik V, Solan Z, Edelman S, Ruppin E, Horn D. Motif extraction and protein classification. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2007:80-5. [PMID: 16447965 DOI: 10.1109/csb.2005.39] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We present a novel unsupervised method for extracting meaningful motifs from biological sequence data. This de novo motif extraction (MEX) algorithm is data driven, finding motifs that are not necessarily over-represented in the data. Applying MEX to the oxidoreductases class of enzymes, containing approximately 7000 enzyme sequences, a relatively small set of motifs is obtained. This set spans a motif-space that is used for functional classification of the enzymes by an SVM classifier. The classification based on MEX motifs surpasses that of two other SVM based methods: SVMProt, a method based on the analysis of physical-chemical properties of a protein generated from its sequence of amino acids, and SVM applied to a Smith-Waterman distances matrix. Our findings demonstrate that the MEX algorithm extracts relevant motifs, supporting a successful sequence-to-function classification.
Collapse
Affiliation(s)
- Vered Kunik
- School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | | | | | | | |
Collapse
|
330
|
Han LY, Zheng CJ, Xie B, Jia J, Ma XH, Zhu F, Lin HH, Chen X, Chen YZ. Support vector machines approach for predicting druggable proteins: recent progress in its exploration and investigation of its usefulness. Drug Discov Today 2007; 12:304-13. [PMID: 17395090 DOI: 10.1016/j.drudis.2007.02.015] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2006] [Revised: 01/30/2007] [Accepted: 02/20/2007] [Indexed: 02/07/2023]
Abstract
Identification and validation of viable targets is an important first step in drug discovery and new methods, and integrated approaches are continuously explored to improve the discovery rate and exploration of new drug targets. An in silico machine learning method, support vector machines, has been explored as a new method for predicting druggable proteins from amino acid sequence independent of sequence similarity, thereby facilitating the prediction of druggable proteins that exhibit no or low homology to known targets.
Collapse
Affiliation(s)
- Lian Yi Han
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Blk Soc 1, Level 7, 3 Science Drive 2, Singapore 117543
| | | | | | | | | | | | | | | | | |
Collapse
|
331
|
Bi R, Zhou Y, Lu F, Wang W. Predicting Gene Ontology functions based on support vector machines and statistical significance estimation. Neurocomputing 2007. [DOI: 10.1016/j.neucom.2006.10.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
332
|
Martin S, Brown WM, Faulon JL. Using product kernels to predict protein interactions. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2007; 110:215-45. [PMID: 17922100 DOI: 10.1007/10_2007_084] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
There is a wide variety of experimental methods for the identification of protein interactions. This variety has in turn spurred the development of numerous different computational approaches for modeling and predicting protein interactions. These methods range from detailed structure-based methods capable of operating on only a single pair of proteins at a time to approximate statistical methods capable of making predictions on multiple proteomes simultaneously. In this chapter, we provide a brief discussion of the relative merits of different experimental and computational methods available for identifying protein interactions. Then we focus on the application of our particular (computational) method using Support Vector Machine product kernels. We describe our method in detail and discuss the application of the method for predicting protein-protein interactions, beta-strand interactions, and protein-chemical interactions.
Collapse
Affiliation(s)
- Shawn Martin
- Computational Biology, Sandia National Laboratories, PO Box 5800, 87185-1316, Albuquerque, NM 87185-1316, USA.
| | | | | |
Collapse
|
333
|
Zheng CJ, Han LY, Yap CW, Ji ZL, Cao ZW, Chen YZ. Therapeutic targets: progress of their exploration and investigation of their characteristics. Pharmacol Rev 2006; 58:259-79. [PMID: 16714488 DOI: 10.1124/pr.58.2.4] [Citation(s) in RCA: 132] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Modern drug discovery is primarily based on the search and subsequent testing of drug candidates acting on a preselected therapeutic target. Progress in genomics, protein structure, proteomics, and disease mechanisms has led to a growing interest in and effort for finding new targets and more effective exploration of existing targets. The number of reported targets of marketed and investigational drugs has significantly increased in the past 8 years. There are 1535 targets collected in the therapeutic target database compared with approximately 500 targets reported in a 1996 review. Knowledge of these targets is helpful for molecular dissection of the mechanism of action of drugs and for predicting features that guide new drug design and the search for new targets. This article summarizes the progress of target exploration and investigates the characteristics of the currently explored targets to analyze their sequence, structure, family representation, pathway association, tissue distribution, and genome location features for finding clues useful for searching for new targets. Possible "rules" to guide the search for druggable proteins and the feasibility of using a statistical learning method for predicting druggable proteins directly from their sequences are discussed.
Collapse
Affiliation(s)
- C J Zheng
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Singapore, Singapore
| | | | | | | | | | | |
Collapse
|
334
|
Lin HH, Han LY, Zhang HL, Zheng CJ, Xie B, Cao ZW, Chen YZ. Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach. BMC Bioinformatics 2006; 7 Suppl 5:S13. [PMID: 17254297 PMCID: PMC1764469 DOI: 10.1186/1471-2105-7-s5-s13] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Metal-binding proteins play important roles in structural stability, signaling, regulation, transport, immune response, metabolism control, and metal homeostasis. Because of their functional and sequence diversity, it is desirable to explore additional methods for predicting metal-binding proteins irrespective of sequence similarity. This work explores support vector machines (SVM) as such a method. SVM prediction systems were developed by using 53,333 metal-binding and 147,347 non-metal-binding proteins, and evaluated by an independent set of 31,448 metal-binding and 79,051 non-metal-binding proteins. The computed prediction accuracy is 86.3%, 81.6%, 83.5%, 94.0%, 81.2%, 85.4%, 77.6%, 90.4%, 90.9%, 74.9% and 78.1% for calcium-binding, cobalt-binding, copper-binding, iron-binding, magnesium-binding, manganese-binding, nickel-binding, potassium-binding, sodium-binding, zinc-binding, and all metal-binding proteins respectively. The accuracy for the non-member proteins of each class is 88.2%, 99.9%, 98.1%, 91.4%, 87.9%, 94.5%, 99.2%, 99.9%, 99.9%, 98.0%, and 88.0% respectively. Comparable accuracies were obtained by using a different SVM kernel function. Our method predicts 67% of the 87 metal-binding proteins non-homologous to any protein in the Swissprot database and 85.3% of the 333 proteins of known metal-binding domains as metal-binding. These suggest the usefulness of SVM for facilitating the prediction of metal-binding proteins. Our software can be accessed at the SVMProt server http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.
Collapse
Affiliation(s)
- HH Lin
- Bioinformatics and Drug Design Group, Department of Pharmacy and Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - LY Han
- Bioinformatics and Drug Design Group, Department of Pharmacy and Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - HL Zhang
- Bioinformatics and Drug Design Group, Department of Pharmacy and Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - CJ Zheng
- Bioinformatics and Drug Design Group, Department of Pharmacy and Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - B Xie
- Bioinformatics and Drug Design Group, Department of Pharmacy and Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - ZW Cao
- Shanghai Center for Bioinformatics Technology, 100, Qinzhou Road, Shanghai 200235 P.R. China
| | - YZ Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy and Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
- Shanghai Center for Bioinformatics Technology, 100, Qinzhou Road, Shanghai 200235 P.R. China
| |
Collapse
|
335
|
Wang Y, Xue ZD, Shi XH, Xu J. Prediction of π-turns in proteins using PSI-BLAST profiles and secondary structure information. Biochem Biophys Res Commun 2006; 347:574-80. [PMID: 16844090 DOI: 10.1016/j.bbrc.2006.06.066] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2006] [Accepted: 06/14/2006] [Indexed: 11/28/2022]
Abstract
Due to the structural and functional importance of tight turns, some methods have been proposed to predict gamma-turns, beta-turns, and alpha-turns in proteins. In the past, studies of pi-turns were made, but not a single prediction approach has been developed so far. It will be useful to develop a method for identifying pi-turns in a protein sequence. In this paper, the support vector machine (SVM) method has been introduced to predict pi-turns from the amino acid sequence. The training and testing of this approach is performed with a newly collected data set of 640 non-homologous protein chains containing 1931 pi-turns. Different sequence encoding schemes have been explored in order to investigate their effects on the prediction performance. With multiple sequence alignment and predicted secondary structure, the final SVM model yields a Matthews correlation coefficient (MCC) of 0.556 by a 7-fold cross-validation. A web server implementing the prediction method is available at the following URL: http://210.42.106.80/piturn/.
Collapse
Affiliation(s)
- Yan Wang
- Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan City, China.
| | | | | | | |
Collapse
|
336
|
Chen W, Zhang J, Dong C, Yang B, Li Y, Liu C, Hu Y. Identification of Transmembrane Domain of a Membrane Associated Protein NS5 of Dendrolimus punctatus Cytoplasmic Polyhedrosis Virus. BMB Rep 2006; 39:412-7. [PMID: 16889685 DOI: 10.5483/bmbrep.2006.39.4.412] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We examined the intracellular localization of NS5 protein of Dendrolimus punctatus cytoplasmic polyhedrosis virus (DpCPV) by expressing NS5-GFP fusion protein and proteins from deletion mutants of NS5 in baculovirus recombinant infected insect Spodoptera frugiperda (Sf-9) cells. It was found that the NS5 protein was present at the plasma membrane of the cells, and that the N-terminal portion of the protein played a key role in the localization. A transmembrane region was identified to be present in the N-terminal portion of the protein, and the detailed transmembrane domain (SQIHMVWVKSGLVFF, 57-71aa) of N-terminal portion of NS5 was further determined, which was accorded with the predicted results, these findings suggested that NS5 might have an important function in viral life cycle.
Collapse
Affiliation(s)
- Wuguo Chen
- State Key Laboratory of Virology and Department of Biotechnology, College of Life Sciences, Wuhan University, Wuhan, Hubei 430072, P. R. China
| | | | | | | | | | | | | |
Collapse
|
337
|
Cui J, Han LY, Lin HH, Tang ZQ, Jiang L, Cao ZW, Chen YZ. MHC-BPS: MHC-binder prediction server for identifying peptides of flexible lengths from sequence-derived physicochemical properties. Immunogenetics 2006; 58:607-13. [PMID: 16832638 DOI: 10.1007/s00251-006-0117-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2006] [Accepted: 03/16/2006] [Indexed: 10/24/2022]
Abstract
Major histocompatibility complex (MHC)-binding peptides are essential for antigen recognition by T-cell receptors and are being explored for vaccine design. Computational methods have been developed for predicting MHC-binding peptides of fixed lengths, based on the training of relatively few non-binders. It is desirable to introduce methods applicable for peptides of flexible lengths and trained by using more diverse sets of non-binders. MHC-BPS is a web-based MHC-binder prediction server that uses support vector machines for predicting peptide binders of flexible lengths for 18 MHC class I and 12 class II alleles from sequence-derived physicochemical properties, which were trained by using 4,208 approximately 3,252 binders and 234,333 approximately 168,793 non-binders, and evaluated by an independent set of 545 approximately 476 binders and 110,564 approximately 84,430 non-binders. The binder prediction accuracies are 86 approximately 99% for 25 and 70 approximately 80% for five alleles, and the non-binder accuracies are 96 approximately 99% for 30 alleles. A screening of HIV-1 genome identifies 0.01 approximately 5% and 5 approximately 8% of the constituent peptides as binders for 24 and 6 alleles, respectively, including 75 approximately 100% of the known epitopes. This method correctly predicts 73.3% of the 15 newly published epitopes in the last 4 months of 2005. MHC-BPS is available at http://bidd.cz3.nus.edu.sg/mhc/ .
Collapse
Affiliation(s)
- Juan Cui
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | | | | | | | | | | | | |
Collapse
|
338
|
Han L, Cui J, Lin H, Ji Z, Cao Z, Li Y, Chen Y. Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity. Proteomics 2006; 6:4023-37. [PMID: 16791826 DOI: 10.1002/pmic.200500938] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Protein sequence contains clues to its function. Functional prediction from sequence presents a challenge particularly for proteins that have low or no sequence similarity to proteins of known function. Recently, machine learning methods have been explored for predicting functional class of proteins from sequence-derived properties independent of sequence similarity, which showed promising potential for low- and non-homologous proteins. These methods can thus be explored as potential tools to complement alignment- and clustering-based methods for predicting protein function. This article reviews the strategies, current progresses, and underlying difficulties in using machine learning methods for predicting the functional class of proteins. The relevant software and web-servers are described. The reported prediction performances in the application of these methods are also presented, which need to be interpreted with caution as they are dependent on such factors as datasets used and choice of parameters.
Collapse
Affiliation(s)
- Lianyi Han
- Department of Computational Science, National University of Singapore, Singapore, Singapore
| | | | | | | | | | | | | |
Collapse
|
339
|
Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ. PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 2006; 34:W32-7. [PMID: 16845018 PMCID: PMC1538821 DOI: 10.1093/nar/gkl305] [Citation(s) in RCA: 203] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2005] [Revised: 01/17/2006] [Accepted: 04/10/2006] [Indexed: 02/01/2023] Open
Abstract
Sequence-derived structural and physicochemical features have frequently been used in the development of statistical learning models for predicting proteins and peptides of different structural, functional and interaction profiles. PROFEAT (Protein Features) is a web server for computing commonly-used structural and physicochemical features of proteins and peptides from amino acid sequence. It computes six feature groups composed of ten features that include 51 descriptors and 1447 descriptor values. The computed features include amino acid composition, dipeptide composition, normalized Moreau-Broto autocorrelation, Moran autocorrelation, Geary autocorrelation, sequence-order-coupling number, quasi-sequence-order descriptors and the composition, transition and distribution of various structural and physicochemical properties. In addition, it can also compute previous autocorrelations descriptors based on user-defined properties. Our computational algorithms were extensively tested and the computed protein features have been used in a number of published works for predicting proteins of functional classes, protein-protein interactions and MHC-binding peptides. PROFEAT is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/prof/prof.cgi.
Collapse
Affiliation(s)
- Z. R. Li
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of SingaporeBlk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
- College of Chemistry, Sichuan UniversityChengdu, 610064, P. R. China
| | - H. H. Lin
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of SingaporeBlk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - L. Y. Han
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of SingaporeBlk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - L. Jiang
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of SingaporeBlk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | - X. Chen
- Department of Biotechnology, Zhejiang UniversityHangzhou, 310029, P. R. China
| | - Y. Z. Chen
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of SingaporeBlk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
- Shanghai Center for Bioinformation TechnologyShanghai, 201203, P. R. China
| |
Collapse
|
340
|
Zhang GQ, Cao ZW, Luo QM, Cai YD, Li YX. Operon prediction based on SVM. Comput Biol Chem 2006; 30:233-40. [PMID: 16716751 DOI: 10.1016/j.compbiolchem.2006.03.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2005] [Revised: 03/17/2006] [Accepted: 03/24/2006] [Indexed: 11/27/2022]
Abstract
The operon is a specific functional organization of genes found in bacterial genomes. Most genes within operons share common features. The support vector machine (SVM) approach is here used to predict operons at the genomic level. Four features were chosen as SVM input vectors: the intergenic distances, the number of common pathways, the number of conserved gene pairs and the mutual information of phylogenetic profiles. The analysis reveals that these common properties are indeed characteristic of the genes within operons and are different from that of non-operonic genes. Jackknife testing indicates that these input feature vectors, employed with RBF kernel SVM, achieve high accuracy. To validate the method, Escherichia coli K12 and Bacillus subtilis were taken as benchmark genomes of known operon structure, and the prediction results in both show that the SVM can detect operon genes in target genomes efficiently and offers a satisfactory balance between sensitivity and specificity.
Collapse
Affiliation(s)
- Guo-qing Zhang
- Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | | | | | | | | |
Collapse
|
341
|
Yu X, Cao J, Cai Y, Shi T, Li Y. Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J Theor Biol 2006; 240:175-84. [PMID: 16274699 DOI: 10.1016/j.jtbi.2005.09.018] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2005] [Revised: 09/09/2005] [Accepted: 09/09/2005] [Indexed: 11/18/2022]
Abstract
In the post-genome era, the prediction of protein function is one of the most demanding tasks in the study of bioinformatics. Machine learning methods, such as the support vector machines (SVMs), greatly help to improve the classification of protein function. In this work, we integrated SVMs, protein sequence amino acid composition, and associated physicochemical properties into the study of nucleic-acid-binding proteins prediction. We developed the binary classifications for rRNA-, RNA-, DNA-binding proteins that play an important role in the control of many cell processes. Each SVM predicts whether a protein belongs to rRNA-, RNA-, or DNA-binding protein class. Self-consistency and jackknife tests were performed on the protein data sets in which the sequences identity was < 25%. Test results show that the accuracies of rRNA-, RNA-, DNA-binding SVMs predictions are approximately 84%, approximately 78%, approximately 72%, respectively. The predictions were also performed on the ambiguous and negative data set. The results demonstrate that the predicted scores of proteins in the ambiguous data set by RNA- and DNA-binding SVM models were distributed around zero, while most proteins in the negative data set were predicted as negative scores by all three SVMs. The score distributions agree well with the prior knowledge of those proteins and show the effectiveness of sequence associated physicochemical properties in the protein function prediction. The software is available from the author upon request.
Collapse
Affiliation(s)
- Xiaojing Yu
- Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Graduate School of the Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, PR China
| | | | | | | | | |
Collapse
|
342
|
Soeria-Atmadja D, Wallman M, Björklund AK, Isaksson A, Hammerling U, Gustafsson MG. External cross-validation for unbiased evaluation of protein family detectors: application to allergens. Proteins 2006; 61:918-25. [PMID: 16231294 DOI: 10.1002/prot.20656] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Key issues in protein science and computational biology are design and evaluation of algorithms aimed at detection of proteins that belong to a specific family, as defined by structural, evolutionary, or functional criteria. In this context, several validation techniques are often used to compare different parameter settings of the detector, and to subsequently select the setting that yields the smallest error rate estimate. A frequently overlooked problem associated with this approach is that this smallest error rate estimate may have a large optimistic bias. Based on computer simulations, we show that a detector's error rate estimate can be overly optimistic and propose a method to obtain unbiased performance estimates of a detector design procedure. The method is founded on an external 10-fold cross-validation (CV) loop that embeds an internal validation procedure used for parameter selection in detector design. The designed detector generated in each of the 10 iterations are evaluated on held-out examples exclusively available in the external CV iterations. Notably, the average of these 10 performance estimates is not associated with a final detector, but rather with the average performance of the design procedure used. We apply the external CV loop to the particular problem of detecting potentially allergenic proteins, using a previously reported design procedure. Unbiased performance estimates of the allergen detector design procedure are presented together with information about which algorithms and parameter settings that are most frequently selected.
Collapse
|
343
|
DeBolt S, Cook DR, Ford CM. L-tartaric acid synthesis from vitamin C in higher plants. Proc Natl Acad Sci U S A 2006; 103:5608-13. [PMID: 16567629 PMCID: PMC1459401 DOI: 10.1073/pnas.0510864103] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The biosynthetic pathway of L-tartaric acid, the form most commonly encountered in nature, and its catabolic ties to vitamin C, remain a challenge to plant scientists. Vitamin C and L-tartaric acid are plant-derived metabolites with intrinsic human value. In contrast to most fruits during development, grapes accumulate L-tartaric acid, which remains within the berry throughout ripening. Berry taste and the organoleptic properties and aging potential of wines are intimately linked to levels of L-tartaric acid present in the fruit, and those added during vinification. Elucidation of the reactions relating L-tartaric acid to vitamin C catabolism in the Vitaceae showed that they proceed via the oxidation of L-idonic acid, the proposed rate-limiting step in the pathway. Here we report the use of transcript and metabolite profiling to identify candidate cDNAs from genes expressed at developmental times and in tissues appropriate for L-tartaric acid biosynthesis in grape berries. Enzymological analyses of one candidate confirmed its activity in the proposed rate-limiting step of the direct pathway from vitamin C to tartaric acid in higher plants. Surveying organic acid content in Vitis and related genera, we have identified a non-tartrate-forming species in which this gene is deleted. This species accumulates in excess of three times the levels of vitamin C than comparably ripe berries of tartrate-accumulating species, suggesting that modulation of tartaric acid biosynthesis may provide a rational basis for the production of grapes rich in vitamin C.
Collapse
Affiliation(s)
- Seth DeBolt
- *School of Agriculture, Food, and Wine, University of Adelaide, Adelaide, SA 5005, Australia
- Department of Plant Pathology, University of California, Davis, CA 95616-8680; and
- Cooperative Research Centre for Viticulture, P.O. Box 145, Glen Osmond, SA 5064, Australia
| | - Douglas R. Cook
- Department of Plant Pathology, University of California, Davis, CA 95616-8680; and
| | - Christopher M. Ford
- *School of Agriculture, Food, and Wine, University of Adelaide, Adelaide, SA 5005, Australia
- Cooperative Research Centre for Viticulture, P.O. Box 145, Glen Osmond, SA 5064, Australia
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
344
|
Cui J, Han LY, Li H, Ung CY, Tang ZQ, Zheng CJ, Cao ZW, Chen YZ. Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties. Mol Immunol 2006; 44:514-20. [PMID: 16563508 DOI: 10.1016/j.molimm.2006.02.010] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2005] [Revised: 02/06/2006] [Accepted: 02/14/2006] [Indexed: 11/21/2022]
Abstract
BACKGROUND Computational methods have been developed for predicting allergen proteins from sequence segments that show identity, homology, or motif match to a known allergen. These methods achieve good prediction accuracies, but are less effective for novel proteins with no similarity to any known allergen. METHODS This work tests the feasibility of using a statistical learning method, support vector machines, as such a method. The prediction system is trained and tested by using 1005 allergen proteins from the Allergome database and 22,469 non-allergen proteins from 7871 Pfam families. RESULTS Testing results by an independent set of 229 allergen and 6717 non-allergen proteins from 7871 Pfam families show that 93.0% and 99.9% of these are correctly predicted, which are comparable to the best results of other methods. Of the 18 novel allergen proteins non-homologous to any other proteins in the Swissprot database, 88.9% is correctly predicted. A further screening of 168,128 proteins in the Swissprot database finds that 2.9% of the proteins are predicted as allergen proteins, which is consistent with the estimated numbers from motif-based methods. CONCLUSIONS Our study suggests that SVM is a potentially useful method for predicting allergen proteins and it has certain capability for predicting novel allergen proteins. Our software can be accessed at .
Collapse
Affiliation(s)
- Juan Cui
- Bioinformatics and Drug Design Group, Department of Pharmacy and Computational Science, National University of Singapore, Blk SoC 1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | | | | | | | |
Collapse
|
345
|
Lin HH, Han LY, Zhang HL, Zheng CJ, Xie B, Chen YZ. Prediction of the functional class of lipid binding proteins from sequence-derived properties irrespective of sequence similarity. J Lipid Res 2006; 47:824-31. [PMID: 16443826 DOI: 10.1194/jlr.m500530-jlr200] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Lipid binding proteins play important roles in signaling, regulation, membrane trafficking, immune response, lipid metabolism, and transport. Because of their functional and sequence diversity, it is desirable to explore additional methods for predicting lipid binding proteins irrespective of sequence similarity. This work explores the use of support vector machines (SVMs) as such a method. SVM prediction systems are developed using 14,776 lipid binding and 133,441 nonlipid binding proteins and are evaluated by an independent set of 6,768 lipid binding and 64,761 nonlipid binding proteins. The computed prediction accuracy is 78.9, 79.5, 82.2, 79.5, 84.4, 76.6, 90.6, 79.0, and 89.9% for lipid degradation, lipid metabolism, lipid synthesis, lipid transport, lipid binding, lipopolysaccharide biosynthesis, lipoprotein, lipoyl, and all lipid binding proteins, respectively. The accuracy for the nonmember proteins of each class is 99.9, 99.2, 99.6, 99.8, 99.9, 99.8, 98.5, 99.9, and 97.0%, respectively. Comparable accuracies are obtained when homologous proteins are considered as one, or by using a different SVM kernel function. Our method predicts 86.8% of the 76 lipid binding proteins nonhomologous to any protein in the Swiss-Prot database and 89.0% of the 73 known lipid binding domains as lipid binding. These findings suggest the usefulness of SVMs for facilitating the prediction of lipid binding proteins. Our software can be accessed at the SVMProt server (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi).
Collapse
Affiliation(s)
- H H Lin
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Singapore 117543
| | | | | | | | | | | |
Collapse
|
346
|
Cui J, Han LY, Cai CZ, Zheng CJ, Ji ZL, Chen YZ. Prediction of functional class of novel bacterial proteins without the use of sequence similarity by a statistical learning method. J Mol Microbiol Biotechnol 2006; 9:86-100. [PMID: 16319498 DOI: 10.1159/000088839] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
A substantial percentage of the putative protein-encoding open reading frames (ORFs) in bacterial genomes have no homolog of known function, and their function cannot be confidently assigned on the basis of sequence similarity. Methods not based on sequence similarity are needed and being developed. One method, SVMProt (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi), predicts protein functional family irrespective of sequence similarity (Nucleic Acids Res. 2003;31:3692-3697). While it has been tested on a large number of proteins, its capability for non-homologous proteins has so far been evaluated for a relatively small number of proteins, and additional tests are needed to more fully assess SVMProt. In this work, 90 novel bacterial proteins (non-homologous to known proteins) are used to evaluate the capability of SVMProt. These proteins are such that none of their homologs are in the Swiss-Prot database, their functions not clearly described in the literature, and they themselves and their homologs are not included in the training sets of SVMProt. They represent proteins whose function cannot be confidently predicted by sequence similarity methods at present. The predicted functional class of 76.7% of each of these proteins shows various levels of consistency with the literature-described function, compared to the overall accuracy of 87% for the SVMProt functional class assignment of 34,582 proteins that have at least one homolog of known function. Our study suggests that SVMProt is capable of assigning functional class for novel bacterial proteins at a level not too much lower than that of sequence alignment methods for homologous proteins.
Collapse
Affiliation(s)
- J Cui
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Singapore
| | | | | | | | | | | |
Collapse
|
347
|
Lin HH, Han LY, Cai CZ, Ji ZL, Chen YZ. Prediction of transporter family from protein sequence by support vector machine approach. Proteins 2005; 62:218-31. [PMID: 16287089 DOI: 10.1002/prot.20605] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Transporters play key roles in cellular transport and metabolic processes, and in facilitating drug delivery and excretion. These proteins are classified into families based on the transporter classification (TC) system. Determination of the TC family of transporters facilitates the study of their cellular and pharmacological functions. Methods for predicting TC family without sequence alignments or clustering are particularly useful for studying novel transporters whose function cannot be determined by sequence similarity. This work explores the use of a machine learning method, support vector machines (SVMs), for predicting the family of transporters from their sequence without the use of sequence similarity. A total of 10,636 transporters in 13 TC subclasses, 1914 transporters in eight TC families, and 168,341 nontransporter proteins are used to train and test the SVM prediction system. Testing results by using a separate set of 4351 transporters and 83,151 nontransporter proteins show that the overall accuracy for predicting members of these TC subclasses and families is 83.4% and 88.0%, respectively, and that of nonmembers is 99.3% and 96.6%, respectively. The accuracies for predicting members and nonmembers of individual TC subclasses are in the range of 70.7-96.1% and 97.6-99.9%, respectively, and those of individual TC families are in the range of 60.6-97.1% and 91.5-99.4%, respectively. A further test by using 26,139 transmembrane proteins outside each of the 13 TC subclasses shows that 90.4-99.6% of these are correctly predicted. Our study suggests that the SVM is potentially useful for facilitating functional study of transporters irrespective of sequence similarity.
Collapse
Affiliation(s)
- H H Lin
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Singapore
| | | | | | | | | |
Collapse
|
348
|
Han LY, Zheng CJ, Lin HH, Cui J, Li H, Zhang HL, Tang ZQ, Chen YZ. Prediction of functional class of novel plant proteins by a statistical learning method. THE NEW PHYTOLOGIST 2005; 168:109-21. [PMID: 16159326 DOI: 10.1111/j.1469-8137.2005.01482.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In plant genomes, the function of a substantial percentage of the putative protein-coding open reading frames (ORFs) is unknown. These ORFs have no significant sequence similarity to known proteins, which complicates the task of functional study of these proteins. Efforts are being made to explore methods that are complementary to, or may be used in combination with, sequence alignment and clustering methods. A web-based protein functional class prediction software, SVMProt, has shown some capability for predicting functional class of distantly related proteins. Here the usefulness of SVMProt for functional study of novel plant proteins is evaluated. To test SVMProt, 49 plant proteins (without a sequence homolog in the Swiss-Prot protein database, not in the SVMProt training set, and with functional indications provided in the literature) were selected from a comprehensive search of MEDLINE abstracts and Swiss-Prot databases in 1999-2004. These represent unique proteins the function of which, at present, cannot be confidently predicted by sequence alignment and clustering methods. The predicted functional class of 31 proteins was consistent, and that of four other proteins was weakly consistent, with published functions. Overall, the functional class of 71.4% of these proteins was consistent, or weakly consistent, with functional indications described in the literature. SVMProt shows a certain level of ability to provide useful hints about the functions of novel plant proteins with no similarity to known proteins.
Collapse
Affiliation(s)
- L Y Han
- Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543
| | | | | | | | | | | | | | | |
Collapse
|
349
|
Solan Z, Horn D, Ruppin E, Edelman S. Unsupervised learning of natural languages. Proc Natl Acad Sci U S A 2005; 102:11629-34. [PMID: 16087885 PMCID: PMC1187953 DOI: 10.1073/pnas.0409746102] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
Collapse
Affiliation(s)
- Zach Solan
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | |
Collapse
|
350
|
Han L, Cai C, Ji Z, Chen Y. Prediction of functional class of novel viral proteins by a statistical learning method irrespective of sequence similarity. Virology 2005; 331:136-43. [PMID: 15582660 PMCID: PMC7111859 DOI: 10.1016/j.virol.2004.10.020] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2004] [Revised: 09/15/2004] [Accepted: 10/09/2004] [Indexed: 11/19/2022]
Abstract
The function of a substantial percentage of the putative protein-coding open reading frames (ORFs) in viral genomes is unknown. As their sequence is not similar to that of proteins of known function, the function of these ORFs cannot be assigned on the basis of sequence similarity. Methods complement or in combination with sequence similarity-based approaches are being explored. The web-based software SVMProt (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi) to some extent assigns protein functional family irrespective of sequence similarity and has been found to be useful for studying distantly related proteins [Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z., 2003. SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res. 31(13): 3692–3697]. Here 25 novel viral proteins are selected to test the capability of SVMProt for functional family assignment of viral proteins whose function cannot be confidently predicted on by sequence similarity methods at present. These proteins are without a sequence homolog in the Swissprot database, with its precise function provided in the literature, and not included in the training sets of SVMProt. The predicted functional classes of 72% of these proteins match the literature-described function, which is compared to the overall accuracy of 87% for SVMProt functional class assignment of 34 582 proteins. This suggests that SVMProt to some extent is capable of functional class assignment irrespective of sequence similarity and it is potentially useful for facilitating functional study of novel viral proteins.
Collapse
Affiliation(s)
- L.Y. Han
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Block SOC1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore
| | - C.Z. Cai
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Block SOC1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore
- Department of Applied Physics, Chongquing University, Chongquing 400044, PR China
| | - Z.L. Ji
- Department of Biology, School of Life Sciences, Xiamen University, Xiamen 361000, FuJian Province, PR China
| | - Y.Z. Chen
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Block SOC1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore
- Corresponding author. Fax: +65 6774 6756.
| |
Collapse
|