Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cheng J, Baldi P. A machine learning information retrieval approach to protein fold recognition. Bioinformatics 2006;22:1456-63. [PMID: 16547073 DOI: 10.1093/bioinformatics/btl102] [Citation(s) in RCA: 156] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Cheng J, Baldi P. A machine learning information retrieval approach to protein fold recognition. Bioinformatics 2006;22:1456-63. [PMID: 16547073 DOI: 10.1093/bioinformatics/btl102] [Citation(s) in RCA: 156] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

ProFold: Protein Fold Classification with Additional Structural Features and a Novel Ensemble Classifier. BIOMED RESEARCH INTERNATIONAL 2016;2016:6802832. [PMID: 27660761 PMCID: PMC5021882 DOI: 10.1155/2016/6802832] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 07/15/2016] [Accepted: 08/07/2016] [Indexed: 11/17/2022]

Zhang L, Wang H, Yan L, Su L, Xu D. OMPcontact: An Outer Membrane Protein Inter-Barrel Residue Contact Prediction Method. J Comput Biol 2016;24:217-228. [PMID: 27513917 DOI: 10.1089/cmb.2015.0236] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Cui X, Lu Z, Wang S, Jing-Yan Wang J, Gao X. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction. Bioinformatics 2016;32:i332-i340. [PMID: 27307635 PMCID: PMC4908355 DOI: 10.1093/bioinformatics/btw271] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Abstract

MOTIVATION

Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information.

METHOD

We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration.

RESULTS

We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.

AVAILABILITY AND IMPLEMENTATION

Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx

CONTACT

: xin.gao@kaust.edu.sa

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Xu J, Zhang J. Impact of structure space continuity on protein fold classification. Sci Rep 2016;6:23263. [PMID: 27006112 PMCID: PMC4804218 DOI: 10.1038/srep23263] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Accepted: 03/03/2016] [Indexed: 11/09/2022] Open

Improving Protein Fold Recognition by Deep Learning Networks. Sci Rep 2015;5:17573. [PMID: 26634993 PMCID: PMC4669437 DOI: 10.1038/srep17573] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Accepted: 11/02/2015] [Indexed: 12/31/2022] Open

Cabezas-Cruz A, Valdés JJ, Lancelot J, Pierce RJ. Fast evolutionary rates associated with functional loss in class I glucose transporters of Schistosoma mansoni. BMC Genomics 2015;16:980. [PMID: 26584526 PMCID: PMC4653847 DOI: 10.1186/s12864-015-2144-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 10/26/2015] [Indexed: 11/24/2022] Open

Abstract

Background

The trematode parasite, Schistosoma mansoni, has evolved to switch from oxidative phosphorylation to glycolysis in the presence of glucose immediately after invading the human host. This metabolic switch is dependent on extracellular glucose concentration. Four glucose transporters are encoded in the genome of S. mansoni, however, only two were shown to facilitate glucose diffusion.

Results

By modeling the phase of human host infection, we showed that transporter transcript expression profiles of recently transformed schistosomula have two opposing responses to increased glucose concentrations. Concurring with the transcription profiles, our phylogenetic analyses revealed that S. mansoni glucose transporters belong to two separate clusters, one associated with class I glucose transporters from vertebrates and insects, and the other specific to parasitic Platyhelminthes. To study the evolutionary paths of both groups and their functional implications, we determined evolutionary rates, relative divergence times, genomic organization and performed structural analyses with the protein sequences. We finally used the modelled structures of the S. mansoni glucose transporters to biophysically (i) analyze the dynamics of key residues during glucose binding, (ii) test glucose stability within the active site, and (iii) demonstrate glucose diffusion. The two S. mansoni Platyhelminthes-specific glucose transporters, which seem to be younger than the other two, exhibit slower rates of molecular evolution, are encoded by intron-poor genes, and transport glucose. Interestingly, our molecular dynamic analyses suggest that S. mansoni class I glucose transporters are not able to transport glucose.

Conclusions

The glucose transporter family in S. mansoni exhibit different evolutionary histories. Our results suggested that S. mansoni class I glucose transporters lost their capacity to transport glucose and that this function evolved independently in the Platyhelminthes-specific glucose transporters. Finally, taking into account the differences in the dynamics of glucose transport of the Platyhelminthes-specific transporters of S. mansoni compared to that of humans, we conclude that S. mansoni glucose transporters may be targets for rationally designed drugs against schistosomiasis.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-2144-6) contains supplementary material, which is available to authorized users.

Collapse

Lyons J, Dehzangi A, Heffernan R, Yang Y, Zhou Y, Sharma A, Paliwal K. Advancing the Accuracy of Protein Fold Recognition by Utilizing Profiles From Hidden Markov Models. IEEE Trans Nanobioscience 2015. [DOI: 10.1109/tnb.2015.2457906] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Pinheiro-Silva R, Borges L, Coelho LP, Cabezas-Cruz A, Valdés JJ, do Rosário V, de la Fuente J, Domingos A. Gene expression changes in the salivary glands of Anopheles coluzzii elicited by Plasmodium berghei infection. Parasit Vectors 2015;8:485. [PMID: 26395987 PMCID: PMC4580310 DOI: 10.1186/s13071-015-1079-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 09/09/2015] [Indexed: 11/10/2022] Open

Abstract

Background

Malaria is a devastating infectious disease caused by Plasmodium parasites transmitted through the bites of infected Anopheles mosquitoes. Salivary glands are the only mosquito tissue invaded by Plasmodium sporozoites, being a key stage for the effective parasite transmission, making the study of Anopheles sialome highly relevant.

Methods

RNA-sequencing was used to compare differential gene expression in salivary glands of uninfected and Plasmodium berghei-infected Anopheles coluzzii mosquitoes. RNA-seq results were validated by quantitative RT-PCR. The transmembrane glucose transporter gene AGAP007752 was selected for functional analysis by RNA interference. The effect of gene silencing on infection level was evaluated. The putative function and tertiary structure of the protein was assessed.

Results

RNA-seq data showed that 2588 genes were differentially expressed in mosquitoes salivary glands in response to P. berghei infection, being 1578 upregulated and 1010 downregulated. Metabolism, Immunity, Replication/Transcription/Translation, Proteolysis and Transport were the mosquito gene functional classes more affected by parasite infection. Endopeptidase coding genes were the most abundant within the differentially expressed genes in infected salivary glands (P < 0.001). Based on its putative function and expression level, the transmembrane glucose transporter gene, AGAP007752, was selected for functional analysis by RNA interference. The results demonstrated that the number of sporozoites was 44.3 % lower in mosquitoes fed on infected mice after AGAPP007752 gene knockdown when compared to control (P < 0.01).

Conclusions

Our hypothesis is that the protein encoded by the gene AGAPP007752 may play a role on An. coluzzii salivary glands infection by Plasmodium parasite, working as a sporozoite receptor and/or promoting a favorable environment for the capacity of sporozoites.

Electronic supplementary material

The online version of this article (doi:10.1186/s13071-015-1079-8) contains supplementary material, which is available to authorized users.

Collapse

Iqbal S, Mishra A, Hoque MT. Improved prediction of accessible surface area results in efficient energy function application. J Theor Biol 2015;380:380-91. [DOI: 10.1016/j.jtbi.2015.06.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Revised: 05/15/2015] [Accepted: 06/02/2015] [Indexed: 01/16/2023]

Li C, Lin X, Hui C, Lam KM, Zhang S. Computer-Aided Diagnosis for Distinguishing Pancreatic Mucinous Cystic Neoplasms From Serous Oligocystic Adenomas in Spectral CT Images. Technol Cancer Res Treat 2014;15:44-54. [PMID: 25520271 DOI: 10.1177/1533034614563013] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 11/10/2014] [Indexed: 12/15/2022] Open

Abstract

OBJECTIVE

This preliminary study aims to verify the effectiveness of the additional information provided by spectral computed tomography (CT) with the proposed computer-aided diagnosis (CAD) scheme to differentiate pancreatic serous oligocystic adenomas (SOAs) from mucinous cystic neoplasms of pancreas cystic lesions.

MATERIALS AND METHODS

This study was conducted from January 2010 to October 2013. Twenty-three patients (5 men and 18 women; mean age, 43.96 years old) with SOA and 19 patients (3 men and 16 women; mean age, 41.74 years old) with MCN were included in this retrospective study. Two types of features were collected by dual-energy spectral CT imaging as follows: conventional and additional quantitative spectral CT features. Classification results of the CAD scheme were compared using the conventional features and full feature data set. Important features were selected using support vector machine classification method combined with feature-selection technique. The optimal cutoff values of selected features were determined through receiver-operating characteristic curve analyses.

RESULTS

Combining conventional features with additional spectral CT features improved the overall accuracy from 88.37% to 93.02%. The selected features of the proposed CAD scheme were tumor size, contour, location, and low-energy CT values (43 keV). Iodine-water basis material pair densities in both arterial phase (AP) and portal venous phase (PP) were important factors for differential diagnosis of SOA and MCN. The optimal cutoff values of long axis, short axis, 40 keV monochromatic CT value in AP, iodine (water) density in AP, 43 keV monochromatic CT value in PP, and iodine (water) density in PP were 3.4 mm, 3.1 mm, 35.7 Hu, 0.32533 mg/mL, 39.4 Hu, and 0.348 mg/mL, respectively.

CONCLUSION

The combination of conventional features and additional information provided by dual-energy spectral CT shows a high accuracy in the CAD scheme. The quantitative information of spectral CT may prove useful in the diagnosis and classification of SOAs and MCNs with machine learning algorithms.

Collapse

Eickholt J, Wang Z. PCP-ML: protein characterization package for machine learning. BMC Res Notes 2014;7:810. [PMID: 25406415 PMCID: PMC4246511 DOI: 10.1186/1756-0500-7-810] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 10/31/2014] [Indexed: 11/10/2022] Open

Jo T, Cheng J. Improving protein fold recognition by random forest. BMC Bioinformatics 2014;15 Suppl 11:S14. [PMID: 25350499 PMCID: PMC4251042 DOI: 10.1186/1471-2105-15-s11-s14] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Reconstructing protein structures by neural network pairwise interaction fields and iterative decoy set construction. Biomolecules 2014;4:160-80. [PMID: 24970210 PMCID: PMC4030983 DOI: 10.3390/biom4010160] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Revised: 01/22/2014] [Accepted: 01/30/2014] [Indexed: 11/17/2022] Open

Abbasi E, Ghatee M, Shiri M. FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds. Comput Biol Med 2013;43:1182-91. [DOI: 10.1016/j.compbiomed.2013.05.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Revised: 05/21/2013] [Accepted: 05/22/2013] [Indexed: 10/26/2022]

Kuksa PP. Biological sequence classification with multivariate string kernels. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:1201-1210. [PMID: 24384708 DOI: 10.1109/tcbb.2013.15] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Wang H, He Z, Zhang C, Zhang L, Xu D. Transmembrane protein alignment and fold recognition based on predicted topology. PLoS One 2013;8:e69744. [PMID: 23894534 PMCID: PMC3716705 DOI: 10.1371/journal.pone.0069744] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 06/15/2013] [Indexed: 11/18/2022] Open

Guilloux A, Caudron B, Jestin JL. A method to predict edge strands in beta-sheets from protein sequences. Comput Struct Biotechnol J 2013;7:e201305001. [PMID: 24688737 PMCID: PMC3962219 DOI: 10.5936/csbj.201305001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Revised: 05/27/2013] [Accepted: 05/30/2013] [Indexed: 12/15/2022] Open

Wang R, Gao X. A Two-Layer Learning Architecture for Multi-Class Protein Folds Classification. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open

Kaushik S, Mutt E, Chellappan A, Sankaran S, Srinivasan N, Sowdhamini R. Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage. PLoS One 2013;8:e56449. [PMID: 23437136 PMCID: PMC3577913 DOI: 10.1371/journal.pone.0056449] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 01/13/2013] [Indexed: 12/31/2022] Open

Abstract

Background

Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST.

Methodology/Principal Findings

We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ∼100% and Mathew’s correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families.

Conclusions/Significance

Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the ‘bridging’ role of related families.

Collapse

SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS. BMC Bioinformatics 2013;14 Suppl 1:S9. [PMID: 23369106 PMCID: PMC3548692 DOI: 10.1186/1471-2105-14-s1-s9] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects.

Results

We propose a new bioinformatics approach to support biological data mining in the analysis and interpretation of SNPs associated to pathologies. This system can be employed to design custom genotyping chips for disease-oriented studies and to re-score GWAS results. The proposed method relies (1) on the data integration of public resources using a gene-centric database design, (2) on the evaluation of a set of static biomolecular annotations, defined as features, and (3) on the SNP scoring function, which computes SNP scores using parameters and weights set by users. We employed a machine learning classifier to set default feature weights and an ontological annotation layer to enable the enrichment of the input gene set. We implemented our method as a web tool called SNPranker 2.0 (http://www.itb.cnr.it/snpranker), improving our first published release of this system. A user-friendly interface allows the input of a list of genes, SNPs or a biological process, and to customize the features set with relative weights. As result, SNPranker 2.0 returns a list of SNPs, localized within input and ontologically enriched genes, combined with their prioritization scores.

Conclusions

Different databases and resources are already available for SNPs annotation, but they do not prioritize or re-score SNPs relying on a-priori biomolecular knowledge. SNPranker 2.0 attempts to fill this gap through a user-friendly integrated web resource. End users, such as researchers in medical genetics and epidemiology, may find in SNPranker 2.0 a new tool for data mining and interpretation able to support SNPs analysis. Possible scenarios are GWAS data re-scoring, SNPs selection for custom genotyping arrays and SNPs/diseases association studies.

Collapse

eThread: a highly optimized machine learning-based approach to meta-threading and the modeling of protein tertiary structures. PLoS One 2012. [PMID: 23185577 PMCID: PMC3503980 DOI: 10.1371/journal.pone.0050200] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

3D profile-based approach to proteome-wide discovery of novel human chemokines. PLoS One 2012;7:e36151. [PMID: 22586462 PMCID: PMC3346806 DOI: 10.1371/journal.pone.0036151] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 03/27/2012] [Indexed: 12/29/2022] Open

Abstract

Chemokines are small secreted proteins with important roles in immune responses. They consist of a conserved three-dimensional (3D) structure, so-called IL8-like chemokine fold, which is supported by disulfide bridges characteristic of this protein family. Sequence- and profile-based computational methods have been proficient in discovering novel chemokines by making use of their sequence-conserved cysteine patterns. However, it has been recently shown that some chemokines escaped annotation by these methods due to low sequence similarity to known chemokines and to different arrangement of cysteines in sequence and in 3D. Innovative methods overcoming the limitations of current techniques may allow the discovery of new remote homologs in the still functionally uncharacterized fraction of the human genome. We report a novel computational approach for proteome-wide identification of remote homologs of the chemokine family that uses fold recognition techniques in combination with a scaffold-based automatic mapping of disulfide bonds to define a 3D profile of the chemokine protein family. By applying our methodology to all currently uncharacterized human protein sequences, we have discovered two novel proteins that, without having significant sequence similarity to known chemokines or characteristic cysteine patterns, show strong structural resemblance to known anti-HIV chemokines. Detailed computational analysis and experimental structural investigations based on mass spectrometry and circular dichroism support our structural predictions and highlight several other chemokine-like features. The results obtained support their functional annotation as putative novel chemokines and encourage further experimental characterization. The identification of remote homologs of human chemokines may provide new insights into the molecular mechanisms causing pathologies such as cancer or AIDS, and may contribute to the development of novel treatments. Besides, the genome-wide applicability of our methodology based on 3D protein family profiles may open up new possibilities for improving and accelerating protein function annotation processes.

Collapse

Pires DEV, de Melo-Minardi RC, dos Santos MA, da Silveira CH, Santoro MM, Meira W. Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns. BMC Genomics 2011;12 Suppl 4:S12. [PMID: 22369665 PMCID: PMC3287581 DOI: 10.1186/1471-2164-12-s4-s12] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Abstract

Background

The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method: Cutoff Scanning Matrix (CSM). CSM generates feature vectors that represent distance patterns between protein residues. These feature vectors are then used as evidence for classification. Singular value decomposition is used as a preprocessing step to reduce dimensionality and noise. The aspect of protein function considered in the present work is enzyme activity. A series of experiments was performed on datasets based on Enzyme Commission (EC) numbers and mechanistically different enzyme superfamilies as well as other datasets derived from SCOP release 1.75.

Results

CSM was able to achieve a precision of up to 99% after SVD preprocessing for a database derived from manually curated protein superfamilies and up to 95% for a dataset of the 950 most-populated EC numbers. Moreover, we conducted experiments to verify our ability to assign SCOP class, superfamily, family and fold to protein domains. An experiment using the whole set of domains found in last SCOP version yielded high levels of precision and recall (up to 95%). Finally, we compared our structural classification results with those in the literature to place this work into context. Our method was capable of significantly improving the recall of a previous study while preserving a compatible precision level.

Conclusions

We showed that the patterns derived from CSMs could effectively be used to predict protein function and thus help with automatic function annotation. We also demonstrated that our method is effective in structural classification tasks. These facts reinforce the idea that the pattern of inter-residue distances is an important component of family structural signatures. Furthermore, singular value decomposition provided a consistent increase in precision and recall, which makes it an important preprocessing step when dealing with noisy data.

Collapse

PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences. ACTA ACUST UNITED AC 2011;12:181-9. [DOI: 10.1007/s10969-011-9119-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2011] [Accepted: 11/24/2011] [Indexed: 10/14/2022]

CHEN YUEHUI, CHEN FENG, YANG JACKY, YANG MARYQU. ENSEMBLE VOTING SYSTEM FOR MULTICLASS PROTEIN FOLD RECOGNITION. INT J PATTERN RECOGN 2011. [DOI: 10.1142/s0218001408006454] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011;27:2076-82. [PMID: 21666270 DOI: 10.1093/bioinformatics/btr350] [Citation(s) in RCA: 245] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Ashkenazy H, Unger R, Kliger Y. Hidden conformations in protein structures. Bioinformatics 2011;27:1941-7. [DOI: 10.1093/bioinformatics/btr292] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Yang JY, Chen X. Improving taxonomy-based protein fold recognition by using global and local features. Proteins 2011;79:2053-64. [DOI: 10.1002/prot.23025] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2010] [Revised: 02/05/2011] [Accepted: 03/03/2011] [Indexed: 11/05/2022]

Pandit SB, Skolnick J. TASSER_low-zsc: an approach to improve structure prediction using low z-score-ranked templates. Proteins 2011;78:2769-80. [PMID: 20635423 DOI: 10.1002/prot.22791] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Hu Y, Dong X, Wu A, Cao Y, Tian L, Jiang T. Incorporation of local structural preference potential improves fold recognition. PLoS One 2011;6:e17215. [PMID: 21365008 PMCID: PMC3041821 DOI: 10.1371/journal.pone.0017215] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 01/25/2011] [Indexed: 11/19/2022] Open

Chen H, Kihara D. Effect of using suboptimal alignments in template-based protein structure prediction. Proteins 2011;79:315-34. [PMID: 21058297 PMCID: PMC3058269 DOI: 10.1002/prot.22885] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Ahmed F, Benedito VA, Zhao PX. Mining Functional Elements in Messenger RNAs: Overview, Challenges, and Perspectives. FRONTIERS IN PLANT SCIENCE 2011;2:84. [PMID: 22639614 PMCID: PMC3355573 DOI: 10.3389/fpls.2011.00084] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2011] [Accepted: 11/03/2011] [Indexed: 05/03/2023]

Zhou H, Skolnick J. Improving threading algorithms for remote homology modeling by combining fragment and template comparisons. Proteins 2010;78:2041-8. [PMID: 20455261 DOI: 10.1002/prot.22717] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Kumar A, Cowen L. Recognition of beta-structural motifs using hidden Markov models trained with simulated evolution. Bioinformatics 2010;26:i287-93. [PMID: 20529918 PMCID: PMC2881384 DOI: 10.1093/bioinformatics/btq199] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Jain P, Hirst JD. Automatic structure classification of small proteins using random forest. BMC Bioinformatics 2010;11:364. [PMID: 20594334 PMCID: PMC2916923 DOI: 10.1186/1471-2105-11-364] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2010] [Accepted: 07/01/2010] [Indexed: 11/29/2022] Open

Karakaş M, Woetzel N, Meiler J. BCL::contact-low confidence fold recognition hits boost protein contact prediction and de novo structure determination. J Comput Biol 2010;17:153-68. [PMID: 19772383 DOI: 10.1089/cmb.2009.0030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Wang Z, Eickholt J, Cheng J. MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8. Bioinformatics 2010;26:882-8. [PMID: 20150411 PMCID: PMC2844995 DOI: 10.1093/bioinformatics/btq058] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Revised: 02/02/2010] [Accepted: 02/08/2010] [Indexed: 11/14/2022] Open

Yan RX, Si JN, Wang C, Zhang Z. DescFold: a web server for protein fold recognition. BMC Bioinformatics 2009;10:416. [PMID: 20003426 PMCID: PMC2803855 DOI: 10.1186/1471-2105-10-416] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2009] [Accepted: 12/14/2009] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Machine learning-based methods have been proven to be powerful in developing new fold recognition tools. In our previous work [Zhang, Kochhar and Grigorov (2005) Protein Science, 14: 431-444], a machine learning-based method called DescFold was established by using Support Vector Machines (SVMs) to combine the following four descriptors: a profile-sequence-alignment-based descriptor using Psi-blast e-values and bit scores, a sequence-profile-alignment-based descriptor using Rps-blast e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. In this work, we focus on the improvement of DescFold by incorporating more powerful descriptors and setting up a user-friendly web server.

RESULTS

In seeking more powerful descriptors, the profile-profile alignment score generated from the COMPASS algorithm was first considered as a new descriptor (i.e., PPA). When considering a profile-profile alignment between two proteins in the context of fold recognition, one protein is regarded as a template (i.e., its 3D structure is known). Instead of a sequence profile derived from a Psi-blast search, a structure-seeded profile for the template protein was generated by searching its structural neighbors with the assistance of the TM-align structural alignment algorithm. Moreover, the COMPASS algorithm was used again to derive a profile-structural-profile-alignment-based descriptor (i.e., PSPA). We trained and tested the new DescFold in a total of 1,835 highly diverse proteins extracted from the SCOP 1.73 version. When the PPA and PSPA descriptors were introduced, the new DescFold boosts the performance of fold recognition substantially. Using the SCOP_1.73_40% dataset as the fold library, the DescFold web server based on the trained SVM models was further constructed. To provide a large-scale test for the new DescFold, a stringent test set of 1,866 proteins were selected from the SCOP 1.75 version. At a less than 5% false positive rate control, the new DescFold is able to correctly recognize structural homologs at the fold level for nearly 46% test proteins. Additionally, we also benchmarked the DescFold method against several well-established fold recognition algorithms through the LiveBench targets and Lindahl dataset.

CONCLUSIONS

The new DescFold method was intensively benchmarked to have very competitive performance compared with some well-established fold recognition methods, suggesting that it can serve as a useful tool to assist in template-based protein structure prediction. The DescFold server is freely accessible at http://202.112.170.199/DescFold/index.html.

Collapse

Chen CC, Hwang JK, Yang JM. (PS)2-v2: template-based protein structure prediction server. BMC Bioinformatics 2009;10:366. [PMID: 19878598 PMCID: PMC2775752 DOI: 10.1186/1471-2105-10-366] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2009] [Accepted: 10/31/2009] [Indexed: 03/11/2024] Open

Abstract

Background

Template selection and target-template alignment are critical steps for template-based modeling (TBM) methods. To identify the template for the twilight zone of 15~25% sequence similarity between targets and templates is still difficulty for template-based protein structure prediction. This study presents the (PS)²-v2 server, based on our original server with numerous enhancements and modifications, to improve reliability and applicability.

Results

To detect homologous proteins with remote similarity, the (PS)²-v2 server utilizes the S2A2 matrix, which is a 60 × 60 substitution matrix using the secondary structure propensities of 20 amino acids, and the position-specific sequence profile (PSSM) generated by PSI-BLAST. In addition, our server uses multiple templates and multiple models to build and assess models. Our method was evaluated on the Lindahl benchmark for fold recognition and ProSup benchmark for sequence alignment. Evaluation results indicated that our method outperforms sequence-profile approaches, and had comparable performance to that of structure-based methods on these benchmarks. Finally, we tested our method using the 154 TBM targets of the CASP8 (Critical Assessment of Techniques for Protein Structure Prediction) dataset. Experimental results show that (PS)²-v2 is ranked 6^thamong 72 severs and is faster than the top-rank five serves, which utilize ab initio methods.

Conclusion

Experimental results demonstrate that (PS)²-v2 with the S2A2 matrix is useful for template selections and target-template alignments by blending the amino acid and structural propensities. The multiple-template and multiple-model strategies are able to significantly improve the accuracies for target-template alignments in the twilight zone. We believe that this server is useful in structure prediction and modeling, especially in detecting homologous templates with sequence similarity in the twilight zone.

Collapse

Wang Z, Tegge AN, Cheng J. Evaluating the absolute quality of a single protein model using structural features and support vector machines. Proteins 2009;75:638-47. [PMID: 19004001 DOI: 10.1002/prot.22275] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Jain P, Hirst JD. Exploring protein structural dissimilarity to facilitate structure classification. BMC STRUCTURAL BIOLOGY 2009;9:60. [PMID: 19765314 PMCID: PMC2754988 DOI: 10.1186/1472-6807-9-60] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2009] [Accepted: 09/19/2009] [Indexed: 12/04/2022]

Horst J, Samudrala R. Diversity of protein structures and difficulties in fold recognition: the curious case of protein G. F1000 BIOLOGY REPORTS 2009;1:69. [PMID: 20209018 PMCID: PMC2832337 DOI: 10.3410/b1-69] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Dong Q, Zhou S, Guan J. A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 2009;25:2655-62. [DOI: 10.1093/bioinformatics/btp500] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L. On the relation between residue flexibility and local solvent accessibility in proteins. Proteins 2009;76:617-36. [DOI: 10.1002/prot.22375] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Liu Y, Carbonell J, Gopalakrishnan V, Weigele P. Conditional graphical models for protein structural motif recognition. J Comput Biol 2009;16:639-57. [PMID: 19432536 DOI: 10.1089/cmb.2008.0176] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Deschavanne P, Tufféry P. Enhanced protein fold recognition using a structural alphabet. Proteins 2009;76:129-37. [DOI: 10.1002/prot.22324] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Lee SY, Lee JY, Jung KS, Ryu KH. A 9-state hidden Markov model using protein secondary structure information for protein fold recognition. Comput Biol Med 2009;39:527-34. [DOI: 10.1016/j.compbiomed.2009.03.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2008] [Revised: 01/20/2009] [Accepted: 03/11/2009] [Indexed: 11/30/2022]

Zhou H, Skolnick J. Protein structure prediction by pro-Sp3-TASSER. Biophys J 2009;96:2119-27. [PMID: 19289038 DOI: 10.1016/j.bpj.2008.12.3898] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 11/12/2008] [Accepted: 12/03/2008] [Indexed: 12/29/2022] Open

Abstract

An automated protein structure prediction algorithm, pro-sp3-Threading/ASSEmbly/Refinement (TASSER), is described and benchmarked. Structural templates are identified using five different scoring functions derived from the previously developed threading methods PROSPECTOR_3 and SP(3). Top templates identified by each scoring function are combined to derive contact and distant restraints for subsequent model refinement by short TASSER simulations. For Medium/Hard targets (those with moderate to poor quality templates and/or alignments), alternative template alignments are also generated by parametric alignment and the top models selected by TASSER-QA are included in the contact and distance restraint derivation. Then, multiple short TASSER simulations are used to generate an ensemble of full-length models. Subsequently, the top models are selected from the ensemble by TASSER-QA and used to derive TASSER contacts and distant restraints for another round of full TASSER refinement. The final models are selected from both rounds of TASSER simulations by TASSER-QA. We compare pro-sp3-TASSER with our previously developed MetaTASSER method (enhanced with chunk-TASSER for Medium/Hard targets) on a representative test data set of 723 proteins <250 residues in length. For the 348 proteins classified as easy targets (those templates with good alignments and global structure similarity to the target), the cumulative TM-score of the best of top five models by pro-sp3-TASSER shows a 2.1% improvement over MetaTASSER. For the 155/220 medium/hard targets, the improvements in TM-score are 2.8% and 2.2%, respectively. All improvements are statistically significant. More importantly, the number of foldable targets (those having models whose TM-score to native >0.4 in the top five clusters) increases from 472 to 497 for all targets, and the relative increases for medium and hard targets are 10% and 15%, respectively. A server that implements the above algorithm is available at http://cssb.biology.gatech.edu/skolnick/webservice/pro-sp3-TASSER/. The source code is also available upon request.

Collapse

Gao X, Bu D, Xu J, Li M. Improving consensus contact prediction via server correlation reduction. BMC STRUCTURAL BIOLOGY 2009;9:28. [PMID: 19419562 PMCID: PMC2689239 DOI: 10.1186/1472-6807-9-28] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2008] [Accepted: 05/06/2009] [Indexed: 11/10/2022]

Abstract

Background

Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them.

Results

In this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively.

Conclusion

Reducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.

Collapse

100

Jain P, Garibaldi JM, Hirst JD. Supervised machine learning algorithms for protein structure classification. Comput Biol Chem 2009;33:216-23. [PMID: 19473879 DOI: 10.1016/j.compbiolchem.2009.04.004] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2008] [Revised: 03/25/2009] [Accepted: 04/23/2009] [Indexed: 10/20/2022]