Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Sim J, Kim SY, Lee J. PPRODO: prediction of protein domain boundaries using neural networks. Proteins 2006;59:627-32. [PMID: 15789433 DOI: 10.1002/prot.20442] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Number

Cited by Other Article(s)

Mahmud S, Guo Z, Quadir F, Liu J, Cheng J. Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps. BMC Bioinformatics 2022;23:283. [PMID: 35854211 PMCID: PMC9295499 DOI: 10.1186/s12859-022-04829-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 07/08/2022] [Indexed: 01/25/2023] Open

Mulnaes D, Golchin P, Koenig F, Gohlke H. TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning. J Chem Theory Comput 2021;17:4599-4613. [PMID: 34161735 DOI: 10.1021/acs.jctc.1c00129] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Wang Y, Zhang H, Zhong H, Xue Z. Protein domain identification methods and online resources. Comput Struct Biotechnol J 2021;19:1145-1153. [PMID: 33680357 PMCID: PMC7895673 DOI: 10.1016/j.csbj.2021.01.041] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/25/2021] [Accepted: 01/26/2021] [Indexed: 01/03/2023] Open

Shi Q, Chen W, Huang S, Jin F, Dong Y, Wang Y, Xue Z. DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network. Bioinformatics 2020;35:5128-5136. [PMID: 31197306 DOI: 10.1093/bioinformatics/btz464] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 05/07/2019] [Accepted: 06/05/2019] [Indexed: 11/13/2022] Open

Hong SH, Joo K, Lee J. ConDo: protein domain boundary prediction using coevolutionary information. Bioinformatics 2020;35:2411-2417. [PMID: 30500873 DOI: 10.1093/bioinformatics/bty973] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Revised: 11/15/2018] [Accepted: 11/29/2018] [Indexed: 11/13/2022] Open

Chen G, Chen J, Liu H, Chen S, Zhang Y, Li P, Thierry-Mieg D, Thierry-Mieg J, Mattes W, Ning B, Shi T. Comprehensive Identification and Characterization of Human Secretome Based on Integrative Proteomic and Transcriptomic Data. Front Cell Dev Biol 2019;7:299. [PMID: 31824949 PMCID: PMC6881247 DOI: 10.3389/fcell.2019.00299] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 11/07/2019] [Indexed: 12/25/2022] Open

Affiliation(s)

Geng Chen The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
Jiwei Chen The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
Huanlong Liu The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
Shuangguan Chen The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
Yang Zhang The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
Peng Li The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
Danielle Thierry-Mieg National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
Jean Thierry-Mieg National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
William Mattes National Center for Toxicological Research, Food and Drug Administration, Jefferson City, AR, United States
Baitang Ning National Center for Toxicological Research, Food and Drug Administration, Jefferson City, AR, United States
Tieliu Shi The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China

Collapse

Wang Y, Wang J, Li R, Shi Q, Xue Z, Zhang Y. ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly. Nucleic Acids Res 2019;45:W400-W407. [PMID: 28498994 PMCID: PMC5793814 DOI: 10.1093/nar/gkx410] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 04/28/2017] [Indexed: 12/21/2022] Open

Bioinformatics and Translation Elongation. BIOINFORMATICS AND THE CELL 2018:197-238. [PMCID: PMC7121122 DOI: 10.1007/978-3-319-90684-3_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/11/2023]

Richa T, Ide S, Suzuki R, Ebina T, Kuroda Y. Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers. J Comput Aided Mol Des 2016;31:237-244. [PMID: 28028736 DOI: 10.1007/s10822-016-9999-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2016] [Accepted: 12/10/2016] [Indexed: 10/20/2022]

Chatterjee P, Basu S, Zubek J, Kundu M, Nasipuri M, Plewczynski D. PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach. J Mol Model 2016;22:72. [PMID: 26969678 PMCID: PMC4788683 DOI: 10.1007/s00894-016-2933-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 02/17/2016] [Indexed: 01/04/2023]

Mabrouk M, Werner T, Schneider M, Putz I, Brock O. Analysis of free modeling predictions by RBO aleph in CASP11. Proteins 2015;84 Suppl 1:87-104. [PMID: 26492194 DOI: 10.1002/prot.24950] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 09/28/2015] [Accepted: 10/19/2015] [Indexed: 12/15/2022]

Xue Z, Jang R, Govindarajoo B, Huang Y, Wang Y. Extending Protein Domain Boundary Predictors to Detect Discontinuous Domains. PLoS One 2015;10:e0141541. [PMID: 26502173 PMCID: PMC4621036 DOI: 10.1371/journal.pone.0141541] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Accepted: 10/10/2015] [Indexed: 11/18/2022] Open

Mabrouk M, Putz I, Werner T, Schneider M, Neeb M, Bartels P, Brock O. RBO Aleph: leveraging novel information sources for protein structure prediction. Nucleic Acids Res 2015;43:W343-8. [PMID: 25897112 PMCID: PMC4489312 DOI: 10.1093/nar/gkv357] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 04/03/2015] [Indexed: 02/02/2023] Open

Shatnawi M, Zaki N. Inter-domain linker prediction using amino acid compositional index. Comput Biol Chem 2015;55:23-30. [DOI: 10.1016/j.compbiolchem.2015.01.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2014] [Revised: 01/22/2015] [Accepted: 01/22/2015] [Indexed: 10/24/2022]

PDP-RF: Protein Domain Boundary Prediction Using Random Forest Classifier. LECTURE NOTES IN COMPUTER SCIENCE 2015. [DOI: 10.1007/978-3-319-19941-2_42] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Shatnawi M, Zaki N, Yoo PD. Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties. BMC Bioinformatics 2014;15 Suppl 16:S8. [PMID: 25521329 PMCID: PMC4290662 DOI: 10.1186/1471-2105-15-s16-s8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection. J Comput Aided Mol Des 2014;28:831-9. [PMID: 24965847 DOI: 10.1007/s10822-014-9763-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Accepted: 06/09/2014] [Indexed: 10/25/2022]

Xue Z, Xu D, Wang Y, Zhang Y. ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics 2013;29:i247-56. [PMID: 23812990 PMCID: PMC3694664 DOI: 10.1093/bioinformatics/btt209] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Abstract

Motivation: Protein domains are subunits that can fold and evolve independently. Identification of domain boundary locations is often the first step in protein folding and function annotations. Most of the current methods deduce domain boundaries by sequence-based analysis, which has low accuracy. There is no efficient method for predicting discontinuous domains that consist of segments from separated sequence regions. As template-based methods are most efficient for protein 3D structure modeling, combining multiple threading alignment information should increase the accuracy and reliability of computational domain predictions.

Result: We developed a new protein domain predictor, ThreaDom, which deduces domain boundary locations based on multiple threading alignments. The core of the method development is the derivation of a domain conservation score that combines information from template domain structures and terminal and internal alignment gaps. Tested on 630 non-redundant sequences, without using homologous templates, ThreaDom generates correct single- and multi-domain classifications in 81% of cases, where 78% have the domain linker assigned within ±20 residues. In a second test on 486 proteins with discontinuous domains, ThreaDom achieves an average precision 84% and recall 65% in domain boundary prediction. Finally, ThreaDom was examined on 56 targets from CASP8 and had a domain overlap rate 73, 87 and 85% with the target for Free Modeling, Hard multiple-domain and discontinuous domain proteins, respectively, which are significantly higher than most domain predictors in the CASP8. Similar results were achieved on the targets from the most recently CASP9 and CASP10 experiments.

Availability:http://zhanglab.ccmb.med.umich.edu/ThreaDom/.

Contact:zhng@umich.edu

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

PPM-Dom: A novel method for domain position prediction. Comput Biol Chem 2013;47:8-15. [DOI: 10.1016/j.compbiolchem.2013.06.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2013] [Revised: 06/05/2013] [Accepted: 06/05/2013] [Indexed: 02/05/2023]

Zhang XY, Lu LJ, Song Q, Yang QQ, Li DP, Sun JM, Li TH, Cong PS. DomHR: accurately identifying domain boundaries in proteins using a hinge region strategy. PLoS One 2013;8:e60559. [PMID: 23593247 PMCID: PMC3623903 DOI: 10.1371/journal.pone.0060559] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2012] [Accepted: 02/27/2013] [Indexed: 11/18/2022] Open

Xia X. Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction. SCIENTIFICA 2012;2012:917540. [PMID: 24278755 PMCID: PMC3820676 DOI: 10.6064/2012/917540] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 10/11/2012] [Indexed: 05/31/2023]

Cheng J, Li J, Wang Z, Eickholt J, Deng X. The MULTICOM toolbox for protein structure prediction. BMC Bioinformatics 2012;13:65. [PMID: 22545707 PMCID: PMC3495398 DOI: 10.1186/1471-2105-13-65] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2012] [Accepted: 04/30/2012] [Indexed: 12/31/2022] Open

Abstract

Background

As genome sequencing is becoming routine in biomedical research, the total number of protein sequences is increasing exponentially, recently reaching over 108 million. However, only a tiny portion of these proteins (i.e. ~75,000 or < 0.07%) have solved tertiary structures determined by experimental techniques. The gap between protein sequence and structure continues to enlarge rapidly as the throughput of genome sequencing techniques is much higher than that of protein structure determination techniques. Computational software tools for predicting protein structure and structural features from protein sequences are crucial to make use of this vast repository of protein resources.

Results

To meet the need, we have developed a comprehensive MULTICOM toolbox consisting of a set of protein structure and structural feature prediction tools. These tools include secondary structure prediction, solvent accessibility prediction, disorder region prediction, domain boundary prediction, contact map prediction, disulfide bond prediction, beta-sheet topology prediction, fold recognition, multiple template combination and alignment, template-based tertiary structure modeling, protein model quality assessment, and mutation stability prediction.

Conclusions

These tools have been rigorously tested by many users in the last several years and/or during the last three rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7-9) from 2006 to 2010, achieving state-of-the-art or near performance. In order to facilitate bioinformatics research and technological development in the field, we have made the MULTICOM toolbox freely available as web services and/or software packages for academic use and scientific research. It is available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.

Collapse

Eickholt J, Deng X, Cheng J. DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinformatics 2011;12:43. [PMID: 21284866 PMCID: PMC3036623 DOI: 10.1186/1471-2105-12-43] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Accepted: 02/01/2011] [Indexed: 11/17/2022] Open

Ebina T, Toh H, Kuroda Y. DROP: an SVM domain linker predictor trained with optimal features selected by random forest. ACTA ACUST UNITED AC 2010;27:487-94. [PMID: 21169376 DOI: 10.1093/bioinformatics/btq700] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Yoo PD, Shwen Ho Y, Ng J, Charleston M, Saksena NK, Yang P, Zomaya AY. Hierarchical kernel mixture models for the prediction of AIDS disease progression using HIV structural gp120 profiles. BMC Genomics 2010;11 Suppl 4:S22. [PMID: 21143806 PMCID: PMC3005921 DOI: 10.1186/1471-2164-11-s4-s22] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Gorbalenya AE, Lieutaud P, Harris MR, Coutard B, Canard B, Kleywegt GJ, Kravchenko AA, Samborskiy DV, Sidorov IA, Leontovich AM, Jones TA. Practical application of bioinformatics by the multidisciplinary VIZIER consortium. Antiviral Res 2010;87:95-110. [PMID: 20153379 PMCID: PMC7172516 DOI: 10.1016/j.antiviral.2010.02.005] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Revised: 02/03/2010] [Accepted: 02/04/2010] [Indexed: 01/03/2023]

Affiliation(s)

Alexander E. Gorbalenya Molecular Virology Laboratory, Department of Medical Microbiology, Center for Infectious Diseases, Leiden University Medical Center, P.O. Box 9600, E4-P, 2300 RC Leiden, The Netherlands A.N. Belozersky Institute of Physico-Chemical Biology, Moscow State University, Moscow 119899, Russia
Philippe Lieutaud Laboratoire Architecture et Fonction des Macromolécules Biologiques, UMR 6098, AFMB-CNRS-ESIL, Case 925, 163 Avenue de Luminy, 13288 Marseille, France
Mark R. Harris Department of Cell and Molecular Biology, Uppsala University, Biomedical Center, Box 596, SE-751 24 Uppsala, Sweden
Bruno Coutard Laboratoire Architecture et Fonction des Macromolécules Biologiques, UMR 6098, AFMB-CNRS-ESIL, Case 925, 163 Avenue de Luminy, 13288 Marseille, France
Bruno Canard Laboratoire Architecture et Fonction des Macromolécules Biologiques, UMR 6098, AFMB-CNRS-ESIL, Case 925, 163 Avenue de Luminy, 13288 Marseille, France
Gerard J. Kleywegt Department of Cell and Molecular Biology, Uppsala University, Biomedical Center, Box 596, SE-751 24 Uppsala, Sweden
Alexander A. Kravchenko A.N. Belozersky Institute of Physico-Chemical Biology, Moscow State University, Moscow 119899, Russia
Dmitry V. Samborskiy Molecular Virology Laboratory, Department of Medical Microbiology, Center for Infectious Diseases, Leiden University Medical Center, P.O. Box 9600, E4-P, 2300 RC Leiden, The Netherlands
Igor A. Sidorov Molecular Virology Laboratory, Department of Medical Microbiology, Center for Infectious Diseases, Leiden University Medical Center, P.O. Box 9600, E4-P, 2300 RC Leiden, The Netherlands
Andrey M. Leontovich A.N. Belozersky Institute of Physico-Chemical Biology, Moscow State University, Moscow 119899, Russia
T. Alwyn Jones Department of Cell and Molecular Biology, Uppsala University, Biomedical Center, Box 596, SE-751 24 Uppsala, Sweden

Collapse

Chen P, Liu C, Burge L, Li J, Mohammad M, Southerland W, Gloster C, Wang B. DomSVR: domain boundary prediction with support vector regression from sequence information alone. Amino Acids 2010;39:713-26. [PMID: 20165918 DOI: 10.1007/s00726-010-0506-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Accepted: 01/25/2010] [Indexed: 11/24/2022]

Yoo PD, Zhou BB, Zomaya AY. A modular kernel approach for integrative analysis of protein domain boundaries. BMC Genomics 2009;10 Suppl 3:S21. [PMID: 19958485 PMCID: PMC2788374 DOI: 10.1186/1471-2164-10-s3-s21] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Xue B, Faraggi E, Zhou Y. Predicting residue-residue contact maps by a two-layer, integrated neural-network method. Proteins 2009;76:176-83. [PMID: 19137600 DOI: 10.1002/prot.22329] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Walsh I, Martin AJM, Mooney C, Rubagotti E, Vullo A, Pollastri G. Ab initio and homology based prediction of protein domains by recursive neural networks. BMC Bioinformatics 2009;10:195. [PMID: 19558651 PMCID: PMC2711945 DOI: 10.1186/1471-2105-10-195] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2008] [Accepted: 06/26/2009] [Indexed: 11/10/2022] Open

Abstract

Background

Proteins, especially larger ones, are often composed of individual evolutionary units, domains, which have their own function and structural fold. Predicting domains is an important intermediate step in protein analyses, including the prediction of protein structures.

Results

We describe novel systems for the prediction of protein domain boundaries powered by Recursive Neural Networks. The systems rely on a combination of primary sequence and evolutionary information, predictions of structural features such as secondary structure, solvent accessibility and residue contact maps, and structural templates, both annotated for domains (from the SCOP dataset) and unannotated (from the PDB). We gauge the contribution of contact maps, and PDB and SCOP templates independently and for different ranges of template quality. We find that accurately predicted contact maps are informative for the prediction of domain boundaries, while the same is not true for contact maps predicted ab initio. We also find that gap information from PDB templates is informative, but, not surprisingly, less than SCOP annotations. We test both systems trained on templates of all qualities, and systems trained only on templates of marginal similarity to the query (less than 25% sequence identity). While the first batch of systems produces near perfect predictions in the presence of fair to good templates, the second batch outperforms or match ab initio predictors down to essentially any level of template quality.

We test all systems in 5-fold cross-validation on a large non-redundant set of multi-domain and single domain proteins. The final predictors are state-of-the-art, with a template-less prediction boundary recall of 50.8% (precision 38.7%) within ± 20 residues and a single domain recall of 80.3% (precision 78.1%). The SCOP-based predictors achieve a boundary recall of 74% (precision 77.1%) again within ± 20 residues, and classify single domain proteins as such in over 85% of cases, when we allow a mix of bad and good quality templates. If we only allow marginal templates (max 25% sequence identity to the query) the scores remain high, with boundary recall and precision of 59% and 66.3%, and 80% of all single domain proteins predicted correctly.

Conclusion

The systems presented here may prove useful in large-scale annotation of protein domains in proteins of unknown structure. The methods are available as public web servers at the address: and we plan on running them on a multi-genomic scale and make the results public in the near future.

Collapse

Ebina T, Toh H, Kuroda Y. Loop-length-dependent SVM prediction of domain linkers for high-throughput structural proteomics. Biopolymers 2009;92:1-8. [PMID: 18844295 DOI: 10.1002/bip.21105] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Bondugula R, Lee MS, Wallqvist A. FIEFDom: a transparent domain boundary recognition system using a fuzzy mean operator. Nucleic Acids Res 2008;37:452-62. [PMID: 19056827 PMCID: PMC2632928 DOI: 10.1093/nar/gkn944] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Wu Y, Dousis AD, Chen M, Li J, Ma J. OPUS-Dom: applying the folding-based method VECFOLD to determine protein domain boundaries. J Mol Biol 2008;385:1314-29. [PMID: 19026662 DOI: 10.1016/j.jmb.2008.10.093] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2008] [Revised: 10/29/2008] [Accepted: 10/31/2008] [Indexed: 10/21/2022]

Yoo PD, Sikder AR, Taheri J, Zhou BB, Zomaya AY. DomNet: Protein Domain Boundary Prediction Using Enhanced General Regression Network and New Profiles. IEEE Trans Nanobioscience 2008;7:172-81. [DOI: 10.1109/tnb.2008.2000747] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Yoo PD, Sikder AR, Zhou BB, Zomaya AY. Improved general regression network for protein domain boundary prediction. BMC Bioinformatics 2008;9 Suppl 1:S12. [PMID: 18315843 PMCID: PMC2259413 DOI: 10.1186/1471-2105-9-s1-s12] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Manjasetty BA, Turnbull AP, Panjikar S, Büssow K, Chance MR. Automated technologies and novel techniques to accelerate protein crystallography for structural genomics. Proteomics 2008;8:612-25. [PMID: 18210369 DOI: 10.1002/pmic.200700687] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Ye L, Liu T, Wu Z, Zhou R. Sequence-based protein domain boundary prediction using BP neural network with various property profiles. Proteins 2008;71:300-7. [PMID: 17932915 DOI: 10.1002/prot.21745] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Tress M, Cheng J, Baldi P, Joo K, Lee J, Seo JH, Lee J, Baker D, Chivian D, Kim D, Ezkurdia I. Assessment of predictions submitted for the CASP7 domain prediction category. Proteins 2008;69 Suppl 8:137-51. [PMID: 17680686 DOI: 10.1002/prot.21675] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Gimona M. Protein Linguistics and the Modular Code of the Cytoskeleton. BIOSEMIOTICS 2008:189-206. [DOI: 10.1007/978-1-4020-6340-4_8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]

Zhou H, Xue B, Zhou Y. DDOMAIN: Dividing structures into domains using a normalized domain-domain interaction profile. Protein Sci 2007;16:947-55. [PMID: 17456745 PMCID: PMC2206635 DOI: 10.1110/ps.062597307] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Abstract

Dividing protein structures into domains is proven useful for more accurate structural and functional characterization of proteins. Here, we develop a method, called DDOMAIN, that divides structure into DOMAINs using a normalized contact-based domain-domain interaction profile. Results of DDOMAIN are compared to AUTHORS annotations (domain definitions are given by the authors who solved protein structures), as well as to popular SCOP and CATH annotations by human experts and automatic programs. DDOMAIN's automatic annotations are most consistent with the AUTHORS annotations (90% agreement in number of domains and 88% agreement in both number of domains and at least 85% overlap in domain assignment of residues) if its three adjustable parameters are trained by the AUTHORS annotations. By comparison, the agreement is 83% (81% with at least 85% overlap criterion) between SCOP-trained DDOMAIN and SCOP annotations and 77% (73%) between CATH-trained DDOMAIN and CATH annotations. The agreement between DDOMAIN and AUTHORS annotations goes beyond single-domain proteins (97%, 82%, and 56% for single-, two-, and three-domain proteins, respectively). For an "easy" data set of proteins whose CATH and SCOP annotations agree with each other in number of domains, the agreement is 90% (89%) between "easy-set"-trained DDOMAIN and CATH/SCOP annotations. The consistency between SCOP-trained DDOMAIN and SCOP annotations is superior to two other recently developed, SCOP-trained, automatic methods PDP (protein domain parser), and DomainParser 2. We also tested a simple consensus method made of PDP, DomainParser 2, and DDOMAIN and a different version of DDOMAIN based on a more sophisticated statistical energy function. The DDOMAIN server and its executable are available in the services section on http://sparks.informatics.iupui.edu.

Collapse

Cheng J. DOMAC: an accurate, hybrid protein domain prediction server. Nucleic Acids Res 2007;35:W354-6. [PMID: 17553833 PMCID: PMC1933197 DOI: 10.1093/nar/gkm390] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Sikder AR, Zomaya AY. Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index. BMC Bioinformatics 2006;7 Suppl 5:S6. [PMID: 17254311 PMCID: PMC1764483 DOI: 10.1186/1471-2105-7-s5-s6] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Joshi RR, Samant VV. Bayesian data mining of protein domains gives an efficient predictive algorithm and new insight. J Mol Model 2006;13:275-82. [PMID: 17028865 DOI: 10.1007/s00894-006-0141-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2006] [Accepted: 07/28/2006] [Indexed: 11/29/2022]

Dong Q, Wang X, Lin L. Novel knowledge-based mean force potential at the profile level. BMC Bioinformatics 2006;7:324. [PMID: 16803615 PMCID: PMC1534065 DOI: 10.1186/1471-2105-7-324] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2006] [Accepted: 06/27/2006] [Indexed: 11/10/2022] Open

Abstract

Background

The development and testing of functions for the modeling of protein energetics is an important part of current research aimed at understanding protein structure and function. Knowledge-based mean force potentials are derived from statistical analyses of interacting groups in experimentally determined protein structures. Current knowledge-based mean force potentials are developed at the atom or amino acid level. The evolutionary information contained in the profiles is not investigated. Based on these observations, a class of novel knowledge-based mean force potentials at the profile level has been presented, which uses the evolutionary information of profiles for developing more powerful statistical potentials.

Results

The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into binary profiles with a probability threshold. As a result, the protein sequences are represented as sequences of binary profiles rather than sequences of amino acids. Similar to the knowledge-based potentials at the residue level, a class of novel potentials at the profile level is introduced. We develop four types of profile-level statistical potentials including distance-dependent, contact, Φ/Ψ dihedral angle and accessible surface statistical potentials. These potentials are first evaluated by the fold assessment between the correct and incorrect models generated by comparative modeling from our own and other groups. They are then used to recognize the native structures from well-constructed decoy sets. Experimental results show that all the knowledge-base mean force potentials at the profile level outperform those at the residue level. Significant improvements are obtained for the distance-dependent and accessible surface potentials (5–6%). The contact and Φ/Ψ dihedral angle potential only get a slight improvement (1–2%). Decoy set evaluation results show that the distance-dependent profile-level potentials even outperform other atom-level potentials. We also demonstrate that profile-level statistical potentials can improve the performance of threading.

Conclusion

The knowledge-base mean force potentials at the profile level can provide better discriminatory ability than those at the residue level, so they will be useful for protein structure prediction and model refinement.

Collapse

Miyazaki S, Kuroda Y, Yokoyama S. Identification of putative domain linkers by a neural network - application to a large sequence database. BMC Bioinformatics 2006;7:323. [PMID: 16800897 PMCID: PMC1538634 DOI: 10.1186/1471-2105-7-323] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2006] [Accepted: 06/27/2006] [Indexed: 11/10/2022] Open

Tai CH, Lee WJ, Vincent JJ, Lee B. Evaluation of domain prediction in CASP6. Proteins 2006;61 Suppl 7:183-192. [PMID: 16187361 DOI: 10.1002/prot.20736] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Joshi RR, Samant VV. Fast prediction of protein domain boundaries using conserved local patterns. J Mol Model 2006;12:943-52. [PMID: 16649034 DOI: 10.1007/s00894-006-0116-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2005] [Accepted: 03/09/2006] [Indexed: 10/24/2022]

Abstract

We have found certain conserved motifs and secondary structural patterns present in the vicinity of interior domain boundary points (dbps) by a data-driven approach without any a priori constraint on the type and number of such features, and without any requirement of sequence homology. We have used these motifs and patterns to rerank the solutions obtained by the well-known domain guess by size (DGS) algorithm. We predict, overall, five solutions. The average accuracy of overall (i.e., top five) predictions by our method [domain boundary prediction using conserved patterns (DPCP)] has improved the average accuracy of the top five solutions of DGS from 71.74 to 82.88 %, in the case of two-continuous-domain proteins, and from 21.38 to 80.56 %, for two-discontinuous-domain proteins. Considering only the top solution, the gains in accuracy are from 0 to 72.74 % for two-continuous-domain proteins with chain lengths up to 300 residues, and from 0 to 62.85 % for those with up to 400 residues. In the case of discontinuous domains, top_min solutions (the minimum number of solutions required for predicting all dbps of a protein) of DPCP improve the average accuracy of DGS prediction from 12.5 to 76.3 % in proteins with chain lengths up to 300 residues, and from 13.33 to 70.84 % for proteins with up to 400 residues. In our validation experiments, the performance of DPCP was also found to be superior to that of domain identification from secondary structure element alignment (DomSSEA), the best method reported so far for efficient prediction of domain boundaries using predicted secondary structure. The average accuracies of the topmost solution of DomSSEA are 61 and 52 % for proteins with up to 300 residues and 400, respectively, in the case of continuous domains; the corresponding accuracies for the discontinuous case are 28 and 21 %.

Collapse

Dong Q, Wang X, Lin L, Xu Z. Domain boundary prediction based on profile domain linker propensity index. Comput Biol Chem 2006;30:127-33. [PMID: 16531120 DOI: 10.1016/j.compbiolchem.2006.01.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2005] [Revised: 12/29/2005] [Accepted: 01/08/2006] [Indexed: 11/19/2022]

Hondoh T, Kato A, Yokoyama S, Kuroda Y. Computer-aided NMR assay for detecting natively folded structural domains. Protein Sci 2006;15:871-83. [PMID: 16522794 PMCID: PMC2242495 DOI: 10.1110/ps.051880406] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Gimona M. Protein linguistics — a grammar for modular protein assembly? Nat Rev Mol Cell Biol 2006;7:68-73. [PMID: 16493414 DOI: 10.1038/nrm1785] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]