Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Su QJ, Lu L, Saxonov S, Brutlag DL. eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity. Nucleic Acids Res 2005;33:D178-82. [PMID: 15608172 PMCID: PMC540014 DOI: 10.1093/nar/gki060] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

For:	Su QJ, Lu L, Saxonov S, Brutlag DL. eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity. Nucleic Acids Res 2005;33:D178-82. [PMID: 15608172 PMCID: PMC540014 DOI: 10.1093/nar/gki060] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Number

Cited by Other Article(s)

Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 2017;114:11703-11708. [PMID: 29078314 PMCID: PMC5676897 DOI: 10.1073/pnas.1707642114] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open

Abstract

We question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected.

Proteins share similar segments with one another. Such “reused parts”—which have been successfully incorporated into other proteins—are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment “reuse” across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for “themes”—segments of at least 35 residues of similar sequence and structure—reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.

Collapse

Kubrycht J, Sigler K, Souček P, Hudeček J. Structures composing protein domains. Biochimie 2013;95:1511-24. [DOI: 10.1016/j.biochi.2013.04.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 04/02/2013] [Indexed: 12/21/2022]

Chua HN, Wong L. Predicting Protein Functions from Protein Interaction Networks. ACTA ACUST UNITED AC 2012. [DOI: 10.4018/ijkdb.2012100104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

Rorick M. Quantifying protein modularity and evolvability: a comparison of different techniques. Biosystems 2012;110:22-33. [PMID: 22796584 DOI: 10.1016/j.biosystems.2012.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Revised: 06/20/2012] [Accepted: 06/27/2012] [Indexed: 10/28/2022]

Rorick MM, Wagner GP. Protein structural modularity and robustness are associated with evolvability. Genome Biol Evol 2011;3:456-75. [PMID: 21602570 PMCID: PMC3134980 DOI: 10.1093/gbe/evr046] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open

Bernardes JS, Carbone A, Zaverucha G. A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models. BMC Bioinformatics 2011;12:83. [PMID: 21429187 PMCID: PMC3078102 DOI: 10.1186/1471-2105-12-83] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2010] [Accepted: 03/23/2011] [Indexed: 11/23/2022] Open

Nelson KJ, Knutson ST, Soito L, Klomsiri C, Poole LB, Fetrow JS. Analysis of the peroxiredoxin family: using active-site structure and sequence information for global classification and residue analysis. Proteins 2011;79:947-64. [PMID: 21287625 PMCID: PMC3065352 DOI: 10.1002/prot.22936] [Citation(s) in RCA: 136] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Revised: 10/13/2010] [Accepted: 10/25/2010] [Indexed: 12/25/2022]

Trifonov EN, Frenkel ZM. Evolution of protein modularity. Curr Opin Struct Biol 2009;19:335-40. [PMID: 19386484 DOI: 10.1016/j.sbi.2009.03.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Accepted: 03/16/2009] [Indexed: 10/20/2022]

Rajasekaran S, Balla S, Gradie P, Gryk MR, Kadaveru K, Kundeti V, Maciejewski MW, Mi T, Rubino N, Vyas J, Schiller MR. Minimotif miner 2nd release: a database and web system for motif search. Nucleic Acids Res 2009;37:D185-90. [PMID: 18978024 PMCID: PMC2686579 DOI: 10.1093/nar/gkn865] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 10/16/2008] [Indexed: 11/24/2022] Open

Affiliation(s)

Sanguthevar Rajasekaran Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Sudha Balla Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Patrick Gradie Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Michael R. Gryk Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Krishna Kadaveru Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Vamsi Kundeti Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Mark W. Maciejewski Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Tian Mi Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Nicholas Rubino Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Jay Vyas Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Martin R. Schiller Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA

Collapse

Liu B, Wang X, Lin L, Dong Q, Wang X. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinformatics 2008;9:510. [PMID: 19046430 PMCID: PMC2613933 DOI: 10.1186/1471-2105-9-510] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2008] [Accepted: 12/01/2008] [Indexed: 11/23/2022] Open

Abstract

BACKGROUND

Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences.

RESULTS

In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods.

CONCLUSION

The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.

Collapse

Ben-Hur A, Brutlag D. Sequence Motifs: Highly Predictive Features of Protein Function. FEATURE EXTRACTION 2008. [DOI: 10.1007/978-3-540-35488-8_32] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Wase NV, Wright PC. Systems biology of cyanobacterial secondary metabolite production and its role in drug discovery. Expert Opin Drug Discov 2008;3:903-29. [DOI: 10.1517/17460441.3.8.903] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Wang H, Segal E, Ben-Hur A, Li QR, Vidal M, Koller D. InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale. Genome Biol 2008;8:R192. [PMID: 17868464 PMCID: PMC2375030 DOI: 10.1186/gb-2007-8-9-r192] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Revised: 07/25/2007] [Accepted: 09/14/2007] [Indexed: 12/30/2022] Open

Hsu CM, Chen CY, Liu BJ. MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences. Nucleic Acids Res 2008;36:1400-6. [PMID: 18314547 PMCID: PMC3143912 DOI: 10.1093/nar/gkm717] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Håndstad T, Hestnes AJH, Sætrom P. Motif kernel generated by genetic programming improves remote homology and fold detection. BMC Bioinformatics 2007;8:23. [PMID: 17254344 PMCID: PMC1794419 DOI: 10.1186/1471-2105-8-23] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2006] [Accepted: 01/25/2007] [Indexed: 11/10/2022] Open

Hsu CM, Chen CY, Liu BJ. MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences. Nucleic Acids Res 2006;34:W356-61. [PMID: 16845025 PMCID: PMC1538832 DOI: 10.1093/nar/gkl309] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Ben-Hur A, Noble WS. Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinformatics 2006;7 Suppl 1:S2. [PMID: 16723005 PMCID: PMC1810313 DOI: 10.1186/1471-2105-7-s1-s2] [Citation(s) in RCA: 155] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Gutman R, Berezin C, Wollman R, Rosenberg Y, Ben-Tal N. QuasiMotiFinder: protein annotation by searching for evolutionarily conserved motif-like patterns. Nucleic Acids Res 2005;33:W255-61. [PMID: 15980465 PMCID: PMC1160256 DOI: 10.1093/nar/gki496] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open