Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Liu J, Rost B. Domains, motifs and clusters in the protein universe. Curr Opin Chem Biol 2003;7:5-11. [PMID: 12547420 DOI: 10.1016/s1367-5931(02)00003-0] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

Schütze K, Heinzinger M, Steinegger M, Rost B. Nearest neighbor search on embeddings rapidly identifies distant protein relations. FRONTIERS IN BIOINFORMATICS 2022;2:1033775. [PMID: 36466147 PMCID: PMC9714024 DOI: 10.3389/fbinf.2022.1033775] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/31/2022] [Indexed: 11/29/2023] Open

Marquet C, Heinzinger M, Olenyi T, Dallago C, Erckert K, Bernhofer M, Nechaev D, Rost B. Embeddings from protein language models predict conservation and variant effects. Hum Genet 2022;141:1629-1647. [PMID: 34967936 PMCID: PMC8716573 DOI: 10.1007/s00439-021-02411-y] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/06/2021] [Indexed: 12/13/2022]

Abstract

The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient-MCC-for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein.

Collapse

Affiliation(s)

Céline Marquet Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany. TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
Michael Heinzinger Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Tobias Olenyi Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Christian Dallago Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Kyra Erckert Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Michael Bernhofer Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Dmitrii Nechaev Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Burkhard Rost Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany

Collapse

Littmann M, Heinzinger M, Dallago C, Weissenow K, Rost B. Protein embeddings and deep learning predict binding residues for various ligand classes. Sci Rep 2021;11:23916. [PMID: 34903827 PMCID: PMC8668950 DOI: 10.1038/s41598-021-03431-4] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 12/02/2021] [Indexed: 01/27/2023] Open

Computational and experimental elucidation of Plasmodium falciparum phosphoethanolamine methyltransferase inhibitors: Pivotal drug target. PLoS One 2019;14:e0221032. [PMID: 31437171 PMCID: PMC6705855 DOI: 10.1371/journal.pone.0221032] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 07/29/2019] [Indexed: 11/19/2022] Open

Reeb J, Hecht M, Mahlich Y, Bromberg Y, Rost B. Predicted Molecular Effects of Sequence Variants Link to System Level of Disease. PLoS Comput Biol 2016;12:e1005047. [PMID: 27536940 PMCID: PMC4990455 DOI: 10.1371/journal.pcbi.1005047] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 07/04/2016] [Indexed: 11/19/2022] Open

Abstract

Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, e.g. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, i.e. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, in silico methods focusing on the micro level of molecular function can help to understand the macro system level of disease.

The variations in the genetic sequence between individuals affect the gene-product, i.e. the protein differently. Some variants have no measurable effect (are neutral), while others affect protein function. Some of those effects are so severe they cause so called monogenic Mendelian diseases, i.e. diseases triggered by a single letter change. Some in silico methods predict the molecular impact of sequence variation. However, both experimental and computational analyses struggle to generalize from the effect upon molecular protein function to the effect upon the organism such as a disease. Here, we confirmed that methods predicting molecular effects correctly capture the type of effects causing Mendelian diseases in human and introduced a data set for animal diseases that was also captured by predictions methods. Predicted effects were less when in silico testing human variants in an animal model (here mouse). This is important to know because “mouse models” are common to study human diseases. Overall, we provided some evidence for a link between the molecular level and some type of disease.

Collapse

Ahmad M, Jung LT, Bhuiyan MAA. On fuzzy semantic similarity measure for DNA coding. Comput Biol Med 2015;69:144-51. [PMID: 26773936 DOI: 10.1016/j.compbiomed.2015.12.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Revised: 12/22/2015] [Accepted: 12/23/2015] [Indexed: 11/28/2022]

Rappoport N, Stern A, Linial N, Linial M. Entropy-driven partitioning of the hierarchical protein space. Bioinformatics 2015;30:i624-30. [PMID: 25161256 PMCID: PMC4147929 DOI: 10.1093/bioinformatics/btu478] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

Abstract

Motivation: Modern protein sequencing techniques have led to the determination of >50 million protein sequences. ProtoNet is a clustering system that provides a continuous hierarchical agglomerative clustering tree for all proteins. While ProtoNet performs unsupervised classification of all included proteins, finding an optimal level of granularity for the purpose of focusing on protein functional groups remain elusive. Here, we ask whether knowledge-based annotations on protein families can support the automatic unsupervised methods for identifying high-quality protein families. We present a method that yields within the ProtoNet hierarchy an optimal partition of clusters, relative to manual annotation schemes. The method’s principle is to minimize the entropy-derived distance between annotation-based partitions and all available hierarchical partitions. We describe the best front (BF) partition of 2 478 328 proteins from UniRef50. Of 4 929 553 ProtoNet tree clusters, BF based on Pfam annotations contain 26 891 clusters. The high quality of the partition is validated by the close correspondence with the set of clusters that best describe thousands of keywords of Pfam. The BF is shown to be superior to naïve cut in the ProtoNet tree that yields a similar number of clusters. Finally, we used parameters intrinsic to the clustering process to enrich a priori the BF’s clusters. We present the entropy-based method’s benefit in overcoming the unavoidable limitations of nested clusters in ProtoNet. We suggest that this automatic information-based cluster selection can be useful for other large-scale annotation schemes, as well as for systematically testing and comparing putative families derived from alternative clustering methods.

Availability and implementation: A catalog of BF clusters for thousands of Pfam keywords is provided at http://protonet.cs.huji.ac.il/bestFront/

Contact: michall@cc.huji.ac.il

Collapse

Doerr D, Stoye J, Böcker S, Jahn K. Identifying gene clusters by discovering common intervals in indeterminate strings. BMC Genomics 2015;15 Suppl 6:S2. [PMID: 25571793 PMCID: PMC4274641 DOI: 10.1186/1471-2164-15-s6-s2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Ben-Tal N, Kolodny R. Representation of the Protein Universe using Classifications, Maps, and Networks. Isr J Chem 2014. [DOI: 10.1002/ijch.201400001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

PPM-Dom: A novel method for domain position prediction. Comput Biol Chem 2013;47:8-15. [DOI: 10.1016/j.compbiolchem.2013.06.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2013] [Revised: 06/05/2013] [Accepted: 06/05/2013] [Indexed: 02/05/2023]

Ezkurdia I, Tress ML. Protein structural domains: definition and prediction. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2011;Chapter 2:2.14.1-2.14.16. [PMID: 22045561 DOI: 10.1002/0471140864.ps0214s66] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

Angadi UB, Venkatesulu M. Structural SCOP superfamily level classification using unsupervised machine learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;9:601-608. [PMID: 21844638 DOI: 10.1109/tcbb.2011.114] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Protein disorder--a breakthrough invention of evolution? Curr Opin Struct Biol 2011;21:412-8. [PMID: 21514145 DOI: 10.1016/j.sbi.2011.03.014] [Citation(s) in RCA: 112] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Revised: 03/29/2011] [Accepted: 03/29/2011] [Indexed: 11/21/2022]

Bannert C, Welfle A, aus dem Spring C, Schomburg D. BrEPS: a flexible and automatic protocol to compute enzyme-specific sequence profiles for functional annotation. BMC Bioinformatics 2010;11:589. [PMID: 21122127 PMCID: PMC3009691 DOI: 10.1186/1471-2105-11-589] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 12/01/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Models for the simulation of metabolic networks require the accurate prediction of enzyme function. Based on a genomic sequence, enzymatic functions of gene products are today mainly predicted by sequence database searching and operon analysis. Other methods can support these techniques: We have developed an automatic method "BrEPS" that creates highly specific sequence patterns for the functional annotation of enzymes.

RESULTS

The enzymes in the UniprotKB are identified and their sequences compared against each other with BLAST. The enzymes are then clustered into a number of trees, where each tree node is associated with a set of EC-numbers. The enzyme sequences in the tree nodes are aligned with ClustalW. The conserved columns of the resulting multiple alignments are used to construct sequence patterns. In the last step, we verify the quality of the patterns by computing their specificity. Patterns with low specificity are omitted and recomputed further down in the tree. The final high-quality patterns can be used for functional annotation. We ran our protocol on a recent Swiss-Prot release and show statistics, as well as a comparison to PRIAM, a probabilistic method that is also specialized on the functional annotation of enzymes. We determine the amount of true positive annotations for five common microorganisms with data from BRENDA and AMENDA serving as standard of truth. BrEPS is almost on par with PRIAM, a fact which we discuss in the context of five manually investigated cases.

CONCLUSIONS

Our protocol computes highly specific sequence patterns that can be used to support the functional annotation of enzymes. The main advantages of our method are that it is automatic and unsupervised, and quite fast once the patterns are evaluated. The results show that BrEPS can be a valuable addition to the reconstruction of metabolic networks.

Collapse

Naamati G, Fromer M, Linial M. Expansion of tandem repeats in sea anemone Nematostella vectensis proteome: A source for gene novelty? BMC Genomics 2009;10:593. [PMID: 20003297 PMCID: PMC2805694 DOI: 10.1186/1471-2164-10-593] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2009] [Accepted: 12/10/2009] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The complete proteome of the starlet sea anemone, Nematostella vectensis, provides insights into gene invention dating back to the Cnidarian-Bilaterian ancestor. With the addition of the complete proteomes of Hydra magnipapillata and Monosiga brevicollis, the investigation of proteins having unique features in early metazoan life has become practical. We focused on the properties and the evolutionary trends of tandem repeat (TR) sequences in Cnidaria proteomes.

RESULTS

We found that 11-16% of N. vectensis proteins contain tandem repeats. Most TRs cover 150 amino acid segments that are comprised of basic units of 5-20 amino acids. In total, the N. Vectensis proteome has about 3300 unique TR-units, but only a small fraction of them are shared with H. magnipapillata, M. brevicollis, or mammalian proteomes. The overall abundance of these TRs stands out relative to that of 14 proteomes representing the diversity among eukaryotes and within the metazoan world. TR-units are characterized by a unique composition of amino acids, with cysteine and histidine being over-represented. Structurally, most TR-segments are associated with coiled and disordered regions. Interestingly, 80% of the TR-segments can be read in more than one open reading frame. For over 100 of them, translation of the alternative frames would result in long proteins. Most domain families that are characterized as repeats in eukaryotes are found in the TR-proteomes from Nematostella and Hydra.

CONCLUSIONS

While most TR-proteins have originated from prediction tools and are still awaiting experimental validations, supportive evidence exists for hundreds of TR-units in Nematostella. The existence of TR-proteins in early metazoan life may have served as a robust mode for novel genes with previously overlooked structural and functional characteristics.

Collapse

Punta M, Love J, Handelman S, Hunt JF, Shapiro L, Hendrickson WA, Rost B. Structural genomics target selection for the New York consortium on membrane protein structure. ACTA ACUST UNITED AC 2009;10:255-68. [PMID: 19859826 PMCID: PMC2780672 DOI: 10.1007/s10969-009-9071-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2009] [Accepted: 09/30/2009] [Indexed: 01/02/2023]

Walsh I, Martin AJM, Mooney C, Rubagotti E, Vullo A, Pollastri G. Ab initio and homology based prediction of protein domains by recursive neural networks. BMC Bioinformatics 2009;10:195. [PMID: 19558651 PMCID: PMC2711945 DOI: 10.1186/1471-2105-10-195] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2008] [Accepted: 06/26/2009] [Indexed: 11/10/2022] Open

Abstract

Background

Proteins, especially larger ones, are often composed of individual evolutionary units, domains, which have their own function and structural fold. Predicting domains is an important intermediate step in protein analyses, including the prediction of protein structures.

Results

We describe novel systems for the prediction of protein domain boundaries powered by Recursive Neural Networks. The systems rely on a combination of primary sequence and evolutionary information, predictions of structural features such as secondary structure, solvent accessibility and residue contact maps, and structural templates, both annotated for domains (from the SCOP dataset) and unannotated (from the PDB). We gauge the contribution of contact maps, and PDB and SCOP templates independently and for different ranges of template quality. We find that accurately predicted contact maps are informative for the prediction of domain boundaries, while the same is not true for contact maps predicted ab initio. We also find that gap information from PDB templates is informative, but, not surprisingly, less than SCOP annotations. We test both systems trained on templates of all qualities, and systems trained only on templates of marginal similarity to the query (less than 25% sequence identity). While the first batch of systems produces near perfect predictions in the presence of fair to good templates, the second batch outperforms or match ab initio predictors down to essentially any level of template quality.

We test all systems in 5-fold cross-validation on a large non-redundant set of multi-domain and single domain proteins. The final predictors are state-of-the-art, with a template-less prediction boundary recall of 50.8% (precision 38.7%) within ± 20 residues and a single domain recall of 80.3% (precision 78.1%). The SCOP-based predictors achieve a boundary recall of 74% (precision 77.1%) again within ± 20 residues, and classify single domain proteins as such in over 85% of cases, when we allow a mix of bad and good quality templates. If we only allow marginal templates (max 25% sequence identity to the query) the scores remain high, with boundary recall and precision of 59% and 66.3%, and 80% of all single domain proteins predicted correctly.

Conclusion

The systems presented here may prove useful in large-scale annotation of protein domains in proteins of unknown structure. The methods are available as public web servers at the address: and we plan on running them on a multi-genomic scale and make the results public in the near future.

Collapse

Kuzniar A, Lin K, He Y, Nijveen H, Pongor S, Leunissen JAM. ProGMap: an integrated annotation resource for protein orthology. Nucleic Acids Res 2009;37:W428-34. [PMID: 19494185 PMCID: PMC2703891 DOI: 10.1093/nar/gkp462] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Fulton DL, Sundararajan S, Badis G, Hughes TR, Wasserman WW, Roach JC, Sladek R. TFCat: the curated catalog of mouse and human transcription factors. Genome Biol 2009;10:R29. [PMID: 19284633 PMCID: PMC2691000 DOI: 10.1186/gb-2009-10-3-r29] [Citation(s) in RCA: 153] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2008] [Revised: 02/26/2009] [Accepted: 03/12/2009] [Indexed: 11/20/2022] Open

Wang M, Caetano-Anollés G. The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. Structure 2009;17:66-78. [PMID: 19141283 DOI: 10.1016/j.str.2008.11.008] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2008] [Revised: 10/27/2008] [Accepted: 11/13/2008] [Indexed: 10/21/2022]

Wrzeszczynski KO, Rost B. Cell cycle kinases predicted from conserved biophysical properties. Proteins 2009;74:655-68. [PMID: 18704950 DOI: 10.1002/prot.22181] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Nair R, Liu J, Soong TT, Acton TB, Everett JK, Kouranov A, Fiser A, Godzik A, Jaroszewski L, Orengo C, Montelione GT, Rost B. Structural genomics is the largest contributor of novel structural leverage. ACTA ACUST UNITED AC 2009;10:181-91. [PMID: 19194785 PMCID: PMC2705706 DOI: 10.1007/s10969-008-9055-6] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2008] [Accepted: 12/08/2008] [Indexed: 11/28/2022]

Loewenstein Y, Portugaly E, Fromer M, Linial M. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. Bioinformatics 2008;24:i41-9. [PMID: 18586742 PMCID: PMC2718652 DOI: 10.1093/bioinformatics/btn174] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

MOTIVATION

UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets.

APPLICATION

We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation.

RESULTS

We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families.

AVAILABILITY

A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.

Collapse

Sequence similarity network reveals common ancestry of multidomain proteins. PLoS Comput Biol 2008;4:e1000063. [PMID: 18475320 PMCID: PMC2377100 DOI: 10.1371/journal.pcbi.1000063] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2007] [Accepted: 03/18/2008] [Indexed: 11/25/2022] Open

Abstract

We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1) an extension of the traditional model of homology to include domain insertions; and 2) a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with identical domain architectures, we show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them. Our study demonstrates the utility of mining network structure for evolutionary information, suggesting this is a fertile approach for investigating evolutionary processes in the post-genomic era.

New genes evolve through the duplication and modification of existing genes. As a result, genes that share common ancestry tend to have similar structure and function. Computational methods that use common ancestry have been extraordinarily successful in inferring function. The practice of discerning evolutionary relationships is stymied, however, by modular sequences made up of two or more domains. When two genes share some domains but not others, it is difficult to distinguish a case of common ancestry from insertion of the same domain into both genes. We present a formal framework to define how multidomain genes are related, and propose a novel method for rapid, robust characterization of evolutionary relationships. In an empirical comparison with the current state of the art, we demonstrate superior performance of our method using a large hand-curated set of sequences known to share common ancestry. The success of our method derives from its unique ability to infer evolutionary history from local topology in the sequence similarity network. This represents a departure from the view that protein family classification must be restricted to families with conserved architecture. By exploiting the structure of the sequence similarity network, our approach surmounts this limitation and opens the door to studies of the role of modularity in protein evolution.

Collapse

Schenk G, Margraf T, Torda AE. Protein sequence and structure alignments within one framework. Algorithms Mol Biol 2008;3:4. [PMID: 18380904 PMCID: PMC2390564 DOI: 10.1186/1748-7188-3-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 04/01/2008] [Indexed: 11/19/2022] Open

Carrière C, Mornon JP, Venien-Bryan C, Boisset N, Callebaut I. Calcineurin B-like domains in the large regulatory α/β subunits of phosphorylase kinase. Proteins 2008;71:1597-606. [DOI: 10.1002/prot.22006] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Tress M, Cheng J, Baldi P, Joo K, Lee J, Seo JH, Lee J, Baker D, Chivian D, Kim D, Ezkurdia I. Assessment of predictions submitted for the CASP7 domain prediction category. Proteins 2008;69 Suppl 8:137-51. [PMID: 17680686 DOI: 10.1002/prot.21675] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Fretwell JF, K. Ismail SM, Cummings JM, Selby TL. Characterization of a randomized FRET library for protease specificity determination. MOLECULAR BIOSYSTEMS 2008;4:862-70. [DOI: 10.1039/b709290c] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Heger A, Korpelainen E, Hupponen T, Mattila K, Ollikainen V, Holm L. PairsDB atlas of protein sequence space. Nucleic Acids Res 2007;36:D276-80. [PMID: 17986464 PMCID: PMC2238971 DOI: 10.1093/nar/gkm879] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Wang M, Yafremava LS, Caetano-Anollés D, Mittenthal JE, Caetano-Anollés G. Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. Genes Dev 2007;17:1572-85. [PMID: 17908824 PMCID: PMC2045140 DOI: 10.1101/gr.6454307] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2007] [Accepted: 08/23/2007] [Indexed: 11/25/2022]

Mezei M, Zhou MM. Pspace: a program that assesses protein space. SOURCE CODE FOR BIOLOGY AND MEDICINE 2007;2:6. [PMID: 17956630 PMCID: PMC2231351 DOI: 10.1186/1751-0473-2-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2007] [Accepted: 10/23/2007] [Indexed: 11/10/2022]

Aragues R, Sali A, Bonet J, Marti-Renom MA, Oliva B. Characterization of protein hubs by inferring interacting motifs from protein interactions. PLoS Comput Biol 2007;3:1761-71. [PMID: 17941705 PMCID: PMC1976338 DOI: 10.1371/journal.pcbi.0030178] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 07/27/2007] [Indexed: 12/19/2022] Open

Abstract

The characterization of protein interactions is essential for understanding biological systems. While genome-scale methods are available for identifying interacting proteins, they do not pinpoint the interacting motifs (e.g., a domain, sequence segments, a binding site, or a set of residues). Here, we develop and apply a method for delineating the interacting motifs of hub proteins (i.e., highly connected proteins). The method relies on the observation that proteins with common interaction partners tend to interact with these partners through a common interacting motif. The sole input for the method are binary protein interactions; neither sequence nor structure information is needed. The approach is evaluated by comparing the inferred interacting motifs with domain families defined for 368 proteins in the Structural Classification of Proteins (SCOP). The positive predictive value of the method for detecting proteins with common SCOP families is 75% at sensitivity of 10%. Most of the inferred interacting motifs were significantly associated with sequence patterns, which could be responsible for the common interactions. We find that yeast hubs with multiple interacting motifs are more likely to be essential than hubs with one or two interacting motifs, thus rationalizing the previously observed correlation between essentiality and the number of interacting partners of a protein. We also find that yeast hubs with multiple interacting motifs evolve slower than the average protein, contrary to the hubs with one or two interacting motifs. The proposed method will help us discover unknown interacting motifs and provide biological insights about protein hubs and their roles in interaction networks.

Recent advances in experimental methods have produced a deluge of protein–protein interactions data. However, these methods do not supply information on which specific protein regions are physically in contact during the interactions. Identifying these regions (interfaces) is fundamental for scientific disciplines that require detailed characterizations of protein interactions. In this work, we present a computational method that identifies groups of proteins with similar interfaces. This is achieved by relying on the observation that proteins with common interaction partners tend to interact through similar interfaces. The proposed method retrieves protein interactions from public data repositories and groups proteins that share a sensible number of interacting partners. Proteins within the same group are then labeled with the same “interacting motif” identifier (iMotif). The evaluation performed using known protein domains and structural binding sites suggests that the method is better suited for proteins with multiple interacting partners (hubs). Using yeast data, we show that the cellular essentiality of a gene better correlates with the number of interacting motifs than with the absolute number of interactions.

Collapse

Religa TL, Johnson CM, Vu DM, Brewer SH, Dyer RB, Fersht AR. The helix-turn-helix motif as an ultrafast independently folding domain: the pathway of folding of Engrailed homeodomain. Proc Natl Acad Sci U S A 2007;104:9272-7. [PMID: 17517666 PMCID: PMC1890484 DOI: 10.1073/pnas.0703434104] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space? Malar J 2006;5:110. [PMID: 17112376 PMCID: PMC1665468 DOI: 10.1186/1475-2875-5-110] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2006] [Accepted: 11/17/2006] [Indexed: 11/21/2022] Open

Portugaly E, Linial N, Linial M. EVEREST: a collection of evolutionary conserved protein domains. Nucleic Acids Res 2006;35:D241-6. [PMID: 17099230 PMCID: PMC1669739 DOI: 10.1093/nar/gkl850] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Wang M, Caetano-Anollés G. Global phylogeny determined by the combination of protein domains in proteomes. Mol Biol Evol 2006;23:2444-54. [PMID: 16971695 DOI: 10.1093/molbev/msl117] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Lin K, Zhu L, Zhang DY. An initial strategy for comparing proteins at the domain architecture level. Bioinformatics 2006;22:2081-6. [PMID: 16837531 DOI: 10.1093/bioinformatics/btl366] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Portugaly E, Harel A, Linial N, Linial M. EVEREST: automatic identification and classification of protein domains in all protein sequences. BMC Bioinformatics 2006;7:277. [PMID: 16749920 PMCID: PMC1533870 DOI: 10.1186/1471-2105-7-277] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Accepted: 06/02/2006] [Indexed: 11/16/2022] Open

Abstract

Background

Proteins are comprised of one or several building blocks, known as domains. Such domains can be classified into families according to their evolutionary origin. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational methodologies for large-scale determination of protein domains and their boundaries. We provide and rigorously evaluate a novel set of domain families that is automatically generated from sequence data. Our domain family identification process, called EVEREST (EVolutionary Ensembles of REcurrent SegmenTs), begins by constructing a library of protein segments that emerge in an all vs. all pairwise sequence comparison. It then proceeds to cluster these segments into putative domain families. The selection of the best putative families is done using machine learning techniques. A statistical model is then created for each of the chosen families. This procedure is then iterated: the aforementioned statistical models are used to scan all protein sequences, to recreate a library of segments and to cluster them again.

Results

Processing the Swiss-Prot section of the UniProt Knoledgebase, release 7.2, EVEREST defines 20,230 domains, covering 85% of the amino acids of the Swiss-Prot database. EVEREST annotates 11,852 proteins (6% of the database) that are not annotated by Pfam A. In addition, in 43,086 proteins (20% of the database), EVEREST annotates a part of the protein that is not annotated by Pfam A. Performance tests show that EVEREST recovers 56% of Pfam A families and 63% of SCOP families with high accuracy, and suggests previously unknown domain families with at least 51% fidelity. EVEREST domains are often a combination of domains as defined by Pfam or SCOP and are frequently sub-domains of such domains.

Conclusion

The EVEREST process and its output domain families provide an exhaustive and validated view of the protein domain world that is automatically generated from sequence data. The EVEREST library of domain families, accessible for browsing and download at [1], provides a complementary view to that provided by other existing libraries. Furthermore, since it is automatic, the EVEREST process is scalable and we will run it in the future on larger databases as well. The EVEREST source files are available for download from the EVEREST web site.

Collapse

Dehal PS, Boore JL. A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database. BMC Bioinformatics 2006;7:201. [PMID: 16608522 PMCID: PMC1523372 DOI: 10.1186/1471-2105-7-201] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2005] [Accepted: 04/11/2006] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community.

DISCUSSION

The PhIGs database currently contains 23 completely sequenced genomes of fungi and metazoans, containing 409,653 genes that have been grouped into 42,645 gene clusters. Each gene cluster is built such that the gene sequence distances are consistent with the known organismal relationships and in so doing, maximizing the likelihood for the clusters to represent truly orthologous genes. The PhIGs website contains tools that allow the study of genes within their phylogenetic framework through keyword searches on annotations, such as GO and InterPro assignments, and sequence similarity searches by BLAST and HMM. In addition to displaying the evolutionary relationships of the genes in each cluster, the website also allows users to view the relative physical positions of homologous genes in specified sets of genomes.

SUMMARY

Accurate analyses of genes and genomes can only be done within their full phylogenetic context. The PhIGs database and corresponding website http://phigs.org address this problem for the scientific community. Our goal is to expand the content as more genomes are sequenced and use this framework to incorporate more analyses.

Collapse

Camoglu O, Can T, Singh AK. Integrating multi-attribute similarity networks for robust representation of the protein space. Bioinformatics 2006;22:1585-92. [PMID: 16595556 DOI: 10.1093/bioinformatics/btl130] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Uchiyama I. Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes. Nucleic Acids Res 2006;34:647-58. [PMID: 16436801 PMCID: PMC1351371 DOI: 10.1093/nar/gkj448] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Gewehr JE, Zimmer R. SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles. Bioinformatics 2005;22:181-7. [PMID: 16267083 DOI: 10.1093/bioinformatics/bti751] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Dekker FJ, Koch MA, Waldmann H. Protein structure similarity clustering (PSSC) and natural product structure as inspiration sources for drug development and chemical genomics. Curr Opin Chem Biol 2005;9:232-9. [PMID: 15939324 DOI: 10.1016/j.cbpa.2005.03.003] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2005] [Accepted: 03/22/2005] [Indexed: 02/04/2023]

Donald JE, Shakhnovich EI. Determining functional specificity from protein sequences. Bioinformatics 2005;21:2629-35. [PMID: 15797914 DOI: 10.1093/bioinformatics/bti396] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Bae K, Mallick BK, Elsik CG. Prediction of protein interdomain linker regions by a hidden Markov model. Bioinformatics 2005;21:2264-70. [PMID: 15746283 DOI: 10.1093/bioinformatics/bti363] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Krause A, Stoye J, Vingron M. Large scale hierarchical clustering of protein sequences. BMC Bioinformatics 2005;6:15. [PMID: 15663796 PMCID: PMC547898 DOI: 10.1186/1471-2105-6-15] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2004] [Accepted: 01/22/2005] [Indexed: 11/10/2022] Open

Kaplan N, Friedlich M, Fromer M, Linial M. A functional hierarchical organization of the protein sequence space. BMC Bioinformatics 2004;5:196. [PMID: 15596019 PMCID: PMC544566 DOI: 10.1186/1471-2105-5-196] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2004] [Accepted: 12/14/2004] [Indexed: 11/10/2022] Open

Papandreou N, Berezovsky IN, Lopes A, Eliopoulos E, Chomilier J. Universal positions in globular proteins. From observation to simulation. ACTA ACUST UNITED AC 2004;271:4762-8. [PMID: 15606763 DOI: 10.1111/j.1432-1033.2004.04440.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Kifer I, Sasson O, Linial M. Predicting fold novelty based on ProtoNet hierarchical classification. Bioinformatics 2004;21:1020-7. [PMID: 15539447 DOI: 10.1093/bioinformatics/bti135] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Liu J, Hegyi H, Acton TB, Montelione GT, Rost B. Automatic target selection for structural genomics on eukaryotes. Proteins 2004;56:188-200. [PMID: 15211504 DOI: 10.1002/prot.20012] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]