Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Pollack JD, Li Q, Pearl DK. Taxonomic utility of a phylogenetic analysis of phosphoglycerate kinase proteins of Archaea, Bacteria, and Eukaryota: Insights by Bayesian analyses. Mol Phylogenet Evol 2005;35:420-30. [PMID: 15804412 DOI: 10.1016/j.ympev.2005.02.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2004] [Revised: 02/04/2005] [Accepted: 02/07/2005] [Indexed: 10/25/2022]

For:	Pollack JD, Li Q, Pearl DK. Taxonomic utility of a phylogenetic analysis of phosphoglycerate kinase proteins of Archaea, Bacteria, and Eukaryota: Insights by Bayesian analyses. Mol Phylogenet Evol 2005;35:420-30. [PMID: 15804412 DOI: 10.1016/j.ympev.2005.02.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2004] [Revised: 02/04/2005] [Accepted: 02/07/2005] [Indexed: 10/25/2022]

Number

Cited by Other Article(s)

TMT-Based Quantitative Proteomics Analysis of the Fish-Borne Spoiler Shewanella putrefaciens Subjected to Cold Stress Using LC-MS/MS. J CHEM-NY 2021. [DOI: 10.1155/2021/8876986] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

Wellner A, Raitses Gurevich M, Tawfik DS. Mechanisms of protein sequence divergence and incompatibility. PLoS Genet 2013;9:e1003665. [PMID: 23935519 PMCID: PMC3723536 DOI: 10.1371/journal.pgen.1003665] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Accepted: 06/10/2013] [Indexed: 12/27/2022] Open

Abstract

Alignments of orthologous protein sequences convey a complex picture. Some positions are utterly conserved whilst others have diverged to variable degrees. Amongst the latter, many are non-exchangeable between extant sequences. How do functionally critical and highly conserved residues diverge? Why and how did these exchanges become incompatible within contemporary sequences? Our model is phosphoglycerate kinase (PGK), where lysine 219 is an essential active-site residue completely conserved throughout Eukaryota and Bacteria, and serine is found only in archaeal PGKs. Contemporary sequences tested exhibited complete loss of function upon exchanges at 219. However, a directed evolution experiment revealed that two mutations were sufficient for human PGK to become functional with serine at position 219. These two mutations made position 219 permissive not only for serine and lysine, but also to a range of other amino acids seen in archaeal PGKs. The identified trajectories that enabled exchanges at 219 show marked sign epistasis - a relatively small loss of function with respect to one amino acid (lysine) versus a large gain with another (serine, and other amino acids). Our findings support the view that, as theoretically described, the trajectories underlining the divergence of critical positions are dominated by sign epistatic interactions. Such trajectories are an outcome of rare mutational combinations. Nonetheless, as suggested by the laboratory enabled K219S exchange, given enough time and variability in selection levels, even utterly conserved and functionally essential residues may change.

Orthologs are proteins in different species sharing the same function and structure. However, the mechanisms that underline the divergence of different sequences from a single ancestor remain unclear, particularly because many amino acid exchanges between orthologs result in loss of function (incompatibility). We aimed at disentangling an ancient divergence event within the active-site of a universally spread enzyme that mediates ATP synthesis. Using laboratory evolution experiments, we found that an exchange in a functionally critical active-site residue that is incompatible within contemporary orthologs is enabled by few mutations. These mutations lead to transition sequences in which, unlike the extant sequences, a wide range of amino acids is tolerated. Our experiment reveals the properties of these transition sequences that may resemble the historical ancestral states that underlined this divergence event, and the mechanisms that led to incompatibility within the contemporary orthologs. Our results support theoretical predictions and reshape our understanding of protein structure-function. That a given position is entirely conserved and essential for function does not indicate that it will never exchange, but rather, that the exchange may depend on changes in many other positions.

Collapse

Pollack JD, Gerard D, Pearl DK. Uniquely localized intra-molecular amino acid concentrations at the glycolytic enzyme catalytic/active centers of Archaea, Bacteria and Eukaryota are associated with their proposed temporal appearances on earth. ORIGINS LIFE EVOL B 2013;43:161-87. [PMID: 23715690 DOI: 10.1007/s11084-013-9331-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Accepted: 04/04/2013] [Indexed: 11/27/2022]

Abstract

The distributions of amino acids at most-conserved sites nearest catalytic/active centers (C/AC) in 4,645 sequences of ten enzymes of the glycolytic Embden-Meyerhof-Parnas pathway in Archaea, Bacteria and Eukaryota are similar to the proposed temporal order of their appearance on Earth. Glycine, isoleucine, leucine, valine, glutamic acid and possibly lysine often described as prebiotic, i.e., existing or occurring before the emergence of life, were localized in positional and conservational defined aggregations in all enzymes of all Domains. The distributions of all 20 biologic amino acids in most-conserved sites nearest their C/ACs were quite different either from distributions in sites less-conserved and further from their C/ACs or from all amino acids regardless of their position or conservation. The major concentrations of glycine, e.g., perhaps the earliest prebiotic amino acid, occupies ≈ 16 % of all the most-conserved sites within a volume of ≈ 7-8 Å radius from their C/ACs and decreases linearly towards the molecule's peripheries. Spatially localized major concentrations of isoleucine, leucine and valine are in the mid-conserved and mid-distant sites from their C/ACs in protein interiors. Lysine and glutamic acid comprise ≈ 25-30 % of all amino acids within an irregular volume bounded by ≈ 24-28 Å radii from their C/ACs at the most-distant least-conserved sites. The unreported characteristics of these amino acids: their spatially and conservationally identified concentrations in Archaea, Bacteria and Eukaryota, suggest some common structural organization of glycolytic enzymes that may be relevant to their evolution and that of other proteins. We discuss our data in relation to enzyme evolution, their reported prebiotic putative temporal appearances on Earth, abundances, biological "cost", neighbor-sequence preferences or "ordering" and some thermodynamic parameters.

Collapse

Tree preserving embedding. Proc Natl Acad Sci U S A 2011;108:16916-21. [PMID: 21949369 DOI: 10.1073/pnas.1018393108] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Debruyne M, Verdonck T. Robust kernel principal component analysis and classification. ADV DATA ANAL CLASSI 2010. [DOI: 10.1007/s11634-010-0068-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Pollack JD, Pan X, Pearl DK. Concentration of specific amino acids at the catalytic/active centers of highly-conserved "housekeeping" enzymes of central metabolism in archaea, bacteria and Eukaryota: is there a widely conserved chemical signal of prebiotic assembly? ORIGINS LIFE EVOL B 2010;40:273-302. [PMID: 20069373 DOI: 10.1007/s11084-009-9188-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2009] [Accepted: 11/04/2009] [Indexed: 10/20/2022]

Kuystermans D, Dunn MJ, Al-Rubeai M. A proteomic study of cMyc improvement of CHO culture. BMC Biotechnol 2010;10:25. [PMID: 20307306 PMCID: PMC2859402 DOI: 10.1186/1472-6750-10-25] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Accepted: 03/22/2010] [Indexed: 02/07/2023] Open

Debruyne M. An outlier map for Support Vector Machine classification. Ann Appl Stat 2009. [DOI: 10.1214/09-aoas256] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Cytological characterization of YpsB, a novel component of the Bacillus subtilis divisome. J Bacteriol 2008;190:7096-107. [PMID: 18776011 DOI: 10.1128/jb.00064-08] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Kertész-Farkas A, Dhir S, Sonego P, Pacurar M, Netoteia S, Nijveen H, Kuzniar A, Leunissen JAM, Kocsor A, Pongor S. Benchmarking protein classification algorithms via supervised cross-validation. ACTA ACUST UNITED AC 2008;70:1215-23. [PMID: 17604112 DOI: 10.1016/j.jbbm.2007.05.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2007] [Revised: 05/20/2007] [Accepted: 05/23/2007] [Indexed: 11/30/2022]

Abstract

Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

Collapse

Brown DR, Whitcomb RF, Bradbury JM. Revised minimal standards for description of new species of the class Mollicutes (division Tenericutes). Int J Syst Evol Microbiol 2008;57:2703-2719. [PMID: 17978244 DOI: 10.1099/ijs.0.64722-0] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Abstract

Minimal standards for novel species of the class Mollicutes (trivial term, mollicutes), last published in 1995, require revision. The International Committee on Systematics of Prokaryotes Subcommittee on the Taxonomy of Mollicutes proposes herein revised standards that reflect recent advances in molecular systematics and the species concept for prokaryotes. The mandatory requirements are: (i) deposition of the type strain into two recognized culture collections, preferably located in different countries; (ii) deposition of the 16S rRNA gene sequence into a public database, and a phylogenetic analysis of the relationships among the 16S rRNA gene sequences of the novel species and its neighbours; (iii) deposition of antiserum against the type strain into a recognized collection; (iv) demonstration, by using the combination of 16S rRNA gene sequence analyses, serological analyses and supplementary phenotypic data, that the type strain differs significantly from all previously named species; and (v) assignment to an order, a family and a genus in the class, with an appropriate specific epithet. The 16S rRNA gene sequence provides the primary basis for assignment to hierarchical rank, and may also constitute evidence of species novelty, but serological and supplementary phenotypic data must be presented to substantiate this. Serological methods have been documented to be congruent with DNA-DNA hybridization data and with 16S rRNA gene placements. The novel species must be tested serologically to the greatest extent that the investigators deem feasible against all neighbouring species whose 16S rRNA gene sequences show >0.94 similarity. The investigator is responsible for justifying which characters are most meaningful for assignment to the part of the mollicute phylogenetic tree in which a novel species is located, and for providing the means by which novel species can be identified by other investigators. The publication of the description should appear in a journal having wide circulation. If the journal is not the International Journal of Systematic and Evolutionary Microbiology, copies of the publication must be submitted to that journal so that the name may be considered for inclusion in a Validation List as required by the International Code of Bacteriological Nomenclature (the Bacteriological Code). Updated informal descriptions of the class Mollicutes and some of its constituent higher taxa are available as supplementary material in IJSEM Online.

Collapse

Tree-Based Algorithms for Protein Classification. ACTA ACUST UNITED AC 2008. [DOI: 10.1007/978-3-540-76803-6_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Sonego P, Pacurar M, Dhir S, Kertész-Farkas A, Kocsor A, Gáspári Z, Leunissen JA, Pongor S. A Protein Classification Benchmark collection for machine learning. Nucleic Acids Res 2006;35:D232-6. [PMID: 17142240 PMCID: PMC1669728 DOI: 10.1093/nar/gkl812] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Kaján L, Kertész-Farkas A, Franklin D, Ivanova N, Kocsor A, Pongor S. Application of a simple likelihood ratio approximant to protein sequence classification. Bioinformatics 2006;22:2865-9. [PMID: 17090576 DOI: 10.1093/bioinformatics/btl512] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Baik JY, Lee MS, An SR, Yoon SK, Joo EJ, Kim YH, Park HW, Lee GM. Initial transcriptome and proteome analyses of low culture temperature-induced expression in CHO cells producing erythropoietin. Biotechnol Bioeng 2006;93:361-71. [PMID: 16187333 DOI: 10.1002/bit.20717] [Citation(s) in RCA: 113] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Kocsor A, Kertész-Farkas A, Kaján L, Pongor S. Application of compression-based distance measures to protein sequence classification: a methodological study. Bioinformatics 2005;22:407-12. [PMID: 16317070 DOI: 10.1093/bioinformatics/bti806] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open