1
|
Price MN, Arkin AP. A fast comparative genome browser for diverse bacteria and archaea. PLoS One 2024; 19:e0301871. [PMID: 38593165 PMCID: PMC11003636 DOI: 10.1371/journal.pone.0301871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/22/2024] [Indexed: 04/11/2024] Open
Abstract
Genome sequencing has revealed an incredible diversity of bacteria and archaea, but there are no fast and convenient tools for browsing across these genomes. It is cumbersome to view the prevalence of homologs for a protein of interest, or the gene neighborhoods of those homologs, across the diversity of the prokaryotes. We developed a web-based tool, fast.genomics, that uses two strategies to support fast browsing across the diversity of prokaryotes. First, the database of genomes is split up. The main database contains one representative from each of the 6,377 genera that have a high-quality genome, and additional databases for each taxonomic order contain up to 10 representatives of each species. Second, homologs of proteins of interest are identified quickly by using accelerated searches, usually in a few seconds. Once homologs are identified, fast.genomics can quickly show their prevalence across taxa, view their neighboring genes, or compare the prevalence of two different proteins. Fast.genomics is available at https://fast.genomics.lbl.gov.
Collapse
Affiliation(s)
- Morgan N. Price
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Adam P. Arkin
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| |
Collapse
|
2
|
Haft DH. In silico discovery of the myxosortases that process MYXO-CTERM and three novel prokaryotic C-terminal protein-sorting signals that share invariant Cys residues. J Bacteriol 2024; 206:e0017323. [PMID: 38084967 PMCID: PMC10810001 DOI: 10.1128/jb.00173-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/10/2023] [Indexed: 01/26/2024] Open
Abstract
The LPXTG protein-sorting signal, found in surface proteins of various Gram-positive pathogens, was the founding member of a growing panel of prokaryotic small C-terminal sorting domains. Sortase A cleaves LPXTG, exosortases (XrtA and XrtB) cleave the PEP-CTERM sorting signal, archaeosortase A cleaves PGF-CTERM, and rhombosortase cleaves GlyGly-CTERM domains. Four sorting signal domains without previously known processing proteases are the MYXO-CTERM, JDVT-CTERM, Synerg-CTERM, and CGP-CTERM domains. These exhibit the standard tripartite architecture of a short signature motif, a hydrophobic transmembrane segment, and an Arg-rich cluster. Each has an invariant cysteine in its signature motif. Computational evidence strongly suggests that each of these four Cys-containing sorting signals is processed, at least in part, by a cognate family of glutamic-type intramembrane endopeptidases related to the eukaryotic type II CAAX-processing protease Rce1. For the MYXO-CTERM sorting signals of different lineages, their sorting enzymes, called myxosortases, include MrtX (MXAN_2755 in Myxococcus xanthus), MrtC, and MrtP, all with radically different N-terminal domains but with a conserved core. Related predicted sorting enzymes were also identified for JDVT-CTERM (MrtJ), Synerg-CTERM (MrtS), and CGP-CTERM (MrtA). This work establishes a major new family of protein-sorting housekeeping endopeptidases contributing to the surface attachment of proteins in prokaryotes. IMPORTANCE Homologs of the eukaryotic type II CAAX-box protease Rce1, a membrane-embedded endopeptidase found in yeast and human ER and involved in sorting proteins to their proper cellular locations, are abundant in prokaryotes but not well understood there. This bioinformatics paper identifies several subgroups of the family as cognate endopeptidases for four protein-sorting signals processed by previously unknown machinery. Sorting signals with newly identified processing enzymes include three novel ones, but also MYXO-CTERM, which had been the focus of previous experimental work in the model fruiting and gliding bacterium Myxococcus xanthus. The new findings will substantially improve our understanding of Cys-containing C-terminal protein-sorting signals and of protein trafficking generally in bacteria and archaea.
Collapse
Affiliation(s)
- Daniel H. Haft
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
3
|
Eight Unexpected Selenoprotein Families in Organometallic Biochemistry in Clostridium difficile, in ABC Transport, and in Methylmercury Biosynthesis. J Bacteriol 2023; 205:e0025922. [PMID: 36598231 PMCID: PMC9879109 DOI: 10.1128/jb.00259-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The bioinformatics of a nine-gene locus, designated selenocysteine-assisted organometallic (SAO), was investigated after identifying six new selenoprotein families and constructing hidden Markov models (HMMs) that find and annotate members of those families. Four are selenoproteins in most SAO loci, including Clostridium difficile. They include two ABC transporter subunits, namely, permease SaoP, with selenocysteine (U) at the channel-gating position, and substrate-binding subunit SaoB. Cytosolic selenoproteins include SaoL, homologous to MerB organomercurial lyases from mercury resistance loci, and SaoT, related to thioredoxins. SaoL, SaoB, and surface protein SaoC (an occasional selenoprotein) share an unusual CU dipeptide motif, which is something rare in selenoproteins but found in selenoprotein variants of mercury resistance transporter subunit MerT. A nonselenoprotein, SaoE, shares homology with Cu/Zn efflux and arsenical efflux pumps. The organization of the SAO system suggests substrate interaction with surface-exposed selenoproteins, followed by import, metabolism that may cleave a carbon-to-heavy metal bond, and finally metal efflux. A novel type of mercury resistance is possible, but SAO instead may support fermentative metabolism, with selenocysteine-mediated formation of organometallic intermediates, followed by import, degradation, and metal efflux. Phylogenetic profiling shows SOA loci consistently co-occur with Stickland fermentation markers but even more consistently with 8Fe-9S cofactor-type double-cubane proteins. Hypothesizing that the SAO system forms organometallic intermediates, we investigated the known methylmercury formation protein families HgcA and HgcB. Both families contained overlooked selenoproteins. Most HgcAs have a CU motif N terminal to their previously accepted start sites. Seeking additional rare and overlooked selenoproteins may help reveal more cryptic aspects of microbial biochemistry. IMPORTANCE This work adds 8 novel prokaryotic selenoproteins to the 80 or so families previously known. It describes the SAO (selenocysteine-assisted organometallic) locus, with the most selenoproteins of any known system. The rare CU motif recurs throughout, suggesting the formation and degradation of organometallic compounds. That suggestion triggered a reexamination of HgcA and HcgB, which are methylmercury formation proteins that can adversely impact food safety. Both are selenoproteins, once corrected, with HgcA again showing a CU motif. The SAO system is plausibly a mercury resistance locus for selenium-dependent anaerobes. But instead, it may exploit heavy metals as cofactors in organometallic intermediate-forming pathways that circumvent high activation energies and facilitate the breakdown of otherwise poorly accessible nutrients. SAO could provide an edge that helps Clostridium difficile, an important pathogen, establish disease.
Collapse
|
4
|
Price MN, Deutschbauer AM, Arkin AP. Four families of folate-independent methionine synthases. PLoS Genet 2021; 17:e1009342. [PMID: 33534785 PMCID: PMC7857596 DOI: 10.1371/journal.pgen.1009342] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 01/05/2021] [Indexed: 11/29/2022] Open
Abstract
Although most organisms synthesize methionine from homocysteine and methyl folates, some have “core” methionine synthases that lack folate-binding domains and use other methyl donors. In vitro, the characterized core synthases use methylcobalamin as a methyl donor, but in vivo, they probably rely on corrinoid (vitamin B12-binding) proteins. We identified four families of core methionine synthases that are distantly related to each other (under 30% pairwise amino acid identity). From the characterized enzymes, we identified the families MesA, which is found in methanogens, and MesB, which is found in anaerobic bacteria and archaea with the Wood-Ljungdahl pathway. A third uncharacterized family, MesC, is found in anaerobic archaea that have the Wood-Ljungdahl pathway and lack known forms of methionine synthase. We predict that most members of the MesB and MesC families accept methyl groups from the iron-sulfur corrinoid protein of that pathway. The fourth family, MesD, is found only in aerobic bacteria. Using transposon mutants and complementation, we show that MesD does not require 5-methyltetrahydrofolate or cobalamin. Instead, MesD requires an uncharacterized protein family (DUF1852) and oxygen for activity. Methionine is one of the amino acids that make up proteins, and the final step in methionine synthesis is the transfer of a methyl group. In most organisms, the methyl group is obtained from methyl folates, but some anaerobic bacteria and archaea are thought to use corrinoid (vitamin B12-binding) proteins instead. By analyzing the sequences of the potential methionine synthases across the genomes of diverse bacteria and archaea, we identified four families of folate-independent methionine synthases. For three of these families, we can use co-occurrence with corrinoid proteins to predict their likely partners. We show that the fourth family does not require vitamin B12; instead, it obtains methyl groups from an oxygen-dependent partner protein. Our results will help us understand the growth requirements of diverse bacteria and archaea.
Collapse
Affiliation(s)
- Morgan N. Price
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, United States of America
- * E-mail: (MNP); (APA)
| | - Adam M. Deutschbauer
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Adam P. Arkin
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, United States of America
- Department of Bioengineering, University of California, Berkeley, California, United States of America
- * E-mail: (MNP); (APA)
| |
Collapse
|
5
|
Posttranslational Methylation of Arginine in Methyl Coenzyme M Reductase Has a Profound Impact on both Methanogenesis and Growth of Methanococcus maripaludis. J Bacteriol 2020; 202:JB.00654-19. [PMID: 31740491 DOI: 10.1128/jb.00654-19] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 11/09/2019] [Indexed: 02/05/2023] Open
Abstract
Catalyzing the key step for anaerobic production and/or oxidation of methane and likely other short-chain alkanes, methyl coenzyme M reductase (Mcr) and its homologs play a key role in the global carbon cycle. The McrA subunit possesses up to five conserved posttranslational modifications (PTMs) at its active site. It was previously suggested that methanogenesis marker protein 10 (Mmp10) could play an important role in methanogenesis. To systematically examine its physiological role, mmpX (locus tag MMP1554), the gene encoding Mmp10 in Methanococcus maripaludis, was deleted with a new genetic tool, resulting in the complete loss of the 5-C-(S)-methylarginine PTM of residue 275 in the McrA subunit. When the ΔmmpX mutant was complemented with the wild-type gene expressed by either a strong or a weak promoter, methylation was fully restored. Compared to the parental strain, maximal rates of methane formation by whole cells were reduced by 40 to 60% in the ΔmmpX mutant. The reduction in activity was fully reversed by the complement with the strong promoter. Site-directed mutagenesis of mmpX resulted in a differential loss of arginine methylation among the mutants in vivo, suggesting that activities of Mmp10 directly modulated methylation. R275 was present in a highly conserved PXRR275(A/S)R(G/A) signature sequence in McrAs. The only other protein in M. maripaludis containing a similar sequence was not methylated, suggesting that Mmp10 is specific for McrA. In conclusion, Mmp10 modulates the methyl-Arg PTM on McrA in a highly specific manner, which has a profound impact on Mcr activity.IMPORTANCE Mcr is the key enzyme in methanogenesis and a promising candidate for bioengineering the conversion of methane to liquid fuel. Our knowledge of Mcr is still limited. In terms of complexity, uniqueness, and environmental importance, Mcr is more comparable to photosynthetic reaction centers than conventional enzymes. PTMs have long been hypothesized to play key roles in modulating Mcr activity. Here, we directly link the mmpX gene to the arginine PTM of Mcr, demonstrate its association with methanogenesis activity, and offer insights into its substrate specificity and putative cofactor binding sites. This is also the first time that a PTM of McrA has been shown to have a substantial impact on both methanogenesis and growth in the absence of additional stressors.
Collapse
|
6
|
Dong SH, Liu A, Mahanta N, Mitchell DA, Nair SK. Mechanistic Basis for Ribosomal Peptide Backbone Modifications. ACS CENTRAL SCIENCE 2019; 5:842-851. [PMID: 31139720 PMCID: PMC6535971 DOI: 10.1021/acscentsci.9b00124] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Indexed: 05/16/2023]
Abstract
YcaO enzymes are known to catalyze the ATP-dependent formation of azoline heterocycles, thioamides, and (macro)lactamidines on peptide substrates. These enzymes are found in multiple biosynthetic pathways, including those for several different classes of ribosomally synthesized and post-translationally modified peptides (RiPPs). However, there are major knowledge gaps in the mechanistic and structural underpinnings that govern each of the known YcaO-mediated modifications. Here, we present the first structure of any YcaO enzyme bound to its peptide substrate in the active site, specifically that from Methanocaldococcus jannaschii which is involved in the thioamidation of the α-subunit of methyl-coenzyme M reductase (McrA). The structural data are leveraged to identify and test the residues involved in substrate binding and catalysis by site-directed mutagenesis. We also show that thioamide-forming YcaOs can carry out the cyclodehydration of a related peptide substrate, which underscores the mechanistic conservation across the YcaO family and allows for the extrapolation of mechanistic details to azoline-forming YcaOs involved in RiPP biosynthesis. A bioinformatic survey of all YcaOs highlights the diverse sequence space in azoline-forming YcaOs and suggests their early divergence from a common ancestor. The data presented within provide a detailed molecular framework for understanding this family of enzymes, which reconcile several decades of prior data on RiPP cyclodehydratases. These studies also provide the foundational knowledge to impact our mechanistic understanding of additional RiPP biosynthetic classes.
Collapse
Affiliation(s)
- Shi-Hui Dong
- Department
of Biochemistry, Carl R. Woese Institute for Genomic Biology, Department of Microbiology, Department of Chemistry, and Center for Biophysics
and Quantitative Biology, University of
Illinois, 600 South Mathews Avenue, Urbana, Illinois 61801, United
States
| | - Andi Liu
- Department
of Biochemistry, Carl R. Woese Institute for Genomic Biology, Department of Microbiology, Department of Chemistry, and Center for Biophysics
and Quantitative Biology, University of
Illinois, 600 South Mathews Avenue, Urbana, Illinois 61801, United
States
| | - Nilkamal Mahanta
- Department
of Biochemistry, Carl R. Woese Institute for Genomic Biology, Department of Microbiology, Department of Chemistry, and Center for Biophysics
and Quantitative Biology, University of
Illinois, 600 South Mathews Avenue, Urbana, Illinois 61801, United
States
| | - Douglas A. Mitchell
- Department
of Biochemistry, Carl R. Woese Institute for Genomic Biology, Department of Microbiology, Department of Chemistry, and Center for Biophysics
and Quantitative Biology, University of
Illinois, 600 South Mathews Avenue, Urbana, Illinois 61801, United
States
| | - Satish K. Nair
- Department
of Biochemistry, Carl R. Woese Institute for Genomic Biology, Department of Microbiology, Department of Chemistry, and Center for Biophysics
and Quantitative Biology, University of
Illinois, 600 South Mathews Avenue, Urbana, Illinois 61801, United
States
| |
Collapse
|
7
|
Assembly of Methyl Coenzyme M Reductase in the Methanogenic Archaeon Methanococcus maripaludis. J Bacteriol 2018; 200:JB.00746-17. [PMID: 29339414 DOI: 10.1128/jb.00746-17] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2017] [Accepted: 01/04/2018] [Indexed: 01/22/2023] Open
Abstract
Methyl coenzyme M reductase (MCR) is a complex enzyme that catalyzes the final step in biological methanogenesis. To better understand its assembly, the recombinant MCR from the thermophile Methanothermococcus okinawensis (rMCRok) was expressed in the mesophile Methanococcus maripaludis The rMCRok was posttranslationally modified correctly and contained McrD and the unique nickel tetrapyrrole coenzyme F430 Subunits of the native M. maripaludis (MCRmar) were largely absent, suggesting that the recombinant enzyme was formed by an assembly of cotranscribed subunits. Strong support for this hypothesis was obtained by expressing a chimeric operon comprising the His-tagged mcrA from M. maripaludis and the mcrBDCG from M. okinawensis in M. maripaludis The His-tagged purified rMCR then contained the M. maripaludis McrA and the M. okinawensis McrBDG. The present study prompted us to form a working model for MCR assembly, which can be further tested by the heterologous expression system established here.IMPORTANCE Approximately 1.6% of the net primary production of plants, algae, and cyanobacteria are processed by biological methane production in anoxic environments. This accounts for about 74% of the total global methane production, up to 25% of which is consumed by anaerobic oxidation of methane (AOM). Methyl coenzyme M reductase (MCR) is the key enzyme in both methanogenesis and AOM. MCR is assembled as a dimer of two heterotrimers, where posttranslational modifications and F430 cofactors are embedded in the active sites. However, this complex assembly process remains unknown. Here, we established a heterologous expression system for MCR to learn how MCR is assembled.
Collapse
|
8
|
Nayak DD, Mahanta N, Mitchell DA, Metcalf WW. Post-translational thioamidation of methyl-coenzyme M reductase, a key enzyme in methanogenic and methanotrophic Archaea. eLife 2017; 6. [PMID: 28880150 PMCID: PMC5589413 DOI: 10.7554/elife.29218] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 08/11/2017] [Indexed: 12/14/2022] Open
Abstract
Methyl-coenzyme M reductase (MCR), found in strictly anaerobic methanogenic and methanotrophic archaea, catalyzes the reversible production and consumption of the potent greenhouse gas methane. The α subunit of MCR (McrA) contains several unusual post-translational modifications, including a rare thioamidation of glycine. Based on the presumed function of homologous genes involved in the biosynthesis of thioviridamide, a thioamide-containing natural product, we hypothesized that the archaeal tfuA and ycaO genes would be responsible for post-translational installation of thioglycine into McrA. Mass spectrometric characterization of McrA from the methanogenic archaeon Methanosarcina acetivorans lacking tfuA and/or ycaO revealed the presence of glycine, rather than thioglycine, supporting this hypothesis. Phenotypic characterization of the ∆ycaO-tfuA mutant revealed a severe growth rate defect on substrates with low free energy yields and at elevated temperatures (39°C - 45°C). Our analyses support a role for thioglycine in stabilizing the protein secondary structure near the active site.
Collapse
Affiliation(s)
- Dipti D Nayak
- Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana, United States
| | - Nilkamal Mahanta
- Department of Chemistry, University of Illinois, Urbana, United States
| | - Douglas A Mitchell
- Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana, United States.,Department of Chemistry, University of Illinois, Urbana, United States.,Department of Microbiology, University of Illinois, Urbana, United States
| | - William W Metcalf
- Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana, United States.,Department of Microbiology, University of Illinois, Urbana, United States
| |
Collapse
|
9
|
Haft DR, Haft DH. A comprehensive software suite for protein family construction and functional site prediction. PLoS One 2017; 12:e0171758. [PMID: 28182651 PMCID: PMC5300114 DOI: 10.1371/journal.pone.0171758] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 01/25/2017] [Indexed: 11/18/2022] Open
Abstract
In functionally diverse protein families, conservation in short signature regions may outperform full-length sequence comparisons for identifying proteins that belong to a subgroup within which one specific aspect of their function is conserved. The SIMBAL workflow (Sites Inferred by Metabolic Background Assertion Labeling) is a data-mining procedure for finding such signature regions. It begins by using clues from genomic context, such as co-occurrence or conserved gene neighborhoods, to build a useful training set from a large number of uncharacterized but mutually homologous proteins. When training set construction is successful, the YES partition is enriched in proteins that share function with the user’s query sequence, while the NO partition is depleted. A selected query sequence is then mined for short signature regions whose closest matches overwhelmingly favor proteins from the YES partition. High-scoring signature regions typically contain key residues critical to functional specificity, so proteins with the highest sequence similarity across these regions tend to share the same function. The SIMBAL algorithm was described previously, but significant manual effort, expertise, and a supporting software infrastructure were required to prepare the requisite training sets. Here, we describe a new, distributable software suite that speeds up and simplifies the process for using SIMBAL, most notably by providing tools that automate training set construction. These tools have broad utility for comparative genomics, allowing for flexible collection of proteins or protein domains based on genomic context as well as homology, a capability that can greatly assist in protein family construction. Armed with this new software suite, SIMBAL can serve as a fast and powerful in silico alternative to direct experimentation for characterizing proteins and their functional interactions.
Collapse
Affiliation(s)
- David Renfrew Haft
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Daniel H. Haft
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
10
|
Haft DH. Using comparative genomics to drive new discoveries in microbiology. Curr Opin Microbiol 2015; 23:189-96. [PMID: 25617609 DOI: 10.1016/j.mib.2014.11.017] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Revised: 11/19/2014] [Accepted: 11/20/2014] [Indexed: 01/17/2023]
Abstract
Bioinformatics looks to many microbiologists like a service industry. In this view, annotation starts with what is known from experiments in the lab, makes reasonable inferences of which genes match other genes in function, builds databases to make all that we know accessible, but creates nothing truly new. Experiments lead, then biocuration and computational biology follow. But the astounding success of genome sequencing is changing the annotation paradigm. Every genome sequenced is an intercepted coded message from the microbial world, and as all cryptographers know, it is easier to decode a thousand messages than a single message. Some biology is best discovered not by phenomenology, but by decoding genome content, forming hypotheses, and doing the first few rounds of validation computationally. Through such reasoning, a role and function may be assigned to a protein with no sequence similarity to any protein yet studied. Experimentation can follow after the discovery to cement and to extend the findings. Unfortunately, this approach remains so unfamiliar to most bench scientists that lab work and comparative genomics typically segregate to different teams working on unconnected projects. This review will discuss several themes in comparative genomics as a discovery method, including highly derived data, use of patterns of design to reason by analogy, and in silico testing of computationally generated hypotheses.
Collapse
|
11
|
Kinch LN, Grishin NV. Bioinformatics perspective on rhomboid intramembrane protease evolution and function. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2013; 1828:2937-43. [PMID: 23845876 DOI: 10.1016/j.bbamem.2013.06.031] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Revised: 06/25/2013] [Accepted: 06/27/2013] [Indexed: 10/26/2022]
Abstract
Endopeptidase classification based on catalytic mechanism and evolutionary history has proven to be invaluable to the study of proteolytic enzymes. Such general mechanistic- and evolutionary- based groupings have launched experimental investigations, because knowledge gained for one family member tends to apply to the other closely related enzymes. The serine endopeptidases represent one of the most abundant and diverse groups, with their apparently successful proteolytic mechanism having arisen independently many times throughout evolution, giving rise to the well-studied soluble chemotrypsins and subtilisins, among many others. A large and diverse family of polytopic transmembrane proteins known as rhomboids has also evolved the serine protease mechanism. While the spatial structure, mechanism, and biochemical function of this family as intramembrane proteases has been established, the cellular roles of these enzymes as well as their natural substrates remain largely undetermined. While the evolutionary history of rhomboid proteases has been debated, sorting out the relationships among current day representatives should provide a solid basis for narrowing the knowledge gap between their biochemical and cellular functions. Indeed, some functional characteristics of rhomboid proteases can be gleaned from their evolutionary relationships. Finally, a specific case where phylogenetic profile analysis has identified proteins that contain a C-terminal processing motif (GlyGly-Cterm) as co-occurring with a set of bacterial rhomboid proteases provides an example of potential target identification through bioinformatics. This article is part of a Special Issue entitled: Intramembrane Proteases.
Collapse
Affiliation(s)
- Lisa N Kinch
- Howard Hughes Medical Institute and Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | | |
Collapse
|
12
|
Szczesny P, Mykowiecka A, Pawłowski K, Grynberg M. Distinct protein classes in human red cell proteome revealed by similarity of phylogenetic profiles. PLoS One 2013; 8:e54471. [PMID: 23349899 PMCID: PMC3549994 DOI: 10.1371/journal.pone.0054471] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 12/12/2012] [Indexed: 01/16/2023] Open
Abstract
The minimal set of proteins necessary to maintain a vertebrate cell forms an interesting core of cellular machinery. The known proteome of human red blood cell consists of about 1400 proteins. We treated this protein complement of one of the simplest human cells as a model and asked the questions on its function and origins. The proteome was mapped onto phylogenetic profiles, i.e. vectors of species possessing homologues of human proteins. A novel clustering approach was devised, utilising similarity in the phylogenetic spread of homologues as distance measure. The clustering based on phylogenetic profiles yielded several distinct protein classes differing in phylogenetic taxonomic spread, presumed evolutionary history and functional properties. Notably, small clusters of proteins common to vertebrates or Metazoa and other multicellular eukaryotes involve biological functions specific to multicellular organisms, such as apoptosis or cell-cell signaling, respectively. Also, a eukaryote-specific cluster is identified, featuring GTP-ase signalling and ubiquitination. Another cluster, made up of proteins found in most organisms, including bacteria and archaea, involves basic molecular functions such as oxidation-reduction and glycolysis. Approximately one third of erythrocyte proteins do not fall in any of the clusters, reflecting the complexity of protein evolution in comparison to our simple model. Basically, the clustering obtained divides the proteome into old and new parts, the former originating from bacterial ancestors, the latter from inventions within multicellular eukaryotes. Thus, the model human cell proteome appears to be made up of protein sets distinct in their history and biological roles. The current work shows that phylogenetic profiles concept allows protein clustering in a way relevant both to biological function and evolutionary history.
Collapse
Affiliation(s)
- Paweł Szczesny
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
- Department of Plant Molecular Biology, Institute of Experimental Plant Biology, University of Warsaw, Warsaw, Poland
| | | | - Krzysztof Pawłowski
- Faculty of Agriculture and Biology, Warsaw University of Life Sciences, Warsaw, Poland
- Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
- * E-mail: (MG); (KP)
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
- * E-mail: (MG); (KP)
| |
Collapse
|
13
|
Haft DH, Selengut JD, Richter RA, Harkins D, Basu MK, Beck E. TIGRFAMs and Genome Properties in 2013. Nucleic Acids Res 2012. [PMID: 23197656 PMCID: PMC3531188 DOI: 10.1093/nar/gks1234] [Citation(s) in RCA: 404] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
TIGRFAMs, available online at http://www.jcvi.org/tigrfams is a database of protein family definitions. Each entry features a seed alignment of trusted representative sequences, a hidden Markov model (HMM) built from that alignment, cutoff scores that let automated annotation pipelines decide which proteins are members, and annotations for transfer onto member proteins. Most TIGRFAMs models are designated equivalog, meaning they assign a specific name to proteins conserved in function from a common ancestral sequence. Models describing more functionally heterogeneous families are designated subfamily or domain, and assign less specific but more widely applicable annotations. The Genome Properties database, available at http://www.jcvi.org/genome-properties, specifies how computed evidence, including TIGRFAMs HMM results, should be used to judge whether an enzymatic pathway, a protein complex or another type of molecular subsystem is encoded in a genome. TIGRFAMs and Genome Properties content are developed in concert because subsystems reconstruction for large numbers of genomes guides selection of seed alignment sequences and cutoff values during protein family construction. Both databases specialize heavily in bacterial and archaeal subsystems. At present, 4284 models appear in TIGRFAMs, while 628 systems are described by Genome Properties. Content derives both from subsystem discovery work and from biocuration of the scientific literature.
Collapse
Affiliation(s)
- Daniel H Haft
- Informatics, J Craig Venter Institute, Rockville, MD 20850, USA.
| | | | | | | | | | | |
Collapse
|
14
|
Archaeosortases and exosortases are widely distributed systems linking membrane transit with posttranslational modification. J Bacteriol 2011; 194:36-48. [PMID: 22037399 DOI: 10.1128/jb.06026-11] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Multiple new prokaryotic C-terminal protein-sorting signals were found that reprise the tripartite architecture shared by LPXTG and PEP-CTERM: motif, TM helix, basic cluster. Defining hidden Markov models were constructed for all. PGF-CTERM occurs in 29 archaeal species, some of which have more than 50 proteins that share the domain. PGF-CTERM proteins include the major cell surface protein in Halobacterium, a glycoprotein with a partially characterized diphytanylglyceryl phosphate linkage near its C terminus. Comparative genomics identifies a distant exosortase homolog, designated archaeosortase A (ArtA), as the likely protein-processing enzyme for PGF-CTERM. Proteomics suggests that the PGF-CTERM region is removed. Additional systems include VPXXXP-CTERM/archeaosortase B in two of the same archaea and PEF-CTERM/archaeosortase C in four others. Bacterial exosortases often fall into subfamilies that partner with very different cohorts of extracellular polymeric substance biosynthesis proteins; several species have multiple systems. Variant systems include the VPDSG-CTERM/exosortase C system unique to certain members of the phylum Verrucomicrobia, VPLPA-CTERM/exosortase D in several alpha- and deltaproteobacterial species, and a dedicated (single-target) VPEID-CTERM/exosortase E system in alphaproteobacteria. Exosortase-related families XrtF in the class Flavobacteria and XrtG in Gram-positive bacteria mark distinctive conserved gene neighborhoods. A picture emerges of an ancient and now well-differentiated superfamily of deeply membrane-embedded protein-processing enzymes. Their target proteins are destined to transit cellular membranes during their biosynthesis, during which most undergo additional posttranslational modifications such as glycosylation.
Collapse
|