51
|
Peterhoff D, Zellner H, Guldan H, Merkl R, Sterner R, Babinger P. Corrigendum: Dimerization Determines Substrate Specificity of a Bacterial Prenyltransferase. Chembiochem 2012. [DOI: 10.1002/cbic.201200473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
52
|
Dietrich S, Borst N, Schlee S, Schneider D, Janda JO, Sterner R, Merkl R. Experimental assessment of the importance of amino acid positions identified by an entropy-based correlation analysis of multiple-sequence alignments. Biochemistry 2012; 51:5633-41. [PMID: 22737967 DOI: 10.1021/bi300747r] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The analysis of a multiple-sequence alignment (MSA) with correlation methods identifies pairs of residue positions whose occupation with amino acids changes in a concerted manner. It is plausible to assume that positions that are part of many such correlation pairs are important for protein function or stability. We have used the algorithm H2r to identify positions k in the MSAs of the enzymes anthranilate phosphoribosyl transferase (AnPRT) and indole-3-glycerol phosphate synthase (IGPS) that show a high conn(k) value, i.e., a large number of significant correlations in which k is involved. The importance of the identified residues was experimentally validated by performing mutagenesis studies with sAnPRT and sIGPS from the archaeon Sulfolobus solfataricus. For sAnPRT, five H2r mutant proteins were generated by replacing nonconserved residues with alanine or the prevalent residue of the MSA. As a control, five residues with conn(k) values of zero were chosen randomly and replaced with alanine. The catalytic activities and conformational stabilities of the H2r and control mutant proteins were analyzed by steady-state enzyme kinetics and thermal unfolding studies. Compared to wild-type sAnPRT, the catalytic efficiencies (k(cat)/K(M)) were largely unaltered. In contrast, the apparent thermal unfolding temperature (T(M)(app)) was lowered in most proteins. Remarkably, the strongest observed destabilization (ΔT(M)(app) = 14 °C) was caused by the V284A exchange, which pertains to the position with the highest correlation signal [conn(k) = 11]. For sIGPS, six H2r mutant and four control proteins with alanine exchanges were generated and characterized. The k(cat)/K(M) values of four H2r mutant proteins were reduced between 13- and 120-fold, and their T(M)(app) values were decreased by up to 5 °C. For the sIGPS control proteins, the observed activity and stability decreases were much less severe. Our findings demonstrate that positions with high conn(k) values have an increased probability of being important for enzyme function or stability.
Collapse
|
53
|
Peterhoff D, Zellner H, Guldan H, Merkl R, Sterner R, Babinger P. Dimerization Determines Substrate Specificity of a Bacterial Prenyltransferase. Chembiochem 2012; 13:1297-303. [DOI: 10.1002/cbic.201200127] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Indexed: 01/19/2023]
|
54
|
Hain T, Ghai R, Billion A, Kuenne CT, Steinweg C, Izar B, Mohamed W, Mraheil MA, Domann E, Schaffrath S, Kärst U, Goesmann A, Oehm S, Pühler A, Merkl R, Vorwerk S, Glaser P, Garrido P, Rusniok C, Buchrieser C, Goebel W, Chakraborty T. Comparative genomics and transcriptomics of lineages I, II, and III strains of Listeria monocytogenes. BMC Genomics 2012; 13:144. [PMID: 22530965 PMCID: PMC3464598 DOI: 10.1186/1471-2164-13-144] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Accepted: 04/12/2012] [Indexed: 12/13/2022] Open
Abstract
Background Listeria monocytogenes is a food-borne pathogen that causes infections with a high-mortality rate and has served as an invaluable model for intracellular parasitism. Here, we report complete genome sequences for two L. monocytogenes strains belonging to serotype 4a (L99) and 4b (CLIP80459), and transcriptomes of representative strains from lineages I, II, and III, thereby permitting in-depth comparison of genome- and transcriptome -based data from three lineages of L. monocytogenes. Lineage III, represented by the 4a L99 genome is known to contain strains less virulent for humans. Results The genome analysis of the weakly pathogenic L99 serotype 4a provides extensive evidence of virulence gene decay, including loss of several important surface proteins. The 4b CLIP80459 genome, unlike the previously sequenced 4b F2365 genome harbours an intact inlB invasion gene. These lineage I strains are characterized by the lack of prophage genes, as they share only a single prophage locus with other L. monocytogenes genomes 1/2a EGD-e and 4a L99. Comparative transcriptome analysis during intracellular growth uncovered adaptive expression level differences in lineages I, II and III of Listeria, notable amongst which was a strong intracellular induction of flagellar genes in strain 4a L99 compared to the other lineages. Furthermore, extensive differences between strains are manifest at levels of metabolic flux control and phosphorylated sugar uptake. Intriguingly, prophage gene expression was found to be a hallmark of intracellular gene expression. Deletion mutants in the single shared prophage locus of lineage II strain EGD-e 1/2a, the lma operon, revealed severe attenuation of virulence in a murine infection model. Conclusion Comparative genomics and transcriptome analysis of L. monocytogenes strains from three lineages implicate prophage genes in intracellular adaptation and indicate that gene loss and decay may have led to the emergence of attenuated lineages.
Collapse
|
55
|
Janda JO, Busch M, Kück F, Porfenenko M, Merkl R. CLIPS-1D: analysis of multiple sequence alignments to deduce for residue-positions a role in catalysis, ligand-binding, or protein structure. BMC Bioinformatics 2012; 13:55. [PMID: 22480135 PMCID: PMC3391178 DOI: 10.1186/1471-2105-13-55] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 04/05/2012] [Indexed: 11/12/2022] Open
Abstract
Background One aim of the in silico characterization of proteins is to identify all residue-positions, which are crucial for function or structure. Several sequence-based algorithms exist, which predict functionally important sites. However, with respect to sequence information, many functionally and structurally important sites are hard to distinguish and consequently a large number of incorrectly predicted functional sites have to be expected. This is why we were interested to design a new classifier that differentiates between functionally and structurally important sites and to assess its performance on representative datasets. Results We have implemented CLIPS-1D, which predicts a role in catalysis, ligand-binding, or protein structure for residue-positions in a mutually exclusive manner. By analyzing a multiple sequence alignment, the algorithm scores conservation as well as abundance of residues at individual sites and their local neighborhood and categorizes by means of a multiclass support vector machine. A cross-validation confirmed that residue-positions involved in catalysis were identified with state-of-the-art quality; the mean MCC-value was 0.34. For structurally important sites, prediction quality was considerably higher (mean MCC = 0.67). For ligand-binding sites, prediction quality was lower (mean MCC = 0.12), because binding sites and structurally important residue-positions share conservation and abundance values, which makes their separation difficult. We show that classification success varies for residues in a class-specific manner. This is why our algorithm computes residue-specific p-values, which allow for the statistical assessment of each individual prediction. CLIPS-1D is available as a Web service at http://www-bioinf.uni-regensburg.de/. Conclusions CLIPS-1D is a classifier, whose prediction quality has been determined separately for catalytic sites, ligand-binding sites, and structurally important sites. It generates hypotheses about residue-positions important for a set of homologous proteins and focuses on conservation and abundance signals. Thus, the algorithm can be applied in cases where function cannot be transferred from well-characterized proteins by means of sequence comparison.
Collapse
|
56
|
Zellner H, Staudigel M, Trenner T, Bittkowski M, Wolowski V, Icking C, Merkl R. Prescont: Predicting protein-protein interfaces utilizing four residue properties. Proteins 2011; 80:154-68. [DOI: 10.1002/prot.23172] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Revised: 08/18/2011] [Accepted: 08/29/2011] [Indexed: 12/26/2022]
|
57
|
Fink F, Hochrein J, Wolowski V, Merkl R, Gronwald W. PROCOS: computational analysis of protein-protein complexes. J Comput Chem 2011; 32:2575-86. [PMID: 21630291 DOI: 10.1002/jcc.21837] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2010] [Revised: 04/15/2011] [Accepted: 04/15/2011] [Indexed: 11/11/2022]
Abstract
One of the main challenges in protein-protein docking is a meaningful evaluation of the many putative solutions. Here we present a program (PROCOS) that calculates a probability-like measure to be native for a given complex. In contrast to scores often used for analyzing complex structures, the calculated probabilities offer the advantage of providing a fixed range of expected values. This will allow, in principle, the comparison of models corresponding to different targets that were solved with the same algorithm. Judgments are based on distributions of properties derived from a large database of native and false complexes. For complex analysis PROCOS uses these property distributions of native and false complexes together with a support vector machine (SVM). PROCOS was compared to the established scoring schemes of ZRANK and DFIRE. Employing a set of experimentally solved native complexes, high probability values above 50% were obtained for 90% of these structures. Next, the performance of PROCOS was tested on the 40 binary targets of the Dockground decoy set, on 14 targets of the RosettaDock decoy set and on 9 targets that participated in the CAPRI scoring evaluation. Again the advantage of using a probability-based scoring system becomes apparent and a reasonable number of near native complexes was found within the top ranked complexes. In conclusion, a novel fully automated method is presented that allows the reliable evaluation of protein-protein complexes.
Collapse
|
58
|
Fischer A, Seitz T, Lochner A, Sterner R, Merkl R, Bocola M. A fast and precise approach for computational saturation mutagenesis and its experimental validation by using an artificial (βα)8-barrel protein. Chembiochem 2011; 12:1544-50. [PMID: 21626637 DOI: 10.1002/cbic.201100051] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Indexed: 11/09/2022]
Abstract
We present a computational saturation mutagenesis protocol (CoSM) that predicts the impact on stability of all possible amino acid substitutions for a given site at an internal protein interface. CoSM is an efficient algorithm that uses a combination of rotamer libraries, side-chain flips, energy minimization, and molecular dynamics equilibration. Because CoSM considers full side-chain and backbone flexibility in the local environment of the mutated position, amino acids larger than the wild-type residue are also modeled in a proper manner. To assess the performance of CoSM, the effect of point mutations on the stability of an artificial (βα)(8)-barrel protein that has been designed from identical (βα)(4)-half barrels, was studied. In this protein, position 234(N) is a previously identified stability hot-spot that is located at the interface of the two half barrels. By using CoSM, changes in protein stability were predicted for all possible single point mutations replacing wild-type Val234(N). In parallel, the stabilities of 14 representative mutants covering all amino acid classes were experimentally determined. A linear correlation of computationally and experimentally determined energy values yielded an R(2) value of 0.90, which is statistically significant. This degree of coherence is stronger than the ones we obtained for established computational methods of mutational analysis.
Collapse
|
59
|
Pürzer A, Grassmann F, Birzer D, Merkl R. Key2Ann: a tool to process sequence sets by replacing database identifiers with a human-readable annotation. J Integr Bioinform 2011. [DOI: 10.1515/jib-2011-153] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Summary Deducing common properties or degrees of phylogenetic relationship by analyzing a grouping or clustering of sequence sets is a frequently used technique in computational biology. If interpreted by means of visual inspection, the conclusions depend for many of these applications on meaningful names for the input data. In accordance with the aim of the analysis, the sequences should be provided with names indicating the function of the genes or gene-products, the phylogenetic position or other properties characterizing the contributing species. However, sequences extracted from databases are most often annotated with identifiers which only implicitly contain the desired information. To solve this problem, we have designed and implemented a tool named Key2Ann, which replaces in multiple fasta files the database keys with short terms indicating the taxonomic position or other features like the gene name or the EC-number. In addition, properties like habitat, growth temperature or the degree of pathogenicity can be coded for microbial species. To allow for highest flexibility, the user can control the composition of the names by means of command line parameters. Key2Ann is written in Java and can be downloaded via http://www-bioinf.uni-regensburg.de/downl/Key2Ann.zip. We demonstrate the usage of Key2Ann by discussing three typical examples of phylogenetic analysis.
Collapse
|
60
|
Wiedemann SM, Mildner SN, Bönisch C, Israel L, Maiser A, Matheisl S, Straub T, Merkl R, Leonhardt H, Kremmer E, Schermelleh L, Hake SB. Identification and characterization of two novel primate-specific histone H3 variants, H3.X and H3.Y. ACTA ACUST UNITED AC 2010; 190:777-91. [PMID: 20819935 PMCID: PMC2935562 DOI: 10.1083/jcb.201002043] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The expression of a new histone variant H3.Y increases during cellular stress to regulate cell cycle progression and gene expression. Nucleosomal incorporation of specialized histone variants is an important mechanism to generate different functional chromatin states. Here, we describe the identification and characterization of two novel primate-specific histone H3 variants, H3.X and H3.Y. Their messenger RNAs are found in certain human cell lines, in addition to several normal and malignant human tissues. In keeping with their primate specificity, H3.X and H3.Y are detected in different brain regions. Transgenic H3.X and H3.Y proteins are stably incorporated into chromatin in a similar fashion to the known H3 variants. Importantly, we demonstrate biochemically and by mass spectrometry that endogenous H3.Y protein exists in vivo, and that stress stimuli, such as starvation and cellular density, increase the abundance of H3.Y-expressing cells. Global transcriptome analysis revealed that knockdown of H3.Y affects cell growth and leads to changes in the expression of many genes involved in cell cycle control. Thus, H3.Y is a novel histone variant involved in the regulation of cellular responses to outside stimuli.
Collapse
|
61
|
von Mandach C, Merkl R. Genes optimized by evolution for accurate and fast translation encode in Archaea and Bacteria a broad and characteristic spectrum of protein functions. BMC Genomics 2010; 11:617. [PMID: 21050470 PMCID: PMC3091758 DOI: 10.1186/1471-2164-11-617] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 11/04/2010] [Indexed: 11/13/2022] Open
Abstract
Background In many microbial genomes, a strong preference for a small number of codons can be observed in genes whose products are needed by the cell in large quantities. This codon usage bias (CUB) improves translational accuracy and speed and is one of several factors optimizing cell growth. Whereas CUB and the overrepresentation of individual proteins have been studied in detail, it is still unclear which high-level metabolic categories are subject to translational optimization in different habitats. Results In a systematic study of 388 microbial species, we have identified for each genome a specific subset of genes characterized by a marked CUB, which we named the effectome. As expected, gene products related to protein synthesis are abundant in both archaeal and bacterial effectomes. In addition, enzymes contributing to energy production and gene products involved in protein folding and stabilization are overrepresented. The comparison of genomes from eleven habitats shows that the environment has only a minor effect on the composition of the effectomes. As a paradigmatic example, we detailed the effectome content of 37 bacterial genomes that are most likely exposed to strongest selective pressure towards translational optimization. These effectomes accommodate a broad range of protein functions like enzymes related to glycolysis/gluconeogenesis and the TCA cycle, ATP synthases, aminoacyl-tRNA synthetases, chaperones, proteases that degrade misfolded proteins, protectants against oxidative damage, as well as cold shock and outer membrane proteins. Conclusions We made clear that effectomes consist of specific subsets of the proteome being involved in several cellular functions. As expected, some functions are related to cell growth and affect speed and quality of protein synthesis. Additionally, the effectomes contain enzymes of central metabolic pathways and cellular functions sustaining microbial life under stress situations. These findings indicate that cell growth is an important but not the only factor modulating translational accuracy and speed by means of CUB.
Collapse
|
62
|
Richter M, Bosnali M, Carstensen L, Seitz T, Durchschlag H, Blanquart S, Merkl R, Sterner R. Computational and Experimental Evidence for the Evolution of a (βα)8-Barrel Protein from an Ancestral Quarter-Barrel Stabilised by Disulfide Bonds. J Mol Biol 2010; 398:763-73. [DOI: 10.1016/j.jmb.2010.03.057] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2010] [Revised: 03/19/2010] [Accepted: 03/26/2010] [Indexed: 11/28/2022]
|
63
|
Felle M, Exler JH, Merkl R, Dachauer K, Brehm A, Grummt I, Längst G. DNA sequence encoded repression of rRNA gene transcription in chromatin. Nucleic Acids Res 2010; 38:5304-14. [PMID: 20421213 PMCID: PMC2938192 DOI: 10.1093/nar/gkq263] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Eukaryotic genomes are packaged into nucleosomes that occlude DNA from interacting with most DNA-binding proteins. Nucleosome positioning and chromatin organization is critical for gene regulation. We have investigated the mechanism by which nucleosomes are positioned at the promoters of active and silent rRNA genes (rDNA). The reconstitution of nucleosomes on rDNA results in sequence-dependent nucleosome positioning at the rDNA promoter that mimics the chromatin structure of silent rRNA genes in vivo, suggesting that active mechanisms are required to reorganize chromatin structure upon gene activation. Nucleosomes are excluded from positions observed at active rRNA genes, resulting in transcriptional repression on chromatin. We suggest that the repressed state is the default chromatin organization of the rDNA and gene activation requires ATP-dependent chromatin remodelling activities that move the promoter-bound nucleosome about 22-bp upstream. We suggest that nucleosome remodelling precedes promoter-dependent transcriptional activation as specific inhibition of ATP-dependent chromatin remodelling suppresses the initiation of RNA Polymerase I transcription in vitro. Once initiated, RNA Polymerase I is capable of elongating through reconstituted chromatin without apparent displacement of the nucleosomes. The results reveal the functional cooperation of DNA sequence and chromatin remodelling complexes in nucleosome positioning and in establishing the epigenetic active or silent state of rRNA genes.
Collapse
|
64
|
Merkl R, Wiezer A. GO4genome: a prokaryotic phylogeny based on genome organization. J Mol Evol 2009; 68:550-62. [PMID: 19436929 PMCID: PMC3085772 DOI: 10.1007/s00239-009-9233-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2008] [Revised: 03/10/2009] [Accepted: 04/03/2009] [Indexed: 11/24/2022]
Abstract
Determining the phylogeny of closely related prokaryotes may fail in an analysis of rRNA or a small set of sequences. Whole-genome phylogeny utilizes the maximally available sample space. For a precise determination of genome similarity, two aspects have to be considered when developing an algorithm of whole-genome phylogeny: (1) gene order conservation is a more precise signal than gene content; and (2) when using sequence similarity, failures in identifying orthologues or the in situ replacement of genes via horizontal gene transfer may give misleading results. GO4genome is a new paradigm, which is based on a detailed analysis of gene function and the location of the respective genes. For characterization of genes, the algorithm uses gene ontology enabling a comparison of function independent of evolutionary relationship. After the identification of locally optimal series of gene functions, their length distribution is utilized to compute a phylogenetic distance. The outcome is a classification of genomes based on metabolic capabilities and their organization. Thus, the impact of effects on genome organization that are not covered by methods of molecular phylogeny can be studied. Genomes of strains belonging to Escherichia coli, Shigella, Streptococcus, Methanosarcina, and Yersinia were analyzed. Differences from the findings of classical methods are discussed.
Collapse
|
65
|
Fischer A, Enkler N, Neudert G, Bocola M, Sterner R, Merkl R. TransCent: computational enzyme design by transferring active sites and considering constraints relevant for catalysis. BMC Bioinformatics 2009; 10:54. [PMID: 19208235 PMCID: PMC2667513 DOI: 10.1186/1471-2105-10-54] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2008] [Accepted: 02/10/2009] [Indexed: 11/23/2022] Open
Abstract
Background Computational enzyme design is far from being applicable for the general case. Due to computational complexity and limited knowledge of the structure-function interplay, heuristic methods have to be used. Results We have developed TransCent, a computational enzyme design method supporting the transfer of active sites from one enzyme to an alternative scaffold. In an optimization process, it balances requirements originating from four constraints. These are 1) protein stability, 2) ligand binding, 3) pKa values of active site residues, and 4) structural features of the active site. Each constraint is handled by an individual software module. Modules processing the first three constraints are based on state-of-the-art concepts, i.e. RosettaDesign, DrugScore, and PROPKA. To account for the fourth constraint, knowledge-based potentials are utilized. The contribution of modules to the performance of TransCent was evaluated by means of a recapitulation test. The redesign of oxidoreductase cytochrome P450 was analyzed in detail. As a first application, we present and discuss models for the transfer of active sites in enzymes sharing the frequently encountered triosephosphate isomerase fold. Conclusion A recapitulation test on native enzymes showed that TransCent proposes active sites that resemble the native enzyme more than those generated by RosettaDesign alone. Additional tests demonstrated that each module contributes to the overall performance in a statistically significant manner.
Collapse
|
66
|
|
67
|
Merkl R, Zwick M. H2r: identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments. BMC Bioinformatics 2008; 9:151. [PMID: 18366663 PMCID: PMC2323388 DOI: 10.1186/1471-2105-9-151] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2007] [Accepted: 03/18/2008] [Indexed: 11/15/2022] Open
Abstract
Background A multiple sequence alignment (MSA) generated for a protein can be used to characterise residues by means of a statistical analysis of single columns. In addition to the examination of individual positions, the investigation of co-variation of amino acid frequencies offers insights into function and evolution of the protein and residues. Results We introduce conn(k), a novel parameter for the characterisation of individual residues. For each residue k, conn(k) is the number of most extreme signals of co-evolution. These signals were deduced from a normalised mutual information (MI) value U(k, l) computed for all pairs of residues k, l. We demonstrate that conn(k) is a more robust indicator than an individual MI-value for the prediction of residues most plausibly important for the evolution of a protein. This proposition was inferred by means of statistical methods. It was further confirmed by the analysis of several proteins. A server, which computes conn(k)-values is available at . Conclusion The algorithms H2r, which analyses MSAs and computes conn(k)-values, characterises a specific class of residues. In contrast to strictly conserved ones, these residues possess some flexibility in the composition of side chains. However, their allocation is sensibly balanced with several other positions, as indicated by conn(k).
Collapse
|
68
|
Merkl R. Modelling the evolution of the archeal tryptophan synthase. BMC Evol Biol 2007; 7:59. [PMID: 17425797 PMCID: PMC1854888 DOI: 10.1186/1471-2148-7-59] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Accepted: 04/10/2007] [Indexed: 11/16/2022] Open
Abstract
Background Microorganisms and plants are able to produce tryptophan. Enzymes catalysing the last seven steps of tryptophan biosynthesis are encoded in the canonical trp operon. Among the trp genes are most frequently trpA and trpB, which code for the alpha and beta subunit of tryptophan synthase. In several prokaryotic genomes, two variants of trpB (named trpB1 or trpB2) occur in different combinations. The evolutionary history of these trpB genes is under debate. Results In order to study the evolution of trp genes, completely sequenced archeal and bacterial genomes containing trpB were analysed. Phylogenetic trees indicated that TrpB sequences constitute four distinct groups; their composition is in agreement with the location of respective genes. The first group consisted exclusively of trpB1 genes most of which belonged to trp operons. Groups two to four contained trpB2 genes. The largest group (trpB2_o) contained trpB2 genes all located outside of operons. Most of these genes originated from species possessing an operon-based trpB1 in addition. Groups three and four pertain to trpB2 genes of those genomes containing exclusively one or two trpB2 genes, but no trpB1. One group (trpB2_i) consisted of trpB2 genes located inside, the other (trpB2_a) of trpB2 genes located outside the trp operon. TrpA and TrpB form a heterodimer and cooperate biochemically. In order to characterise trpB variants and stages of TrpA/TrpB cooperation in silico, several approaches were combined. Phylogenetic trees were constructed for all trp genes; their structure was assessed via bootstrapping. Alternative models of trpB evolution were evaluated with parsimony arguments. The four groups of trpB variants were correlated with archeal speciation. Several stages of TrpA/TrpB cooperation were identified and trpB variants were characterised. Most plausibly, trpB2 represents the predecessor of the modern trpB gene, and trpB1 evolved in an ancestral bacterium. Conclusion In archeal genomes, several stages of trpB evolution, TrpA/TrpB cooperation, and operon formation can be observed. Thus, archeal trp genes may serve as a model system for studying the evolution of protein-protein interactions and operon formation.
Collapse
|
69
|
Waack S, Keller O, Asper R, Brodag T, Damm C, Fricke WF, Surovcik K, Meinicke P, Merkl R. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics 2006; 7:142. [PMID: 16542435 PMCID: PMC1489950 DOI: 10.1186/1471-2105-7-142] [Citation(s) in RCA: 265] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2005] [Accepted: 03/16/2006] [Indexed: 01/25/2023] Open
Abstract
Background Horizontal gene transfer (HGT) is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs) or more specifically pathogenicity or symbiotic islands. Results We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU) of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired genes.
Collapse
|
70
|
Wiezer A, Merkl R. A comparative categorization of gene flux in diverse microbial species. Genomics 2006; 86:462-75. [PMID: 16026964 DOI: 10.1016/j.ygeno.2005.05.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2004] [Revised: 05/25/2005] [Accepted: 05/25/2005] [Indexed: 12/18/2022]
Abstract
Microbial genomes harbor genomic islands (GIs), genes presumably acquired via horizontal gene transfer (HGT). We compared GIs of hyperthermophilic, thermophilic, mesophilic, and pathogenic/nonpathogenic species and of small and large genomes. The COG database was used to characterize gene-encoded functions. Putative donors were determined to quantify gene flux between superkingdoms. In hyperthermophiles, more than 10% of the genes were on average acquired across the superkingdom border. For thermophiles and particularly mesophiles, we identified a nearly unidirectional export from bacteria to archaea. Additionally, we analyzed GI composition for Escherichia, and pairs of Listeria, Rhizobiales, Methanosarcinaceae, and Thermus thermophilus/Deinococcus radiodurans. For Escherichia and Listeria, the composition of GIs in pathogenic and nonpathogenic species did not differ significantly with respect to encoded COG classes. The analysis of related genomes showed that the composition of GIs cannot be explained with trends of gene content known to depend on genome size.
Collapse
|
71
|
Merkl R. AMIGOS: a method for the inspection of genomic organisation or structure and its application to characterise conserved gene arrangements. In Silico Biol 2006; 6:281-306. [PMID: 16922692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
In order to identify and to characterise gene clusters conserved in microbial genomes, the algorithm AMIGOS was developed. It is based on a categorisation of genes using a predefined set of gene functions (GFs). After the categorisation of all genes of a genome and based on their location on a replicon, distances between GFs were determined and stored in genome-specific matrices. These matrices were used to identify GF clusters like those strictly conserved in 13 archaeal, in 47 bacterial genomes and in the combination of the sets. Within the combined set of these 60 microbial genomes, there exist only two strictly conserved clusters harbouring two ribosomal genes each, namely those for L4, L23 and L22, L29. In order to characterise less strictly conserved GF clusters, content of genomes i.e. matrices were analysed pairwise. Resulting clusters were merged to (meta-) clusters if their content overlapped. A scoring system named cons(CL) was developed. It quantifies conservedness of cluster membership for individual GFs. For the genome of Escherichia coli it was shown that a grouping of cluster elements on cons(CL) values dissected the clusters into smaller sets. These sets were frequently overlapped by known transcriptional units (TUs). This finding justifies the usage of cons(CL) scores to predict TU membership of genes. In addition, cons(CL) values provide a sound basis for non-homologous gene annotation. Based on cons(CL) values, examples of conserved clusters containing annotated genes and single ones with unknown function are given.
Collapse
|
72
|
Merkl R. A comparative categorization of protein function encoded in bacterial or archeal genomic islands. J Mol Evol 2005; 62:1-14. [PMID: 16341468 DOI: 10.1007/s00239-004-0311-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2004] [Accepted: 06/14/2005] [Indexed: 01/11/2023]
Abstract
Genomes of prokaryotes harbor genomic islands (GIs), which are frequently acquired via horizontal gene transfer (HGT). Here I present an analysis of GIs with respect to gene-encoded functions. GIs were identified by statistical analysis of codon usage and clustering. Genes classified as putatively alien (pA) or putatively native (pN) were categorized according to the COG database. Among pA and pN genes, the distribution of COG functions and classes were studied for different groupings of prokaryotes. Groups were formed according to taxonomical relation or habitats. In all groups, genes related to class L (replication, recombination, and repair) were statistically significantly overrepresented in GIs. GIs of bacteria and archaea showed a distinct pattern of preferences. In archeal GIs, genes belonging to COG class M (cell wall/membrane/envelope biogenesis) or Q (secondary metabolites biosynthesis, transport, and catabolism) were more frequent. In bacterial GIs, genes of classes U (intracellular trafficking, secretion, and vesicular transport), N (cell motility), and V (defense mechanisms) were predominant. Underrepresentation was strongest for genes belonging to class J (translation, ribosomal structure, and biogenesis). Among single COG functions overrepresented in GIs were transferases and transporters. In both superkingdoms, HGT enhances genomic content by meeting demands that are independent of the studied habitats. These findings are in agreement with the complexity theory, which predicts the preferential import of operational genes. However, only specific subsets of operational genes were enriched in GIs. Modification of the cell envelope, cell motility, secretion, and protection of cellular DNA are major issues in HGT.
Collapse
|
73
|
Meinicke P, Tech M, Morgenstern B, Merkl R. Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites. BMC Bioinformatics 2004; 5:169. [PMID: 15511290 PMCID: PMC535353 DOI: 10.1186/1471-2105-5-169] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2004] [Accepted: 10/28/2004] [Indexed: 12/05/2022] Open
Abstract
Background Kernel-based learning algorithms are among the most advanced machine learning methods and have been successfully applied to a variety of sequence classification tasks within the field of bioinformatics. Conventional kernels utilized so far do not provide an easy interpretation of the learnt representations in terms of positional and compositional variability of the underlying biological signals. Results We propose a kernel-based approach to datamining on biological sequences. With our method it is possible to model and analyze positional variability of oligomers of any length in a natural way. On one hand this is achieved by mapping the sequences to an intuitive but high-dimensional feature space, well-suited for interpretation of the learnt models. On the other hand, by means of the kernel trick we can provide a general learning algorithm for that high-dimensional representation because all required statistics can be computed without performing an explicit feature space mapping of the sequences. By introducing a kernel parameter that controls the degree of position-dependency, our feature space representation can be tailored to the characteristics of the biological problem at hand. A regularized learning scheme enables application even to biological problems for which only small sets of example sequences are available. Our approach includes a visualization method for transparent representation of characteristic sequence features. Thereby importance of features can be measured in terms of discriminative strength with respect to classification of the underlying sequences. To demonstrate and validate our concept on a biochemically well-defined case, we analyze E. coli translation initiation sites in order to show that we can find biologically relevant signals. For that case, our results clearly show that the Shine-Dalgarno sequence is the most important signal upstream a start codon. The variability in position and composition we found for that signal is in accordance with previous biological knowledge. We also find evidence for signals downstream of the start codon, previously introduced as transcriptional enhancers. These signals are mainly characterized by occurrences of adenine in a region of about 4 nucleotides next to the start codon. Conclusions We showed that the oligo kernel can provide a valuable tool for the analysis of relevant signals in biological sequences. In the case of translation initiation sites we could clearly deduce the most discriminative motifs and their positional variation from example sequences. Attractive features of our approach are its flexibility with respect to oligomer length and position conservation. By means of these two parameters oligo kernels can easily be adapted to different biological problems.
Collapse
|
74
|
Veith B, Herzberg C, Steckel S, Feesche J, Maurer KH, Ehrenreich P, Bäumer S, Henne A, Liesegang H, Merkl R, Ehrenreich A, Gottschalk G. The Complete Genome Sequence of Bacillus licheniformis DSM13, an Organism with Great Industrial Potential. J Mol Microbiol Biotechnol 2004; 7:204-11. [PMID: 15383718 DOI: 10.1159/000079829] [Citation(s) in RCA: 238] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The genome of Bacillus licheniformis DSM13 consists of a single chromosome that has a size of 4,222,748 base pairs. The average G+C ratio is 46.2%. 4,286 open reading frames, 72 tRNA genes, 7 rRNA operons and 20 transposase genes were identified. The genome shows a marked co-linearity with Bacillus subtilis but contains defined inserted regions that can be identified at the sequence as well as at the functional level. B. licheniformis DSM13 has a well-conserved secretory system, no polyketide biosynthesis, but is able to form the lipopeptide lichenysin. From the further analysis of the genome sequence, we identified conserved regulatory DNA motives, the occurrence of the glyoxylate bypass and the presence of anaerobic ribonucleotide reductase explaining that B. licheniformis is able to grow on acetate and 2,3-butanediol as well as anaerobically on glucose. Many new genes of potential interest for biotechnological applications were found in B. licheniformis; candidates include proteases, pectate lyases, lipases and various polysaccharide degrading enzymes.
Collapse
MESH Headings
- Bacillus/genetics
- Bacillus subtilis/genetics
- Base Composition
- Biological Transport/genetics
- Chromosomes, Bacterial/genetics
- DNA, Bacterial/chemistry
- Endopeptidases/genetics
- Genes, Bacterial/genetics
- Genes, Bacterial/physiology
- Genes, rRNA
- Genome, Bacterial
- Genomics
- Glyoxylates/metabolism
- Lipase/genetics
- Lipoproteins/genetics
- Metabolism/genetics
- Molecular Sequence Data
- Open Reading Frames
- Peptides, Cyclic/genetics
- Polysaccharide-Lyases/genetics
- RNA, Transfer/genetics
- Recombination, Genetic
- Regulatory Sequences, Nucleic Acid
- Ribonucleotide Reductases/genetics
- Sequence Analysis, DNA
- Synteny
- Transposases/genetics
Collapse
|
75
|
Merkl R. A survey of codon and amino acid frequency bias in microbial genomes focusing on translational efficiency. J Mol Evol 2004; 57:453-66. [PMID: 14708578 DOI: 10.1007/s00239-003-2499-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Unequal use of synonymous codons has been found in several prokaryotic and eukaryotic genomes. This bias has been associated with translational efficiency. The prevalence of this bias across lineages is currently unknown. Here, a new method (GCB) to measure codon usage bias is presented. It uses an iterative approach for the determination of codon scores and allows the computation of an index of codon bias suitable for interspecies comparison. A server to calculate GCB-values of individual genes as well as a list of compiled results are available at www.g21.bio.uni-goettingen.de. The method was applied to complete bacterial genomes. The relation of codon usage bias with amino acid composition and the choice of stop codons were determined and discussed.
Collapse
|