76
|
Kuravsky ML, Aleshin VV, Frishman D, Muronetz VI. Testis-specific glyceraldehyde-3-phosphate dehydrogenase: origin and evolution. BMC Evol Biol 2011; 11:160. [PMID: 21663662 PMCID: PMC3224139 DOI: 10.1186/1471-2148-11-160] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 06/10/2011] [Indexed: 11/25/2022] Open
Abstract
Background Glyceraldehyde-3-phosphate dehydrogenase (GAPD) catalyses one of the glycolytic reactions and is also involved in a number of non-glycolytic processes, such as endocytosis, DNA excision repair, and induction of apoptosis. Mammals are known to possess two homologous GAPD isoenzymes: GAPD-1, a well-studied protein found in all somatic cells, and GAPD-2, which is expressed solely in testis. GAPD-2 supplies energy required for the movement of spermatozoa and is tightly bound to the sperm tail cytoskeleton by the additional N-terminal proline-rich domain absent in GAPD-1. In this study we investigate the evolutionary history of GAPD and gain some insights into specialization of GAPD-2 as a testis-specific protein. Results A dataset of GAPD sequences was assembled from public databases and used for phylogeny reconstruction by means of the Bayesian method. Since resolution in some clades of the obtained tree was too low, syntenic analysis was carried out to define the evolutionary history of GAPD more precisely. The performed selection tests showed that selective pressure varies across lineages and isoenzymes, as well as across different regions of the same sequences. Conclusions The obtained results suggest that GAPD-1 and GAPD-2 emerged after duplication during the early evolution of chordates. GAPD-2 was subsequently lost by most lineages except lizards, mammals, as well as cartilaginous and bony fishes. In reptilians and mammals, GAPD-2 specialized to a testis-specific protein and acquired the novel N-terminal proline-rich domain anchoring the protein in the sperm tail cytoskeleton. This domain is likely to have originated by exonization of a microsatellite genomic region. Recognition of the proline-rich domain by cytoskeletal proteins seems to be unspecific. Besides testis, GAPD-2 of lizards was also found in some regenerating tissues, but it lacks the proline-rich domain due to tissue-specific alternative splicing.
Collapse
|
77
|
Theis FJ, Latif N, Wong P, Frishman D. Complex principal component and correlation structure of 16 yeast genomic variables. Mol Biol Evol 2011; 28:2501-12. [PMID: 21444651 DOI: 10.1093/molbev/msr077] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
A quickly growing number of characteristics reflecting various aspects of gene function and evolution can be either measured experimentally or computed from DNA and protein sequences. The study of pairwise correlations between such quantitative genomic variables as well as collective analysis of their interrelations by multidimensional methods have delivered crucial insights into the processes of molecular evolution. Here, we present a principal component analysis (PCA) of 16 genomic variables from Saccharomyces cerevisiae, the largest data set analyzed so far. Because many missing values and potential outliers hinder the direct calculation of principal components, we introduce the application of Bayesian PCA. We confirm some of the previously established correlations, such as evolutionary rate versus protein expression, and reveal new correlations such as those between translational efficiency, phosphorylation density, and protein age. Although the first principal component primarily contrasts genomic change and protein expression, the second component separates variables related to gene existence and expressed protein functions. Enrichment analysis on genes affecting variable correlations unveils classes of influential genes. For example, although ribosomal and nuclear transport genes make important contributions to the correlation between protein isoelectric point and molecular weight, protein synthesis and amino acid metabolism genes help cause the lack of significant correlation between propensity for gene loss and protein age. We present the novel Quagmire database (Quantitative Genomics Resource) which allows exploring relationships between more genomic variables in three model organisms-Escherichia coli, S. cerevisiae, and Homo sapiens (http://webclu.bio.wzw.tum.de:18080/quagmire).
Collapse
|
78
|
Mewes HW, Ruepp A, Theis F, Rattei T, Walter M, Frishman D, Suhre K, Spannagl M, Mayer KFX, Stümpflen V, Antonov A. MIPS: curated databases and comprehensive secondary data resources in 2010. Nucleic Acids Res 2010; 39:D220-4. [PMID: 21109531 PMCID: PMC3013725 DOI: 10.1093/nar/gkq1157] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38 000 000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).
Collapse
|
79
|
Abstract
Domain Interaction MAp (DIMA, available at http://webclu.bio.wzw.tum.de/dima) is a database of predicted and known interactions between protein domains. It integrates 5807 structurally known interactions imported from the iPfam and 3did databases and 46 900 domain interactions predicted by four computational methods: domain phylogenetic profiling, domain pair exclusion algorithm correlated mutations and domain interaction prediction in a discriminative way. Additionally predictions are filtered to exclude those domain pairs that are reported as non-interacting by the Negatome database. The DIMA Web site allows to calculate domain interaction networks either for a domain of interest or for entire organisms, and to explore them interactively using the Flash-based Cytoscape Web software.
Collapse
|
80
|
Gershoni M, Fuchs A, Shani N, Fridman Y, Corral-Debrinski M, Aharoni A, Frishman D, Mishmar D. Coevolution predicts direct interactions between mtDNA-encoded and nDNA-encoded subunits of oxidative phosphorylation complex i. J Mol Biol 2010; 404:158-71. [PMID: 20868692 DOI: 10.1016/j.jmb.2010.09.029] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2009] [Revised: 09/05/2010] [Accepted: 09/13/2010] [Indexed: 10/19/2022]
Abstract
Despite years of research, the structure of the largest mammalian oxidative phosphorylation (OXPHOS) complex, NADH-ubiquinone oxidoreductase (complex I), and the interactions among its 45 subunits are not fully understood. Since complex I harbors subunits encoded by mitochondrial DNA (mtDNA) and nuclear DNA (nDNA) genomes, with the former evolving ∼10 times faster than the latter, tight cytonuclear coevolution is expected and observed. Recently, we identified three nDNA-encoded complex I subunits that underwent accelerated amino acid replacement, suggesting their adjustment to the elevated mtDNA rate of change. Hence, they constitute excellent candidates for binding mtDNA-encoded subunits. Here, we further disentangle the network of physical cytonuclear interactions within complex I by analyzing subunits coevolution. Firstly, relying on the bioinformatic analysis of 10 protein complexes possessing solved structures, we show that signals of coevolution identified physically interacting subunits with nearly 90% accuracy, thus lending support to our approach. When applying this approach to cytonuclear interaction within complex I, we predict that the 'rate-accelerated' nDNA-encoded subunits of complex I, NDUFC2 and NDUFA1, likely interact with the mtDNA-encoded subunits ND5/ND4 and ND5/ND4/ND1, respectively. Furthermore, we predicted interactions among mtDNA-encoded complex I subunits. Using the yeast two-hybrid system, we experimentally confirmed the predicted interactions of human NDUFC2 with ND4, the interactions of human NDUFA1 with ND1 and ND4, and the lack of interaction of NDUFC2 with ND3 and NDUFA1, thus providing a proof of concept for our approach. Our study shows, for the first time, evidence for direct interactions between nDNA-encoded and mtDNA-encoded subunits of human OXPHOS complex I and paves the path towards deciphering subunit interactions within complexes lacking three-dimensional structures. Our subunit-interactions-predicting method, ComplexCorr, is available at http://webclu.bio.wzw.tum.de/complexcorr.
Collapse
|
81
|
Kowarsch A, Fuchs A, Frishman D, Pagel P. Correlated mutations: a hallmark of phenotypic amino acid substitutions. PLoS Comput Biol 2010; 6. [PMID: 20862353 PMCID: PMC2940720 DOI: 10.1371/journal.pcbi.1000923] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Accepted: 08/09/2010] [Indexed: 11/18/2022] Open
Abstract
Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/. Point mutations (i.e., changes of a single sequence element) can have a severe impact on protein function. Many diseases are caused by such minute defects. On the other hand, the majority of such mutations does not lead to noticeable effects. Although previous research has revealed important aspects that influence or predict the chance of a mutation to cause disease, much remains to be learned before we fully understand this complex problem. In our work, we use the observation that sometimes certain positions in a protein mutate in an apparently correlated fashion and analyze this correlation with respect to mutation vulnerability. Our results show that positions exhibiting evolutionary correlation are significantly more likely to be vulnerable to mutation than average positions. On one hand, our data further support the concept of correlated positions to not only be associated with protein contacts but also functional sites and/or disease positions (as introduced by others). On the other hand, this could be useful to further improve the understanding and prediction of the consequences of mutations. Our work is the first to attempt a large-scale quantitation of this relationship.
Collapse
|
82
|
Neumann S, Fuchs A, Mulkidjanian A, Frishman D. Current status of membrane protein structure classification. Proteins 2010; 78:1760-73. [PMID: 20186977 DOI: 10.1002/prot.22692] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
For over 2 decades, continuous efforts to organize the jungle of available protein structures have been underway. Although a number of discrepancies between different classification approaches for soluble proteins have been reported, the classification of membrane proteins has so far not been comparatively studied because of the limited amount of available structural data. Here, we present an analysis of alpha-helical membrane protein classification in the SCOP and CATH databases. In the current set of 63 alpha-helical membrane protein chains having between 1 and 13 transmembrane helices, we observed a number of differently classified proteins both regarding their domain and fold assignment. The majority of all discrepancies affect single transmembrane helix, two helix hairpin, and four helix bundle domains, while domains with more than five helices are mostly classified consistently between SCOP and CATH. It thus appears that the structural constraints imposed by the lipid bilayer complicate the classification of membrane proteins with only few membrane-spanning regions. This problem seems to be specific for membrane proteins as soluble four helix bundles, not restrained by the membrane, are more consistently classified by SCOP and CATH. Our findings indicate that the structural space of small membrane helix bundles is highly continuous such that even minor differences in individual classification procedures may lead to a significantly different classification. Membrane proteins with few helices and limited structural diversity only seem to be reasonably classifiable if the definition of a fold is adapted to include more fine-grained structural features such as helix-helix interactions and reentrant regions.
Collapse
|
83
|
Fuchs A, Frishman D. Structural comparison and classification of alpha-helical transmembrane domains based on helix interaction patterns. Proteins 2010; 78:2587-99. [PMID: 20552684 DOI: 10.1002/prot.22768] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Structural classification of membrane proteins is still in its infancy due to the relative paucity of available three-dimensional structures compared with soluble proteins. However, recent technological advances in protein structure determination have led to a significant increase in experimentally known membrane protein folds, warranting exploration of the structural universe of membrane proteins. Here, a new and completely membrane protein specific structural classification system is introduced that classifies alpha-helical membrane proteins according to common helix architectures. Each membrane protein is represented by a helix interaction graph depicting transmembrane helices with their pairwise interactions resulting from individual residue contacts. Subsequently, proteins are clustered according to similarities among these helix interaction graphs using a newly developed structural similarity score called HISS. As HISS scores explicitly disregard structural properties of loop regions, they are more suitable to capture conserved transmembrane helix bundle architectures than other structural similarity scores. Importantly, we are able to show that a classification approach based on helix interaction similarity closely resembles conventional structural classification databases such as SCOP and CATH implying that helix interactions are one of the major determinants of alpha-helical membrane protein folds. Furthermore, the classification of all currently available membrane protein structures into 20 recurrent helix architectures and 15 singleton proteins demonstrates not only an impressive variability of membrane helix bundles but also the conservation of common helix interaction patterns among proteins with distinctly different sequences.
Collapse
|
84
|
Sturm M, Hackenberg M, Langenberger D, Frishman D. TargetSpy: a supervised machine learning approach for microRNA target prediction. BMC Bioinformatics 2010; 11:292. [PMID: 20509939 PMCID: PMC2889937 DOI: 10.1186/1471-2105-11-292] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2009] [Accepted: 05/28/2010] [Indexed: 11/21/2022] Open
Abstract
Background Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. Results We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences. In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Conclusion Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org.
Collapse
|
85
|
Abstract
Obtaining well-diffracting crystals remains a major challenge in protein structure research. In this chapter, we review currently available computational methods to estimate the crystallization potential of a protein, to optimize amino acid sequences toward improved crystallization likelihood, and to design optimal crystal screen conditions.
Collapse
|
86
|
Smialowski P, Pagel P, Wong P, Brauner B, Dunger I, Fobo G, Frishman G, Montrone C, Rattei T, Frishman D, Ruepp A. The Negatome database: a reference set of non-interacting protein pairs. Nucleic Acids Res 2010; 38:D540-4. [PMID: 19920129 PMCID: PMC2808923 DOI: 10.1093/nar/gkp1026] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2009] [Revised: 10/19/2009] [Accepted: 10/20/2009] [Indexed: 12/25/2022] Open
Abstract
The Negatome is a collection of protein and domain pairs that are unlikely to be engaged in direct physical interactions. The database currently contains experimentally supported non-interacting protein pairs derived from two distinct sources: by manual curation of literature and by analyzing protein complexes with known 3D structure. More stringent lists of non-interacting pairs were derived from these two datasets by excluding interactions detected by high-throughput approaches. Additionally, non-interacting protein domains have been derived from the stringent manual and structural data, respectively. The Negatome is much less biased toward functionally dissimilar proteins than the negative data derived by randomly selecting proteins from different cellular locations. It can be used to evaluate protein and domain interactions from new experiments and improve the training of interaction prediction algorithms. The Negatome database is available at http://mips.helmholtz-muenchen.de/proj/ppi/negatome.
Collapse
|
87
|
Langosch D, Herrmann J, Fuchs A, Panitz J, Unterreitmeier S, Frishman D. Charge-Charge Interactions Promote Transmembrane Helix-Helix Association Depending on Sequence Context. Biophys J 2010. [DOI: 10.1016/j.bpj.2009.12.3525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
88
|
Ilyinskii PO, Schmidt T, Lukashev D, Meriin AB, Thoidis G, Frishman D, Shneider AM. Importance of mRNA secondary structural elements for the expression of influenza virus genes. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 13:421-30. [PMID: 19594376 DOI: 10.1089/omi.2009.0036] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Development of novel vaccines and therapeutics often requires efficient expression of recombinant viral proteins. Here we show that mutations in essential functional regions of conserved influenza proteins NP and NS1, lead to reduced expression of these genes in vitro. According to in silico analysis, these mRNA regions possess distinct secondary structures sensitive to mutations. We identified a novel structural feature within a region in NS1 mRNA that encodes amino acids essential for NS1 function. Mutations altering this mRNA element lead to significantly reduced protein expression. Conversely, expression was not affected by mutations resulting in amino acid substitutions, when they were designed to preserve this secondary RNA structural element. Furthermore, altering this structure significantly reduced RNA transcription without affecting mRNA stability. Therefore, distinct internal secondary structures of viral mRNA may be important for viral gene expression. If such elements encode amino acids essential for the protein function, then early selection against mutations in this region will be beneficial for the virus. This might point at yet another mechanism of viral evolution, especially for RNA viruses. Finally, introducing mutations into viral genes while preserving their secondary RNA structure, suggests a new method for the generation of efficiently expressed recombinants of viral proteins.
Collapse
|
89
|
|
90
|
Fuchs A, Kirschner A, Frishman D. Prediction of helix-helix contacts and interacting helices in polytopic membrane proteins using neural networks. Proteins 2009; 74:857-71. [PMID: 18704938 DOI: 10.1002/prot.22194] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Despite rapidly increasing numbers of available 3D structures, membrane proteins still account for less than 1% of all structures in the Protein Data Bank. Recent high-resolution structures indicate a clearly broader structural diversity of membrane proteins than initially anticipated, motivating the development of reliable structure prediction methods specifically tailored for this class of molecules. One important prediction target capturing all major aspects of a protein's 3D structure is its contact map. Our analysis shows that computational methods trained to predict residue contacts in globular proteins perform poorly when applied to membrane proteins. We have recently published a method to identify interacting alpha-helices in membrane proteins based on the analysis of coevolving residues in predicted transmembrane regions. Here, we present a substantially improved algorithm for the same problem, which uses a newly developed neural network approach to predict helix-helix contacts. In addition to the input features commonly used for contact prediction of soluble proteins, such as windowed residue profiles and residue distance in the sequence, our network also incorporates features that apply to membrane proteins only, such as residue position within the transmembrane segment and its orientation toward the lipophilic environment. The obtained neural network can predict contacts between residues in transmembrane segments with nearly 26% accuracy. It is therefore the first published contact predictor developed specifically for membrane proteins performing with equal accuracy to state-of-the-art contact predictors available for soluble proteins. The predicted helix-helix contacts were employed in a second step to identify interacting helices. For our dataset consisting of 62 membrane proteins of solved structure, we gained an accuracy of 78.1%. Because the reliable prediction of helix interaction patterns is an important step in the classification and prediction of membrane protein folds, our method will be a helpful tool in compiling a structural census of membrane proteins.
Collapse
|
91
|
Loewenstein Y, Raimondo D, Redfern OC, Watson J, Frishman D, Linial M, Orengo C, Thornton J, Tramontano A. Protein function annotation by homology-based inference. Genome Biol 2009; 10:207. [PMID: 19226439 PMCID: PMC2688287 DOI: 10.1186/gb-2009-10-2-207] [Citation(s) in RCA: 148] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Where information on homologous proteins is available,
progress is being made in automated prediction of protein function
from sequence and structure. With many genomes now sequenced, computational annotation methods to characterize genes and proteins from their sequence are increasingly important. The BioSapiens Network has developed tools to address all stages of this process, and here we review progress in the automated prediction of protein function based on protein sequence and structure.
Collapse
|
92
|
Wong P, Althammer S, Hildebrand A, Kirschner A, Pagel P, Geissler B, Smialowski P, Blöchl F, Oesterheld M, Schmidt T, Strack N, Theis FJ, Ruepp A, Frishman D. An evolutionary and structural characterization of mammalian protein complex organization. BMC Genomics 2008; 9:629. [PMID: 19108706 PMCID: PMC2645396 DOI: 10.1186/1471-2164-9-629] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2008] [Accepted: 12/23/2008] [Indexed: 12/25/2022] Open
Abstract
Background We have recently released a comprehensive, manually curated database of mammalian protein complexes called CORUM. Combining CORUM with other resources, we assembled a dataset of over 2700 mammalian complexes. The availability of a rich information resource allows us to search for organizational properties concerning these complexes. Results As the complexity of a protein complex in terms of the number of unique subunits increases, we observed that the number of such complexes and the mean non-synonymous to synonymous substitution ratio of associated genes tend to decrease. Similarly, as the number of different complexes a given protein participates in increases, the number of such proteins and the substitution ratio of the associated gene also tends to decrease. These observations provide evidence relating natural selection and the organization of mammalian complexes. We also observed greater homogeneity in terms of predicted protein isoelectric points, secondary structure and substitution ratio in annotated versus randomly generated complexes. A large proportion of the protein content and interactions in the complexes could be predicted from known binary protein-protein and domain-domain interactions. In particular, we found that large proteins interact preferentially with much smaller proteins. Conclusion We observed similar trends in yeast and other data. Our results support the existence of conserved relations associated with the mammalian protein complexes.
Collapse
|
93
|
Antranikian G, Ruepp A, Gordon PMK, Ballschmiter M, Zibat A, Stark M, Sensen CW, Frishman D, Liebl W, Klenk HP. Rapid access to genes of biotechnologically useful enzymes by partial genome sequencing: the thermoalkaliphile Anaerobranca gottschalkii. J Mol Microbiol Biotechnol 2008; 16:81-90. [PMID: 18957864 DOI: 10.1159/000142896] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Anaerobranca gottschalkii strain LBS3 T is an extremophile living at high temperature (up to 65 degrees C) and in alkaline environments (up to pH 10.5). An assembly of 696 DNA contigs representing about 96% of the 2.26-Mbp genome of A. gottschalkii has been generated with a low-sequence-coverage shotgun-sequencing strategy. The chosen sequencing strategy provided rapid and economical access to genes encoding key enzymes of the mono- and polysaccharide metabolism, without dilution of spare resources for extensive sequencing of genes lacking potential economical value. Five of these amylolytic enzymes of considerable commercial interest for biotechnological applications have been expressed and characterized in more detail after identification of their genes in the partial genome sequence: type I pullulanase, cyclodextrin glycosyltransferase (CGTase), two alpha-amylases (AmyA and AmyB), and an alpha-1,4-glucan-branching enzyme.
Collapse
|
94
|
Herrmann JR, Panitz JC, Unterreitmeier S, Fuchs A, Frishman D, Langosch D. Complex patterns of histidine, hydroxylated amino acids and the GxxxG motif mediate high-affinity transmembrane domain interactions. J Mol Biol 2008; 385:912-23. [PMID: 19007788 DOI: 10.1016/j.jmb.2008.10.058] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2008] [Revised: 10/16/2008] [Accepted: 10/20/2008] [Indexed: 10/21/2022]
Abstract
Specific interactions of transmembrane helices play a pivotal role in the folding and oligomerization of integral membrane proteins. The helix-helix interfaces frequently depend on specific amino acid patterns. In this study, a heptad repeat pattern was randomized with all naturally occurring amino acids to uncover novel sequence motifs promoting transmembrane domain interactions. Self-interacting transmembrane domains were selected from the resulting combinatorial library by means of the ToxR/POSSYCCAT system. A comparison of the amino acid composition of high-and low-affinity sequences revealed that high-affinity transmembrane domains exhibit position-specific enrichment of histidine. Further, sequences containing His preferentially display Gly, Ser, and/or Thr residues at flanking positions and frequently contain a C-terminal GxxxG motif. Mutational analysis of selected sequences confirmed the importance of these residues in homotypic interaction. Probing heterotypic interaction indicated that His interacts in trans with hydroxylated residues. Reconstruction of minimal interaction motifs within the context of an oligo-Leu sequence confirmed that His is part of a hydrogen bonded cluster that is brought into register by the GxxxG motif. Notably, a similar motif contributes to self-interaction of the BNIP3 transmembrane domain.
Collapse
|
95
|
Walter MC, Rattei T, Arnold R, Güldener U, Münsterkötter M, Nenova K, Kastenmüller G, Tischler P, Wölling A, Volz A, Pongratz N, Jost R, Mewes HW, Frishman D. PEDANT covers all complete RefSeq genomes. Nucleic Acids Res 2008; 37:D408-11. [PMID: 18940859 PMCID: PMC2686588 DOI: 10.1093/nar/gkn749] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The PEDANT genome database provides exhaustive annotation of nearly 3000 publicly available eukaryotic, eubacterial, archaeal and viral genomes with more than 4.5 million proteins by a broad set of bioinformatics algorithms. In particular, all completely sequenced genomes from the NCBI's Reference Sequence collection (RefSeq) are covered. The PEDANT processing pipeline has been sped up by an order of magnitude through the utilization of precalculated similarity information stored in the similarity matrix of proteins (SIMAP) database, making it possible to process newly sequenced genomes immediately as they become available. PEDANT is freely accessible to academic users at http://pedant.gsf.de. For programmatic access Web Services are available at http://pedant.gsf.de/webservices.jsp.
Collapse
|
96
|
Schmidt T, Frishman D. Assignment of isochores for all completely sequenced vertebrate genomes using a consensus. Genome Biol 2008; 9:R104. [PMID: 18590563 PMCID: PMC2481423 DOI: 10.1186/gb-2008-9-6-r104] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Revised: 05/22/2008] [Accepted: 06/30/2008] [Indexed: 11/16/2022] Open
Abstract
A new consensus isochore assignment method and a database of isochore maps for all completely sequenced vertebrate genomes are presented. We show that although the currently available isochore mapping methods agree on the isochore classification of about two-thirds of the human DNA, they produce significantly different results with regard to the location of isochore boundaries and isochore length distribution. We present a new consensus isochore assignment method based on majority voting and provide IsoBase, a comprehensive on-line database of isochore maps for all completely sequenced vertebrate genomes.
Collapse
|
97
|
Kirschner A, Frishman D. Prediction of beta-turns and beta-turn types by a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN). Gene 2008; 422:22-9. [PMID: 18598743 DOI: 10.1016/j.gene.2008.06.008] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
UNLABELLED Prediction of beta-turns from amino acid sequences has long been recognized as an important problem in structural bioinformatics due to their frequent occurrence as well as their structural and functional significance. Because various structural features of proteins are intercorrelated, secondary structure information has been often employed as an additional input for machine learning algorithms while predicting beta-turns. Here we present a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of predicting multiple mutually dependent structural motifs and demonstrate its efficiency in recognizing three aspects of protein structure: beta-turns, beta-turn types, and secondary structure. The advantage of our method compared to other predictors is that it does not require any external input except for sequence profiles because interdependencies between different structural features are taken into account implicitly during the learning process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total prediction accuracy of 77.9% and the Mathew's Correlation Coefficient of 0.45, the highest performance reported so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how simultaneous prediction of multiple targets influences prediction performance on single targets. The MOLEBRNN presented here is a generic method applicable in a variety of research fields where multiple mutually depending target classes need to be predicted. AVAILABILITY http://webclu.bio.wzw.tum.de/predator-web/.
Collapse
|
98
|
Ishihama Y, Schmidt T, Rappsilber J, Mann M, Hartl FU, Kerner MJ, Frishman D. Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 2008; 9:102. [PMID: 18304323 PMCID: PMC2292177 DOI: 10.1186/1471-2164-9-102] [Citation(s) in RCA: 353] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2008] [Accepted: 02/27/2008] [Indexed: 11/10/2022] Open
Abstract
Background Knowledge about the abundance of molecular components is an important prerequisite for building quantitative predictive models of cellular behavior. Proteins are central components of these models, since they carry out most of the fundamental processes in the cell. Thus far, protein concentrations have been difficult to measure on a large scale, but proteomic technologies have now advanced to a stage where this information becomes readily accessible. Results Here, we describe an experimental scheme to maximize the coverage of proteins identified by mass spectrometry of a complex biological sample. Using a combination of LC-MS/MS approaches with protein and peptide fractionation steps we identified 1103 proteins from the cytosolic fraction of the Escherichia coli strain MC4100. A measure of abundance is presented for each of the identified proteins, based on the recently developed emPAI approach which takes into account the number of sequenced peptides per protein. The values of abundance are within a broad range and accurately reflect independently measured copy numbers per cell. As expected, the most abundant proteins were those involved in protein synthesis, most notably ribosomal proteins. Proteins involved in energy metabolism as well as those with binding function were also found in high copy number while proteins annotated with the terms metabolism, transcription, transport, and cellular organization were rare. The barrel-sandwich fold was found to be the structural fold with the highest abundance. Highly abundant proteins are predicted to be less prone to aggregation based on their length, pI values, and occurrence patterns of hydrophobic stretches. We also find that abundant proteins tend to be predominantly essential. Additionally we observe a significant correlation between protein and mRNA abundance in E. coli cells. Conclusion Abundance measurements for more than 1000 E. coli proteins presented in this work represent the most complete study of protein abundance in a bacterial cell so far. We show significant associations between the abundance of a protein and its properties and functions in the cell. In this way, we provide both data and novel insights into the role of protein concentration in this model organism.
Collapse
|
99
|
Grimm M, Stephan R, Iversen C, Manzardo GGG, Rattei T, Riedel K, Ruepp A, Frishman D, Lehner A. Cellulose as an extracellular matrix component present in Enterobacter sakazakii biofilms. J Food Prot 2008; 71:13-8. [PMID: 18236657 DOI: 10.4315/0362-028x-71.1.13] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Cellulose was identified and characterized as an extracellular matrix component present in the biofilm of an Enterobacter sakazakii clinical isolate grown in nutrient-deficient (M9) medium. Using a bacterial artificial cloning approach in Escherichia coli and subsequent screening of transformants for fluorescence on calcofluor plates, nine genes organized in two operons were identified as putatively responsible for the biosynthesis of cellulose. In addition to the genes already described for cellulose production, two more genes were identified, putatively transcribed together with the genes from the first operon. Putative cellulose in E. sakazakii ES5 biofilm grown on glass coverslips was visualized by calcofluor staining and confocal fluorescence laser scanning microscopy. For the first time, the presence of cellulose in biofilms produced by E. sakazakii was confirmed by methylation analysis.
Collapse
|
100
|
Wong P, Frishman D. Designability and disease. Methods Mol Biol 2008; 484:491-504. [PMID: 18592197 DOI: 10.1007/978-1-59745-398-1_29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Structural designability is the number of ways it is possible to encode for structure. A protein's designability has been equated with the size of sequence space encoding for the protein's structure, a measure that reflects the structure's robustness to mutation. Current evidence suggests that designability is fundamental to our understanding of the evolvability and distribution of structures in nature and is a significant factor associated with human disease. Here, we describe definitions and principles underlying the concept of designability and discuss its relation to disease.
Collapse
|