951
|
Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, Strausberg RL, Frazier M, Venter JC. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol 2007; 5:e16. [PMID: 17355171 PMCID: PMC1821046 DOI: 10.1371/journal.pbio.0050016] [Citation(s) in RCA: 534] [Impact Index Per Article: 31.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2006] [Accepted: 08/15/2006] [Indexed: 02/04/2023] Open
Abstract
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature. The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. Given the wide-ranging roles microbes play in many ecosystems, metagenomics studies of microbial communities will reveal insights into protein families and their evolution. Because most microbes will not grow in the laboratory using current cultivation techniques, scientists have turned to cultivation-independent techniques to study microbial diversity. One such technique—shotgun sequencing—allows random sampling of DNA sequences to examine the genomic material present in a microbial community. We used shotgun sequencing to examine microbial communities in water samples collected by the Sorcerer II Global Ocean Sampling (GOS) expedition. Our analysis predicted more than six million proteins in the GOS data—nearly twice the number of proteins present in current databases. These predictions add tremendous diversity to known protein families and cover nearly all known prokaryotic protein families. Some of the predicted proteins had no similarity to any currently known proteins and therefore represent new families. A higher than expected fraction of these novel families is predicted to be of viral origin. We also found that several protein domains that were previously thought to be kingdom specific have GOS examples in other kingdoms. Our analysis opens the door for a multitude of follow-up protein family analyses and indicates that we are a long way from sampling all the protein families that exist in nature. The GOS data identified 6.12 million predicted proteins covering nearly all known prokaryotic protein families, and several new families. This almost doubles the number of known proteins and shows that we are far from identifying all the proteins in nature.
Collapse
Affiliation(s)
- Shibu Yooseph
- J. Craig Venter Institute, Rockville, Maryland, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
952
|
Larsen J, Kuhnert P, Frey J, Christensen H, Bisgaard M, Olsen JE. Analysis of gene order data supports vertical inheritance of the leukotoxin operon and genome rearrangements in the 5' flanking region in genus Mannheimia. BMC Evol Biol 2007; 7:184. [PMID: 17915007 PMCID: PMC2228313 DOI: 10.1186/1471-2148-7-184] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2007] [Accepted: 10/03/2007] [Indexed: 12/30/2022] Open
Abstract
Background The Mannheimia subclades belong to the same bacterial genus, but have taken divergent paths toward their distinct lifestyles. For example, M. haemolytica + M. glucosida are potential pathogens of the respiratory tract in the mammalian suborder Ruminantia, whereas M. ruminalis, the supposed sister group, lives as a commensal in the ovine rumen. We have tested the hypothesis that vertical inheritance of the leukotoxin (lktCABD) operon has occurred from the last common ancestor of genus Mannheimia to any ancestor of the diverging subclades by exploring gene order data. Results We examined the gene order in the 5' flanking region of the leukotoxin operon and found that the 5' flanking gene strings, hslVU-lapB-artJ-lktC and xylAB-lktC, are peculiar to M. haemolytica + M. glucosida and M. granulomatis, respectively, whereas the gene string hslVU-lapB-lktC is present in M. ruminalis, the supposed sister group of M. haemolytica + M. glucosida, and in the most ancient subclade M. varigena. In M. granulomatis, we found remnants of the gene string hslVU-lapB-lktC in the xylB-lktC intergenic region. Conclusion These observations indicate that the gene string hslVU-lapB-lktC is more ancient than the hslVU-lapB-artJ-lktC and xylAB-lktC gene strings. The presence of (remnants of) the ancient gene string hslVU-lapB-lktC among any subclades within genus Mannheimia supports that it has been vertically inherited from the last common ancestor of genus Mannheimia to any ancestor of the diverging subclades, thus reaffirming the hypothesis of vertical inheritance of the leukotoxin operon. The presence of individual 5' flanking regions in M. haemolytica + M. glucosida and M. granulomatis reflects later genome rearrangements within each subclade. The evolution of the novel 5' flanking region in M. haemolytica + M. glucosida resulted in transcriptional coupling between the divergently arranged artJ and lkt promoters. We propose that the chimeric promoter have led to high level expression of the leukotoxin operon which could explain the increased potential of certain M. haemolytica + M. glucosida strains to cause a particular type of infection.
Collapse
Affiliation(s)
- Jesper Larsen
- Department of Veterinary Pathobiology, Faculty of Life Sciences, University of Copenhagen, Stigbøjlen 4, DK-1870 Frederiksberg C, Denmark
| | - Peter Kuhnert
- Institute of Veterinary Bacteriology, University of Berne, Länggass-Strasse 122, CH-3012 Berne, Switzerland
| | - Joachim Frey
- Institute of Veterinary Bacteriology, University of Berne, Länggass-Strasse 122, CH-3012 Berne, Switzerland
| | - Henrik Christensen
- Department of Veterinary Pathobiology, Faculty of Life Sciences, University of Copenhagen, Stigbøjlen 4, DK-1870 Frederiksberg C, Denmark
| | - Magne Bisgaard
- Department of Veterinary Pathobiology, Faculty of Life Sciences, University of Copenhagen, Stigbøjlen 4, DK-1870 Frederiksberg C, Denmark
| | - John E Olsen
- Department of Veterinary Pathobiology, Faculty of Life Sciences, University of Copenhagen, Stigbøjlen 4, DK-1870 Frederiksberg C, Denmark
| |
Collapse
|
953
|
Kiemer L, Cesareni G. Comparative interactomics: comparing apples and pears? Trends Biotechnol 2007; 25:448-54. [PMID: 17825444 DOI: 10.1016/j.tibtech.2007.08.002] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2007] [Revised: 06/21/2007] [Accepted: 08/22/2007] [Indexed: 11/23/2022]
Abstract
The study of the complex web of interactions that link biological molecules in a cell is the subject of interactomics--currently one of the fastest moving fields in molecular biology. The recent completion of high-throughput studies to investigate systematically all the possible interactions in a variety of model organisms has provided unique opportunities to compare interaction networks and ask questions about their conservation during evolution. It is expected that this approach will yield a scientific return as rich as that obtained in the past decade from comparing genomes and proteomes from different organisms.
Collapse
Affiliation(s)
- Lars Kiemer
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, Rome, Italy
| | | |
Collapse
|
954
|
Arga KY, Onsan ZI, Kirdar B, Ulgen KO, Nielsen J. Understanding signaling in yeast: Insights from network analysis. Biotechnol Bioeng 2007; 97:1246-58. [PMID: 17252576 DOI: 10.1002/bit.21317] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Reconstruction of protein interaction networks that represent groups of proteins contributing to the same cellular function is a key step towards quantitative studies of signal transduction pathways. Here we present a novel approach to reconstruct a highly correlated protein interaction network and to identify previously unknown components of a signaling pathway through integration of protein-protein interaction data, gene expression data, and Gene Ontology annotations. A novel algorithm is designed to reconstruct a highly correlated protein interaction network which is composed of the candidate proteins for signal transduction mechanisms in yeast Saccharomyces cerevisiae. The high efficiency of the reconstruction process is proved by a Receiver Operating Characteristic curve analysis. Identification and scoring of the possible linear pathways enables reconstruction of specific sub-networks for glucose-induction signaling and high osmolarity MAPK signaling in S. cerevisiae. All of the known components of these pathways are identified together with several new "candidate" proteins, indicating the successful reconstructions of two model pathways involved in S. cerevisiae. The integrated approach is hence shown useful for (i) prediction of new signaling pathways, (ii) identification of unknown members of documented pathways, and (iii) identification of network modules consisting of a group of related components that often incorporate the same functional mechanism.
Collapse
Affiliation(s)
- K Yalçin Arga
- Department of Chemical Engineering, Boğaziçi University, 34342 Istanbul, Turkey
| | | | | | | | | |
Collapse
|
955
|
Fuhrer T, Chen L, Sauer U, Vitkup D. Computational prediction and experimental verification of the gene encoding the NAD+/NADP+-dependent succinate semialdehyde dehydrogenase in Escherichia coli. J Bacteriol 2007; 189:8073-8. [PMID: 17873044 PMCID: PMC2168661 DOI: 10.1128/jb.01027-07] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Although NAD(+)-dependent succinate semialdehyde dehydrogenase activity was first described in Escherichia coli more than 25 years ago, the responsible gene has remained elusive so far. As an experimental proof of concept for a gap-filling algorithm for metabolic networks developed earlier, we demonstrate here that the E. coli gene yneI is responsible for this activity. Our biochemical results demonstrate that the yneI-encoded succinate semialdehyde dehydrogenase can use either NAD(+) or NADP(+) to oxidize succinate semialdehyde to succinate. The gene is induced by succinate semialdehyde, and expression data indicate that yneI plays a unique physiological role in the general nitrogen metabolism of E. coli. In particular, we demonstrate using mutant growth experiments that the yneI gene has an important, but not essential, role during growth on arginine and probably has an essential function during growth on putrescine as the nitrogen source. The NADP(+)-dependent succinate semialdehyde dehydrogenase activity encoded by the functional homolog gabD appears to be important for nitrogen metabolism under N limitation conditions. The yneI-encoded activity, in contrast, functions primarily as a valve to prevent toxic accumulation of succinate semialdehyde. Analysis of available genome sequences demonstrated that orthologs of both yneI and gabD are broadly distributed across phylogenetic space.
Collapse
Affiliation(s)
- Tobias Fuhrer
- Institute of Molecular Systems Biology, ETH Zurich, CH-8093 Zurich, Switzerland
| | | | | | | |
Collapse
|
956
|
Iwasaki W, Takagi T. Reconstruction of highly heterogeneous gene-content evolution across the three domains of life. ACTA ACUST UNITED AC 2007; 23:i230-9. [PMID: 17646301 DOI: 10.1093/bioinformatics/btm165] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Reconstruction of gene-content evolutionary history is fundamental in studying the evolution of genomes and biological systems. To reconstruct plausible evolutionary history, rates of gene gain/loss should be estimated by considering the high level of heterogeneity: e.g. genome duplication and parasitization, respectively, result in high rates of gene gain and loss. Gene-content evolution reconstruction methods that consider this heterogeneity and that are both effective in estimating the rates of gene gain and loss and sufficiently efficient to analyze abundant genomic data had not been developed. RESULTS An effective and efficient method for reconstructing heterogeneous gene-content evolution was developed. This method comprises analytically integrable modeling of gene-content evolution, analytical formulation of expectation-maximization and efficient calculation of marginal likelihood using an inside-outside-like algorithm. Simulation tests on the scale of hundreds of genomes showed that both the gene gain/loss rates and evolutionary history were effectively estimated within a few days of computational time. Subsequently, this algorithm was applied to an actual data set of nearly 200 genomes to reconstruct the heterogeneous gene-content evolution across the three domains of life. The reconstructed history, which contained several features consistent with biological observations, showed that the trends of gene-content evolution were not only drastically different between prokaryotes and eukaryotes, but were highly variable within each form of life. The results suggest that heterogeneity should be considered in studies of the evolution of gene content, genomes and biological systems. AVAILABILITY An R script that implements the algorithm is available upon request.
Collapse
Affiliation(s)
- Wataru Iwasaki
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan.
| | | |
Collapse
|
957
|
Harrington ED, Singh AH, Doerks T, Letunic I, von Mering C, Jensen LJ, Raes J, Bork P. Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proc Natl Acad Sci U S A 2007; 104:13913-8. [PMID: 17717083 PMCID: PMC1955820 DOI: 10.1073/pnas.0702636104] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.
Collapse
Affiliation(s)
- E. D. Harrington
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - A. H. Singh
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - T. Doerks
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - I. Letunic
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - C. von Mering
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - L. J. Jensen
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - J. Raes
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - P. Bork
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
- Max Delbrück Centre for Molecular Medicine, D-13092 Berlin, Germany
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
958
|
Zhou D, He Y, Kwoh CK. Semi-supervised learning of the hidden vector state model for extracting protein-protein interactions. Artif Intell Med 2007; 41:209-22. [PMID: 17702552 DOI: 10.1016/j.artmed.2007.07.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2006] [Revised: 06/18/2007] [Accepted: 07/06/2007] [Indexed: 11/23/2022]
Abstract
OBJECTIVE The hidden vector state (HVS) model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. It has been applied successfully for protein-protein interactions extraction. However, the HVS model, being a statistically based approach, requires large-scale annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. METHODS AND MATERIALS In this paper, we present two novel semi-supervised learning approaches, one based on classification and the other based on expectation-maximization, to train the HVS model from both annotated and un-annotated corpora. RESULTS AND CONCLUSION Experimental results show the improved performance over the baseline system using the HVS model trained solely from the annotated corpus, which gives the support to the feasibility and efficiency of our approaches.
Collapse
Affiliation(s)
- Deyu Zhou
- School of Computer Engineering, Nanyang Technological University, Block N4, Nanyang Avenue, Singapore 639798, Singapore.
| | | | | |
Collapse
|
959
|
Kotelnikova E, Kalinin A, Yuryev A, Maslov S. Prediction of protein-protein interactions on the basis of evolutionary conservation of protein functions. Evol Bioinform Online 2007; 3:197-206. [PMID: 19461979 PMCID: PMC2684133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION Although a great deal of progress is being made in the development of fast and reliable experimental techniques to extract genome-wide networks of protein-protein and protein-DNA interactions, the sequencing of new genomes proceeds at an even faster rate. That is why there is a considerable need for reliable methods of in-silico prediction of protein interaction based solely on sequence similarity information and known interactions from well-studied organisms. This problem can be solved if a dependency exists between sequence similarity and the conservation of the proteins' functions. RESULTS In this paper, we introduce a novel probabilistic method for prediction of protein-protein interactions using a new empirical probabilistic formula describing the loss of interactions between homologous proteins during the course of evolution. This formula describes an evolutional process quite similar to the process of the Earth's population growth. In addition, our method favors predictions confirmed by several interacting pairs over predictions coming from a single interacting pair. Our approach is useful in working with "noisy" data such as those coming from high-throughput experiments. We have generated predictions for five "model" organisms: H. sapiens, D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae and evaluated the quality of these predictions.
Collapse
Affiliation(s)
| | - Andrey Kalinin
- Ariadne Genomics Inc. 9430 Key West Ave., Suite 113, Rockville, MD 20850, U.S.A
| | - Anton Yuryev
- Ariadne Genomics Inc. 9430 Key West Ave., Suite 113, Rockville, MD 20850, U.S.A.,Correspondence: Anton Yuryev,
| | - Sergei Maslov
- Department of Physics, Brookhaven National Laboratory, Upton, New York 11973, U.S.A
| |
Collapse
|
960
|
de Crécy-Lagard V, El Yacoubi B, de la Garza RD, Noiriel A, Hanson AD. Comparative genomics of bacterial and plant folate synthesis and salvage: predictions and validations. BMC Genomics 2007; 8:245. [PMID: 17645794 PMCID: PMC1971073 DOI: 10.1186/1471-2164-8-245] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2006] [Accepted: 07/23/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Folate synthesis and salvage pathways are relatively well known from classical biochemistry and genetics but they have not been subjected to comparative genomic analysis. The availability of genome sequences from hundreds of diverse bacteria, and from Arabidopsis thaliana, enabled such an analysis using the SEED database and its tools. This study reports the results of the analysis and integrates them with new and existing experimental data. RESULTS Based on sequence similarity and the clustering, fusion, and phylogenetic distribution of genes, several functional predictions emerged from this analysis. For bacteria, these included the existence of novel GTP cyclohydrolase I and folylpolyglutamate synthase gene families, and of a trifunctional p-aminobenzoate synthesis gene. For plants and bacteria, the predictions comprised the identities of a 'missing' folate synthesis gene (folQ) and of a folate transporter, and the absence from plants of a folate salvage enzyme. Genetic and biochemical tests bore out these predictions. CONCLUSION For bacteria, these results demonstrate that much can be learnt from comparative genomics, even for well-explored primary metabolic pathways. For plants, the findings particularly illustrate the potential for rapid functional assignment of unknown genes that have prokaryotic homologs, by analyzing which genes are associated with the latter. More generally, our data indicate how combined genomic analysis of both plants and prokaryotes can be more powerful than isolated examination of either group alone.
Collapse
Affiliation(s)
- Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA
| | - Basma El Yacoubi
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA
| | | | - Alexandre Noiriel
- Department of Horticultural Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Andrew D Hanson
- Department of Horticultural Sciences, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
961
|
Affiliation(s)
- Dmitrij Frishman
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenchaftszentrum Weihenstephan, 85350 Freising, Germany
| |
Collapse
|
962
|
Abstract
The integration of information on different aspects of the composition and function of mitochondria is defining a more comprehensive mitochondrial interactome and elucidating its role in a multitude of cellular processes and human disease.
Collapse
Affiliation(s)
- Timothy E Shutt
- Department of Pathology, Yale University School of Medicine, Cedar Street, New Haven, CT 06520-8023, USA
| | - Gerald S Shadel
- Department of Pathology, Yale University School of Medicine, Cedar Street, New Haven, CT 06520-8023, USA
| |
Collapse
|
963
|
Green ML, Karp PD. Using genome-context data to identify specific types of functional associations in pathway/genome databases. Bioinformatics 2007; 23:i205-11. [PMID: 17646298 DOI: 10.1093/bioinformatics/btm213] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Hundreds of genes lacking homology to any protein of known function are sequenced every day. Genome-context methods have proved useful in providing clues about functional annotations for many proteins. However, genome-context methods detect many biological types of functional associations, and do not identify which type of functional association they have found. RESULTS We have developed two new genome-context-based algorithms. Algorithm 1 extends our previous algorithm for identifying missing enzymes in predicted metabolic pathways (pathway holes) to use genome-context features. The new algorithm has significantly improved scope because it can now be applied to pathway reactions to which sequence similarity methods cannot be applied due to an absence of known sequences for enzymes catalyzing the reaction in other organisms. The new method identifies at least one known enzyme in the top ten hits for 58% of EcoCyc reactions that lack enzyme sequences in other organisms. Surprisingly, the addition of genome-context features does not improve the accuracy of the algorithm when sequences for the enzyme do exist in other organisms. Algorithm 2 uses genome-context methods to predict three distinct types of functional relationships between pairs of proteins: pairs that occur in the same protein complex, the same pathway, or the same operon. This algorithm performs with varying degrees of accuracy on each type of relationship, and performs best in predicting pathway and protein complex relationships.
Collapse
Affiliation(s)
- Michelle L Green
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | | |
Collapse
|
964
|
Linding R, Jensen LJ, Ostheimer GJ, van Vugt MA, Jørgensen C, Miron IM, Diella F, Colwill K, Taylor L, Elder K, Metalnikov P, Nguyen V, Pasculescu A, Jin J, Park JG, Samson LD, Woodgett JR, Russell RB, Bork P, Yaffe MB, Pawson T. Systematic discovery of in vivo phosphorylation networks. Cell 2007; 129:1415-26. [PMID: 17570479 PMCID: PMC2692296 DOI: 10.1016/j.cell.2007.05.052] [Citation(s) in RCA: 588] [Impact Index Per Article: 34.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2006] [Revised: 04/24/2007] [Accepted: 05/30/2007] [Indexed: 01/23/2023]
Abstract
Protein kinases control cellular decision processes by phosphorylating specific substrates. Thousands of in vivo phosphorylation sites have been identified, mostly by proteome-wide mapping. However, systematically matching these sites to specific kinases is presently infeasible, due to limited specificity of consensus motifs, and the influence of contextual factors, such as protein scaffolds, localization, and expression, on cellular substrate specificity. We have developed an approach (NetworKIN) that augments motif-based predictions with the network context of kinases and phosphoproteins. The latter provides 60%-80% of the computational capability to assign in vivo substrate specificity. NetworKIN pinpoints kinases responsible for specific phosphorylations and yields a 2.5-fold improvement in the accuracy with which phosphorylation networks can be constructed. Applying this approach to DNA damage signaling, we show that 53BP1 and Rad50 are phosphorylated by CDK1 and ATM, respectively. We describe a scalable strategy to evaluate predictions, which suggests that BCLAF1 is a GSK-3 substrate.
Collapse
Affiliation(s)
- Rune Linding
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
- Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, USA
| | | | - Gerard J. Ostheimer
- Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, USA
- Center for Environmental Health Sciences, Massachusetts Institute of Technology, Cambridge, USA
| | - Marcel A.T.M. van Vugt
- Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, USA
- Department of Cell Biology and Genetics, Erasmus University, Rotterdam, The Netherlands
| | - Claus Jørgensen
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Ioana M. Miron
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | | | - Karen Colwill
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Lorne Taylor
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Kelly Elder
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Pavel Metalnikov
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Vivian Nguyen
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Adrian Pasculescu
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Jing Jin
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Jin Gyoon Park
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Leona D. Samson
- Center for Environmental Health Sciences, Massachusetts Institute of Technology, Cambridge, USA
| | - James R. Woodgett
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | | | - Peer Bork
- European Molecular Biology Laboratory, Heidelberg, Germany
- Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Michael B. Yaffe
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Tony Pawson
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| |
Collapse
|
965
|
Realm of PD-(D/E)XK nuclease superfamily revisited: detection of novel families with modified transitive meta profile searches. BMC STRUCTURAL BIOLOGY 2007; 7:40. [PMID: 17584917 PMCID: PMC1913061 DOI: 10.1186/1472-6807-7-40] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2007] [Accepted: 06/20/2007] [Indexed: 11/30/2022]
Abstract
Background PD-(D/E)XK nucleases constitute a large and highly diverse superfamily of enzymes that display little sequence similarity despite retaining a common core fold and a few critical active site residues. This makes identification of new PD-(D/E)XK nuclease families a challenging task as they usually escape detection with standard sequence-based methods. We developed a modified transitive meta profile search approach and to consider the structural diversity of PD-(D/E)XK nuclease fold more thoroughly we analyzed also lower than threshold Meta-BASIC hits to select potentially correct predictions placed among unreliable or incorrect ones. Results Application of a modified transitive Meta-BASIC searches on updated PFAM families and PDB structures resulted in detection of five new PD-(D/E)XK nuclease families encompassing hundreds of so far uncharacterized and poorly annotated proteins. These include four families catalogued in PFAM database as domains of unknown function (DUF506, DUF524, DUF1626 and DUF1703) and YhgA-like family of putative transposases. Three of these families represent extremely distant homologs (DUF506, DUF524, and YhgA-like), while two are newly defined in updated database (DUF1626 and DUF1703). In addition, we also confidently identified an extended AAA-ATPase domain in the N-terminal region of DUF1703 family proteins. Conclusion Obtained results suggest that detailed analysis of below threshold Meta-BASIC hits may push limits further for distant homology detection in the 'midnight zone' of homology. All identified families conserve the core evolutionary fold, secondary structure and hydrophobic patterns common to existing PD-(D/E)XK nucleases and maintain critical active site motifs that contribute to nucleic acid cleavage. Further experimental investigations should address the predicted activity and clarify potential substrates providing further insight into detailed biological role of these newly detected nucleases.
Collapse
|
966
|
Shoemaker BA, Panchenko AR. Deciphering protein-protein interactions. Part I. Experimental techniques and databases. PLoS Comput Biol 2007; 3:e42. [PMID: 17397251 PMCID: PMC1847991 DOI: 10.1371/journal.pcbi.0030042] [Citation(s) in RCA: 235] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
967
|
Giacomelli L, Nicolini C. Gene expression of human T lymphocytes cell cycle: experimental and bioinformatic analysis. J Cell Biochem 2007; 99:1326-33. [PMID: 16795054 DOI: 10.1002/jcb.20991] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Human lymphocytes gene expression is monitored before and after PHA stimulation over 72 h, using DNA microarray technology. Results are then compared with our previous bioinformatics predictions, which identified six leader genes of highest importance in human T lymphocytes cell cycle. Experimental data are strikingly compatible with bioinformatic predictions of the specific role and interaction of PCNA, CDC2, and CCNA2 at all phases of the cell cycle and of CHEK1 in regulating DNA repair and preservation. It does not escape our notice that the conception and use of ad hoc arrays, based on a bioinformatics prediction which identifies the most important genes involved in a particular biological process, can really be an added value in cell biology and cancer research alternative to massive frequently misleading molecular genomics.
Collapse
Affiliation(s)
- Luca Giacomelli
- Nanoworld Institute, University of Genoa, Corso Europa 30, 16132 Genoa, Italy
| | | |
Collapse
|
968
|
Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages. BMC Bioinformatics 2007; 8 Suppl 4:S6. [PMID: 17570149 PMCID: PMC1892085 DOI: 10.1186/1471-2105-8-s4-s6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Existing methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process. Bacteriophage genomes are an example that cannot be easily analyzed by these methods. This work addresses these shortcomings and aims to provide an automated prediction system of gene function. RESULTS We have developed a novel system called SynFPS to perform gene function prediction over completed genomes. The prediction system is initialized by clustering a large collection of weakly related genomes into groups based on their resemblance in gene distribution. From each individual group, data are then extracted and used to train a Support Vector Machine that makes gene function predictions. Experiments were conducted with 9 different gene functions over 296 bacteriophage genomes. Cross validation results gave an average prediction accuracy of ~80%, which is comparable to other genomic-context based prediction methods. Functional predictions are also made on 3 uncharacterized genes and 12 genes that cannot be identified by sequence alignment. The software is publicly available at http://www.synteny.net/. CONCLUSION The proposed system employs genomic context to predict gene function and detect gene correspondence in whole-genome comparisons. Although our experimental focus is on bacteriophages, the method may be extended to other microbial genomes as they share a number of similar characteristics with phage genomes such as gene order conservation.
Collapse
|
969
|
Geurts P, Touleimat N, Dutreix M, d'Alché-Buc F. Inferring biological networks with output kernel trees. BMC Bioinformatics 2007; 8 Suppl 2:S4. [PMID: 17493253 PMCID: PMC1892073 DOI: 10.1186/1471-2105-8-s2-s4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Elucidating biological networks between proteins appears nowadays as one of the most important challenges in systems biology. Computational approaches to this problem are important to complement high-throughput technologies and to help biologists in designing new experiments. In this work, we focus on the completion of a biological network from various sources of experimental data. Results We propose a new machine learning approach for the supervised inference of biological networks, which is based on a kernelization of the output space of regression trees. It inherits several features of tree-based algorithms such as interpretability, robustness to irrelevant variables, and input scalability. We applied this method to the inference of two kinds of networks in the yeast S. cerevisiae: a protein-protein interaction network and an enzyme network. In both cases, we obtained results competitive with existing approaches. We also show that our method provides relevant insights on input data regarding their potential relationship with the existence of interactions. Furthermore, we confirm the biological validity of our predictions in the context of an analysis of gene expression data. Conclusion Output kernel tree based methods provide an efficient tool for the inference of biological networks from experimental data. Their simplicity and interpretability should make them of great value for biologists.
Collapse
Affiliation(s)
- Pierre Geurts
- IBISC FRE CNRS 2873 & Epigenomics project, GENOPOLE, 523, Place des Terrasses, 91 Evry, France
- Department of Electrical Engineering and Computer Science & GIGA, University of Liège, Institut Montefiore, Sart Tilman B28, 4000 Liège, Belgium
| | - Nizar Touleimat
- IBISC FRE CNRS 2873 & Epigenomics project, GENOPOLE, 523, Place des Terrasses, 91 Evry, France
- UMR 2027 CNRS-IC, Institut Curie, Bâtiment 110, Centre Universitaire, 91405 Orsay, France
| | - Marie Dutreix
- UMR 2027 CNRS-IC, Institut Curie, Bâtiment 110, Centre Universitaire, 91405 Orsay, France
| | - Florence d'Alché-Buc
- IBISC FRE CNRS 2873 & Epigenomics project, GENOPOLE, 523, Place des Terrasses, 91 Evry, France
| |
Collapse
|
970
|
Raes J, Korbel JO, Lercher MJ, von Mering C, Bork P. Prediction of effective genome size in metagenomic samples. Genome Biol 2007; 8:R10. [PMID: 17224063 PMCID: PMC1839125 DOI: 10.1186/gb-2007-8-1-r10] [Citation(s) in RCA: 229] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2006] [Revised: 10/31/2006] [Accepted: 01/15/2007] [Indexed: 11/23/2022] Open
Abstract
A novel computational approach shows a link between genome size and habitat from analysis of environmental metagenomic DNA reads. We introduce a novel computational approach to predict effective genome size (EGS; a measure that includes multiple plasmid copies, inserted sequences, and associated phages and viruses) from short sequencing reads of environmental genomics (or metagenomics) projects. We observe considerable EGS differences between environments and link this with ecologic complexity as well as species composition (for instance, the presence of eukaryotes). For example, we estimate EGS in a complex, organism-dense farm soil sample at about 6.3 megabases (Mb) whereas that of the bacteria therein is only 4.7 Mb; for bacteria in a nutrient-poor, organism-sparse ocean surface water sample, EGS is as low as 1.6 Mb. The method also permits evaluation of completion status and assembly bias in single-genome sequencing projects.
Collapse
Affiliation(s)
- Jeroen Raes
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
- Molecular Biophysics & Biochemistry Department, Yale University, Whitney Avenue, New Haven, Connecticut, USA
| | - Martin J Lercher
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| | - Christian von Mering
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
- Institute of Molecular Biology, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Peer Bork
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| |
Collapse
|
971
|
Merkl R. Modelling the evolution of the archeal tryptophan synthase. BMC Evol Biol 2007; 7:59. [PMID: 17425797 PMCID: PMC1854888 DOI: 10.1186/1471-2148-7-59] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Accepted: 04/10/2007] [Indexed: 11/16/2022] Open
Abstract
Background Microorganisms and plants are able to produce tryptophan. Enzymes catalysing the last seven steps of tryptophan biosynthesis are encoded in the canonical trp operon. Among the trp genes are most frequently trpA and trpB, which code for the alpha and beta subunit of tryptophan synthase. In several prokaryotic genomes, two variants of trpB (named trpB1 or trpB2) occur in different combinations. The evolutionary history of these trpB genes is under debate. Results In order to study the evolution of trp genes, completely sequenced archeal and bacterial genomes containing trpB were analysed. Phylogenetic trees indicated that TrpB sequences constitute four distinct groups; their composition is in agreement with the location of respective genes. The first group consisted exclusively of trpB1 genes most of which belonged to trp operons. Groups two to four contained trpB2 genes. The largest group (trpB2_o) contained trpB2 genes all located outside of operons. Most of these genes originated from species possessing an operon-based trpB1 in addition. Groups three and four pertain to trpB2 genes of those genomes containing exclusively one or two trpB2 genes, but no trpB1. One group (trpB2_i) consisted of trpB2 genes located inside, the other (trpB2_a) of trpB2 genes located outside the trp operon. TrpA and TrpB form a heterodimer and cooperate biochemically. In order to characterise trpB variants and stages of TrpA/TrpB cooperation in silico, several approaches were combined. Phylogenetic trees were constructed for all trp genes; their structure was assessed via bootstrapping. Alternative models of trpB evolution were evaluated with parsimony arguments. The four groups of trpB variants were correlated with archeal speciation. Several stages of TrpA/TrpB cooperation were identified and trpB variants were characterised. Most plausibly, trpB2 represents the predecessor of the modern trpB gene, and trpB1 evolved in an ancestral bacterium. Conclusion In archeal genomes, several stages of trpB evolution, TrpA/TrpB cooperation, and operon formation can be observed. Thus, archeal trp genes may serve as a model system for studying the evolution of protein-protein interactions and operon formation.
Collapse
Affiliation(s)
- Rainer Merkl
- Institut für Biophysik und Physikalische Biochemie, Universität Regensburg, Regensburg, Germany.
| |
Collapse
|
972
|
Pachkov M, Dandekar T, Korbel J, Bork P, Schuster S. Use of pathway analysis and genome context methods for functional genomics of Mycoplasma pneumoniae nucleotide metabolism. Gene 2007; 396:215-25. [PMID: 17467928 DOI: 10.1016/j.gene.2007.02.033] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2005] [Revised: 11/26/2006] [Accepted: 02/21/2007] [Indexed: 11/27/2022]
Abstract
Elementary modes analysis allows one to reveal whether a set of known enzymes is sufficient to sustain functionality of the cell. Moreover, it is helpful in detecting missing reactions and predicting which enzymes could fill these gaps. Here, we perform a comprehensive elementary modes analysis and a genomic context analysis of Mycoplasma pneumoniae nucleotide metabolism, and search for new enzyme activities. The purine and pyrimidine networks are reconstructed by assembling enzymes annotated in the genome or found experimentally. We show that these reaction sets are sufficient for enabling synthesis of DNA and RNA in M. pneumoniae. Special focus is on the key modes for growth. Moreover, we make an educated guess on the nutritional requirements of this micro-organism. For the case that M. pneumoniae does not require adenine as a substrate, we suggest adenylosuccinate synthetase (EC 6.3.4.4), adenylosuccinate lyase (EC 4.3.2.2) and GMP reductase (EC 1.7.1.7) to be operative. GMP reductase activity is putatively assigned to the NRDI_MYCPN gene on the basis of the genomic context analysis. For the pyrimidine network, we suggest CTP synthase (EC 6.3.4.2) to be active. Further experiments on the nutritional requirements are needed to make a decision. Pyrimidine metabolism appears to be more appropriate as a drug target than purine metabolism since it shows lower plasticity.
Collapse
Affiliation(s)
- Mikhail Pachkov
- Department of Bioinformatics, Faculty of Biology and Pharmaceutics, Friedrich-Schiller University Jena, Ernst-Abbe-Platz 2, D-07743 Jena, Germany.
| | | | | | | | | |
Collapse
|
973
|
Shin JH, Price CW. The SsrA-SmpB ribosome rescue system is important for growth of Bacillus subtilis at low and high temperatures. J Bacteriol 2007; 189:3729-37. [PMID: 17369301 PMCID: PMC1913333 DOI: 10.1128/jb.00062-07] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Bacillus subtilis has multiple stress response systems whose integrated action promotes growth and survival under unfavorable conditions. Here we address the function and transcriptional organization of a five-gene cluster containing ssrA, previously known to be important for growth at high temperature because of the role of its tmRNA product in rescuing stalled ribosomes. Reverse transcription-PCR experiments detected a single message for the secG-yvaK-rnr-smpB-ssrA cluster, suggesting that it constitutes an operon. However, rapid amplification of cDNA ends-PCR and lacZ fusion experiments indicated that operon transcription is complex, with at least five promoters controlling different segments of the cluster. One sigma(A)-like promoter preceded secG (P(1)), and internal sigma(A)-like promoters were found in both the rnr-smpB (P(2)) and smpB-ssrA intervals (P(3) and P(HS)). Another internal promoter lay in the secG-yvaK intercistronic region, and this activity (P(B)) was dependent on the general stress factor sigma(B). Null mutations in the four genes downstream from P(B) were tested for their effects on growth. Loss of yvaK (carboxylesterase E) or rnr (RNase R) caused no obvious phenotype. By contrast, smpB was required for growth at high temperature (52 degrees C), as anticipated if its product (a small ribosomal binding protein) is essential for tmRNA (ssrA) function. Notably, smpB and ssrA were also required for growth at low temperature (16 degrees C), a phenotype not previously associated with tmRNA activity. These results extend the known high-temperature role of ssrA and indicate that the ribosome rescue system is important at both extremes of the B. subtilis temperature range.
Collapse
Affiliation(s)
- Ji-Hyun Shin
- Department of Food Science and Technology, University of California, Davis, CA 95616, USA
| | | |
Collapse
|
974
|
Castro MAA, Mombach JCM, de Almeida RMC, Moreira JCF. Impaired expression of NER gene network in sporadic solid tumors. Nucleic Acids Res 2007; 35:1859-67. [PMID: 17332015 PMCID: PMC1874609 DOI: 10.1093/nar/gkm061] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Nucleotide repair genes are not generally altered in sporadic solid tumors. However, point mutations are found scattered throughout the genome of cancer cells indicating that the repair pathways are dysfunctional. To address this point, in this work we focus on the expression pathways rather than in the DNA structure of repair genes related to either genome stability or essential metabolic functions. We present here a novel statistical analysis comparing ten gene expression pathways in human normal and cancer cells using serial analysis of gene expression (SAGE) data. We find that in cancer cells nucleotide-excision repair (NER) and apoptosis are the most impaired pathways and have a highly altered diversity of gene expression profile when compared to normal cells. We propose that genome point mutations in sporadic tumors can be explained by a structurally conserved NER with a functional disorder generated from its entanglement with the apoptosis gene network.
Collapse
Affiliation(s)
- Mauro A A Castro
- Departamento de Bioquímica, Universidade Federal do Rio Grande do Sul, Porto Alegre 90035-003, Brazil.
| | | | | | | |
Collapse
|
975
|
Thakur KG, Joshi AM, Gopal B. Structural and biophysical studies on two promoter recognition domains of the extra-cytoplasmic function sigma factor sigma(C) from Mycobacterium tuberculosis. J Biol Chem 2007; 282:4711-4718. [PMID: 17145760 PMCID: PMC1890005 DOI: 10.1074/jbc.m606283200] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
sigma factors are transcriptional regulatory proteins that bind to the RNA polymerase and dictate gene expression. The extracytoplasmic function (ECF) sigma factors govern the environment dependent regulation of transcription. ECF sigma factors have two domains sigma(2) and sigma(4) that recognize the -10 and -35 promoter elements. However, unlike the primary sigma factor sigma(A), the ECF sigma factors lack sigma(3), a region that helps in the recognition of the extended -10 element and sigma(1.1), a domain involved in the autoinhibition of sigma(A) in the absence of core RNA polymerase. Mycobacterium tuberculosis sigma(C) is an ECF sigma factor that is essential for the pathogenesis and virulence of M. tuberculosis in the mouse and guinea pig models of infection. However, unlike other ECF sigma factors, sigma(C) does not appear to have a regulatory anti-sigma factor located in the same operon. We also note that M. tuberculosis sigma(C) differs from the canonical ECF sigma factors as it has an N-terminal domain comprising of 126 amino acids that precedes the sigma(C)(2) and sigma(C)(4) domains. In an effort to understand the regulatory mechanism of this protein, the crystal structures of the sigma(C)(2) and sigma(C)(4) domains of sigma(C) were determined. These promoter recognition domains are structurally similar to the corresponding domains of sigma(A) despite the low sequence similarity. Fluorescence experiments using the intrinsic tryptophan residues of sigma(C)(2) as well as surface plasmon resonance measurements reveal that the sigma(C)(2) and sigma(C)(4) domains interact with each other. Mutational analysis suggests that the Pribnow box-binding region of sigma(C)(2) is involved in this interdomain interaction. Interaction between the promoter recognition domains in M. tuberculosis sigma(C) are thus likely to regulate the activity of this protein even in the absence of an anti-sigma factor.
Collapse
MESH Headings
- Animals
- Bacterial Proteins/chemistry
- Bacterial Proteins/metabolism
- Crystallography, X-Ray
- Cytoplasm/chemistry
- Cytoplasm/genetics
- Cytoplasm/metabolism
- DNA, Bacterial/chemistry
- DNA, Bacterial/genetics
- DNA, Bacterial/metabolism
- DNA-Directed RNA Polymerases/chemistry
- DNA-Directed RNA Polymerases/metabolism
- Disease Models, Animal
- Gene Expression Regulation, Bacterial/physiology
- Guinea Pigs
- Humans
- Mice
- Models, Molecular
- Mutation
- Mycobacterium tuberculosis/chemistry
- Mycobacterium tuberculosis/genetics
- Mycobacterium tuberculosis/metabolism
- Mycobacterium tuberculosis/pathogenicity
- Promoter Regions, Genetic
- Protein Binding/physiology
- Protein Structure, Tertiary/physiology
- Sigma Factor/chemistry
- Sigma Factor/genetics
- Sigma Factor/metabolism
- Surface Plasmon Resonance
- Transcription, Genetic/physiology
- Tuberculosis/genetics
- Tuberculosis/metabolism
Collapse
Affiliation(s)
- Krishan Gopal Thakur
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | | | - B Gopal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India.
| |
Collapse
|
976
|
Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 2007; 444:1027-31. [PMID: 17183312 DOI: 10.1038/nature05414] [Citation(s) in RCA: 8188] [Impact Index Per Article: 481.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2006] [Accepted: 11/07/2006] [Indexed: 11/09/2022]
Abstract
The worldwide obesity epidemic is stimulating efforts to identify host and environmental factors that affect energy balance. Comparisons of the distal gut microbiota of genetically obese mice and their lean littermates, as well as those of obese and lean human volunteers have revealed that obesity is associated with changes in the relative abundance of the two dominant bacterial divisions, the Bacteroidetes and the Firmicutes. Here we demonstrate through metagenomic and biochemical analyses that these changes affect the metabolic potential of the mouse gut microbiota. Our results indicate that the obese microbiome has an increased capacity to harvest energy from the diet. Furthermore, this trait is transmissible: colonization of germ-free mice with an 'obese microbiota' results in a significantly greater increase in total body fat than colonization with a 'lean microbiota'. These results identify the gut microbiota as an additional contributing factor to the pathophysiology of obesity.
Collapse
Affiliation(s)
- Peter J Turnbaugh
- Center for Genome Sciences, Washington University, St. Louis, Missouri 63108, USA
| | | | | | | | | | | |
Collapse
|
977
|
Gu S, Anderson I, Kunin V, Cipriano M, Minovitsky S, Weber G, Amenta N, Hamann B, Dubchak I. TreeQ-VISTA: an interactive tree visualization tool with functional annotation query capabilities. Bioinformatics 2007; 23:764-6. [PMID: 17234642 DOI: 10.1093/bioinformatics/btl643] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED We describe a general multiplatform exploratory tool called TreeQ-Vista, designed for presenting functional annotations in a phylogenetic context. Traits, such as phenotypic and genomic properties, are interactively queried from a user-provided relational database with a user-friendly interface which provides a set of tools for users with or without SQL knowledge. The query results are projected onto a phylogenetic tree and can be displayed in multiple color groups. A rich set of browsing, grouping and query tools are provided to facilitate trait exploration, comparison and analysis. AVAILABILITY The program, detailed tutorial and examples are available online (http:/genome.lbl.gov/vista/TreeQVista).
Collapse
Affiliation(s)
- Shengyin Gu
- Institute for Data Analysis and Visualization (IDAV), Department of Computer Science, University of California, Davis, One Shields Ave., Davis, CA 95616, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
978
|
Hooper SD, Boué S, Krause R, Jensen LJ, Mason CE, Ghanim M, White KP, Furlong EEM, Bork P. Identification of tightly regulated groups of genes during Drosophila melanogaster embryogenesis. Mol Syst Biol 2007; 3:72. [PMID: 17224916 PMCID: PMC1800352 DOI: 10.1038/msb4100112] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2005] [Accepted: 11/03/2006] [Indexed: 12/03/2022] Open
Abstract
Time-series analysis of whole-genome expression data during Drosophila melanogaster development indicates that up to 86% of its genes change their relative transcript level during embryogenesis. By applying conservative filtering criteria and requiring ‘sharp' transcript changes, we identified 1534 maternal genes, 792 transient zygotic genes, and 1053 genes whose transcript levels increase during embryogenesis. Each of these three categories is dominated by groups of genes where all transcript levels increase and/or decrease at similar times, suggesting a common mode of regulation. For example, 34% of the transiently expressed genes fall into three groups, with increased transcript levels between 2.5–12, 11–20, and 15–20 h of development, respectively. We highlight common and distinctive functional features of these expression groups and identify a coupling between downregulation of transcript levels and targeted protein degradation. By mapping the groups to the protein network, we also predict and experimentally confirm new functional associations.
Collapse
Affiliation(s)
- Sean D Hooper
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Stephanie Boué
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Roland Krause
- Department Vingron, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
- Department Zychlinsky, Max-Planck-Institute for Infection Biology, Berlin, Germany
| | - Lars J Jensen
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Christopher E Mason
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Murad Ghanim
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Kevin P White
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Eileen EM Furlong
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
- Gene Expression Unit, EMBL Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany. Tel.: +49 6221 387 8416;
| | - Peer Bork
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
- Structural and Computational Biology Unit, EMBL, Meyerhofstrasse 1, Heidelberg 69117, Germany. Tel.: +49 622 1387 8526; Fax: +49 622 1387 8517; E-mail:
| |
Collapse
|
979
|
Boekhorst J, Wels M, Kleerebezem M, Siezen RJ. The predicted secretome of Lactobacillus plantarum WCFS1 sheds light on interactions with its environment. MICROBIOLOGY-SGM 2007; 152:3175-3183. [PMID: 17074889 DOI: 10.1099/mic.0.29217-0] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The predicted extracellular proteins of the bacterium Lactobacillus plantarum were analysed to gain insight into the mechanisms underlying interactions of this bacterium with its environment. Extracellular proteins play important roles in processes ranging from probiotic effects in the gastrointestinal tract to degradation of complex extracellular carbon sources such as those found in plant materials, and they have a primary role in the adaptation of a bacterium to changing environmental conditions. The functional annotation of extracellular proteins was improved using a wide variety of bioinformatics methods, including domain analysis and phylogenetic profiling. At least 12 proteins are predicted to be directly involved in adherence to host components such as collagen and mucin, and about 30 extracellular enzymes, mainly hydrolases and transglycosylases, might play a role in the degradation of substrates by L. plantarum to sustain its growth in different environmental niches. A comprehensive overview of all predicted extracellular proteins, their domains composition and their predicted function is provided through a database at http://www.cmbi.ru.nl/secretome which could serve as a basis for targeted experimental studies into the function of extracellular proteins.
Collapse
Affiliation(s)
- Jos Boekhorst
- Center for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, 6500HB Nijmegen, The Netherlands
| | - Michiel Wels
- Wageningen Centre for Food Sciences, Wageningen, The Netherlands
- Center for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, 6500HB Nijmegen, The Netherlands
| | - Michiel Kleerebezem
- NIZO food research, Ede, The Netherlands
- Wageningen Centre for Food Sciences, Wageningen, The Netherlands
| | - Roland J Siezen
- NIZO food research, Ede, The Netherlands
- Wageningen Centre for Food Sciences, Wageningen, The Netherlands
- Center for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, 6500HB Nijmegen, The Netherlands
| |
Collapse
|
980
|
Lee I, Narayanaswamy R, Marcotte EM. 24 Bioinformatic Prediction of Yeast Gene Function. J Microbiol Methods 2007. [DOI: 10.1016/s0580-9517(06)36024-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
981
|
Abstract
Protein-protein interactions (or PPIs) are key elements for the normal functioning of a living cell. A large description of the protein interactomics field is given in this review where different aspects will be discussed. We first give an introduction of the different large scale experimental approaches from yeast two-hybrid to mass spectrometry used to discover PPIs and build protein interaction maps. Single PPI validation techniques such as co-immunoprecipitation or fluorescence methods are then presented as they are more and more integrated in global PPI discovery strategy. Data from different experimental sets are compared and an assessment of the different large scale technologies is presented. Bioinformatics tools can also predict with a good accuracy PPIs in silico, PPIs databases are now numerous and topological analysis has led to interesting insights into the nature of network connection. Finally, PPI, as an association of two proteins, has been structurally characterized for many protein complexes and is largely discussed throughout existing examples. The results obtained so far already provide the biologist with a large set of structured data from which knowledge on pathways and associated protein function can be extracted.
Collapse
|
982
|
Suen G, Arshinoff BI, Taylor RG, Welch RD. Practical Applications of Bacterial Functional Genomics. Biotechnol Genet Eng Rev 2007; 24:213-42. [DOI: 10.1080/02648725.2007.10648101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
983
|
White RA, Szurmant H, Hoch JA, Hwa T. Features of protein-protein interactions in two-component signaling deduced from genomic libraries. Methods Enzymol 2007; 422:75-101. [PMID: 17628135 DOI: 10.1016/s0076-6879(06)22004-4] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
As more and more sequence data become available, new approaches for extracting information from these data become feasible. This chapter reports on one such method that has been applied to elucidate protein-protein interactions in bacterial two-component signaling pathways. The method identifies residues involved in the interaction through an analysis of over 2500 functionally coupled proteins and a precise determination of the substitutional constraints placed on one protein by its signaling mate. Once identified, a simple log-likelihood scoring procedure is applied to these residues to build a predictive tool for assigning signaling mates. The ability to apply this method is based on a proliferation of related domains within multiple organisms. Paralogous evolution through gene duplication and divergence of two-component systems has commonly resulted in tens of closely related interacting pairs within one organism with a roughly one-to-one correspondence between signal and response. This provides us with roughly an order of magnitude more protein pairs than there are unique, fully sequenced bacterial species. Consequently, this chapter serves as both a detailed exposition of the method that has provided more depth to our knowledge of bacterial signaling and a look ahead to what would be possible on a more widespread scale, that is, to protein-protein interactions that have only one example per genome, as the number of genomes increases by a factor of 10.
Collapse
Affiliation(s)
- Robert A White
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, USA
| | | | | | | |
Collapse
|
984
|
Schuster S, von Kamp A, Pachkov M. Understanding the roadmap of metabolism by pathway analysis. Methods Mol Biol 2007; 358:199-226. [PMID: 17035688 DOI: 10.1007/978-1-59745-244-1_12] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The theoretical investigation of the structure of metabolic systems has recently attracted increasing interest. In this chapter, the basic concepts of metabolic pathway analysis are described and various applications are outlined. In particular, the concepts of nullspace and elementary flux modes are explained. The presentation is illustrated by a simple example from tyrosine metabolism and a system describing lysine production in Corynebacterium glutamicum. The latter system gives rise to 37 elementary modes, 36 of which produce lysine with different molar yields. The examples illustrate that metabolic pathway analysis is a useful tool for better understanding the complex architecture of intracellular metabolism, for determining the pathways on which the molar conversion yield of a substrate-product pair under study is maximal, and for assigning functions to orphan genes (functional genomics). Moreover, problems emerging in the modeling of large networks are discussed. An outlook on current trends in the field concludes the chapter.
Collapse
Affiliation(s)
- Stefan Schuster
- Department of Bioinformatics, Friedrich-Schiller University of Jena, Germany
| | | | | |
Collapse
|
985
|
Leach S, Gabow A, Hunter L, Goldberg DS. Assessing and combining reliability of protein interaction sources. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007:433-44. [PMID: 17990508 PMCID: PMC2517251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Integrating diverse sources of interaction information to create protein networks requires strategies sensitive to differences in accuracy and coverage of each source. Previous integration approaches calculate reliabilities of protein interaction information sources based on congruity to a designated 'gold standard.' In this paper, we provide a comparison of the two most popular existing approaches and propose a novel alternative for assessing reliabilities which does not require a gold standard. We identify a new method for combining the resultant reliabilities and compare it against an existing method. Further, we propose an extrinsic approach to evaluation of reliability estimates, considering their influence on the downstream tasks of inferring protein function and learning regulatory networks from expression data. Results using this evaluation method show 1) our method for reliability estimation is an attractive alternative to those requiring a gold standard and 2) the new method for combining reliabilities is less sensitive to noise in reliability assignments than the similar existing technique.
Collapse
Affiliation(s)
- Sonia Leach
- University of Colorado at Denver, Health Sciences Center, Aurora, CO 80045, USA.
| | | | | | | |
Collapse
|
986
|
Tamames J, Moya A, Valencia A. Modular organization in the reductive evolution of protein-protein interaction networks. Genome Biol 2007; 8:R94. [PMID: 17532860 PMCID: PMC1929161 DOI: 10.1186/gb-2007-8-5-r94] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2006] [Revised: 01/30/2007] [Accepted: 05/28/2007] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The variation in the sizes of the genomes of distinct life forms remains somewhat puzzling. The organization of proteins into domains and the different mechanisms that regulate gene expression are two factors that potentially increase the capacity of genomes to create more complex systems. High-throughput protein interaction data now make it possible to examine the additional complexity generated by the way that protein interactions are organized. RESULTS We have studied the reduction in genome size of Buchnera compared to its close relative Escherichia coli. In this well defined evolutionary scenario, we found that among all the properties of the protein interaction networks, it is the organization of networks into modules that seems to be directly related to the evolutionary process of genome reduction. CONCLUSION In Buchnera, the apparently non-random reduction of the modular structure of the networks and the retention of essential characteristics of the interaction network indicate that the roles of proteins within the interaction network are important in the reductive process.
Collapse
Affiliation(s)
- Javier Tamames
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva, Universitat de València, 46071 Valencia, Spain
| | - Andrés Moya
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva, Universitat de València, 46071 Valencia, Spain
| | - Alfonso Valencia
- Structural and Computational Biology Programme, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| |
Collapse
|
987
|
Perocchi F, Jensen LJ, Gagneur J, Ahting U, von Mering C, Bork P, Prokisch H, Steinmetz LM. Assessing systems properties of yeast mitochondria through an interaction map of the organelle. PLoS Genet 2006; 2:e170. [PMID: 17054397 PMCID: PMC1617129 DOI: 10.1371/journal.pgen.0020170] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2006] [Accepted: 08/28/2006] [Indexed: 10/25/2022] Open
Abstract
Mitochondria carry out specialized functions; compartmentalized, yet integrated into the metabolic and signaling processes of the cell. Although many mitochondrial proteins have been identified, understanding their functional interrelationships has been a challenge. Here we construct a comprehensive network of the mitochondrial system. We integrated genome-wide datasets to generate an accurate and inclusive mitochondrial parts list. Together with benchmarked measures of protein interactions, a network of mitochondria was constructed in their cellular context, including extra-mitochondrial proteins. This network also integrates data from different organisms to expand the known mitochondrial biology beyond the information in the existing databases. Our network brings together annotated and predicted functions into a single framework. This enabled, for the entire system, a survey of mutant phenotypes, gene regulation, evolution, and disease susceptibility. Furthermore, we experimentally validated the localization of several candidate proteins and derived novel functional contexts for hundreds of uncharacterized proteins. Our network thus advances the understanding of the mitochondrial system in yeast and identifies properties of genes underlying human mitochondrial disorders.
Collapse
Affiliation(s)
| | - Lars J Jensen
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Julien Gagneur
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Uwe Ahting
- Institute of Human Genetics, Technical University, Munich, and GSF National Research Center for Environment and Health, Neuherberg, Germany
| | | | - Peer Bork
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Holger Prokisch
- Institute of Human Genetics, Technical University, Munich, and GSF National Research Center for Environment and Health, Neuherberg, Germany
| | - Lars M Steinmetz
- European Molecular Biology Laboratory, Heidelberg, Germany
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
988
|
von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Krüger B, Snel B, Bork P. STRING 7--recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 2006; 35:D358-62. [PMID: 17098935 PMCID: PMC1669762 DOI: 10.1093/nar/gkl825] [Citation(s) in RCA: 468] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Information on protein–protein interactions is still mostly limited to a small number of model organisms, and originates from a wide variety of experimental and computational techniques. The database and online resource STRING generalizes access to protein interaction data, by integrating known and predicted interactions from a variety of sources. The underlying infrastructure includes a consistent body of completely sequenced genomes and exhaustive orthology classifications, based on which interaction evidence is transferred between organisms. Although primarily developed for protein interaction analysis, the resource has also been successfully applied to comparative genomics, phylogenetics and network studies, which are all facilitated by programmatic access to the database backend and the availability of compact download files. As of release 7, STRING has almost doubled to 373 distinct organisms, and contains more than 1.5 million proteins for which associations have been pre-computed. Novel features include AJAX-based web-navigation, inclusion of additional resources such as BioGRID, and detailed protein domain annotation. STRING is available at
Collapse
Affiliation(s)
- Christian von Mering
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
989
|
Chich JF, Schaeffer B, Bouin AP, Mouthon F, Labas V, Larramendy C, Deslys JP, Grosclaude J. Prion infection-impaired functional blocks identified by proteomics enlighten the targets and the curing pathways of an anti-prion drug. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2006; 1774:154-67. [PMID: 17174161 DOI: 10.1016/j.bbapap.2006.10.016] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2006] [Revised: 10/30/2006] [Accepted: 10/31/2006] [Indexed: 02/06/2023]
Abstract
Prion-induced neurodegeneration results from multiple cellular alterations among which the accumulation of a modified form of the host protein PrP is but a hallmark. Drug treatments need understanding of underlying mechanisms. Proteomics allows getting a comprehensive view of perturbations leading to neuronal death. Heparan sulfate mimetics has proved to be efficient to clear scrapie protein in cultured cells and in animals. To investigate the mechanisms of drug attack, protein profiles of the neuronal cell line GT1 and its chronically Chandler strain infected counterpart were compared, either in steady state cultures or after a 4-day drug treatment. Differentially expressed proteins were associated into functional blocks relevant to neurodegenerative diseases. Protein structure repair and modification, proteolysis, cell shape and energy/oxidation players were affected by infection, in agreement with prion biology. Unexpectedly, novel affected blocks related to translation, nucleus structure and DNA replication were unravelled displaying commonalities with proliferative processes. The drug had a double action in infected cells by reversing protein levels back to normal in some blocks and by heightening survival functions in others. This study emphasizes the interest of a proteomic approach to unravel novel networks involved in prion infection and curing.
Collapse
Affiliation(s)
- J-F Chich
- Biologie Physico-Chimique des Prions, Virologie et Immunologie Moléculaires, INRA, 78352 Jouy-en-Josas Cedex, France.
| | | | | | | | | | | | | | | |
Collapse
|
990
|
Scholten JC, Culley DE, Brockman FJ, Wu G, Zhang W. Evolution of the syntrophic interaction between Desulfovibrio vulgaris and Methanosarcina barkeri: Involvement of an ancient horizontal gene transfer. Biochem Biophys Res Commun 2006; 352:48-54. [PMID: 17107661 DOI: 10.1016/j.bbrc.2006.10.164] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2006] [Accepted: 10/25/2006] [Indexed: 11/29/2022]
Abstract
The sulfate reducing bacteria Desulfovibrio vulgaris and the methanogenic archaea Methanosarcina barkeri can grow syntrophically on lactate. In this study, a set of three closely located genes, DVU2103, DVU2104, and DVU2108 of D. vulgaris, was found to be up-regulated 2- to 4-fold following the lifestyle shift from syntroph to sulfate reducer; moreover, none of the genes in this gene set were differentially regulated when comparing gene expression from various D. vulgaris pure culture experiments. Although exact function of this gene set is unknown, the results suggest that it may play roles related to the lifestyle change of D. vulgaris from syntroph to sulfate reducer. This hypothesis is further supported by phylogenomic analyses showing that homologies of this gene set were only narrowly present in several groups of bacteria, most of which are restricted to a syntrophic lifestyle, such as Pelobacter carbinolicus, Syntrophobacter fumaroxidans, Syntrophomonas wolfei, and Syntrophus aciditrophicus. Phylogenetic analysis showed that all three individual genes in the gene set tended to be clustered with their homologies from archaeal genera, and they were rooted on archaeal species in the phylogenetic trees, suggesting that they were horizontally transferred from archaeal methanogens. In addition, no significant bias in codon and amino acid usages was detected between these genes and the rest of the D. vulgaris genome, suggesting the gene transfer may have occurred early in the evolutionary history so that sufficient time has elapsed to allow an adaptation to the codon and amino acid usages of D. vulgaris. This report provides novel insights into the origin and evolution of bacterial genes linked to the lifestyle change of D. vulgaris from a syntrophic to a sulfate-reducing lifestyle.
Collapse
Affiliation(s)
- Johannes C Scholten
- Microbiology Department, Pacific Northwest National Laboratory, Richland, WA 99352, USA.
| | | | | | | | | |
Collapse
|
991
|
Abstract
Human tissue-specific genes were reported to be longer than housekeeping genes (both in coding and intronic parts). The competing neutralist and adaptationist models were proposed to explain this observation. Here I show that in human genome the longest are genes with the intermediate expression pattern. From the standpoint of information theory, the regulation of such genes should be most complex. In the genomewide context, they are found here to have the higher informational load on all available levels: from participation in protein interaction networks, pathways and modules reflected in Gene Ontology categories through transcription factor regulatory sets and protein functional domains to amino acid tuples (words) in encoded proteins and nucleotide tuples in introns and promoter regions. Thus, the intermediately expressed genes have the higher functional and regulatory complexity that is reflected in their greater length (which is consistent with the 'genome design' model). The dichotomy of housekeeping versus tissue-specific entities is more pronounced on the modular level than on the molecular level. There are much lesser intermediate-specific modules (modules overrepresented in the intermediately expressed genes) than housekeeping or tissue-specific modules (normalized to gene number). The dichotomy of housekeeping versus tissue-specific genes and modules in multicellular organisms is probably caused by the burden of regulatory complexity acted on the intermediately expressed genes.
Collapse
|
992
|
Ng A, Bursteinas B, Gao Q, Mollison E, Zvelebil M. Resources for integrative systems biology: from data through databases to networks and dynamic system models. Brief Bioinform 2006; 7:318-30. [PMID: 17040977 DOI: 10.1093/bib/bbl036] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In systems biology, biologically relevant quantitative modelling of physiological processes requires the integration of experimental data from diverse sources. Recent developments in high-throughput methodologies enable the analysis of the transcriptome, proteome, interactome, metabolome and phenome on a previously unprecedented scale, thus contributing to the deluge of experimental data held in numerous public databases. In this review, we describe some of the databases and simulation tools that are relevant to systems biology and discuss a number of key issues affecting data integration and the challenges these pose to systems-level research.
Collapse
Affiliation(s)
- Aylwin Ng
- Bioinformatics and Systems Biology Group, Ludwig Institute for Cancer Research, University College London Branch, 91 Riding House Street, London W1W 7BS, UK
| | | | | | | | | |
Collapse
|
993
|
Campinho MA, Silva N, Sweeney GE, Power DM. Molecular, cellular and histological changes in skin from a larval to an adult phenotype during bony fish metamorphosis. Cell Tissue Res 2006; 327:267-84. [PMID: 17028894 DOI: 10.1007/s00441-006-0262-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2006] [Accepted: 05/31/2006] [Indexed: 12/12/2022]
Abstract
Developmental models for skin exist in terrestrial and amphibious vertebrates but there is a lack of information in aquatic vertebrates. We have analysed skin epidermal development of a bony fish (teleost), the most successful group of extant vertebrates. A specific epidermal type I keratin cDNA (hhKer1), which may be a bony-fish-specific adaptation associated with the divergence of skin development (scale formation) compared with other vertebrates, has been cloned and characterized. The expression of hhKer1 and collagen 1alpha1 in skin taken together with the presence or absence of keratin bundle-like structures have made it possible to distinguish between larval and adult epidermal cells during skin development. The use of a flatfish with a well-defined larval to juvenile transition as a model of skin development has revealed that epidermal larval basal cells differentiate directly to epidermal adult basal cells at the climax of metamorphosis. Moreover, hhKer1 expression is downregulated at the climax of metamorphosis and is inversely correlated with increasing thyroxin levels. We suggest that, whereas early mechanisms of skin development between aquatic and terrestrial vertebrates are conserved, later mechanisms diverge.
Collapse
Affiliation(s)
- Marco A Campinho
- Comparative Molecular Endocrinology Group, Marine Science Centre, Universidade do Algarve, 8005-139, Faro, Portugal
| | | | | | | |
Collapse
|
994
|
Zhang W, Culley DE, Gritsenko MA, Moore RJ, Nie L, Scholten JCM, Petritis K, Strittmatter EF, Camp DG, Smith RD, Brockman FJ. LC-MS/MS based proteomic analysis and functional inference of hypothetical proteins in Desulfovibrio vulgaris. Biochem Biophys Res Commun 2006; 349:1412-9. [PMID: 16982031 DOI: 10.1016/j.bbrc.2006.09.019] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2006] [Accepted: 09/07/2006] [Indexed: 11/26/2022]
Abstract
High efficiency capillary liquid chromatography-tandem mass spectrometry (LC-MS/MS) was used to examine the proteins extracted from Desulfovibrio vulgaris cells across six treatment conditions. While our previous study provided a proteomic overview of the cellular metabolism based on proteins with known functions [W. Zhang, M.A. Gritsenko, R.J. Moore, D.E. Culley, L. Nie, K. Petritis, E.F. Strittmatter, D.G. Camp II, R.D. Smith, F.J. Brockman, A proteomic view of the metabolism in Desulfovibrio vulgaris determined by liquid chromatography coupled with tandem mass spectrometry, Proteomics 6 (2006) 4286-4299], this study describes the global detection and functional inference for hypothetical D. vulgaris proteins. Using criteria that a given peptide of a protein is identified from at least two out of three independent LC-MS/MS measurements and that for any protein at least two different peptides are identified among the three measurements, 129 open reading frames (ORFs) originally annotated as hypothetical proteins were found to encode expressed proteins. Functional inference for the conserved hypothetical proteins was performed by a combination of several non-homology based methods: genomic context analysis, phylogenomic profiling, and analysis of a combination of experimental information, including peptide detection in cells grown under specific culture conditions and cellular location of the proteins. Using this approach we were able to assign possible functions to 20 conserved hypothetical proteins. This study demonstrated that a combination of proteomics and bioinformatics methodologies can provide verification of the expression of hypothetical proteins and improve genome annotation.
Collapse
Affiliation(s)
- Weiwen Zhang
- Microbiology Group, Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O. Box 999, Richland, WA 99352, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
995
|
Azizi AA, Gelpi E, Yang JW, Rupp B, Godwin AK, Slater C, Slavc I, Lubec G. Mass spectrometric identification of serine hydrolase OVCA2 in the medulloblastoma cell line DAOY. Cancer Lett 2006; 241:235-49. [PMID: 16368187 DOI: 10.1016/j.canlet.2005.10.023] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2005] [Revised: 10/14/2005] [Accepted: 10/17/2005] [Indexed: 11/18/2022]
Abstract
OVCA2 is a putative serine-hydrolase. Performing protein profiling in human tumour cell lines, OVCA2 was detected in DAOY medulloblastoma cells as a high abundance protein. The protein was unambiguously identified by 2D gel-electrophoresis and MALDI-MS and MS/MS, its presence was confirmed by western blotting. Immunohistochemistry revealed expression in medulloblastoma and predominantly in oligodendrocytes. Computational approaches predicted functional motifs and domains, interaction with apoptosis-related protein BAG and 3D structure. In addition to the presence of OVCA2 in medulloblastoma, it was furthermore detectable in three out of 10 human tumour cell-lines as a high abundance protein probably suggesting a role in the tumour biology.
Collapse
Affiliation(s)
- Amedeo A Azizi
- Department of Pediatrics, Medical University of Vienna, Währinger Gürtel 19-21, A-1090 Vienna, Austria
| | | | | | | | | | | | | | | |
Collapse
|
996
|
Iwig JS, Rowe JL, Chivers PT. Nickel homeostasis in Escherichia coli - the rcnR-rcnA efflux pathway and its linkage to NikR function. Mol Microbiol 2006; 62:252-62. [PMID: 16956381 DOI: 10.1111/j.1365-2958.2006.05369.x] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The nickel physiology of Escherichia coli is dominated by its Ni-Fe hydrogenase isozymes, which are expressed under anaerobic growth conditions. Hydrogenase activity in E. coli requires the NikABCDE nickel transporter, which is transcriptionally repressed by NikR in the presence of excess nickel. Recently, a nickel and cobalt-efflux protein, RcnA, was identified in E. coli. This study examines the effect of RcnA on nickel homeostasis in E. coli. Under nickel-limiting conditions, deletion of rcnA increased NikR activity in vivo. Nickel and cobalt-dependent regulation of rcnA expression required the newly identified transcriptional repressor RcnR (formerly YohL). Deletion of rcnR results in constitutive rcnA expression and a corresponding decrease in NikR activity. Purified RcnR binds directly to the rcnA promoter DNA fragment and this interaction is inhibited by nickel and cobalt. Nickel accumulation is affected differently among deletion strains with impaired nickel homeostasis. Surprisingly, in low nickel growth conditions rcnA expression is required for nickel import via NikABCDE. The data support a model with two distinct pools of nickel ions in E. coli. NikR bridges these two pools by controlling the levels of the hydrogenase-associated pool based on the nickel levels in the second pool.
Collapse
Affiliation(s)
- Jeffrey S Iwig
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St Louis, MO 63110, USA
| | | | | |
Collapse
|
997
|
Abstract
Dramatic advances in sequencing technology and sophisticated experimental assays that interrogate the cell, combined with the public availability of the resulting data, herald the era of systems biology. However, the biological functions of more than 40% of the genes in sequenced genomes are unknown, posing a fundamental barrier to progress in systems biology. The large scale and diversity of available data requires the development of techniques that can automatically utilize these datasets to make quantified and robust predictions of gene function that can be experimentally verified. We present a service called the VIRtual Gene Ontology (VIRGO) that (i) constructs a functional linkage network (FLN) from gene expression and molecular interaction data, (ii) labels genes in the FLN with their functional annotations in the Gene Ontology and (iii) systematically propagates these labels across the FLN in order to precisely predict the functions of unlabelled genes. VIRGO assigns confidence estimates to predicted functions so that a biologist can prioritize predictions for further experimental study. For each prediction, VIRGO also provides an informative ‘propagation diagram’ that traces the flow of information in the FLN that led to the prediction. VIRGO is available at .
Collapse
Affiliation(s)
| | | | - T. M. Murali
- To whom correspondence should be addressed. Tel: +1 540 231 8534; Fax: +1 540 231 6075;
| |
Collapse
|
998
|
Green ML, Karp PD. The outcomes of pathway database computations depend on pathway ontology. Nucleic Acids Res 2006; 34:3687-97. [PMID: 16893953 PMCID: PMC1540720 DOI: 10.1093/nar/gkl438] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Different biological notions of pathways are used in different pathway databases. Those pathway ontologies significantly impact pathway computations. Computational users of pathway databases will obtain different results depending on the pathway ontology used by the databases they employ, and different pathway ontologies are preferable for different end uses. We explore differences in pathway ontologies by comparing the BioCyc and KEGG ontologies. The BioCyc ontology defines a pathway as a conserved, atomic module of the metabolic network of a single organism, i.e. often regulated as a unit, whose boundaries are defined at high-connectivity stable metabolites. KEGG pathways are on average 4.2 times larger than BioCyc pathways, and combine multiple biological processes from different organisms to produce a substrate-centered reaction mosaic. We compared KEGG and BioCyc pathways using genome context methods, which determine the functional relatedness of pairs of genes. For each method we employed, a pair of genes randomly selected from a BioCyc pathway is more likely to be related by that method than is a pair of genes randomly selected from a KEGG pathway, supporting the conclusion that the BioCyc pathway conceptualization is closer to a single conserved biological process than is that of KEGG.
Collapse
Affiliation(s)
- M. L. Green
- Correspondence may also be addressed to M. L. Green. Tel: +1 650 859 5669; Fax: +1 650 859 3735;
| | - P. D. Karp
- To whom correspondence should be addressed. Tel: +1 650 859 4358; Fax: +1 650 859 3735;
| |
Collapse
|
999
|
Falb M, Aivaliotis M, Garcia-Rizo C, Bisle B, Tebbe A, Klein C, Konstantinidis K, Siedler F, Pfeiffer F, Oesterhelt D. Archaeal N-terminal protein maturation commonly involves N-terminal acetylation: a large-scale proteomics survey. J Mol Biol 2006; 362:915-24. [PMID: 16950390 DOI: 10.1016/j.jmb.2006.07.086] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2006] [Revised: 07/28/2006] [Accepted: 07/31/2006] [Indexed: 11/16/2022]
Abstract
We present the first large-scale survey of N-terminal protein maturation in archaea based on 873 proteomically identified N-terminal peptides from the two haloarchaea Halobacterium salinarum and Natronomonas pharaonis. The observed protein maturation pattern can be attributed to the combined action of methionine aminopeptidase and N-terminal acetyltransferase and applies to cytosolic proteins as well as to a large fraction of integral membrane proteins. Both N-terminal maturation processes primarily depend on the amino acid in penultimate position, in which serine and threonine residues are over represented. Removal of the initiator methionine occurs in two-thirds of the haloarchaeal proteins and requires a small penultimate residue, indicating that methionine aminopeptidase specificity is conserved across all domains of life. While N-terminal acetylation is rare in bacteria, our proteomic data show that acetylated N termini are common in archaea affecting about 15% of the proteins and revealing a distinct archaeal N-terminal acetylation pattern. Haloarchaeal N-terminal acetyltransferase reveals narrow substrate specificity, which is limited to cleaved N termini starting with serine or alanine residues. A comparative analysis of 140 ortholog pairs with identified N-terminal peptide showed that acetylatable N-terminal residues are predominantly conserved amongst the two haloarchaea. Only few exceptions from the general N-terminal acetylation pattern were observed, which probably represent protein-specific modifications as they were confirmed by ortholog comparison.
Collapse
Affiliation(s)
- Michaela Falb
- Department of Membrane Biochemistry, Max-Planck-Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1000
|
Abulencia CB, Wyborski DL, Garcia JA, Podar M, Chen W, Chang SH, Chang HW, Watson D, Brodie EL, Hazen TC, Keller M. Environmental whole-genome amplification to access microbial populations in contaminated sediments. Appl Environ Microbiol 2006; 72:3291-301. [PMID: 16672469 PMCID: PMC1472342 DOI: 10.1128/aem.72.5.3291-3301.2006] [Citation(s) in RCA: 189] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using phi29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2% genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small-subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9% of the sequences had significant similarities to known proteins, and "clusters of orthologous groups" (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible. The reported SSU rRNA sequences and library clone end sequences are listed with their respective GenBank accession numbers, DQ 404590 to DQ 404652, DQ 404654 to DQ 404938, and DX 385314 to DX 389173.
Collapse
|