1
|
Abstract
Since the large-scale experimental characterization of protein–protein interactions (PPIs) is not possible for all species, several computational PPI prediction methods have been developed that harness existing data from other species. While PPI network prediction has been extensively used in eukaryotes, microbial network inference has lagged behind. However, bacterial interactomes can be built using the same principles and techniques; in fact, several methods are better suited to bacterial genomes. These predicted networks allow systems-level analyses in species that lack experimental interaction data. This review describes the current network inference and analysis techniques and summarizes the use of computationally-predicted microbial interactomes to date.
Collapse
|
2
|
Sahoo A, Pechmann S. Functional network motifs defined through integration of protein-protein and genetic interactions. PeerJ 2022; 10:e13016. [PMID: 35223214 PMCID: PMC8877332 DOI: 10.7717/peerj.13016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 02/06/2022] [Indexed: 01/11/2023] Open
Abstract
Cells are enticingly complex systems. The identification of feedback regulation is critically important for understanding this complexity. Network motifs defined as small graphlets that occur more frequently than expected by chance have revolutionized our understanding of feedback circuits in cellular networks. However, with their definition solely based on statistical over-representation, network motifs often lack biological context, which limits their usefulness. Here, we define functional network motifs (FNMs) through the systematic integration of genetic interaction data that directly inform on functional relationships between genes and encoded proteins. Occurring two orders of magnitude less frequently than conventional network motifs, we found FNMs significantly enriched in genes known to be functionally related. Moreover, our comprehensive analyses of FNMs in yeast showed that they are powerful at capturing both known and putative novel regulatory interactions, thus suggesting a promising strategy towards the systematic identification of feedback regulation in biological networks. Many FNMs appeared as excellent candidates for the prioritization of follow-up biochemical characterization, which is a recurring bottleneck in the targeting of complex diseases. More generally, our work highlights a fruitful avenue for integrating and harnessing genomic network data.
Collapse
Affiliation(s)
- Amruta Sahoo
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
3
|
Harrison BR, Hoffman JM, Samuelson A, Raftery D, Promislow DEL. Modular Evolution of the Drosophila Metabolome. Mol Biol Evol 2022; 39:msab307. [PMID: 34662414 PMCID: PMC8760934 DOI: 10.1093/molbev/msab307] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Comparative phylogenetic studies offer a powerful approach to study the evolution of complex traits. Although much effort has been devoted to the evolution of the genome and to organismal phenotypes, until now relatively little work has been done on the evolution of the metabolome, despite the fact that it is composed of the basic structural and functional building blocks of all organisms. Here we explore variation in metabolite levels across 50 My of evolution in the genus Drosophila, employing a common garden design to measure the metabolome within and among 11 species of Drosophila. We find that both sex and age have dramatic and evolutionarily conserved effects on the metabolome. We also find substantial evidence that many metabolite pairs covary after phylogenetic correction, and that such metabolome coevolution is modular. Some of these modules are enriched for specific biochemical pathways and show different evolutionary trajectories, with some showing signs of stabilizing selection. Both observations suggest that functional relationships may ultimately cause such modularity. These coevolutionary patterns also differ between sexes and are affected by age. We explore the relevance of modular evolution to fitness by associating modules with lifespan variation measured in the same common garden. We find several modules associated with lifespan, particularly in the metabolome of older flies. Oxaloacetate levels in older females appear to coevolve with lifespan, and a lifespan-associated module in older females suggests that metabolic associations could underlie 50 My of lifespan evolution.
Collapse
Affiliation(s)
- Benjamin R Harrison
- Department of Lab Medicine & Pathology, University of Washington School of Medicine, Seattle, WA, USA
| | - Jessica M Hoffman
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ariana Samuelson
- Department of Biology, University of Washington, Seattle, WA, USA
| | - Daniel Raftery
- Department of Anesthesiology & Pain Medicine, University of Washington School of Medicine, Seattle, WA, USA
| | - Daniel E L Promislow
- Department of Lab Medicine & Pathology, University of Washington School of Medicine, Seattle, WA, USA
- Department of Biology, University of Washington, Seattle, WA, USA
| |
Collapse
|
4
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
5
|
James K, Olson PD. The tapeworm interactome: inferring confidence scored protein-protein interactions from the proteome of Hymenolepis microstoma. BMC Genomics 2020; 21:346. [PMID: 32380953 PMCID: PMC7204028 DOI: 10.1186/s12864-020-6710-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Background Reference genome and transcriptome assemblies of helminths have reached a level of completion whereby secondary analyses that rely on accurate gene estimation or syntenic relationships can be now conducted with a high level of confidence. Recent public release of the v.3 assembly of the mouse bile-duct tapeworm, Hymenolepis microstoma, provides chromosome-level characterisation of the genome and a stabilised set of protein coding gene models underpinned by bioinformatic and empirical data. However, interactome data have not been produced. Conserved protein-protein interactions in other organisms, termed interologs, can be used to transfer interactions between species, allowing systems-level analysis in non-model organisms. Results Here, we describe a probabilistic, integrated network of interologs for the H. microstoma proteome, based on conserved protein interactions found in eukaryote model species. Almost a third of the 10,139 gene models in the v.3 assembly could be assigned interaction data and assessment of the resulting network indicates that topologically-important proteins are related to essential cellular pathways, and that the network clusters into biologically meaningful components. Moreover, network parameters are similar to those of single-species interaction networks that we constructed in the same way for S. cerevisiae, C. elegans and H. sapiens, demonstrating that information-rich, system-level analyses can be conducted even on species separated by a large phylogenetic distance from the major model organisms from which most protein interaction evidence is based. Using the interolog network, we then focused on sub-networks of interactions assigned to discrete suites of genes of interest, including signalling components and transcription factors, germline multipotency genes, and genes differentially-expressed between larval and adult worms. Results show not only an expected bias toward highly-conserved proteins, such as components of intracellular signal transduction, but in some cases predicted interactions with transcription factors that aid in identifying their target genes. Conclusions With key helminth genomes now complete, systems-level analyses can provide an important predictive framework to guide basic and applied research on helminths and will become increasingly informative as new protein-protein interaction data accumulate.
Collapse
Affiliation(s)
- Katherine James
- Department of Applied Sciences, Northumbria University, Newcastle Upon Tyne, UK. .,Department of Life Sciences, The Natural History Museum, Cromwell Road, London, UK.
| | - Peter D Olson
- Department of Life Sciences, The Natural History Museum, Cromwell Road, London, UK
| |
Collapse
|
6
|
Kaznadzey A, Shelyakin P, Belousova E, Eremina A, Shvyreva U, Bykova D, Emelianenko V, Korosteleva A, Tutukina M, Gelfand MS. The genes of the sulphoquinovose catabolism in Escherichia coli are also associated with a previously unknown pathway of lactose degradation. Sci Rep 2018; 8:3177. [PMID: 29453395 PMCID: PMC5816610 DOI: 10.1038/s41598-018-21534-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 02/06/2018] [Indexed: 12/29/2022] Open
Abstract
Comparative genomics analysis of conserved gene cassettes demonstrated resemblance between a recently described cassette of genes involved in sulphoquinovose degradation in Escherichia coli K-12 MG1655 and a Bacilli cassette linked with lactose degradation. Six genes from both cassettes had similar functions related to carbohydrate metabolism, namely, hydrolase, aldolase, kinase, isomerase, transporter, and transcription factor. The Escherichia coli sulphoglycolysis cassette was thus predicted to be associated with lactose degradation. This prediction was confirmed experimentally: expression of genes coding for aldolase (yihT), isomerase (yihS), and kinase (yihV) was dramatically increased during growth on lactose. These genes were previously shown to be activated during growth on sulphoquinovose, so our observation may indicate multi-functional capabilities of the respective proteins. Transcription starts for yihT, yihV and yihW were mapped in silico, in vitro and in vivo. Out of three promoters for yihT, one was active only during growth on lactose. We further showed that switches in yihT transcription are controlled by YihW, a DeoR-family transcription factor in the Escherichia coli cassette. YihW acted as a carbon source-dependent dual regulator involved in sustaining the baseline growth in the absence of lac-operon, with function either complementary, or opposite to a global regulator of carbohydrate metabolism, cAMP-CRP.
Collapse
Affiliation(s)
- Anna Kaznadzey
- A. A. Kharkevich Institute for Information Transmission Problems, RAS, Bolshoy Karetny per. 19, Moscow, 127051, Russia
| | - Pavel Shelyakin
- A. A. Kharkevich Institute for Information Transmission Problems, RAS, Bolshoy Karetny per. 19, Moscow, 127051, Russia
- N. I. Vavilov Institute of General Genetics, RAS, ul. Gubkina 3, Moscow, 119991, Russia
| | - Evgeniya Belousova
- M. V. Lomonosov Moscow State University, Vorobievy Gory 1-73, Moscow, 119991, Russia
| | - Aleksandra Eremina
- The University of Edinburgh, Alexander Crum Brown Rd, Edinburgh, Scotland, EH9 3FF, UK
| | - Uliana Shvyreva
- Institute of Cell Biophysics, RAS, Institutskaya 3, Pushchino, 142290, Russia
| | - Darya Bykova
- M. V. Lomonosov Moscow State University, Vorobievy Gory 1-73, Moscow, 119991, Russia
| | - Vera Emelianenko
- M. V. Lomonosov Moscow State University, Vorobievy Gory 1-73, Moscow, 119991, Russia
| | | | - Maria Tutukina
- Institute of Cell Biophysics, RAS, Institutskaya 3, Pushchino, 142290, Russia.
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, 143028, Russia.
| | - Mikhail S Gelfand
- A. A. Kharkevich Institute for Information Transmission Problems, RAS, Bolshoy Karetny per. 19, Moscow, 127051, Russia
- M. V. Lomonosov Moscow State University, Vorobievy Gory 1-73, Moscow, 119991, Russia
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, 143028, Russia
- Faculty of Computer Science, Higher School of Economics, Kochnovsky pr. 3, Moscow, 125319, Russia
| |
Collapse
|
7
|
Kaznadzey A, Shelyakin P, Gelfand MS. Sugar Lego: gene composition of bacterial carbohydrate metabolism genomic loci. Biol Direct 2017; 12:28. [PMID: 29178959 PMCID: PMC5702140 DOI: 10.1186/s13062-017-0200-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 11/20/2017] [Indexed: 11/25/2022] Open
Abstract
Background Bacterial carbohydrate metabolism is extremely diverse, since carbohydrates serve as a major energy source and are involved in a variety of cellular processes. Bacterial genes belonging to same metabolic pathway are often co-localized in the chromosome, but it is not a strict rule. Gene co-localization in linked to co-evolution and co-regulation. This study focuses on a large-scale analysis of bacterial genomic loci related to the carbohydrate metabolism. Results We demonstrate that only 53% of 148,000 studied genes from over six hundred bacterial genomes are co-localized in bacterial genomes with other carbohydrate metabolism genes, which points to a significant role of singleton genes. Co-localized genes form cassettes, ranging in size from two to fifteen genes. Two major factors influencing the cassette-forming tendency are gene function and bacterial phylogeny. We have obtained a comprehensive picture of co-localization preferences of genes for nineteen major carbohydrate metabolism functional classes, over two hundred gene orthologous clusters, and thirty bacterial classes, and characterized the cassette variety in size and content among different species, highlighting a significant role of short cassettes. The preference towards co-localization of carbohydrate metabolism genes varies between 40 and 76% for bacterial taxa. Analysis of frequently co-localized genes yielded forty-five significant pairwise links between genes belonging to different functional classes. The number of such links per class range from zero to eight, demonstrating varying preferences of respective genes towards a specific chromosomal neighborhood. Genes from eleven functional classes tend to co-localize with genes from the same class, indicating an important role of clustering of genes with similar functions. At that, in most cases such co-localization does not originate from local duplication events. Conclusions Overall, we describe a complex web formed by evolutionary relationships of bacterial carbohydrate metabolism genes, manifested as co-localization patterns. Reviewers This article was reviewed by Daria V. Dibrova (A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia), nominated by Armen Mulkidjanian (University of Osnabrück, Germany), Igor Rogozin (NCBI, NLM, NIH, USA) and Yuri Wolf (NCBI, NLM, NIH, USA). Electronic supplementary material The online version of this article (10.1186/s13062-017-0200-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Anna Kaznadzey
- A.A.Kharkevich Institute for Information Transmission Problems, RAS, Bolshoy Karetny per. 19, Moscow, 127051, Russia.
| | - Pavel Shelyakin
- A.A.Kharkevich Institute for Information Transmission Problems, RAS, Bolshoy Karetny per. 19, Moscow, 127051, Russia.,Vavilov Institute of General Genetics, Gubkin 3, Moscow, 119991, Russia
| | - Mikhail S Gelfand
- A.A.Kharkevich Institute for Information Transmission Problems, RAS, Bolshoy Karetny per. 19, Moscow, 127051, Russia.,Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, 143028, Russia.,Faculty of Computer Science, Higher School of Economics, Kochnovsky pr. 3, Moscow, 125319, Russia.,Faculty of Bioengineering and Bioinformatics, M.V.Lomonosov Moscow State University, Vorobievy Gory 1-73, Moscow, 119991, Russia
| |
Collapse
|
8
|
Chaiboonchoe A, Ghamsari L, Dohai B, Ng P, Khraiwesh B, Jaiswal A, Jijakli K, Koussa J, Nelson DR, Cai H, Yang X, Chang RL, Papin J, Yu H, Balaji S, Salehi-Ashtiani K. Systems level analysis of the Chlamydomonas reinhardtii metabolic network reveals variability in evolutionary co-conservation. MOLECULAR BIOSYSTEMS 2017; 12:2394-407. [PMID: 27357594 DOI: 10.1039/c6mb00237d] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Metabolic networks, which are mathematical representations of organismal metabolism, are reconstructed to provide computational platforms to guide metabolic engineering experiments and explore fundamental questions on metabolism. Systems level analyses, such as interrogation of phylogenetic relationships within the network, can provide further guidance on the modification of metabolic circuitries. Chlamydomonas reinhardtii, a biofuel relevant green alga that has retained key genes with plant, animal, and protist affinities, serves as an ideal model organism to investigate the interplay between gene function and phylogenetic affinities at multiple organizational levels. Here, using detailed topological and functional analyses, coupled with transcriptomics studies on a metabolic network that we have reconstructed for C. reinhardtii, we show that network connectivity has a significant concordance with the co-conservation of genes; however, a distinction between topological and functional relationships is observable within the network. Dynamic and static modes of co-conservation were defined and observed in a subset of gene-pairs across the network topologically. In contrast, genes with predicted synthetic interactions, or genes involved in coupled reactions, show significant enrichment for both shorter and longer phylogenetic distances. Based on our results, we propose that the metabolic network of C. reinhardtii is assembled with an architecture to minimize phylogenetic profile distances topologically, while it includes an expansion of such distances for functionally interacting genes. This arrangement may increase the robustness of C. reinhardtii's network in dealing with varied environmental challenges that the species may face. The defined evolutionary constraints within the network, which identify important pairings of genes in metabolism, may offer guidance on synthetic biology approaches to optimize the production of desirable metabolites.
Collapse
Affiliation(s)
- Amphun Chaiboonchoe
- Laboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi and Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi Institute, Abu Dhabi, UAE.
| | - Lila Ghamsari
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Bushra Dohai
- Laboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi and Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi Institute, Abu Dhabi, UAE.
| | - Patrick Ng
- Department of Biological Statistics and Computational Biology and Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA.
| | - Basel Khraiwesh
- Laboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi and Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi Institute, Abu Dhabi, UAE.
| | - Ashish Jaiswal
- Laboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi and Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi Institute, Abu Dhabi, UAE.
| | - Kenan Jijakli
- Laboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi and Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi Institute, Abu Dhabi, UAE.
| | - Joseph Koussa
- Laboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi and Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi Institute, Abu Dhabi, UAE.
| | - David R Nelson
- Laboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi and Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi Institute, Abu Dhabi, UAE.
| | - Hong Cai
- Laboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi and Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi Institute, Abu Dhabi, UAE.
| | - Xinping Yang
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Roger L Chang
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Jason Papin
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA.
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology and Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA.
| | - Santhanam Balaji
- Laboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi and Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi Institute, Abu Dhabi, UAE. and Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, MA, USA and MRC Laboratory of Molecular Biology, Cambridge, UK.
| | - Kourosh Salehi-Ashtiani
- Laboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi and Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi Institute, Abu Dhabi, UAE. and Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
9
|
Poot-Hernandez AC, Rodriguez-Vazquez K, Perez-Rueda E. The alignment of enzymatic steps reveals similar metabolic pathways and probable recruitment events in Gammaproteobacteria. BMC Genomics 2015; 16:957. [PMID: 26578309 PMCID: PMC4647829 DOI: 10.1186/s12864-015-2113-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 10/19/2015] [Indexed: 11/29/2022] Open
Abstract
Background It is generally accepted that gene duplication followed by functional divergence is one of the main sources of metabolic diversity. In this regard, there is an increasing interest in the development of methods that allow the systematic identification of these evolutionary events in metabolism. Here, we used a method not based on biomolecular sequence analysis to compare and identify common and variable routes in the metabolism of 40 Gammaproteobacteria species. Method The metabolic maps deposited in the KEGG database were transformed into linear Enzymatic Step Sequences (ESS) by using the breadth-first search algorithm. These ESS represent subsequent enzymes linked to each other, where their catalytic activities are encoded in the Enzyme Commission numbers. The ESS were compared in an all-against-all (pairwise comparisons) approach by using a dynamic programming algorithm, leaving only a set of significant pairs. Results and conclusion From these comparisons, we identified a set of functionally conserved enzymatic steps in different metabolic maps, in which cell wall components and fatty acid and lysine biosynthesis were included. In addition, we found that pathways associated with biosynthesis share a higher proportion of similar ESS than degradation pathways and secondary metabolism pathways. Also, maps associated with the metabolism of similar compounds contain a high proportion of similar ESS, such as those maps from nucleotide metabolism pathways, in particular the inosine monophosphate pathway. Furthermore, diverse ESS associated with the low part of the glycolysis pathway were identified as functionally similar to multiple metabolic pathways. In summary, our comparisons may help to identify similar reactions in different metabolic pathways and could reinforce the patchwork model in the evolution of metabolism in Gammaproteobacteria. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2113-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Augusto Cesar Poot-Hernandez
- Departamento de Microbiología Molecular, Instituto de Biotecnología, UNAM, Av. Universidad 2001, Cuernavaca, Morelos, CP 62210, México. .,Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, UNAM, Ciudad Universitaria, CP 04510, México D.F., México.
| | - Katya Rodriguez-Vazquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, UNAM, Ciudad Universitaria, CP 04510, México D.F., México.
| | - Ernesto Perez-Rueda
- Departamento de Microbiología Molecular, Instituto de Biotecnología, UNAM, Av. Universidad 2001, Cuernavaca, Morelos, CP 62210, México.
| |
Collapse
|
10
|
Induction of the Sugar-Phosphate Stress Response Allows Saccharomyces cerevisiae 2-Methyl-4-Amino-5-Hydroxymethylpyrimidine Phosphate Synthase To Function in Salmonella enterica. J Bacteriol 2015; 197:3554-62. [PMID: 26324451 DOI: 10.1128/jb.00576-15] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 08/25/2015] [Indexed: 11/20/2022] Open
Abstract
UNLABELLED Thiamine pyrophosphate is a required cofactor for all forms of life. The pyrimidine moiety of thiamine, 2-methyl-4-amino-5-hydroxymethylpyrimidine phosphate (HMP-P), is synthesized by different mechanisms in bacteria and plants compared to fungi. In this study, Salmonella enterica was used as a host to probe requirements for activity of the yeast HMP-P synthase, Thi5p. Thi5p synthesizes HMP-P from histidine and pyridoxal-5-phosphate and was reported to use a backbone histidine as the substrate, which would mean that it was a single-turnover enzyme. Heterologous expression of Thi5p did not complement an S. enterica HMP-P auxotroph during growth with glucose as the sole carbon source. Genetic analyses described here showed that Thi5p was activated in S. enterica by alleles of sgrR that induced the sugar-phosphate stress response. Deletion of ptsG (encodes enzyme IICB [EIICB] of the phosphotransferase system [PTS]) also allowed function of Thi5p and required sgrR but not sgrS. This result suggested that the role of sgrS in activation of Thi5p was to decrease PtsG activity. In total, the data herein supported the hypothesis that one mechanism to activate Thi5p in S. enterica grown on minimal medium containing glucose (minimal glucose medium) required decreased PtsG activity and an unidentified gene regulated by SgrR. IMPORTANCE This work describes a metabolic link between the sugar-phosphate stress response and the yeast thiamine biosynthetic enzyme Thi5p when heterologously expressed in Salmonella enterica during growth on minimal glucose medium. Suppressor analysis (i) identified a mutant class of the regulator SgrR that activate sugar-phosphate stress response constitutively and (ii) determined that Thi5p is conditionally active in S. enterica. These results emphasized the power of genetic systems in model organisms to uncover enzyme function and underlying metabolic network structure.
Collapse
|
11
|
Park JM, Niestemski LR, Deem MW. Quasispecies theory for evolution of modularity. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:012714. [PMID: 25679649 PMCID: PMC4477872 DOI: 10.1103/physreve.91.012714] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Indexed: 06/04/2023]
Abstract
Biological systems are modular, and this modularity evolves over time and in different environments. A number of observations have been made of increased modularity in biological systems under increased environmental pressure. We here develop a quasispecies theory for the dynamics of modularity in populations of these systems. We show how the steady-state fitness in a randomly changing environment can be computed. We derive a fluctuation dissipation relation for the rate of change of modularity and use it to derive a relationship between rate of environmental changes and rate of growth of modularity. We also find a principle of least action for the evolved modularity at steady state. Finally, we compare our predictions to simulations of protein evolution and find them to be consistent.
Collapse
Affiliation(s)
- Jeong-Man Park
- Departments of Physics & Astronomy and Bioengineering, Rice University, Houston, Texas 77005-1892, USA; Department of Physical and Biological Science, Western New England University, Springfield, Massachusetts 01119, USA; and Department of Physics, The Catholic University of Korea, Bucheon 420-743, Korea
| | - Liang Ren Niestemski
- Departments of Physics & Astronomy and Bioengineering, Rice University, Houston, Texas 77005-1892, USA; Department of Physical and Biological Science, Western New England University, Springfield, Massachusetts 01119, USA; and Department of Physics, The Catholic University of Korea, Bucheon 420-743, Korea
| | - Michael W Deem
- Departments of Physics & Astronomy and Bioengineering, Rice University, Houston, Texas 77005-1892, USA; Department of Physical and Biological Science, Western New England University, Springfield, Massachusetts 01119, USA; and Department of Physics, The Catholic University of Korea, Bucheon 420-743, Korea
| |
Collapse
|
12
|
CHEN JING, DING YANRUI, XU WENBO. COMPARATIVE ANALYSIS OF METABOLIC NETWORKS IN MESOPHILIC AND THERMOPHILIC ARCHAEA METHANOGENS BASED ON MODULARITY. J BIOL SYST 2013. [DOI: 10.1142/s0218339013500150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Metabolic networks are useful representations of the metabolic capabilities of cells. A comparison of metabolic networks across species is essential to better understand how evolutionary pressures shape these networks. By comparing the set of reactions that are expected to occur in an organism with the set of reactions in reference metabolic pathways, it is possible to infer the main metabolic functions of an organism. In this paper, the metabolic networks of the mesophilic archaeon Methanosarcina acetivorans and the thermophilic archaeon Methanopyrus kandleri have been reconstructed based on the KEGG LIGAND database, followed by four topological statistical analyses of the nodes in the two networks to compare their metabolic networks. The values of average degree and characteristic path length are very small but clustering coefficient is relatively large. The results show that the complete metabolic networks of M. acetivorans and M. kandleri possessed "small-world" network properties. Then we used Girvan–Newman modular algorithm to identify hub modules and compared hub modules with non-hub modules, respectively. The results show that M. kandleri metabolic network has a better modular organization than the M. acetivorans network. M. acetivorans includes 39 modules, 25 modules of them are independent, and 15 modules are functionally pure. On the other hand, M. kandleri includes 30 modules. Among them, there are 20 independent modules, and 14 of them are functionally pure. These results further indicated that the present approach for identifying modules yields modules that have biologically significant functions. We also identified hub modules of the metabolic networks and found that these hub modules are carbohydrate metabolism and amino acid metabolism. The conclusions obtained from such studies provide a broad overview of the similarities and differences between organism's metabolic networks. These will be very helpful for further research on thermostability of methanogens.
Collapse
Affiliation(s)
- JING CHEN
- Department of Computer Science and Technology, School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Ave., Wuxi, Jiangsu 214122, P. R. China
| | - YANRUI DING
- Department of Computer Science and Technology, School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Ave., Wuxi, Jiangsu 214122, P. R. China
- The Key Laboratory of Industrial Biotechnology, Ministry of Education, Wuxi 214122, Jiangsu, P. R. China
| | - WENBO XU
- Department of Computer Science and Technology, School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Ave., Wuxi, Jiangsu 214122, P. R. China
| |
Collapse
|
13
|
Global probabilistic annotation of metabolic networks enables enzyme discovery. Nat Chem Biol 2013; 8:848-54. [PMID: 22960854 PMCID: PMC3696893 DOI: 10.1038/nchembio.1063] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 08/07/2012] [Indexed: 11/08/2022]
Abstract
Annotation of organism-specific metabolic networks is one of the main challenges of systems biology. Importantly, due to inherent uncertainty of computational annotations, predictions of biochemical function need to be treated probabilistically. We present a global probabilistic approach to annotate genome-scale metabolic networks that integrates sequence homology and context-based correlations under a single principled framework. The developed method for Global Biochemical reconstruction Using Sampling (GLOBUS) not only provides annotation probabilities for each functional assignment, but also suggests likely alternative functions. GLOBUS is based on statistical Gibbs sampling of probable metabolic annotations and is able to make accurate functional assignments even in cases of remote sequence identity to known enzymes. We apply GLOBUS to genomes of Bacillus subtilis and Staphylococcus aureus, and validate the method predictions by experimentally demonstrating the 6-phosphogluconolactonase activity of ykgB and the role of the sps pathway for rhamnose biosynthesis in B. subtilis.
Collapse
|
14
|
|
15
|
Muley VY, Ranjan A. Evaluation of physical and functional protein-protein interaction prediction methods for detecting biological pathways. PLoS One 2013; 8:e54325. [PMID: 23349851 PMCID: PMC3547882 DOI: 10.1371/journal.pone.0054325] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Accepted: 12/11/2012] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Cellular activities are governed by the physical and the functional interactions among several proteins involved in various biological pathways. With the availability of sequenced genomes and high-throughput experimental data one can identify genome-wide protein-protein interactions using various computational techniques. Comparative assessments of these techniques in predicting protein interactions have been frequently reported in the literature but not their ability to elucidate a particular biological pathway. METHODS Towards the goal of understanding the prediction capabilities of interactions among the specific biological pathway proteins, we report the analyses of 14 biological pathways of Escherichia coli catalogued in KEGG database using five protein-protein functional linkage prediction methods. These methods are phylogenetic profiling, gene neighborhood, co-presence of orthologous genes in the same gene clusters, a mirrortree variant, and expression similarity. CONCLUSIONS Our results reveal that the prediction of metabolic pathway protein interactions continues to be a challenging task for all methods which possibly reflect flexible/independent evolutionary histories of these proteins. These methods have predicted functional associations of proteins involved in amino acids, nucleotide, glycans and vitamins & co-factors pathways slightly better than the random performance on carbohydrate, lipid and energy metabolism. We also make similar observations for interactions involved among the environmental information processing proteins. On the contrary, genetic information processing or specialized processes such as motility related protein-protein linkages that occur in the subset of organisms are predicted with comparable accuracy. Metabolic pathways are best predicted by using neighborhood of orthologous genes whereas phyletic pattern is good enough to reconstruct central dogma pathway protein interactions. We have also shown that the effective use of a particular prediction method depends on the pathway under investigation. In case one is not focused on specific pathway, gene expression similarity method is the best option.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India
| | - Akash Ranjan
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India
- * E-mail:
| |
Collapse
|
16
|
Psomopoulos FE, Mitkas PA, Ouzounis CA. Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles. PLoS One 2013; 8:e52854. [PMID: 23341912 PMCID: PMC3544837 DOI: 10.1371/journal.pone.0052854] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 11/22/2012] [Indexed: 11/18/2022] Open
Abstract
Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.
Collapse
Affiliation(s)
- Fotis E. Psomopoulos
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Pericles A. Mitkas
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Christos A. Ouzounis
- Centre for Bioinformatics, Department of Informatics, School of Natural and Mathematical Sciences, King’s College London, Strand, London, United Kingdom
- * E-mail:
| |
Collapse
|
17
|
Muley VY, Ranjan A. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction. PLoS One 2012; 7:e42057. [PMID: 22844541 PMCID: PMC3406042 DOI: 10.1371/journal.pone.0042057] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2011] [Accepted: 07/02/2012] [Indexed: 12/20/2022] Open
Abstract
Background Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. Methods We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Conclusions Higher performance for predicting protein-protein interactions was achievable even with 100–150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50–100 genomes for comparable accuracy of predictions when computational resources are limited.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, Andhra Pradesh, India
- Department of Biotechnology, Dr. Babasaheb Ambedkar Marathwada University, Sub-centre, Osmanabad, Maharashtra, India
| | - Akash Ranjan
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, Andhra Pradesh, India
- * E-mail:
| |
Collapse
|
18
|
Doerks T, van Noort V, Minguez P, Bork P. Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. PLoS One 2012; 7:e34302. [PMID: 22485162 PMCID: PMC3317503 DOI: 10.1371/journal.pone.0034302] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Accepted: 02/26/2012] [Indexed: 11/18/2022] Open
Abstract
The genome of Mycobacterium tuberculosis (H37Rv) contains 4,019 protein coding genes, of which more than thousand have been categorized as ‘hypothetical’ implying that for these not even weak functional associations could be identified so far. We here predict reliable functional indications for half of this large hypothetical orfeome: 497 genes can be annotated based on orthology, and another 125 can be linked to interacting proteins via integrated genomic context analysis and literature mining. The assignments include newly identified clusters of interacting proteins, hypothetical genes that are associated to well known pathways and putative disease-relevant targets. All together, we have raised the fraction of the proteome with at least some functional annotation to 88% which should considerably enhance the interpretation of large-scale experiments targeting this medically important organism.
Collapse
Affiliation(s)
- Tobias Doerks
- European Molecular Biology Laboratory, Heidelberg, Germany.
| | | | | | | |
Collapse
|
19
|
Chae L, Lee I, Shin J, Rhee SY. Towards understanding how molecular networks evolve in plants. CURRENT OPINION IN PLANT BIOLOGY 2012; 15:177-84. [PMID: 22280840 DOI: 10.1016/j.pbi.2012.01.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Revised: 12/20/2011] [Accepted: 01/05/2012] [Indexed: 05/02/2023]
Abstract
Residing beneath the phenotypic landscape of a plant are intricate and dynamic networks of genes and proteins. As evolution operates on phenotypes, we expect its forces to shape somehow these underlying molecular networks. In this review, we discuss progress being made to elucidate the nature of these forces and their impact on the composition and structure of molecular networks. We also outline current limitations and open questions facing the broader field of plant network analysis.
Collapse
Affiliation(s)
- Lee Chae
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA.
| | | | | | | |
Collapse
|
20
|
Judson RS, Mortensen HM, Shah I, Knudsen TB, Elloumi F. Using pathway modules as targets for assay development in xenobiotic screening. ACTA ACUST UNITED AC 2012; 8:531-42. [DOI: 10.1039/c1mb05303e] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
21
|
Wang X, Yue J, Ren X, Wang Y, Tan M, Li B, Liang L. Modularity analysis based on predicted protein-protein interactions provides new insights into pathogenicity and cellular process of Escherichia coli O157:H7. Theor Biol Med Model 2011; 8:47. [PMID: 22188601 PMCID: PMC3275473 DOI: 10.1186/1742-4682-8-47] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2011] [Accepted: 12/22/2011] [Indexed: 12/19/2022] Open
Abstract
Background With the development of experimental techniques and bioinformatics, the quantity of data available from protein-protein interactions (PPIs) is increasing exponentially. Functional modules can be identified from protein interaction networks. It follows that the investigation of functional modules will generate a better understanding of cellular organization, processes, and functions. However, experimental PPI data are still limited, and no modularity analysis of PPIs in pathogens has been published to date. Results In this study, we predict and analyze the functional modules of E. coli O157:H7 systemically by integrating several bioinformatics methods. After evaluation, most of the predicted modules are found to be biologically significant and functionally homogeneous. Six pathogenicity-related modules were discovered and analyzed, including novel modules. These modules provided new information on the pathogenicity of O157:H7. The modularity of cellular function and cooperativity between modules are also discussed. Moreover, modularity analysis of O157:H7 can provide possible candidates for biological pathway extension and clues for discovering new pathways of cross-talk. Conclusions This article provides the first modularity analysis of a pathogen and sheds new light on the study of pathogens and cellular processes. Our study also provides a strategy for applying modularity analysis to any sequenced organism.
Collapse
Affiliation(s)
- Xia Wang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing 100071, China
| | | | | | | | | | | | | |
Collapse
|
22
|
Brouwers L, Iskar M, Zeller G, van Noort V, Bork P. Network neighbors of drug targets contribute to drug side-effect similarity. PLoS One 2011; 6:e22187. [PMID: 21765950 PMCID: PMC3135612 DOI: 10.1371/journal.pone.0022187] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2011] [Accepted: 06/19/2011] [Indexed: 12/31/2022] Open
Abstract
In pharmacology, it is essential to identify the molecular mechanisms of drug action in order to understand adverse side effects. These adverse side effects have been used to infer whether two drugs share a target protein. However, side-effect similarity of drugs could also be caused by their target proteins being close in a molecular network, which as such could cause similar downstream effects. In this study, we investigated the proportion of side-effect similarities that is due to targets that are close in the network compared to shared drug targets. We found that only a minor fraction of side-effect similarities (5.8 %) are caused by drugs targeting proteins close in the network, compared to side-effect similarities caused by overlapping drug targets (64%). Moreover, these targets that cause similar side effects are more often in a linear part of the network, having two or less interactions, than drug targets in general. Based on the examples, we gained novel insight into the molecular mechanisms of side effects associated with several drug targets. Looking forward, such analyses will be extremely useful in the process of drug development to better understand adverse side effects.
Collapse
Affiliation(s)
- Lucas Brouwers
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Murat Iskar
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Georg Zeller
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Vera van Noort
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max-Delbruck-Centre for Molecular Medicine, Berlin-Buch, Germany
- * E-mail:
| |
Collapse
|
23
|
Raes J, Letunic I, Yamada T, Jensen LJ, Bork P. Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data. Mol Syst Biol 2011; 7:473. [PMID: 21407210 PMCID: PMC3094067 DOI: 10.1038/msb.2011.6] [Citation(s) in RCA: 148] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2010] [Accepted: 01/25/2011] [Indexed: 11/10/2022] Open
Abstract
Using metagenomic ‘parts lists' to study microbial ecology remains a significant challenge. This work proposes a molecular trait-based approach to biogeography by integrating metagenomic data with external metadata and using functional community composition as readout. Climatic factors drive functional and phylogenetic composition of ocean microbial communities. Function dispersal is controlled by environmental conditions. Functional richness has a clear latitudinal gradient and correlates with primary production. Metagenomic data can be used as a predictor for ecosystem processes. To understand the relationship between community composition and environment, functional readouts are the most direct. Metagenomic data enable such trait-based ecology at the molecular level.
Metagenomics (shotgun sequencing of pooled DNA of complete microbial communities) is widely used to investigate ecosystem functioning of environmental and clinical samples. However, the nature of this data (usually a gigantic collection of gene fragments of 1000s of organisms) makes it very hard to infer global patterns on microbial ecology of the environment at hand. To address important ecological questions such as ‘How do microbial communities adapt to the environmental conditions?', ‘What drives the functional variation across the globe and to what extent do genes disperse?' and ‘What drives variation of CO2 uptake across different locations and communities?', we integrated 25 ocean metagenomes from the Global Ocean Sampling project with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the functional and phylogenetic composition of an environment and the main limiting factor on whether functions dispersal across the planet. We find a distinct latitudinal gradient in the size and diversity of the functional repertoire of ocean microbial communities, peaking at 20°N, and which correlates with oceanic CO2 uptake. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes can be used as quantitative predictor for molecular trait-based biogeography and ecology. Using metagenomic ‘parts lists' to infer global patterns on microbial ecology remains a significant challenge. To deduce important ecological indicators such as environmental adaptation, molecular trait dispersal, diversity variation and primary production from the gene pool of an ecosystem, we integrated 25 ocean metagenomes with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the biomolecular repertoire of each sample and the main limiting factor on functional trait dispersal (absence of biogeographic provincialism). Molecular functional richness and diversity show a distinct latitudinal gradient peaking at 20°N and correlate with primary production. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes is an important quantitative readout for molecular trait-based biogeography and ecology.
Collapse
Affiliation(s)
- Jeroen Raes
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | | | | | | | | |
Collapse
|
24
|
Konietzny SG, Dietz L, McHardy AC. Inferring functional modules of protein families with probabilistic topic models. BMC Bioinformatics 2011; 12:141. [PMID: 21554720 PMCID: PMC3098182 DOI: 10.1186/1471-2105-12-141] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2010] [Accepted: 05/09/2011] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context. RESULTS We describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules. CONCLUSIONS We show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.
Collapse
Affiliation(s)
- Sebastian Ga Konietzny
- Max Planck Research Group for Computational Genomics and Epidemiology, Max Planck Institute for Informatics, University Campus E1 4, 66123 Saarbrücken, Germany
| | | | | |
Collapse
|
25
|
Kumar M, Balaji PV. Comparative genomics analysis of completely sequenced microbial genomes reveals the ubiquity of N-linked glycosylation in prokaryotes. MOLECULAR BIOSYSTEMS 2011; 7:1629-45. [PMID: 21387023 DOI: 10.1039/c0mb00259c] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Glycosylation of proteins in prokaryotes has been known for the last few decades. Glycan structures and/or the glycosylation pathways have been experimentally characterized in only a small number of prokaryotes. Even this has become possible only during the last decade or so, primarily due to technological and methodological developments. Glycosylated proteins are diverse in their function and localization. Glycosylation has been shown to be associated with a wide range of biological phenomena. Characterization of the various types of glycans and the glycosylation machinery is critical to understand such processes. Such studies can help in the identification of novel targets for designing drugs, diagnostics, and engineering of therapeutic proteins. In view of this, the experimentally characterized pgl system of Campylobacter jejuni, responsible for N-linked glycosylation, has been used in this study to identify glycosylation loci in 865 prokaryotes whose genomes have been completely sequenced. Results from the present study show that only a small number of organisms have homologs for all the pgl enzymes and a few others have homologs for none of the pgl enzymes. Most of the organisms have homologs for only a subset of the pgl enzymes. There is no specific pattern for the presence or absence of pgl homologs vis-à-vis the 16S rRNA sequence-based phylogenetic tree. This may be due to differences in the glycan structures, high sequence divergence, horizontal gene transfer or non-orthologous gene displacement. Overall, the presence of homologs for pgl enzymes in a large number of organisms irrespective of their habitat, pathogenicity, energy generation mechanism, etc., hints towards the ubiquity of N-linked glycosylation in prokaryotes.
Collapse
Affiliation(s)
- Manjeet Kumar
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400 076, India
| | | |
Collapse
|
26
|
|
27
|
The emergence of modularity in biological systems. Phys Life Rev 2011; 8:129-60. [PMID: 21353651 DOI: 10.1016/j.plrev.2011.02.003] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2011] [Accepted: 02/09/2011] [Indexed: 11/22/2022]
Abstract
In this review, we discuss modularity and hierarchy in biological systems. We review examples from protein structure, genetics, and biological networks of modular partitioning of the geometry of biological space. We review theories to explain modular organization of biology, with a focus on explaining how biology may spontaneously organize to a structured form. That is, we seek to explain how biology nucleated from among the many possibilities in chemistry. The emergence of modular organization of biological structure will be described as a symmetry-breaking phase transition, with modularity as the order parameter. Experimental support for this description will be reviewed. Examples will be presented from pathogen structure, metabolic networks, gene networks, and protein-protein interaction networks. Additional examples will be presented from ecological food networks, developmental pathways, physiology, and social networks.
Collapse
|
28
|
Xu G, Bennett L, Papageorgiou LG, Tsoka S. Module detection in complex networks using integer optimisation. Algorithms Mol Biol 2010; 5:36. [PMID: 21073720 PMCID: PMC2993711 DOI: 10.1186/1748-7188-5-36] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Accepted: 11/12/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The detection of modules or community structure is widely used to reveal the underlying properties of complex networks in biology, as well as physical and social sciences. Since the adoption of modularity as a measure of network topological properties, several methodologies for the discovery of community structure based on modularity maximisation have been developed. However, satisfactory partitions of large graphs with modest computational resources are particularly challenging due to the NP-hard nature of the related optimisation problem. Furthermore, it has been suggested that optimising the modularity metric can reach a resolution limit whereby the algorithm fails to detect smaller communities than a specific size in large networks. RESULTS We present a novel solution approach to identify community structure in large complex networks and address resolution limitations in module detection. The proposed algorithm employs modularity to express network community structure and it is based on mixed integer optimisation models. The solution procedure is extended through an iterative procedure to diminish effects that tend to agglomerate smaller modules (resolution limitations). CONCLUSIONS A comprehensive comparative analysis of methodologies for module detection based on modularity maximisation shows that our approach outperforms previously reported methods. Furthermore, in contrast to previous reports, we propose a strategy to handle resolution limitations in modularity maximisation. Overall, we illustrate ways to improve existing methodologies for community structure identification so as to increase its efficiency and applicability.
Collapse
|
29
|
Circulating brain-derived neurotrophic factor and indices of metabolic and cardiovascular health: data from the Baltimore Longitudinal Study of Aging. PLoS One 2010; 5:e10099. [PMID: 20404913 PMCID: PMC2852401 DOI: 10.1371/journal.pone.0010099] [Citation(s) in RCA: 144] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2009] [Accepted: 03/10/2010] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Besides its well-established role in nerve cell survival and adaptive plasticity, brain-derived neurotrophic factor (BDNF) is also involved in energy homeostasis and cardiovascular regulation. Although BDNF is present in the systemic circulation, it is unknown whether plasma BDNF correlates with circulating markers of dysregulated metabolism and an adverse cardiovascular profile. METHODOLOGY/PRINCIPAL FINDINGS To determine whether circulating BDNF correlates with indices of metabolic and cardiovascular health, we measured plasma BDNF levels in 496 middle-age and elderly subjects (mean age approximately 70), in the Baltimore Longitudinal Study of Aging. Linear regression analysis revealed that plasma BDNF is associated with risk factors for cardiovascular disease and metabolic syndrome, regardless of age. In females, BDNF was positively correlated with BMI, fat mass, diastolic blood pressure, total cholesterol, and LDL-cholesterol, and inversely correlated with folate. In males, BDNF was positively correlated with diastolic blood pressure, triglycerides, free thiiodo-thyronine (FT3), and bioavailable testosterone, and inversely correlated with sex-hormone binding globulin, and adiponectin. CONCLUSION/SIGNIFICANCE Plasma BDNF significantly correlates with multiple risk factors for metabolic syndrome and cardiovascular dysfunction. Whether BDNF contributes to the pathogenesis of these disorders or functions in adaptive responses to cellular stress (as occurs in the brain) remains to be determined.
Collapse
|
30
|
Kanapin AA, Mulder N, Kuznetsov VA. Projection of gene-protein networks to the functional space of the proteome and its application to analysis of organism complexity. BMC Genomics 2010; 11 Suppl 1:S4. [PMID: 20158875 PMCID: PMC2822532 DOI: 10.1186/1471-2164-11-s1-s4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
We consider the problem of biological complexity via a projection of protein-coding genes of complex organisms onto the functional space of the proteome. The latter can be defined as a set of all functions committed by proteins of an organism. Alternative splicing (AS) allows an organism to generate diverse mature RNA transcripts from a single mRNA strand and thus it could be one of the key mechanisms of increasing of functional complexity of the organism's proteome and a driving force of biological evolution. Thus, the projection of transcription units (TU) and alternative splice-variant (SV) forms onto proteome functional space could generate new types of relational networks (e.g. SV-protein function networks, SFN) and lead to discoveries of novel evolutionarily conservative functional modules. Such types of networks might provide new reliable characteristics of organism complexity and a better understanding of the evolutionary integration and plasticity of interconnection of genome-transcriptome-proteome functions.
Collapse
|
31
|
Reid AJ, Ranea JA, Orengo CA. Comparative evolutionary analysis of protein complexes in E. coli and yeast. BMC Genomics 2010; 11:79. [PMID: 20122144 PMCID: PMC2837643 DOI: 10.1186/1471-2164-11-79] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2009] [Accepted: 02/01/2010] [Indexed: 11/17/2022] Open
Abstract
Background Proteins do not act in isolation; they frequently act together in protein complexes to carry out concerted cellular functions. The evolution of complexes is poorly understood, especially in organisms other than yeast, where little experimental data has been available. Results We generated accurate, high coverage datasets of protein complexes for E. coli and yeast in order to study differences in the evolution of complexes between these two species. We show that substantial differences exist in how complexes have evolved between these organisms. A previously proposed model of complex evolution identified complexes with cores of interacting homologues. We support findings of the relative importance of this mode of evolution in yeast, but find that it is much less common in E. coli. Additionally it is shown that those homologues which do cluster in complexes are involved in eukaryote-specific functions. Furthermore we identify correlated pairs of non-homologous domains which occur in multiple protein complexes. These were identified in both yeast and E. coli and we present evidence that these too may represent complex cores in yeast but not those of E. coli. Conclusions Our results suggest that there are differences in the way protein complexes have evolved in E. coli and yeast. Whereas some yeast complexes have evolved by recruiting paralogues, this is not apparent in E. coli. Furthermore, such complexes are involved in eukaryotic-specific functions. This implies that the increase in gene family sizes seen in eukaryotes in part reflects multiple family members being used within complexes. However, in general, in both E. coli and yeast, homologous domains are used in different complexes.
Collapse
Affiliation(s)
- Adam J Reid
- Research Department of Structural & Molecular Biology, University College London, London, WC1E 6BT, UK.
| | | | | |
Collapse
|
32
|
Abstract
Interactions among cellular constituents play a crucial role in overall cellular function and organization. These interactions can be viewed as being complementary to the usual "parts list" of genes and proteins and, in conjunction with the expression states of these parts, are key to a systems level understanding of the cell. Here, we review computational approaches to the understanding of the functional roles of cellular networks, ranging from "static" models of network topology to dynamical and stochastic simulations.
Collapse
|
33
|
|
34
|
'Unknown' proteins and 'orphan' enzymes: the missing half of the engineering parts list--and how to find it. Biochem J 2009; 425:1-11. [PMID: 20001958 DOI: 10.1042/bj20091328] [Citation(s) in RCA: 135] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Like other forms of engineering, metabolic engineering requires knowledge of the components (the 'parts list') of the target system. Lack of such knowledge impairs both rational engineering design and diagnosis of the reasons for failures; it also poses problems for the related field of metabolic reconstruction, which uses a cell's parts list to recreate its metabolic activities in silico. Despite spectacular progress in genome sequencing, the parts lists for most organisms that we seek to manipulate remain highly incomplete, due to the dual problem of 'unknown' proteins and 'orphan' enzymes. The former are all the proteins deduced from genome sequence that have no known function, and the latter are all the enzymes described in the literature (and often catalogued in the EC database) for which no corresponding gene has been reported. Unknown proteins constitute up to about half of the proteins in prokaryotic genomes, and much more than this in higher plants and animals. Orphan enzymes make up more than a third of the EC database. Attacking the 'missing parts list' problem is accordingly one of the great challenges for post-genomic biology, and a tremendous opportunity to discover new facets of life's machinery. Success will require a co-ordinated community-wide attack, sustained over years. In this attack, comparative genomics is probably the single most effective strategy, for it can reliably predict functions for unknown proteins and genes for orphan enzymes. Furthermore, it is cost-efficient and increasingly straightforward to deploy owing to a proliferation of databases and associated tools.
Collapse
|
35
|
Evolution of biomolecular networks: lessons from metabolic and protein interactions. Nat Rev Mol Cell Biol 2009; 10:791-803. [PMID: 19851337 DOI: 10.1038/nrm2787] [Citation(s) in RCA: 144] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Despite only becoming popular at the beginning of this decade, biomolecular networks are now frameworks that facilitate many discoveries in molecular biology. The nodes of these networks are usually proteins (specifically enzymes in metabolic networks), whereas the links (or edges) are their interactions with other molecules. These networks are made up of protein-protein interactions or enzyme-enzyme interactions through shared metabolites in the case of metabolic networks. Evolutionary analysis has revealed that changes in the nodes and links in protein-protein interaction and metabolic networks are subject to different selection pressures owing to distinct topological features. However, many evolutionary constraints can be uncovered only if temporal and spatial aspects are included in the network analysis.
Collapse
|
36
|
|
37
|
Song J, Singh M. How and when should interactome-derived clusters be used to predict functional modules and protein function? ACTA ACUST UNITED AC 2009; 25:3143-50. [PMID: 19770263 PMCID: PMC3167697 DOI: 10.1093/bioinformatics/btp551] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Motivation: Clustering of protein–protein interaction networks is one of the most common approaches for predicting functional modules, protein complexes and protein functions. But, how well does clustering perform at these tasks? Results: We develop a general framework to assess how well computationally derived clusters in physical interactomes overlap functional modules derived via the Gene Ontology (GO). Using this framework, we evaluate six diverse network clustering algorithms using Saccharomyces cerevisiae and show that (i) the performances of these algorithms can differ substantially when run on the same network and (ii) their relative performances change depending upon the topological characteristics of the network under consideration. For the specific task of function prediction in S.cerevisiae, we demonstrate that, surprisingly, a simple non-clustering guilt-by-association approach outperforms widely used clustering-based approaches that annotate a protein with the overrepresented biological process and cellular component terms in its cluster; this is true over the range of clustering algorithms considered. Further analysis parameterizes performance based on the number of annotated proteins, and suggests when clustering approaches should be used for interactome functional analyses. Overall our results suggest a re-examination of when and how clustering approaches should be applied to physical interactomes, and establishes guidelines by which novel clustering approaches for biological networks should be justified and evaluated with respect to functional analysis. Contact:msingh@cs.princeton.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimin Song
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics Princeton University, Princeton, NJ 08544, USA
| | | |
Collapse
|
38
|
Wagner A. Evolutionary constraints permeate large metabolic networks. BMC Evol Biol 2009; 9:231. [PMID: 19747381 PMCID: PMC2753571 DOI: 10.1186/1471-2148-9-231] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Accepted: 09/11/2009] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Metabolic networks show great evolutionary plasticity, because they can differ substantially even among closely related prokaryotes. Any one metabolic network can also effectively compensate for the blockage of individual reactions by rerouting metabolic flux through other pathways. These observations, together with the continual discovery of new microbial metabolic pathways and enzymes, raise the possibility that metabolic networks are only weakly constrained in changing their complement of enzymatic reactions. RESULTS To ask whether this is the case, I characterized pairwise and higher-order associations in the co-occurrence of genes encoding metabolic enzymes in more than 200 completely sequenced representatives of prokaryotic genera. The majority of reactions show constrained evolution. Specifically, genes encoding most reactions tend to co-occur with genes encoding other reaction(s). Constrained reaction pairs occur in small sets whose number is substantially greater than expected by chance alone. Most such sets are associated with single biochemical pathways. The respective genes are not always tightly linked, which renders horizontal co-transfer of constrained reaction sets an unlikely sole cause for these patterns of association. CONCLUSION Even a limited number of available genomes suffices to show that metabolic network evolution is highly constrained by reaction combinations that are favored by natural selection. With increasing numbers of completely sequenced genomes, an evolutionary constraint-based approach may enable a detailed characterization of co-evolving metabolic modules.
Collapse
Affiliation(s)
- Andreas Wagner
- University of Zurich, Dept. of Biochemistry, CH-8057 Zurich, Switzerland.
| |
Collapse
|
39
|
Rentzsch R, Orengo CA. Protein function prediction--the power of multiplicity. Trends Biotechnol 2009; 27:210-9. [PMID: 19251332 DOI: 10.1016/j.tibtech.2009.01.002] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Revised: 01/21/2009] [Accepted: 01/23/2009] [Indexed: 01/07/2023]
Abstract
Advances in experimental and computational methods have quietly ushered in a new era in protein function annotation. This 'age of multiplicity' is marked by the notion that only the use of multiple tools, multiple evidence and considering the multiple aspects of function can give us the broad picture that 21st century biology will need to link and alter micro- and macroscopic phenotypes. It might also help us to undo past mistakes by removing errors from our databases and prevent us from producing more. On the downside, multiplicity is often confusing. We therefore systematically review methods and resources for automated protein function prediction, looking at individual (biochemical) and contextual (network) functions, respectively.
Collapse
Affiliation(s)
- Robert Rentzsch
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.
| | | |
Collapse
|
40
|
Wang H, Kakaradov B, Collins SR, Karotki L, Fiedler D, Shales M, Shokat KM, Walther TC, Krogan NJ, Koller D. A complex-based reconstruction of the Saccharomyces cerevisiae interactome. Mol Cell Proteomics 2009; 8:1361-81. [PMID: 19176519 PMCID: PMC2690481 DOI: 10.1074/mcp.m800490-mcp200] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Most cellular processes are performed by proteomic units that interact with each other. These units are often stoichiometrically stable complexes comprised of several proteins. To obtain a faithful view of the protein interactome we must view it in terms of these basic units (complexes and proteins) and the interactions between them. This study makes two contributions toward this goal. First, it provides a new algorithm for reconstruction of stable complexes from a variety of heterogeneous biological assays; our approach combines state-of-the-art machine learning methods with a novel hierarchical clustering algorithm that allows clusters to overlap. We demonstrate that our approach constructs over 40% more known complexes than other recent methods and that the complexes it produces are more biologically coherent even compared with the reference set. We provide experimental support for some of our novel predictions, identifying both a new complex involved in nutrient starvation and a new component of the eisosome complex. Second, we provide a high accuracy algorithm for the novel problem of predicting transient interactions involving complexes. We show that our complex level network, which we call ComplexNet, provides novel insights regarding the protein-protein interaction network. In particular, we reinterpret the finding that “hubs” in the network are enriched for being essential, showing instead that essential proteins tend to be clustered together in essential complexes and that these essential complexes tend to be large.
Collapse
Affiliation(s)
- Haidong Wang
- Computer Science Department, Stanford University, Stanford, California 94305, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Abstract
Genome-wide metabolic maps allow the development of network-based computational approaches for linking an organism with its biochemical habitat. Progress in the reconstruction of genome-wide metabolic maps has led to the development of network-based computational approaches for linking an organism with its biochemical habitat.
Collapse
|
42
|
Harrington ED, Jensen LJ, Bork P. Predicting biological networks from genomic data. FEBS Lett 2008; 582:1251-8. [PMID: 18294967 DOI: 10.1016/j.febslet.2008.02.033] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2008] [Accepted: 02/13/2008] [Indexed: 12/27/2022]
Abstract
Continuing improvements in DNA sequencing technologies are providing us with vast amounts of genomic data from an ever-widening range of organisms. The resulting challenge for bioinformatics is to interpret this deluge of data and place it back into its biological context. Biological networks provide a conceptual framework with which we can describe part of this context, namely the different interactions that occur between the molecular components of a cell. Here, we review the computational methods available to predict biological networks from genomic sequence data and discuss how they relate to high-throughput experimental methods.
Collapse
Affiliation(s)
- Eoghan D Harrington
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| | | | | |
Collapse
|
43
|
Kensche PR, van Noort V, Dutilh BE, Huynen MA. Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution. J R Soc Interface 2008; 5:151-70. [PMID: 17535793 PMCID: PMC2405902 DOI: 10.1098/rsif.2007.1047] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The gap between the amount of genome information released by genome sequencing projects and our knowledge about the proteins' functions is rapidly increasing. To fill this gap, various 'genomic-context' methods have been proposed that exploit sequenced genomes to predict the functions of the encoded proteins. One class of methods, phylogenetic profiling, predicts protein function by correlating the phylogenetic distribution of genes with that of other genes or phenotypic characteristics. The functions of a number of proteins, including ones of medical relevance, have thus been predicted and subsequently confirmed experimentally. Additionally, various approaches to measure the similarity of phylogenetic profiles and to account for the phylogenetic bias in the data have been proposed. We review the successful applications of phylogenetic profiling and analyse the performance of various profile similarity measures with a set of one microsporidial and 25 fungal genomes. In the fungi, phylogenetic profiling yields high-confidence predictions for the highest and only the highest scoring gene pairs illustrating both the power and the limitations of the approach. Both practical examples and theoretical considerations suggest that in order to get a reliable and specific picture of a protein's function, results from phylogenetic profiling have to be combined with other sources of evidence.
Collapse
Affiliation(s)
- Philip R. Kensche
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
- Author for correspondence ()
| | - Vera van Noort
- European Molecular Biology Laboratory, Meyerhofstrasse 169117 Heidelberg, Germany
| | - Bas E. Dutilh
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Martijn A. Huynen
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
44
|
Abstract
The treatment of bacterial infections is increasingly complicated because microorganisms can develop resistance to antimicrobial agents. This article discusses the information that is required to predict when antibiotic resistance is likely to emerge in a bacterial population. Indeed, the development of the conceptual and methodological tools required for this type of prediction represents an important goal for microbiological research. To this end, we propose the establishment of methodological guidelines that will allow researchers to predict the emergence of resistance to a new antibiotic before its clinical introduction.
Collapse
Affiliation(s)
- José L Martínez
- Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública and Unidad Asociada al CSIC Resistencia a los Antibióticos y Virulencia Bacteriana, Cantoblanco, 28049-Madrid, Spain.
| | | | | |
Collapse
|
45
|
Parter M, Kashtan N, Alon U. Environmental variability and modularity of bacterial metabolic networks. BMC Evol Biol 2007; 7:169. [PMID: 17888177 PMCID: PMC2151768 DOI: 10.1186/1471-2148-7-169] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2007] [Accepted: 09/23/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biological systems are often modular: they can be decomposed into nearly-independent structural units that perform specific functions. The evolutionary origin of modularity is a subject of much current interest. Recent theory suggests that modularity can be enhanced when the environment changes over time. However, this theory has not yet been tested using biological data. RESULTS To address this, we studied the relation between environmental variability and modularity in a natural and well-studied system, the metabolic networks of bacteria. We classified 117 bacterial species according to the degree of variability in their natural habitat. We find that metabolic networks of organisms in variable environments are significantly more modular than networks of organisms that evolved under more constant conditions. CONCLUSION This study supports the view that variability in the natural habitat of an organism promotes modularity in its metabolic network and perhaps in other biological systems.
Collapse
Affiliation(s)
- Merav Parter
- Molecular Cell Biology Department, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Nadav Kashtan
- Molecular Cell Biology Department, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Uri Alon
- Molecular Cell Biology Department, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
46
|
Zhao J, Ding GH, Tao L, Yu H, Yu ZH, Luo JH, Cao ZW, Li YX. Modular co-evolution of metabolic networks. BMC Bioinformatics 2007; 8:311. [PMID: 17723146 PMCID: PMC2001200 DOI: 10.1186/1471-2105-8-311] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2007] [Accepted: 08/27/2007] [Indexed: 11/25/2022] Open
Abstract
Background The architecture of biological networks has been reported to exhibit high level of modularity, and to some extent, topological modules of networks overlap with known functional modules. However, how the modular topology of the molecular network affects the evolution of its member proteins remains unclear. Results In this work, the functional and evolutionary modularity of Homo sapiens (H. sapiens) metabolic network were investigated from a topological point of view. Network decomposition shows that the metabolic network is organized in a highly modular core-periphery way, in which the core modules are tightly linked together and perform basic metabolism functions, whereas the periphery modules only interact with few modules and accomplish relatively independent and specialized functions. Moreover, over half of the modules exhibit co-evolutionary feature and belong to specific evolutionary ages. Peripheral modules tend to evolve more cohesively and faster than core modules do. Conclusion The correlation between functional, evolutionary and topological modularity suggests that the evolutionary history and functional requirements of metabolic systems have been imprinted in the architecture of metabolic networks. Such systems level analysis could demonstrate how the evolution of genes may be placed in a genome-scale network context, giving a novel perspective on molecular evolution.
Collapse
Affiliation(s)
- Jing Zhao
- School of Life Sciences & Technology, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Center for Bioinformation and Technology, Shanghai 200235, China
- Department of Mathematics, Logistical Engineering University, Chongqing 400016, China
| | - Guo-Hui Ding
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Lin Tao
- Shanghai Center for Bioinformation and Technology, Shanghai 200235, China
| | - Hong Yu
- Shanghai Center for Bioinformation and Technology, Shanghai 200235, China
| | - Zhong-Hao Yu
- School of Life Sciences & Technology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jian-Hua Luo
- School of Life Sciences & Technology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhi-Wei Cao
- Shanghai Center for Bioinformation and Technology, Shanghai 200235, China
| | - Yi-Xue Li
- School of Life Sciences & Technology, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Center for Bioinformation and Technology, Shanghai 200235, China
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
47
|
Harrington ED, Singh AH, Doerks T, Letunic I, von Mering C, Jensen LJ, Raes J, Bork P. Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proc Natl Acad Sci U S A 2007; 104:13913-8. [PMID: 17717083 PMCID: PMC1955820 DOI: 10.1073/pnas.0702636104] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.
Collapse
Affiliation(s)
- E. D. Harrington
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - A. H. Singh
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - T. Doerks
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - I. Letunic
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - C. von Mering
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - L. J. Jensen
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - J. Raes
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - P. Bork
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
- Max Delbrück Centre for Molecular Medicine, D-13092 Berlin, Germany
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
48
|
Tsoka S. Computational methodologies for genome evolution and functional association. Comput Chem Eng 2007. [DOI: 10.1016/j.compchemeng.2006.11.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
49
|
Díaz-Mejía JJ, Pérez-Rueda E, Segovia L. A network perspective on the evolution of metabolism by gene duplication. Genome Biol 2007; 8:R26. [PMID: 17326820 PMCID: PMC1852415 DOI: 10.1186/gb-2007-8-2-r26] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2006] [Revised: 10/23/2006] [Accepted: 02/27/2007] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Gene duplication followed by divergence is one of the main sources of metabolic versatility. The patchwork and stepwise models of metabolic evolution help us to understand these processes, but their assumptions are relatively simplistic. We used a network-based approach to determine the influence of metabolic constraints on the retention of duplicated genes. RESULTS We detected duplicated genes by looking for enzymes sharing homologous domains and uncovered an increased retention of duplicates for enzymes catalyzing consecutive reactions, as illustrated by the ligases acting in the biosynthesis of peptidoglycan. As a consequence, metabolic networks show a high retention of duplicates within functional modules, and we found a preferential biochemical coupling of reactions that partially explains this bias. A similar situation was found in enzyme-enzyme interaction networks, but not in interaction networks of non-enzymatic proteins or gene transcriptional regulatory networks, suggesting that the retention of duplicates results from the biochemical rules governing substrate-enzyme-product relationships. We confirmed a high retention of duplicates between chemically similar reactions, as illustrated by fatty-acid metabolism. The retention of duplicates between chemically dissimilar reactions is, however, also greater than expected by chance. Finally, we detected a significant retention of duplicates as groups, instead of single pairs. CONCLUSION Our results indicate that in silico modeling of the origin and evolution of metabolism is improved by the inclusion of specific functional constraints, such as the preferential biochemical coupling of reactions. We suggest that the stepwise and patchwork models are not independent of each other: in fact, the network perspective enables us to reconcile and combine these models.
Collapse
Affiliation(s)
- Juan Javier Díaz-Mejía
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México. Av. Universidad 2001, Col. Chamilpa, Cuernavaca, Morelos, CP 62210 México
| | - Ernesto Pérez-Rueda
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México. Av. Universidad 2001, Col. Chamilpa, Cuernavaca, Morelos, CP 62210 México
| | - Lorenzo Segovia
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México. Av. Universidad 2001, Col. Chamilpa, Cuernavaca, Morelos, CP 62210 México
| |
Collapse
|
50
|
Chen L, Vitkup D. Distribution of orphan metabolic activities. Trends Biotechnol 2007; 25:343-8. [PMID: 17580095 DOI: 10.1016/j.tibtech.2007.06.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2007] [Revised: 04/17/2007] [Accepted: 06/01/2007] [Indexed: 10/23/2022]
Abstract
A significant fraction (30-40%) of known metabolic activities is currently orphan. Although orphan activities have been biochemically characterized, we do not know a single gene responsible for these reactions in any organism. The problem of orphan activities represents one of the major challenges of modern biochemistry. We analyze the distribution of orphans across biochemical space, through years of enzymatic characterization, and by biological organisms. We find that orphan metabolic activities have been accumulating for many decades. They are widely distributed across enzymatic functional space and metabolic network neighborhoods. Although orphans are relatively more abundant in less studied species, over half of orphan reactions have been experimentally characterized in more than one organism. Shrinking the space of orphan activities will likely require a close collaboration between computational and experimental laboratories.
Collapse
Affiliation(s)
- Lifeng Chen
- Center for Computational Biology and Bioinformatics and Department of Biomedical Informatics, Columbia University, 1130 Nicholas Ave., Irving Cancer Research Center, New York, NY 10032, USA
| | | |
Collapse
|