1
|
Reynolds KA, Rosa-Molinar E, Ward RE, Zhang H, Urbanowicz BR, Settles AM. Accelerating biological insight for understudied genes. Integr Comp Biol 2021; 61:2233-2243. [PMID: 33970251 DOI: 10.1093/icb/icab029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The rapid expansion of genome sequence data is increasing the discovery of protein-coding genes across all domains of life. Annotating these genes with reliable functional information is necessary to understand evolution, to define the full biochemical space accessed by nature, and to identify target genes for biotechnology improvements. The vast majority of proteins are annotated based on sequence conservation with no specific biological, biochemical, genetic, or cellular function identified. Recent technical advances throughout the biological sciences enable experimental research on these understudied protein-coding genes in a broader collection of species. However, scientists have incentives and biases to continue focusing on well documented genes within their preferred model organism. This perspective suggests a research model that seeks to break historic silos of research bias by enabling interdisciplinary teams to accelerate biological functional annotation. We propose an initiative to develop coordinated projects of collaborating evolutionary biologists, cell biologists, geneticists, and biochemists that will focus on subsets of target genes in multiple model organisms. Concurrent analysis in multiple organisms takes advantage of evolutionary divergence and selection, which causes individual species to be better suited as experimental models for specific genes. Most importantly, multisystem approaches would encourage transdisciplinary critical thinking and hypothesis testing that is inherently slow in current biological research.
Collapse
Affiliation(s)
- Kimberly A Reynolds
- The Green Center for Systems Biology and the Department of Biophysics, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Eduardo Rosa-Molinar
- Department of Pharmacology & Toxicology, The University of Kansas, Lawrence, KS 66047, USA
| | - Robert E Ward
- Department of Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Hongbin Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Breeanna R Urbanowicz
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia 30602, USA
| | - A Mark Settles
- Bioengineering Branch, NASA Ames Research Center, Moffett Field, CA USA
| |
Collapse
|
2
|
Koonin EV, Makarova KS, Wolf YI. Evolution of Microbial Genomics: Conceptual Shifts over a Quarter Century. Trends Microbiol 2021; 29:582-592. [PMID: 33541841 DOI: 10.1016/j.tim.2021.01.005] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 01/07/2021] [Accepted: 01/08/2021] [Indexed: 12/20/2022]
Abstract
Prokaryote genomics started in earnest in 1995, with the complete sequences of two small bacterial genomes, those of Haemophilus influenzae and Mycoplasma genitalium. During the next quarter century, the prokaryote genome database has been growing exponentially, with no saturation in sight. For most of these 25 years, genome sequencing remained limited to cultivable microbes. Together with next-generation sequencing methods, advances in metagenomics and single-cell genomics have lifted this limitation, providing for an increasingly unbiased characterization of the global prokaryote diversity. Advances in computational genomics followed the progress of genome sequencing, even if occasionally lagging behind. Several major new branches of bacteria and archaea were discovered, including Asgard archaea, the apparent closest relatives of eukaryotes and expansive groups of bacteria and archaea with small genomes thought to be symbionts of other prokaryotes. Comparative analysis of numerous prokaryote genomes spanning a wide range of evolutionary distances changed the conceptual foundations of microbiology, supplanting the notion of species genomes with fixed gene sets with that of dynamic pangenomes and the notion of a single Tree of Life (ToL) with a statistical tree-like trend among individual gene trees. Strides were also made towards a theory and quantitative laws of prokaryote genome evolution.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA.
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| |
Collapse
|
3
|
Schober AF, Mathis AD, Ingle C, Park JO, Chen L, Rabinowitz JD, Junier I, Rivoire O, Reynolds KA. A Two-Enzyme Adaptive Unit within Bacterial Folate Metabolism. Cell Rep 2020; 27:3359-3370.e7. [PMID: 31189117 DOI: 10.1016/j.celrep.2019.05.030] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 04/05/2019] [Accepted: 05/09/2019] [Indexed: 11/29/2022] Open
Abstract
Enzyme function and evolution are influenced by the larger context of a metabolic pathway. Deleterious mutations or perturbations in one enzyme can often be compensated by mutations to others. We used comparative genomics and experiments to examine evolutionary interactions with the essential metabolic enzyme dihydrofolate reductase (DHFR). Analyses of synteny and co-occurrence across bacterial species indicate that DHFR is coupled to thymidylate synthase (TYMS) but relatively independent from the rest of folate metabolism. Using quantitative growth rate measurements and forward evolution in Escherichia coli, we demonstrate that the two enzymes adapt as a relatively independent unit in response to antibiotic stress. Metabolomic profiling revealed that TYMS activity must not exceed DHFR activity to prevent the depletion of reduced folates and the accumulation of the intermediate dihydrofolate. Comparative genomics analyses identified >200 gene pairs with similar statistical signatures of modular co-evolution, suggesting that cellular pathways may be decomposable into small adaptive units.
Collapse
Affiliation(s)
- Andrew F Schober
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Andrew D Mathis
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Christine Ingle
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Junyoung O Park
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Li Chen
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Joshua D Rabinowitz
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Ivan Junier
- Centre National de la Recherche Scientifique, Université Grenoble Alpes, TIMC-IMAG, F-38000 Grenoble, France
| | - Olivier Rivoire
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, PSL Research University, F-75005 Paris, France
| | - Kimberly A Reynolds
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
| |
Collapse
|
4
|
Patterns of diverse gene functions in genomic neighborhoods predict gene function and phenotype. Sci Rep 2019; 9:19537. [PMID: 31863070 PMCID: PMC6925100 DOI: 10.1038/s41598-019-55984-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 12/02/2019] [Indexed: 01/01/2023] Open
Abstract
Genes with similar roles in the cell cluster on chromosomes, thus benefiting from coordinated regulation. This allows gene function to be inferred by transferring annotations from genomic neighbors, following the guilt-by-association principle. We performed a systematic search for co-occurrence of >1000 gene functions in genomic neighborhoods across 1669 prokaryotic, 49 fungal and 80 metazoan genomes, revealing prevalent patterns that cannot be explained by clustering of functionally similar genes. It is a very common occurrence that pairs of dissimilar gene functions – corresponding to semantically distant Gene Ontology terms – are significantly co-located on chromosomes. These neighborhood associations are often as conserved across genomes as the known associations between similar functions, suggesting selective benefits from clustering of certain diverse functions, which may conceivably play complementary roles in the cell. We propose a simple encoding of chromosomal gene order, the neighborhood function profiles (NFP), which draws on diverse gene clustering patterns to predict gene function and phenotype. NFPs yield a 26–46% increase in predictive power over state-of-the-art approaches that propagate function across neighborhoods, thus providing hundreds of novel, high-confidence gene function inferences per genome. Furthermore, we demonstrate that copy number-neutral structural variation that shapes gene function distribution across chromosomes can predict phenotype of individuals from their genome sequence.
Collapse
|
5
|
Shmakov SA, Faure G, Makarova KS, Wolf YI, Severinov KV, Koonin EV. Systematic prediction of functionally linked genes in bacterial and archaeal genomes. Nat Protoc 2019; 14:3013-3031. [PMID: 31520072 DOI: 10.1038/s41596-019-0211-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 06/13/2019] [Indexed: 11/09/2022]
Abstract
Functionally linked genes in bacterial and archaeal genomes are often organized into operons. However, the composition and architecture of operons are highly variable and frequently differ even among closely related genomes. Therefore, to efficiently extract reliable functional predictions for uncharacterized genes from comparative analyses of the rapidly growing genomic databases, dedicated computational approaches are required. We developed a protocol to systematically and automatically identify genes that are likely to be functionally associated with a 'bait' gene or locus by using relevance metrics. Given a set of bait loci and a genomic database defined by the user, this protocol compares the genomic neighborhoods of the baits to identify genes that are likely to be functionally linked to the baits by calculating the abundance of a given gene within and outside the bait neighborhoods and the distance to the bait. We exemplify the performance of the protocol with three test cases, namely, genes linked to CRISPR-Cas systems using the 'CRISPRicity' metric, genes associated with archaeal proviruses and genes linked to Argonaute genes in halobacteria. The protocol can be run by users with basic computational skills. The computational cost depends on the sizes of the genomic dataset and the list of reference loci and can vary from one CPU-hour to hundreds of hours on a supercomputer.
Collapse
Affiliation(s)
- Sergey A Shmakov
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA.,Skolkovo Institute of Science and Technology, Skolkovo, Russia
| | - Guilhem Faure
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Konstantin V Severinov
- Skolkovo Institute of Science and Technology, Skolkovo, Russia.,Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.,Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA.
| |
Collapse
|
6
|
Klobucar K, Brown ED. Use of genetic and chemical synthetic lethality as probes of complexity in bacterial cell systems. FEMS Microbiol Rev 2018; 42:4563584. [PMID: 29069427 DOI: 10.1093/femsre/fux054] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 10/23/2017] [Indexed: 12/22/2022] Open
Abstract
Different conditions and genomic contexts are known to have an impact on gene essentiality and interactions. Synthetic lethal interactions occur when a combination of perturbations, either genetic or chemical, result in a more profound fitness defect than expected based on the effect of each perturbation alone. Synthetic lethality in bacterial systems has long been studied; however, during the past decade, the emerging fields of genomics and chemical genomics have led to an increase in the scale and throughput of these studies. Here, we review the concepts of genomics and chemical genomics in the context of synthetic lethality and their revolutionary roles in uncovering novel biology such as the characterization of genes of unknown function and in antibacterial drug discovery. We provide an overview of the methodologies, examples and challenges of both genetic and chemical synthetic lethal screening platforms. Finally, we discuss how to apply genetic and chemical synthetic lethal approaches to rationalize the synergies of drugs, screen for new and improved antibacterial therapies and predict drug mechanism of action.
Collapse
Affiliation(s)
- Kristina Klobucar
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, 1280 Main St West, Hamilton, ON L8N 3Z5, Canada
| | - Eric D Brown
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, 1280 Main St West, Hamilton, ON L8N 3Z5, Canada
| |
Collapse
|
7
|
Junier I, Rivoire O. Conserved Units of Co-Expression in Bacterial Genomes: An Evolutionary Insight into Transcriptional Regulation. PLoS One 2016; 11:e0155740. [PMID: 27195891 PMCID: PMC4873041 DOI: 10.1371/journal.pone.0155740] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 05/03/2016] [Indexed: 12/18/2022] Open
Abstract
Genome-wide measurements of transcriptional activity in bacteria indicate that the transcription of successive genes is strongly correlated beyond the scale of operons. Here, we analyze hundreds of bacterial genomes to identify supra-operonic segments of genes that are proximal in a large number of genomes. We show that these synteny segments correspond to genomic units of strong transcriptional co-expression. Structurally, the segments contain operons with specific relative orientations (co-directional or divergent) and nucleoid-associated proteins are found to bind at their boundaries. Functionally, operons inside a same segment are highly co-expressed even in the apparent absence of regulatory factors at their promoter regions. Remote operons along DNA can also be co-expressed if their corresponding segments share a transcriptional or sigma factor, without requiring these factors to bind directly to the promoters of the operons. As evidence that these results apply across the bacterial kingdom, we demonstrate them both in the Gram-negative bacterium Escherichia coli and in the Gram-positive bacterium Bacillus subtilis. The underlying process that we propose involves only RNA-polymerases and DNA: it implies that the transcription of an operon mechanically enhances the transcription of adjacent operons. In support of a primary role of this regulation by facilitated co-transcription, we show that the transcription en bloc of successive operons as a result of transcriptional read-through is strongly and specifically enhanced in synteny segments. Finally, our analysis indicates that facilitated co-transcription may be evolutionary primitive and may apply beyond bacteria.
Collapse
Affiliation(s)
- Ivan Junier
- CNRS, TIMC-IMAG, F-38000 Grenoble, France.,Univ. Grenoble Alpes, TIMC-IMAG, F-38000 Grenoble, France
| | - Olivier Rivoire
- CNRS, LIPhy, F-38000 Grenoble, France.,Univ. Grenoble Alpes, LIPhy, F-38000 Grenoble, France
| |
Collapse
|
8
|
OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes. BIOMED RESEARCH INTERNATIONAL 2015; 2015:318217. [PMID: 26543854 PMCID: PMC4620388 DOI: 10.1155/2015/318217] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 03/02/2015] [Indexed: 12/30/2022]
Abstract
Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons—codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.
Collapse
|
9
|
Abstract
In this mini-review I aim to make the case that operons might be the most powerful source for predicted associations among gene products. Such associations can help identify potential processes where the products of unannotated genes might play a role. The power of the operon for providing insight into functional associations stems from four features: (1) on average, around 60% of the genes in prokaryotes are associated into operons; (2) the functional associations between genes in operons tend to be highly conserved; (3) operons can be predicted with high accuracy by conservation of gene order and by the distances between adjacent genes in the same DNA strand; and (4) operons frequently reorganize, providing further insight into functional associations that would not be evident without these reorganization events.
Collapse
|
10
|
The functional landscape bound to the transcription factors of Escherichia coli K-12. Comput Biol Chem 2015; 58:93-103. [PMID: 26094112 DOI: 10.1016/j.compbiolchem.2015.06.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Revised: 05/31/2015] [Accepted: 06/03/2015] [Indexed: 01/05/2023]
Abstract
Motivated by the experimental evidences accumulated in the last ten years and based on information deposited in RegulonDB, literature look up, and sequence analysis, we analyze the repertoire of 304 DNA-binding Transcription factors (TFs) in Escherichia coli K-12. These regulators were grouped in 78 evolutionary families and are regulating almost half of the total genes in this bacterium. In structural terms, 60% of TFs are composed by two-domains, 30% are monodomain, and 10% three- and four-structural domains. As previously noticed, the most abundant DNA-binding domain corresponds to the winged helix-turn-helix, with few alternative DNA-binding structures, resembling the hypothesis of successful protein structures with the emergence of new ones at low scales. In summary, we identified and described the characteristics associated to the DNA-binding TF in E. coli K-12. We also identified twelve functional modules based on a co-regulated gene matrix. Finally, diverse regulons were predicted based on direct associations between the TFs and potential regulated genes. This analysis should increase our knowledge about the gene regulation in the bacterium E. coli K-12, and provide more additional clues for comprehensive modelling of transcriptional regulatory networks in other bacteria.
Collapse
|
11
|
Predicting Functional Interactions Among Genes in Prokaryotes by Genomic Context. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 883:97-106. [PMID: 26621463 DOI: 10.1007/978-3-319-23603-2_5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Genomic context methods for finding functions of unannotated genes were implemented very early after the publication of the first few prokaryotic genomes. The ideas behind these methods include gene fusions, conservation of gene adjacency, and the patters of co-occurrence of genes across available genomes. A later addition was the prediction of features related to functional organization, such as operons, stretches of genes co-transcribed into a single messenger RNA. The ideas behind these methods tend to be easy to understand, while the strategies for transforming those basic ideas into predictions can vary in complexity, mostly because genes whose products are known to functionally interact vary in the way they relate to those basic ideas. We present here a view of genomic context methods for predicting functional interactions, with simple examples of their implementation as compared and evaluated using genes whose products are known to functionally interact.
Collapse
|
12
|
A novel function prediction approach using protein overlap networks. BMC SYSTEMS BIOLOGY 2013; 7:61. [PMID: 23866986 PMCID: PMC3720179 DOI: 10.1186/1752-0509-7-61] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 07/12/2013] [Indexed: 11/10/2022]
Abstract
BACKGROUND Construction of a reliable network remains the bottleneck for network-based protein function prediction. We built an artificial network model called protein overlap network (PON) for the entire genome of yeast, fly, worm, and human, respectively. Each node of the network represents a protein, and two proteins are connected if they share a domain according to InterPro database. RESULTS The function of a protein can be predicted by counting the occurrence frequency of GO (gene ontology) terms associated with domains of direct neighbors. The average success rate and coverage were 34.3% and 43.9%, respectively, for the test genomes, and were increased to 37.9% and 51.3% when a composite PON of the four species was used for the prediction. As a comparison, the success rate was 7.0% in the random control procedure. We also made predictions with GO term annotations of the second layer nodes using the composite network and obtained an impressive success rate (>30%) and coverage (>30%), even for small genomes. Further improvement was achieved by statistical analysis of manually annotated GO terms for each neighboring protein. CONCLUSIONS The PONs are composed of dense modules accompanied by a few long distance connections. Based on the PONs, we developed multiple approaches effective for protein function prediction.
Collapse
|
13
|
Muley VY, Ranjan A. Evaluation of physical and functional protein-protein interaction prediction methods for detecting biological pathways. PLoS One 2013; 8:e54325. [PMID: 23349851 PMCID: PMC3547882 DOI: 10.1371/journal.pone.0054325] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Accepted: 12/11/2012] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Cellular activities are governed by the physical and the functional interactions among several proteins involved in various biological pathways. With the availability of sequenced genomes and high-throughput experimental data one can identify genome-wide protein-protein interactions using various computational techniques. Comparative assessments of these techniques in predicting protein interactions have been frequently reported in the literature but not their ability to elucidate a particular biological pathway. METHODS Towards the goal of understanding the prediction capabilities of interactions among the specific biological pathway proteins, we report the analyses of 14 biological pathways of Escherichia coli catalogued in KEGG database using five protein-protein functional linkage prediction methods. These methods are phylogenetic profiling, gene neighborhood, co-presence of orthologous genes in the same gene clusters, a mirrortree variant, and expression similarity. CONCLUSIONS Our results reveal that the prediction of metabolic pathway protein interactions continues to be a challenging task for all methods which possibly reflect flexible/independent evolutionary histories of these proteins. These methods have predicted functional associations of proteins involved in amino acids, nucleotide, glycans and vitamins & co-factors pathways slightly better than the random performance on carbohydrate, lipid and energy metabolism. We also make similar observations for interactions involved among the environmental information processing proteins. On the contrary, genetic information processing or specialized processes such as motility related protein-protein linkages that occur in the subset of organisms are predicted with comparable accuracy. Metabolic pathways are best predicted by using neighborhood of orthologous genes whereas phyletic pattern is good enough to reconstruct central dogma pathway protein interactions. We have also shown that the effective use of a particular prediction method depends on the pathway under investigation. In case one is not focused on specific pathway, gene expression similarity method is the best option.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India
| | - Akash Ranjan
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India
- * E-mail:
| |
Collapse
|
14
|
Muley VY, Ranjan A. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction. PLoS One 2012; 7:e42057. [PMID: 22844541 PMCID: PMC3406042 DOI: 10.1371/journal.pone.0042057] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2011] [Accepted: 07/02/2012] [Indexed: 12/20/2022] Open
Abstract
Background Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. Methods We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Conclusions Higher performance for predicting protein-protein interactions was achievable even with 100–150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50–100 genomes for comparable accuracy of predictions when computational resources are limited.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, Andhra Pradesh, India
- Department of Biotechnology, Dr. Babasaheb Ambedkar Marathwada University, Sub-centre, Osmanabad, Maharashtra, India
| | - Akash Ranjan
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, Andhra Pradesh, India
- * E-mail:
| |
Collapse
|
15
|
Janga SC, Díaz-Mejía JJ, Moreno-Hagelsieb G. Network-based function prediction and interactomics: the case for metabolic enzymes. Metab Eng 2010; 13:1-10. [PMID: 20654726 DOI: 10.1016/j.ymben.2010.07.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2010] [Revised: 07/15/2010] [Accepted: 07/16/2010] [Indexed: 12/19/2022]
Abstract
As sequencing technologies increase in power, determining the functions of unknown proteins encoded by the DNA sequences so produced becomes a major challenge. Functional annotation is commonly done on the basis of amino-acid sequence similarity alone. Long after sequence similarity becomes undetectable by pair-wise comparison, profile-based identification of homologs can often succeed due to the conservation of position-specific patterns, important for a protein's three dimensional folding and function. Nevertheless, prediction of protein function from homology-driven approaches is not without problems. Homologous proteins might evolve different functions and the power of homology detection has already started to reach its maximum. Computational methods for inferring protein function, which exploit the context of a protein in cellular networks, have come to be built on top of homology-based approaches. These network-based functional inference techniques provide both a first hand hint into a proteins' functional role and offer complementary insights to traditional methods for understanding the function of uncharacterized proteins. Most recent network-based approaches aim to integrate diverse kinds of functional interactions to boost both coverage and confidence level. These techniques not only promise to solve the moonlighting aspect of proteins by annotating proteins with multiple functions, but also increase our understanding on the interplay between different functional classes in a cell. In this article we review the state of the art in network-based function prediction and describe some of the underlying difficulties and successes. Given the volume of high-throughput data that is being reported the time is ripe to employ these network-based approaches, which can be used to unravel the functions of the uncharacterized proteins accumulating in the genomic databases.
Collapse
Affiliation(s)
- S C Janga
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB20QH, United Kingdom.
| | | | | |
Collapse
|
16
|
Vey G, Moreno-Hagelsieb G. Beyond the bounds of orthology: functional inference from metagenomic context. MOLECULAR BIOSYSTEMS 2010; 6:1247-54. [PMID: 20419183 DOI: 10.1039/b919263h] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The effectiveness of the computational inference of function by genomic context is bounded by the diversity of known microbial genomes. Although metagenomes offer access to previously inaccessible organisms, their fragmentary nature prevents the conventional establishment of orthologous relationships required for reliably predicting functional interactions. We introduce a protocol for the prediction of functional interactions using data sources without information about orthologous relationships. To illustrate this process, we use the Sargasso Sea metagenome to construct a functional interaction network for the Escherichia coli K12 genome. We identify two reliability metrics, target intergenic distance and source interaction count, and apply them to selectively filter the predictions retained to construct the network of functional interactions. The resulting network contains 2297 nodes with 10 072 edges with a positive predictive value of 0.80. The metagenome yielded 8423 functional interactions beyond those found using only the genomic orthologs as a data source. This amounted to a 134% increase in the total number of functional interactions that are predicted by combining the metagenome and the genomic orthologs versus the genomic orthologs alone. In the absence of detectable orthologous relationships it remains feasible to derive a reliable set of predicted functional interactions. This offers a strategy for harnessing other metagenomes and homologs in general. Because metagenomes allow access to previously unreachable microorganisms, this will result in expanding the universe of known functional interactions thus furthering our understanding of functional organization.
Collapse
Affiliation(s)
- Gregory Vey
- Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo ON, Canada.
| | | |
Collapse
|
17
|
Are essential genes really essential? Trends Microbiol 2009; 17:433-8. [DOI: 10.1016/j.tim.2009.08.005] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2009] [Revised: 08/03/2009] [Accepted: 08/11/2009] [Indexed: 11/18/2022]
|
18
|
Babu M, Musso G, Díaz-Mejía JJ, Butland G, Greenblatt JF, Emili A. Systems-level approaches for identifying and analyzing genetic interaction networks in Escherichia coli and extensions to other prokaryotes. MOLECULAR BIOSYSTEMS 2009; 5:1439-55. [PMID: 19763343 DOI: 10.1039/b907407d] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Molecular interactions define the functional organization of the cell. Epistatic (genetic, or gene-gene) interactions, one of the most informative and commonly encountered forms of functional relationships, are increasingly being used to map process architecture in model eukaryotic organisms. In particular, 'systems-level' screens in yeast and worm aimed at elucidating genetic interaction networks have led to the generation of models describing the global modular organization of gene products and protein complexes within a cell. However, comparable data for prokaryotic organisms have not been available. Given its ease of growth and genetic manipulation, the Gram-negative bacterium Escherichia coli appears to be an ideal model system for performing comprehensive genome-scale examinations of genetic redundancy in bacteria. In this review, we highlight emerging experimental and computational techniques that have been developed recently to examine functional relationships and redundancy in E. coli at a systems-level, and their potential application to prokaryotes in general. Additionally, we have scanned PubMed abstracts and full-text published articles to manually curate a list of approximately 200 previously reported synthetic sick or lethal genetic interactions in E. coli derived from small-scale experimental studies.
Collapse
Affiliation(s)
- Mohan Babu
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1
| | | | | | | | | | | |
Collapse
|
19
|
Hu P, Janga SC, Babu M, Díaz-Mejía JJ, Butland G, Yang W, Pogoutse O, Guo X, Phanse S, Wong P, Chandran S, Christopoulos C, Nazarians-Armavil A, Nasseri NK, Musso G, Ali M, Nazemof N, Eroukova V, Golshani A, Paccanaro A, Greenblatt JF, Moreno-Hagelsieb G, Emili A. Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol 2009; 7:e96. [PMID: 19402753 PMCID: PMC2672614 DOI: 10.1371/journal.pbio.1000096] [Citation(s) in RCA: 268] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 03/16/2009] [Indexed: 12/28/2022] Open
Abstract
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a "systems-wide" functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
Collapse
Affiliation(s)
- Pingzhao Hu
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Sarath Chandra Janga
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Medical Research Council Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Mohan Babu
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - J. Javier Díaz-Mejía
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Biology, Wilfrid Laurier University, Waterloo, Ontario, Canada
| | - Gareth Butland
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Wenhong Yang
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Oxana Pogoutse
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Xinghua Guo
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Sadhna Phanse
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Peter Wong
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Shamanta Chandran
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Constantine Christopoulos
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Anaies Nazarians-Armavil
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Negin Karimi Nasseri
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Gabriel Musso
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Mehrab Ali
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Nazila Nazemof
- Department of Biology and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Canada
| | - Veronika Eroukova
- Department of Biology and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Canada
| | - Ashkan Golshani
- Department of Biology and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Canada
| | - Alberto Paccanaro
- Department of Computer Science, Royal Holloway, University of London, Egham, United Kingdom
| | - Jack F Greenblatt
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Gabriel Moreno-Hagelsieb
- Department of Biology, Wilfrid Laurier University, Waterloo, Ontario, Canada
- * To whom correspondence should be addressed. E-mail: (GM-H); (AE)
| | - Andrew Emili
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- * To whom correspondence should be addressed. E-mail: (GM-H); (AE)
| |
Collapse
|
20
|
D'Elia MA, Millar KE, Bhavsar AP, Tomljenovic AM, Hutter B, Schaab C, Moreno-Hagelsieb G, Brown ED. Probing Teichoic Acid Genetics with Bioactive Molecules Reveals New Interactions among Diverse Processes in Bacterial Cell Wall Biogenesis. ACTA ACUST UNITED AC 2009; 16:548-56. [DOI: 10.1016/j.chembiol.2009.04.009] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2008] [Revised: 04/03/2009] [Accepted: 04/06/2009] [Indexed: 10/20/2022]
|
21
|
Baumbach J, Tauch A, Rahmann S. Towards the integrated analysis, visualization and reconstruction of microbial gene regulatory networks. Brief Bioinform 2008; 10:75-83. [PMID: 19074493 DOI: 10.1093/bib/bbn055] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To handle changing environmental surroundings and to manage unfavorable conditions, microbial organisms have evolved complex transcriptional regulatory networks. To comprehensively analyze these gene regulatory networks, several online available databases and analysis platforms have been implemented and established. In this article, we address the typical cycle of scientific knowledge exploration and integration in the area of procaryotic transcriptional gene regulation. We briefly review five popular, publicly available systems that support (i) the integration of existing knowledge, (ii) visualization capabilities and (iii) computer analysis to predict promising wet lab targets. We exemplify the benefits of such integrated data analysis platforms by means of four application cases exemplarily performed with the corynebacterial reference database CoryneRegNet.
Collapse
Affiliation(s)
- Jan Baumbach
- International Computer Science Institute, Berkeley, USA.
| | | | | |
Collapse
|
22
|
Díaz-Mejía JJ, Babu M, Emili A. Computational and experimental approaches to chart the Escherichia coli cell-envelope-associated proteome and interactome. FEMS Microbiol Rev 2008; 33:66-97. [PMID: 19054114 PMCID: PMC2704936 DOI: 10.1111/j.1574-6976.2008.00141.x] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The bacterial cell-envelope consists of a complex arrangement of lipids, proteins and carbohydrates that serves as the interface between a microorganism and its environment or, with pathogens, a human host. Escherichia coli has long been investigated as a leading model system to elucidate the fundamental mechanisms underlying microbial cell-envelope biology. This includes extensive descriptions of the molecular identities, biochemical activities and evolutionary trajectories of integral transmembrane proteins, many of which play critical roles in infectious disease and antibiotic resistance. Strikingly, however, only half of the c. 1200 putative cell-envelope-related proteins of E. coli currently have experimentally attributed functions, indicating an opportunity for discovery. In this review, we summarize the state of the art of computational and proteomic approaches for determining the components of the E. coli cell-envelope proteome, as well as exploring the physical and functional interactions that underlie its biogenesis and functionality. We also provide a comprehensive comparative benchmarking analysis on the performance of different bioinformatic and proteomic methods commonly used to determine the subcellular localization of bacterial proteins.
Collapse
Affiliation(s)
- Juan Javier Díaz-Mejía
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | | | | |
Collapse
|
23
|
|
24
|
Yellaboina S, Dudekula DB, Ko MS. Prediction of evolutionarily conserved interologs in Mus musculus. BMC Genomics 2008; 9:465. [PMID: 18842131 PMCID: PMC2571111 DOI: 10.1186/1471-2164-9-465] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2008] [Accepted: 10/08/2008] [Indexed: 12/03/2022] Open
Abstract
Background Identification of protein-protein interactions is an important first step to understand living systems. High-throughput experimental approaches have accumulated large amount of information on protein-protein interactions in human and other model organisms. Such interaction information has been successfully transferred to other species, in which the experimental data are limited. However, the annotation transfer method could yield false positive interologs due to the lack of conservation of interactions when applied to phylogenetically distant organisms. Results To address this issue, we used phylogenetic profile method to filter false positives in interologs based on the notion that evolutionary conserved interactions show similar patterns of occurrence along the genomes. The approach was applied to Mus musculus, in which the experimentally identified interactions are limited. We first inferred the protein-protein interactions in Mus musculus by using two approaches: i) identifying mouse orthologs of interacting proteins (interologs) based on the experimental protein-protein interaction data from other organisms; and ii) analyzing frequency of mouse ortholog co-occurrence in predicted operons of bacteria. We then filtered possible false-positives in the predicted interactions using the phylogenetic profiles. We found that this filtering method significantly increased the frequency of interacting protein-pairs coexpressed in the same cells/tissues in gene expression omnibus (GEO) database as well as the frequency of interacting protein-pairs shared the similar Gene Ontology (GO) terms for biological processes and cellular localizations. The data supports the notion that phylogenetic profile helps to reduce the number of false positives in interologs. Conclusion We have developed protein-protein interaction database in mouse, which contains 41109 interologs. We have also developed a web interface to facilitate the use of database .
Collapse
Affiliation(s)
- Sailu Yellaboina
- Developmental Genomics and Aging Section, Laboratory of Genetics, National Institute on Aging, National Institutes of Health, Baltimore, MD 21224, USA.
| | | | | |
Collapse
|
25
|
Karimpour-Fard A, Leach SM, Gill RT, Hunter LE. Predicting protein linkages in bacteria: which method is best depends on task. BMC Bioinformatics 2008; 9:397. [PMID: 18816389 PMCID: PMC2570368 DOI: 10.1186/1471-2105-9-397] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2008] [Accepted: 09/24/2008] [Indexed: 01/06/2023] Open
Abstract
Background Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations. Results Using Escherichia coli K12 and Bacillus subtilis, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in E. coli K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in E. coli K12 and 88% (333/418)in B. subtilis. Comparing two versions of the E. coli K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction. Conclusion A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.
Collapse
Affiliation(s)
- Anis Karimpour-Fard
- Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, Colorado 80045, USA.
| | | | | | | |
Collapse
|
26
|
Martinez-Guerrero CE, Ciria R, Abreu-Goodger C, Moreno-Hagelsieb G, Merino E. GeConT 2: gene context analysis for orthologous proteins, conserved domains and metabolic pathways. Nucleic Acids Res 2008; 36:W176-80. [PMID: 18511460 PMCID: PMC2447741 DOI: 10.1093/nar/gkn330] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Gene Context Tool (GeConT) allows users to visualize the genomic context of a gene or a group of genes and their orthologous relationships within fully sequenced bacterial genomes. The new version of the server incorporates information from the COG, Pfam and KEGG databases, allowing users to have an integrated graphical representation of the function of genes at multiple levels, their phylogenetic distribution and their genomic context. The sequence of any of the genes can be easily retrieved, as well as the 5′ or 3′ regulatory regions, greatly facilitating further types of analysis. GeConT 2 is available at: http://bioinfo.ibt.unam.mx/gecont.
Collapse
Affiliation(s)
- C E Martinez-Guerrero
- Departmento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos 62210, México
| | | | | | | | | |
Collapse
|
27
|
González Pérez AD, González González E, Espinosa Angarica V, Vasconcelos ATR, Collado-Vides J. Impact of Transcription Units rearrangement on the evolution of the regulatory network of gamma-proteobacteria. BMC Genomics 2008; 9:128. [PMID: 18366643 PMCID: PMC2329645 DOI: 10.1186/1471-2164-9-128] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2007] [Accepted: 03/17/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the past years, several studies begun to unravel the structure, dynamical properties, and evolution of transcriptional regulatory networks. However, even those comparative studies that focus on a group of closely related organisms are limited by the rather scarce knowledge on regulatory interactions outside a few model organisms, such as E. coli among the prokaryotes. RESULTS In this paper we used the information annotated in Tractor_DB (a database of regulatory networks in gamma-proteobacteria) to calculate a normalized Site Orthology Score (SOS) that quantifies the conservation of a regulatory link across thirty genomes of this subclass. Then we used this SOS to assess how regulatory connections have evolved in this group, and how the variation of basic regulatory connection is reflected on the structure of the chromosome. We found that individual regulatory interactions shift between different organisms, a process that may be described as rewiring the network. At this evolutionary scale (the gamma-proteobacteria subclass) this rewiring process may be an important source of variation of regulatory incoming interactions for individual networks. We also noticed that the regulatory links that form feed forward motifs are conserved in a better correlated manner than triads of random regulatory interactions or pairs of co-regulated genes. Furthermore, the rewiring process that takes place at the most basic level of the regulatory network may be linked to rearrangements of genetic material within bacterial chromosomes, which change the structure of Transcription Units and therefore the regulatory connections between Transcription Factors and structural genes. CONCLUSION The rearrangements that occur in bacterial chromosomes-mostly inversion or horizontal gene transfer events - are important sources of variation of gene regulation at this evolutionary scale.
Collapse
Affiliation(s)
- Abel D González Pérez
- Centro Nacional de Bioinformática. Industria y San José, Capitolio Nacional, CP. 10200, Habana Vieja, Ciudad de la Habana, Cuba.
| | | | | | | | | |
Collapse
|
28
|
Wu H, Mao F, Olman V, Xu Y. On application of directons to functional classification of genes in prokaryotes. Comput Biol Chem 2008; 32:176-84. [PMID: 18440870 DOI: 10.1016/j.compbiolchem.2008.02.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2007] [Accepted: 02/15/2008] [Indexed: 11/30/2022]
Abstract
Functional classification of genes represents one of the most basic problems in genome analysis and annotation. Our analysis of some of the popular methods for functional classification of genes shows that these methods are not always consistent with each other and may not be specific enough for high-resolution gene functional annotations. We have developed a method to integrate genomic neighborhood information of genes with their sequence similarity information for the functional classification of prokaryotic genes. The application of our method to 93 proteobacterial genomes has shown that (i) the genomic neighborhoods are much more conserved across prokaryotic genomes than expected by chance, and such conservation can be utilized to improve functional classification of genes; (ii) while our method is consistent with the existing popular schemes as much as they are among themselves, it does provide functional classification at higher resolution and hence allows functional assignments of (new) genes at a more specific level; and (iii) our method is fairly stable when being applied to different genomes.
Collapse
Affiliation(s)
- Hongwei Wu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Savannah, GA 31407, USA
| | | | | | | |
Collapse
|
29
|
Yellaboina S, Goyal K, Mande SC. Inferring genome-wide functional linkages in E. coli by combining improved genome context methods: comparison with high-throughput experimental data. Genome Res 2007; 17:527-35. [PMID: 17339371 PMCID: PMC1832100 DOI: 10.1101/gr.5900607] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Cellular functions are determined by interactions among proteins in the cells. Recognition of these interactions forms an important step in understanding biology at the systems level. Here, we report an interaction network of Escherichia coli, obtained by training a Support Vector Machine on the high quality of interactions in the EcoCyc database, and with the assumption that the periplasmic and cytoplasmic proteins may not interact with each other. The data features included correlation coefficient between bit score phylogenetic profiles, frequency of their co-occurrence in predicted operons, and a new measure--the distance between translational start sites of the genes. The combined genome context methods show a high accuracy of prediction on the test data and predict a total of 78,122 binary interactions. The majority of the interactions identified by high-throughput experimental methods correspond to indirect interaction (interactions through neighbors) in the predicted network. Correlation of the predicted network with the gene essentiality data shows that the essential genes in E. coli exhibit a high linking number, whereas the nonessential genes exhibit a low linking number. Furthermore, our predicted protein-protein interaction network shows that the proteins involved in replication, DNA repair, transcription, translation, and cell wall synthesis are highly connected. We therefore believe that our predicted network will serve as a useful resource in understanding prokaryotic biology.
Collapse
Affiliation(s)
- Sailu Yellaboina
- Centre for DNA Fingerprinting and Diagnostics, Hyderabad 500076, India
| | - Kshama Goyal
- Centre for DNA Fingerprinting and Diagnostics, Hyderabad 500076, India
| | - Shekhar C. Mande
- Centre for DNA Fingerprinting and Diagnostics, Hyderabad 500076, India
- Corresponding author.E-mail ; fax 91-40-27155610
| |
Collapse
|
30
|
Suen G, Arshinoff BI, Taylor RG, Welch RD. Practical Applications of Bacterial Functional Genomics. Biotechnol Genet Eng Rev 2007; 24:213-42. [DOI: 10.1080/02648725.2007.10648101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
31
|
De Keersmaecker SCJ, Thijs IMV, Vanderleyden J, Marchal K. Integration of omics data: how well does it work for bacteria? Mol Microbiol 2006; 62:1239-50. [PMID: 17040488 DOI: 10.1111/j.1365-2958.2006.05453.x] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In the current omics era, innovative high-throughput technologies allow measuring temporal and conditional changes at various cellular levels. Although individual analysis of each of these omics data undoubtedly results into interesting findings, it is only by integrating them that gaining a global insight into cellular behaviour can be aimed at. A systems approach thus is predicated on data integration. However, because of the complexity of biological systems and the specificities of the data-generating technologies (noisiness, heterogeneity, etc.), integrating omics data in an attempt to reconstruct signalling networks is not trivial. Developing its methodologies constitutes a major research challenge. Besides for their intrinsic value towards health care, environment and industry, prokaryotes are ideal model systems to further develop these methods because of their lower regulatory complexity compared with eukaryotes, and the ease with which they can be manipulated. Several successful examples outlined in this review already show the potential of the systems approach for both fundamental and industrial applications, which would be time-consuming or impossible to develop solely through traditional reductionist approaches.
Collapse
Affiliation(s)
- Sigrid C J De Keersmaecker
- Centre of Microbial and Plant Genetics (CMPG) Katholieke Universiteit Leuven, Kasteelpark Arenberg 20, Belgium
| | | | | | | |
Collapse
|
32
|
Binnewies TT, Motro Y, Hallin PF, Lund O, Dunn D, La T, Hampson DJ, Bellgard M, Wassenaar TM, Ussery DW. Ten years of bacterial genome sequencing: comparative-genomics-based discoveries. Funct Integr Genomics 2006; 6:165-85. [PMID: 16773396 DOI: 10.1007/s10142-006-0027-2] [Citation(s) in RCA: 102] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2006] [Revised: 02/24/2006] [Accepted: 03/07/2006] [Indexed: 10/24/2022]
Abstract
It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this information.
Collapse
Affiliation(s)
- Tim T Binnewies
- Center for Biological Sequence Analysis, Technical University of Denmark, 2800, Lyngby, Denmark
| | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Abstract
We present a study on computational identification of uber-operons in a prokaryotic genome, each of which represents a group of operons that are evolutionarily or functionally associated through operons in other (reference) genomes. Uber-operons represent a rich set of footprints of operon evolution, whose full utilization could lead to new and more powerful tools for elucidation of biological pathways and networks than what operons have provided, and a better understanding of prokaryotic genome structures and evolution. Our prediction algorithm predicts uber-operons through identifying groups of functionally or transcriptionally related operons, whose gene sets are conserved across the target and multiple reference genomes. Using this algorithm, we have predicted uber-operons for each of a group of 91 genomes, using the other 90 genomes as references. In particular, we predicted 158 uber-operons in Escherichia coli K12 covering 1830 genes, and found that many of the uber-operons correspond to parts of known regulons or biological pathways or are involved in highly related biological processes based on their Gene Ontology (GO) assignments. For some of the predicted uber-operons that are not parts of known regulons or pathways, our analyses indicate that their genes are highly likely to work together in the same biological processes, suggesting the possibility of new regulons and pathways. We believe that our uber-operon prediction provides a highly useful capability and a rich information source for elucidation of complex biological processes, such as pathways in microbes. All the prediction results are available at our Uber-Operon Database: , the first of its kind.
Collapse
Affiliation(s)
- Dongsheng Che
- Department of Computer Science, University of GeorgiaUSA
| | - Guojun Li
- Department of Biochemistry and Molecular Biology, University of GeorgiaUSA
- School of Mathematics and System Sciences, Shandong UniversityChina
| | - Fenglou Mao
- Department of Biochemistry and Molecular Biology, University of GeorgiaUSA
| | - Hongwei Wu
- Department of Biochemistry and Molecular Biology, University of GeorgiaUSA
| | - Ying Xu
- Department of Biochemistry and Molecular Biology, University of GeorgiaUSA
- Department of Computer Science, University of GeorgiaUSA
- To whom correspondence should be addressed. Tel: 1 706 542 9779; Fax: 1 706 542 9751; Ying Xu
| |
Collapse
|
34
|
Salgado H, Gama-Castro S, Peralta-Gil M, Díaz-Peredo E, Sánchez-Solano F, Santos-Zavaleta A, Martínez-Flores I, Jiménez-Jacinto V, Bonavides-Martínez C, Segura-Salazar J, Martínez-Antonio A, Collado-Vides J. RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res 2006; 34:D394-7. [PMID: 16381895 PMCID: PMC1347518 DOI: 10.1093/nar/gkj156] [Citation(s) in RCA: 276] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
RegulonDB is the internationally recognized reference database of Escherichia coli K-12 offering curated knowledge of the regulatory network and operon organization. It is currently the largest electronically-encoded database of the regulatory network of any free-living organism. We present here the recently launched RegulonDB version 5.0 radically different in content, interface design and capabilities. Continuous curation of original scientific literature provides the evidence behind every single object and feature. This knowledge is complemented with comprehensive computational predictions across the complete genome. Literature-based and predicted data are clearly distinguished in the database. Starting with this version, RegulonDB public releases are synchronized with those of EcoCyc since our curation supports both databases. The complex biology of regulation is simplified in a navigation scheme based on three major streams: genes, operons and regulons. Regulatory knowledge is directly available in every navigation step. Displays combine graphic and textual information and are organized allowing different levels of detail and biological context. This knowledge is the backbone of an integrated system for the graphic display of the network, graphic and tabular microarray comparisons with curated and predicted objects, as well as predictions across bacterial genomes, and predicted networks of functionally related gene products. Access RegulonDB at .
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Julio Collado-Vides
- To whom correspondence should be addressed. Tel +527 77 313 9877; Fax +527 77 317 5581;
| |
Collapse
|
35
|
González V, Santamaría RI, Bustos P, Hernández-González I, Medrano-Soto A, Moreno-Hagelsieb G, Janga SC, Ramírez MA, Jiménez-Jacinto V, Collado-Vides J, Dávila G. The partitioned Rhizobium etli genome: genetic and metabolic redundancy in seven interacting replicons. Proc Natl Acad Sci U S A 2006; 103:3834-9. [PMID: 16505379 PMCID: PMC1383491 DOI: 10.1073/pnas.0508502103] [Citation(s) in RCA: 253] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We report the complete 6,530,228-bp genome sequence of the symbiotic nitrogen fixing bacterium Rhizobium etli. Six large plasmids comprise one-third of the total genome size. The chromosome encodes most functions necessary for cell growth, whereas few essential genes or complete metabolic pathways are located in plasmids. Chromosomal synteny is disrupted by genes related to insertion sequences, phages, plasmids, and cell-surface components. Plasmids do not show synteny, and their orthologs are mostly shared by accessory replicons of species with multipartite genomes. Some nodulation genes are predicted to be functionally related with chromosomal loci encoding for the external envelope of the bacterium. Several pieces of evidence suggest an exogenous origin for the symbiotic plasmid (p42d) and p42a. Additional putative horizontal gene transfer events might have contributed to expand the adaptive repertoire of R. etli, because they include genes involved in small molecule metabolism, transport, and transcriptional regulation. Twenty-three putative sigma factors, numerous isozymes, and paralogous families attest to the metabolic redundancy and the genomic plasticity necessary to sustain the lifestyle of R. etli in symbiosis and in the soil.
Collapse
Affiliation(s)
- Víctor González
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
- *To whom correspondence may be addressed. E-mail:
or
| | - Rosa I. Santamaría
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
| | - Patricia Bustos
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
| | - Ismael Hernández-González
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
| | - Arturo Medrano-Soto
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
| | - Gabriel Moreno-Hagelsieb
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
| | - Sarath Chandra Janga
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
| | - Miguel A. Ramírez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
| | - Verónica Jiménez-Jacinto
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
| | - Julio Collado-Vides
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
| | - Guillermo Dávila
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, AP565-A Cuernavaca, Morelos, 62210, México
- *To whom correspondence may be addressed. E-mail:
or
| |
Collapse
|
36
|
Vallenet D, Labarre L, Rouy Z, Barbe V, Bocs S, Cruveiller S, Lajus A, Pascal G, Scarpelli C, Médigue C. MaGe: a microbial genome annotation system supported by synteny results. Nucleic Acids Res 2006; 34:53-65. [PMID: 16407324 PMCID: PMC1326237 DOI: 10.1093/nar/gkj406] [Citation(s) in RCA: 323] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Magnifying Genomes (MaGe) is a microbial genome annotation system based on a relational database containing information on bacterial genomes, as well as a web interface to achieve genome annotation projects. Our system allows one to initiate the annotation of a genome at the early stage of the finishing phase. MaGe's main features are (i) integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, (ii) integration of results obtained with a wide range of bioinformatics methods, among which exploration of gene context by searching for conserved synteny and reconstruction of metabolic pathways, (iii) an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions. MaGe is also linked to numerous well-known biological databases and systems. Our system has been thoroughly tested during the annotation of complete bacterial genomes (Acinetobacter baylyi ADP1, Pseudoalteromonas haloplanktis, Frankia alni) and is currently used in the context of several new microbial genome annotation projects. In addition, MaGe allows for annotation curation and exploration of already published genomes from various genera (e.g. Yersinia, Bacillus and Neisseria). MaGe can be accessed at .
Collapse
Affiliation(s)
- David Vallenet
- Atelier de Génomique Comparative, CNRS-UMR8030, 2 rue Gaston Crémieux, 91057 Evry, Cedex, France.
| | | | | | | | | | | | | | | | | | | |
Collapse
|