1
|
Chung HC, Friedberg I, Bromberg Y. Assembling bacterial puzzles: piecing together functions into microbial pathways. NAR Genom Bioinform 2024; 6:lqae109. [PMID: 39184378 PMCID: PMC11344244 DOI: 10.1093/nargab/lqae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 07/24/2024] [Accepted: 08/07/2024] [Indexed: 08/27/2024] Open
Abstract
Functional metagenomics enables the study of unexplored bacterial diversity, gene families, and pathways essential to microbial communities. However, discovering biological insights with these data is impeded by the scarcity of quality annotations. Here, we use a co-occurrence-based analysis of predicted microbial protein functions to uncover pathways in genomic and metagenomic biological systems. Our approach, based on phylogenetic profiles, improves the identification of functional relationships, or participation in the same biochemical pathway, between enzymes over a comparable homology-based approach. We optimized the design of our profiles to identify potential pathways using minimal data, clustered functionally related enzyme pairs into multi-enzymatic pathways, and evaluated our predictions against reference pathways in the KEGG database. We then demonstrated a novel extension of this approach to predict inter-bacterial protein interactions amongst members of a marine microbiome. Most significantly, we show our method predicts emergent biochemical pathways between known and unknown functions. Thus, our work establishes a basis for identifying the potential functional capacities of the entire metagenome, capturing previously unknown and abstract functions into discrete putative pathways.
Collapse
Affiliation(s)
- Henri C Chung
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011 , USA
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - Yana Bromberg
- Department of Computer Science, Emory University, Atlanta, GA 30307, USA
- Department of Biology, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
2
|
Tan MF, Zou G, Wei Y, Liu WQ, Li HQ, Hu Q, Zhang LS, Zhou R. Protein-protein interaction network and potential drug target candidates of Streptococcus suis. J Appl Microbiol 2021; 131:658-670. [PMID: 33249680 DOI: 10.1111/jam.14950] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 11/15/2020] [Accepted: 11/25/2020] [Indexed: 02/06/2023]
Abstract
AIMS This study aimed to explore potential drug targets of Streptococcus suis at the system level. METHODS AND RESULTS A homologous protein mapping method was used in the construction of a protein-protein interaction (PPI) network of S. suis, which presented 1147 non-redundant interaction pairs among 286 proteins. The parameters of PPI networks were calculated and showed scale-free network properties. In all, 41 possibly essential proteins identified from 47 highly connected proteins were selected as potential drug target candidates. Of these proteins, 30 were already regarded as drug targets in other bacterial species. Six transporters with high connections to other functional proteins were identified as probably not essential but important functional proteins. Afterward, the subnetwork centred with cell division protein FtsZ was used in confirming the PPI network through bacterial two-hybrid analysis. CONCLUSIONS The predicted PPI network covers 13·04% of the proteome in S. suis. The selected 41 potential drug target candidates are conserved between S. suis and several model bacteria. SIGNIFICANCE AND IMPACT OF THE STUDY The predictions included proteins known to be drug targets, and a verifying experiment confirmed the reliability of predicted interactions. This work is the first to present systematic computational PPI data for S. suis and provides potential drug targets, which are valuable in exploring novel anti-streptococcus drugs.
Collapse
Affiliation(s)
- M-F Tan
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University College of Veterinary Medicine, Wuhan, China.,Institute of Animal Husbandry and Veterinary Medicine, Jiangxi Academy of Agricultural Sciences, Nanchang, China
| | - G Zou
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University College of Veterinary Medicine, Wuhan, China
| | - Y Wei
- Institute of Animal Husbandry and Veterinary Medicine, Jiangxi Academy of Agricultural Sciences, Nanchang, China
| | - W-Q Liu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University College of Veterinary Medicine, Wuhan, China
| | - H-Q Li
- Institute of Animal Husbandry and Veterinary Medicine, Jiangxi Academy of Agricultural Sciences, Nanchang, China
| | - Q Hu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University College of Veterinary Medicine, Wuhan, China
| | - L-S Zhang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University College of Veterinary Medicine, Wuhan, China
| | - R Zhou
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University College of Veterinary Medicine, Wuhan, China.,International Research Center for Animal Disease (Ministry of Science & Technology of China), Wuhan, China.,Cooperative Innovation Center of Sustainable Pig Production, Wuhan, China
| |
Collapse
|
3
|
Adesioye FA, Makhalanyane TP, Biely P, Cowan DA. Phylogeny, classification and metagenomic bioprospecting of microbial acetyl xylan esterases. Enzyme Microb Technol 2016; 93-94:79-91. [DOI: 10.1016/j.enzmictec.2016.07.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Revised: 06/18/2016] [Accepted: 07/01/2016] [Indexed: 02/06/2023]
|
4
|
Lobb B, Doxey AC. Novel function discovery through sequence and structural data mining. Curr Opin Struct Biol 2016; 38:53-61. [DOI: 10.1016/j.sbi.2016.05.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 01/30/2023]
|
5
|
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones D, Kim PM, Kriwacki R, Oldfield CJ, Pappu RV, Tompa P, Uversky VN, Wright P, Babu MM. Classification of intrinsically disordered regions and proteins. Chem Rev 2014; 114:6589-631. [PMID: 24773235 PMCID: PMC4095912 DOI: 10.1021/cr400525m] [Citation(s) in RCA: 1440] [Impact Index Per Article: 144.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Indexed: 12/11/2022]
Affiliation(s)
- Robin van der Lee
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
- Centre
for Molecular and Biomolecular Informatics, Radboud University Medical Centre, 6500 HB Nijmegen, The
Netherlands
| | - Marija Buljan
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Benjamin Lang
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Robert J. Weatheritt
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Gary W. Daughdrill
- Department
of Cell Biology, Microbiology, and Molecular Biology, University of South Florida, 3720 Spectrum Boulevard, Suite 321, Tampa, Florida 33612, United States
| | - A. Keith Dunker
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Monika Fuxreiter
- MTA-DE
Momentum Laboratory of Protein Dynamics, Department of Biochemistry
and Molecular Biology, University of Debrecen, H-4032 Debrecen, Nagyerdei krt 98, Hungary
| | - Julian Gough
- Department
of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, United Kingdom
| | - Joerg Gsponer
- Department
of Biochemistry and Molecular Biology, Centre for High-Throughput
Biology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - David
T. Jones
- Bioinformatics
Group, Department of Computer Science, University
College London, London, WC1E 6BT, United Kingdom
| | - Philip M. Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Department of Molecular
Genetics, and Department of Computer Science, University
of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Richard
W. Kriwacki
- Department
of Structural Biology, St. Jude Children’s
Research Hospital, Memphis, Tennessee 38105, United States
| | - Christopher J. Oldfield
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Rohit V. Pappu
- Department
of Biomedical Engineering and Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Peter Tompa
- VIB Department
of Structural Biology, Vrije Universiteit
Brussel, Brussels, Belgium
- Institute
of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Vladimir N. Uversky
- Department
of Molecular Medicine and USF Health Byrd Alzheimer’s Research
Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33612, United States
- Institute for Biological Instrumentation,
Russian Academy of Sciences, Pushchino,
Moscow Region, Russia
| | - Peter
E. Wright
- Department
of Integrative Structural and Computational Biology and Skaggs Institute
of Chemical Biology, The Scripps Research
Institute, 10550 North
Torrey Pines Road, La Jolla, California 92037, United States
| | - M. Madan Babu
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
6
|
Guilloux A, Caudron B, Jestin JL. A method to predict edge strands in beta-sheets from protein sequences. Comput Struct Biotechnol J 2013; 7:e201305001. [PMID: 24688737 PMCID: PMC3962219 DOI: 10.5936/csbj.201305001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Revised: 05/27/2013] [Accepted: 05/30/2013] [Indexed: 12/15/2022] Open
Abstract
There is a need for rules allowing three-dimensional structure information to be derived from protein sequences. In this work, consideration of an elementary protein folding step allows protein sub-sequences which optimize folding to be derived for any given protein sequence. Classical mechanics applied to this system and the energy conservation law during the elementary folding step yields an equation whose solutions are taken over the field of rational numbers. This formalism is applied to beta-sheets containing two edge strands and at least two central strands. The number of protein sub-sequences optimized for folding per amino acid in beta-strands is shown in particular to predict edge strands from protein sequences. Topological information on beta-strands and loops connecting them is derived for protein sequences with a prediction accuracy of 75%. The statistical significance of the finding is given. Applications in protein structure prediction are envisioned such as for the quality assessment of protein structure models.
Collapse
Affiliation(s)
- Antonin Guilloux
- Analyse algébrique, Institut de Mathématiques de Jussieu, Université Pierre et Marie Curie, Paris VI, France
| | - Bernard Caudron
- Centre d'Informatique pour la Biologie, Institut Pasteur, Paris, France
| | | |
Collapse
|
7
|
Proteome-wide protein interaction measurements of bacterial proteins of unknown function. Proc Natl Acad Sci U S A 2012; 110:477-82. [PMID: 23267104 DOI: 10.1073/pnas.1210634110] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Despite the enormous proliferation of bacterial genome data, surprisingly persistent collections of bacterial proteins have resisted functional annotation. In a typical genome, roughly 30% of genes have no assigned function. Many of these proteins are conserved across a large number of bacterial genomes. To assign a putative function to these conserved proteins of unknown function, we created a physical interaction map by measuring biophysical interaction of these proteins. Binary protein--protein interactions in the model organism Streptococcus pneumoniae (TIGR4) are measured with a microfluidic high-throughput assay technology. In some cases, informatic analysis was used to restrict the space of potential binding partners. In other cases, we performed in vitro proteome-wide interaction screens. We were able to assign putative functions to 50 conserved proteins of unknown function that we studied with this approach.
Collapse
|
8
|
Prakash T, Taylor TD. Functional assignment of metagenomic data: challenges and applications. Brief Bioinform 2012; 13:711-27. [PMID: 22772835 PMCID: PMC3504928 DOI: 10.1093/bib/bbs033] [Citation(s) in RCA: 101] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 05/26/2012] [Indexed: 12/14/2022] Open
Abstract
Metagenomic sequencing provides a unique opportunity to explore earth's limitless environments harboring scores of yet unknown and mostly unculturable microbes and other organisms. Functional analysis of the metagenomic data plays a central role in projects aiming to explore the most essential questions in microbiology, namely 'In a given environment, among the microbes present, what are they doing, and how are they doing it?' Toward this goal, several large-scale metagenomic projects have recently been conducted or are currently underway. Functional analysis of metagenomic data mainly suffers from the vast amount of data generated in these projects. The shear amount of data requires much computational time and storage space. These problems are compounded by other factors potentially affecting the functional analysis, including, sample preparation, sequencing method and average genome size of the metagenomic samples. In addition, the read-lengths generated during sequencing influence sequence assembly, gene prediction and subsequently the functional analysis. The level of confidence for functional predictions increases with increasing read-length. Usually, the most reliable functional annotations for metagenomic sequences are achieved using homology-based approaches against publicly available reference sequence databases. Here, we present an overview of the current state of functional analysis of metagenomic sequence data, bottlenecks frequently encountered and possible solutions in light of currently available resources and tools. Finally, we provide some examples of applications from recent metagenomic studies which have been successfully conducted in spite of the known difficulties.
Collapse
|
9
|
Kankainen M, Ojala T, Holm L. BLANNOTATOR: enhanced homology-based function prediction of bacterial proteins. BMC Bioinformatics 2012; 13:33. [PMID: 22335941 PMCID: PMC3386020 DOI: 10.1186/1471-2105-13-33] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Accepted: 02/15/2012] [Indexed: 11/10/2022] Open
Abstract
Background Automated function prediction has played a central role in determining the biological functions of bacterial proteins. Typically, protein function annotation relies on homology, and function is inferred from other proteins with similar sequences. This approach has become popular in bacterial genomics because it is one of the few methods that is practical for large datasets and because it does not require additional functional genomics experiments. However, the existing solutions produce erroneous predictions in many cases, especially when query sequences have low levels of identity with the annotated source protein. This problem has created a pressing need for improvements in homology-based annotation. Results We present an automated method for the functional annotation of bacterial protein sequences. Based on sequence similarity searches, BLANNOTATOR accurately annotates query sequences with one-line summary descriptions of protein function. It groups sequences identified by BLAST into subsets according to their annotation and bases its prediction on a set of sequences with consistent functional information. We show the results of BLANNOTATOR's performance in sets of bacterial proteins with known functions. We simulated the annotation process for 3090 SWISS-PROT proteins using a database in its state preceding the functional characterisation of the query protein. For this dataset, our method outperformed the five others that we tested, and the improved performance was maintained even in the absence of highly related sequence hits. We further demonstrate the value of our tool by analysing the putative proteome of Lactobacillus crispatus strain ST1. Conclusions BLANNOTATOR is an accurate method for bacterial protein function prediction. It is practical for genome-scale data and does not require pre-existing sequence clustering; thus, this method suits the needs of bacterial genome and metagenome researchers. The method and a web-server are available at http://ekhidna.biocenter.helsinki.fi/poxo/blannotator/.
Collapse
Affiliation(s)
- Matti Kankainen
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| | | | | |
Collapse
|
10
|
Brown SD, Babbitt PC. Inference of functional properties from large-scale analysis of enzyme superfamilies. J Biol Chem 2011; 287:35-42. [PMID: 22069325 DOI: 10.1074/jbc.r111.283408] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
As increasingly large amounts of data from genome and other sequencing projects become available, new approaches are needed to determine the functions of the proteins these genes encode. We show how large-scale computational analysis can help to address this challenge by linking functional information to sequence and structural similarities using protein similarity networks. Network analyses using three functionally diverse enzyme superfamilies illustrate the use of these approaches for facile updating and comparison of available structures for a large superfamily, for creation of functional hypotheses for metagenomic sequences, and to summarize the limits of our functional knowledge about even well studied superfamilies.
Collapse
Affiliation(s)
- Shoshana D Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, 94158-2330
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, 94158-2330; Pharmaceutical Chemistry, School of Pharmacy; California Institute for Quantitative Biosciences, University of California, San Francisco, California 94158-2330.
| |
Collapse
|
11
|
Tarrío R, Ayala FJ, Rodríguez-Trelles F. The Vein Patterning 1 (VEP1) gene family laterally spread through an ecological network. PLoS One 2011; 6:e22279. [PMID: 21818306 PMCID: PMC3144213 DOI: 10.1371/journal.pone.0022279] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2011] [Accepted: 06/18/2011] [Indexed: 11/23/2022] Open
Abstract
Lateral gene transfer (LGT) is a major evolutionary mechanism in prokaryotes. Knowledge about LGT— particularly, multicellular— eukaryotes has only recently started to accumulate. A widespread assumption sees the gene as the unit of LGT, largely because little is yet known about how LGT chances are affected by structural/functional features at the subgenic level. Here we trace the evolutionary trajectory of VEin Patterning 1, a novel gene family known to be essential for plant development and defense. At the subgenic level VEP1 encodes a dinucleotide-binding Rossmann-fold domain, in common with members of the short-chain dehydrogenase/reductase (SDR) protein family. We found: i) VEP1 likely originated in an aerobic, mesophilic and chemoorganotrophic α-proteobacterium, and was laterally propagated through nets of ecological interactions, including multiple LGTs between phylogenetically distant green plant/fungi-associated bacteria, and five independent LGTs to eukaryotes. Of these latest five transfers, three are ancient LGTs, implicating an ancestral fungus, the last common ancestor of land plants and an ancestral trebouxiophyte green alga, and two are recent LGTs to modern embryophytes. ii) VEP1's rampant LGT behavior was enabled by the robustness and broad utility of the dinucleotide-binding Rossmann-fold, which provided a platform for the evolution of two unprecedented departures from the canonical SDR catalytic triad. iii) The fate of VEP1 in eukaryotes has been different in different lineages, being ubiquitous and highly conserved in land plants, whereas fungi underwent multiple losses. And iv) VEP1-harboring bacteria include non-phytopathogenic and phytopathogenic symbionts which are non-randomly distributed with respect to the type of harbored VEP1 gene. Our findings suggest that VEP1 may have been instrumental for the evolutionary transition of green plants to land, and point to a LGT-mediated ‘Trojan Horse’ mechanism for the evolution of bacterial pathogenesis against plants. VEP1 may serve as tool for revealing microbial interactions in plant/fungi-associated environments.
Collapse
Affiliation(s)
- Rosa Tarrío
- Universidad de Santiago de Compostela, CIBERER, Genome Medicine Group, Santiago de Compostela, Spain
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Francisco J. Ayala
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Francisco Rodríguez-Trelles
- Grup de Biologia Evolutiva, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Barcelona, Spain
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
- * E-mail:
| |
Collapse
|
12
|
Larsen PE, Trivedi G, Sreedasyam A, Lu V, Podila GK, Collart FR. Using deep RNA sequencing for the structural annotation of the Laccaria bicolor mycorrhizal transcriptome. PLoS One 2010; 5:e9780. [PMID: 20625404 PMCID: PMC2897884 DOI: 10.1371/journal.pone.0009780] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2010] [Accepted: 02/26/2010] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. METHODOLOGY We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. CONCLUSIONS 69% of expressed mycorrhizal JGI "best" gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models.
Collapse
Affiliation(s)
- Peter E. Larsen
- Biosciences Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Geetika Trivedi
- Department of Biological Sciences, University of Alabama in Huntsville, Huntsville, Alabama, United States of America
| | - Avinash Sreedasyam
- Department of Biological Sciences, University of Alabama in Huntsville, Huntsville, Alabama, United States of America
| | - Vincent Lu
- Biosciences Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Gopi K. Podila
- Department of Biological Sciences, University of Alabama in Huntsville, Huntsville, Alabama, United States of America
| | - Frank R. Collart
- Biosciences Division, Argonne National Laboratory, Lemont, Illinois, United States of America
- * E-mail:
| |
Collapse
|
13
|
Schnoes AM, Brown SD, Dodevski I, Babbitt PC. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 2009; 5:e1000605. [PMID: 20011109 PMCID: PMC2781113 DOI: 10.1371/journal.pcbi.1000605] [Citation(s) in RCA: 469] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 11/09/2009] [Indexed: 12/13/2022] Open
Abstract
Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%-63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with "overprediction" of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.
Collapse
Affiliation(s)
- Alexandra M. Schnoes
- Graduate Group in Biophysics, University of California San Francisco, San Francisco, California, United States of America
| | - Shoshana D. Brown
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
| | - Igor Dodevski
- Department of Biochemistry, University of Zürich, Zürich, Switzerland
| | - Patricia C. Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America
- California Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, California, United States of America
| |
Collapse
|
14
|
Giuliani SE, Frank AM, Collart FR. Functional assignment of solute-binding proteins of ABC transporters using a fluorescence-based thermal shift assay. Biochemistry 2009; 47:13974-84. [PMID: 19063603 DOI: 10.1021/bi801648r] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We have used a fluorescence-based thermal shift (FTS) assay to identify amino acids that bind to solute-binding proteins in the bacterial ABC transporter family. The assay was validated with a set of six proteins with known binding specificity and was consistently able to map proteins with their known binding ligands. The assay also identified additional candidate binding ligands for several of the amino acid-binding proteins in the validation set. We extended this approach to additional targets and demonstrated the ability of the FTS assay to unambiguously identify preferential binding for several homologues of amino acid-binding proteins with known specificity and to functionally annotate proteins of unknown binding specificity. The assay is implemented in a microwell plate format and provides a rapid approach to validate an anticipated function or to screen proteins of unknown function. The ABC-type transporter family is ubiquitous and transports a variety of biological compounds, but the current annotation of the ligand-binding proteins is limited to mostly generic descriptions of function. The results illustrate the feasibility of the FTS assay to improve the functional annotation of binding proteins associated with ABC-type transporters and suggest this approach that can also be extended to other protein families.
Collapse
Affiliation(s)
- Sarah E Giuliani
- Biosciences Division, Argonne National Laboratory, Lemont, Illinois 60439, USA
| | | | | |
Collapse
|
15
|
Protein Sequence Databases. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
16
|
Discovering functional novelty in metagenomes: examples from light-mediated processes. J Bacteriol 2008; 191:32-41. [PMID: 18849420 DOI: 10.1128/jb.01084-08] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The emerging coverage of diverse habitats by metagenomic shotgun data opens new avenues of discovering functional novelty using computational tools. Here, we apply three different concepts for predicting novel functions within light-mediated microbial pathways in five diverse environments. Using phylogenetic approaches, we discovered two novel deep-branching subfamilies of photolyases (involved in light-mediated repair) distributed abundantly in high-UV environments. Using neighborhood approaches, we were able to assign seven novel functional partners in luciferase synthesis, nitrogen metabolism, and quorum sensing to BLUF domain-containing proteins (involved in light sensing). Finally, by domain analysis, for RcaE proteins (involved in chromatic adaptation), we predict 16 novel domain architectures that indicate novel functionalities in habitats with little or no light. Quantification of protein abundance in the various environments supports our findings that bacteria utilize light for sensing, repair, and adaptation far more widely than previously thought. While the discoveries illustrate the opportunities in function discovery, we also discuss the immense conceptual and practical challenges that come along with this new type of data.
Collapse
|
17
|
Dryden DTF, Thomson AR, White JH. How much of protein sequence space has been explored by life on Earth? J R Soc Interface 2008; 5:953-6. [PMID: 18426772 PMCID: PMC2459213 DOI: 10.1098/rsif.2008.0085] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
We suggest that the vastness of protein sequence space is actually completely explorable during the populating of the Earth by life by considering upper and lower limits for the number of organisms, genome size, mutation rate and the number of functionally distinct classes of amino acids. We conclude that rather than life having explored only an infinitesimally small part of sequence space in the last 4 Gyr, it is instead quite plausible for all of functional protein sequence space to have been explored and that furthermore, at the molecular level, there is no role for contingency.
Collapse
Affiliation(s)
- David T F Dryden
- School of Chemistry, University of Edinburgh, The King's Buildings, Edinburgh EH9 3JJ, UK.
| | | | | |
Collapse
|
18
|
Molecular eco-systems biology: towards an understanding of community function. Nat Rev Microbiol 2008; 6:693-9. [DOI: 10.1038/nrmicro1935] [Citation(s) in RCA: 293] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
19
|
Towards completion of the Earth's proteome. EMBO Rep 2008; 8:1135-41. [PMID: 18059312 DOI: 10.1038/sj.embor.7401117] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2007] [Accepted: 10/15/2007] [Indexed: 11/08/2022] Open
Abstract
New protein sequences are deposited in databases at an accelerating pace; however, many of these are homologous to known proteins and could be considered redundant. If all historical releases of the protein database are analysed using the original sequence-clustering procedure described here, the fraction of newly sequenced proteins that are redundant is increasing. We interpret this as an indication that the sequencing of the Earth's proteome--the complete set of proteins on Earth--is approaching completion. We estimate the approximate size of the Earth's proteome to be 5 million sequences, most of which will be identified during the next 5 years. As the Earth's proteome nears completion, cluster analysis of the protein database will become essential to identify under-explored taxa to which future sequencing efforts should be directed and to focus research on protein families without experimental characterization.
Collapse
|
20
|
Christen R. Global Sequencing: A Review of Current Molecular Data and New Methods Available to Assess Microbial Diversity. Microbes Environ 2008; 23:253-68. [DOI: 10.1264/jsme2.me08525] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Richard Christen
- Université de Nice et CNRS UMR 6543, Laboratoire de Biologie Virtuelle, Cente de Biochimie, Parc Valrose, Faculté des Sciences
| |
Collapse
|
21
|
Morett E, Saab-Rincón G, Olvera L, Olvera M, Flores H, Grande R. Sensitive genome-wide screen for low secondary enzymatic activities: the YjbQ family shows thiamin phosphate synthase activity. J Mol Biol 2007; 376:839-53. [PMID: 18178222 DOI: 10.1016/j.jmb.2007.12.017] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2007] [Revised: 12/06/2007] [Accepted: 12/07/2007] [Indexed: 11/28/2022]
Abstract
Contemporary enzymes are highly efficient and selective catalysts. However, due to the intrinsically very reactive nature of active sites, gratuitous secondary reactions are practically unavoidable. Consequently, even the smallest cell, with its limited enzymatic repertoire, has the potential to carry out numerous additional, very likely inefficient, secondary reactions. If selectively advantageous, secondary reactions could be the basis for the evolution of new fully functional enzymes. Here, we investigated if Escherichia coli has cryptic enzymatic activities related to thiamin biosynthesis. We selected this pathway because this vitamin is essential, but the cell's requirements are very small. Therefore, enzymes with very low activity could complement the auxotrophy of strains deleted of some bona fide thiamin biosynthetic genes. By overexpressing the E. coli's protein repertoire, we selected yjbQ, a gene that complemented a strain deleted of the thiamin phosphate synthase (TPS)-coding gene thiE. In vitro studies confirmed TPS activity, and by directed evolution experiments, this activity was enhanced. Structurally oriented mutagenesis allowed us to identify the putative active site. Remote orthologs of YjbQ from Thermotoga, Sulfolobus, and Pyrococcus were cloned and also showed thiamin auxotrophy complementation, indicating that the cryptic TPS activity is a property of this protein family. Interestingly, the thiE- and yjbQ-coded TPSs are analog enzymes with no structural similarity, reflecting distinct evolutionary origin. These results support the hypothesis that the enzymatic repertoire of a cell such as E. coli has the potential to perform vast amounts of alternative reactions, which could be exploited to evolve novel or more efficient catalysts.
Collapse
Affiliation(s)
- Enrique Morett
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, AP 510-3, CP 62250, Cuernavaca, Morelos, México.
| | | | | | | | | | | |
Collapse
|
22
|
Annotation, comparison and databases for hundreds of bacterial genomes. Res Microbiol 2007; 158:724-36. [DOI: 10.1016/j.resmic.2007.09.009] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2007] [Revised: 09/21/2007] [Accepted: 09/26/2007] [Indexed: 11/20/2022]
|
23
|
Raes J, Foerstner KU, Bork P. Get the most out of your metagenome: computational analysis of environmental sequence data. Curr Opin Microbiol 2007; 10:490-8. [DOI: 10.1016/j.mib.2007.09.001] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2007] [Revised: 08/27/2007] [Accepted: 09/03/2007] [Indexed: 11/28/2022]
|
24
|
Harrington ED, Singh AH, Doerks T, Letunic I, von Mering C, Jensen LJ, Raes J, Bork P. Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proc Natl Acad Sci U S A 2007; 104:13913-8. [PMID: 17717083 PMCID: PMC1955820 DOI: 10.1073/pnas.0702636104] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.
Collapse
Affiliation(s)
- E. D. Harrington
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - A. H. Singh
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - T. Doerks
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - I. Letunic
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - C. von Mering
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - L. J. Jensen
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - J. Raes
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - P. Bork
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
- Max Delbrück Centre for Molecular Medicine, D-13092 Berlin, Germany
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|