1
|
Herbst K, Wang T, Forchielli EJ, Thommes M, Paschalidis IC, Segrè D. Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations. Commun Biol 2024; 7:407. [PMID: 38570615 PMCID: PMC10991586 DOI: 10.1038/s42003-024-06093-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 03/22/2024] [Indexed: 04/05/2024] Open
Abstract
The interpretation of complex biological datasets requires the identification of representative variables that describe the data without critical information loss. This is particularly important in the analysis of large phenotypic datasets (phenomics). Here we introduce Multi-Attribute Subset Selection (MASS), an algorithm which separates a matrix of phenotypes (e.g., yield across microbial species and environmental conditions) into predictor and response sets of conditions. Using mixed integer linear programming, MASS expresses the response conditions as a linear combination of the predictor conditions, while simultaneously searching for the optimally descriptive set of predictors. We apply the algorithm to three microbial datasets and identify environmental conditions that predict phenotypes under other conditions, providing biologically interpretable axes for strain discrimination. MASS could be used to reduce the number of experiments needed to identify species or to map their metabolic capabilities. The generality of the algorithm allows addressing subset selection problems in areas beyond biology.
Collapse
Affiliation(s)
- Konrad Herbst
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Taiyao Wang
- Division of Systems Engineering, Boston University, Boston, MA, USA
| | - Elena J Forchielli
- Biological Design Center, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - Meghan Thommes
- Biological Design Center, Boston University, Boston, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Ioannis Ch Paschalidis
- Division of Systems Engineering, Boston University, Boston, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA.
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA.
| | - Daniel Segrè
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Biological Design Center, Boston University, Boston, MA, USA.
- Department of Biology, Boston University, Boston, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA.
| |
Collapse
|
2
|
Cohen SE, Hashmi SM, Jones AAD, Lykourinou V, Ondrechen MJ, Sridhar S, van de Ven AL, Waters LS, Beuning PJ. Adapting Undergraduate Research to Remote Work to Increase Engagement. BIOPHYSICIST (ROCKVILLE, MD.) 2021; 2:28-32. [PMID: 36909739 PMCID: PMC10003819 DOI: 10.35459/tbp.2021.000199] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Demand for undergraduate research experiences typically outstrips the available laboratory positions, which could have been exacerbated during the remote work conditions imposed by the SARS-CoV-2/COVID-19 pandemic. This report presents a collection of examples of how undergraduates have been engaged in research under pandemic work restrictions. Examples include a range of projects related to fluid dynamics, cancer biology, nanomedicine, circadian clocks, metabolic disease, catalysis, and environmental remediation. Adaptations were made that included partnerships between remote and in-person research students and students taking on more data analysis and literature surveys, as well as data mining, computational, and informatics projects. In many cases, these projects engaged students who otherwise would have worked in traditional bench research, as some previously had. Several examples of beneficial experiences are reported, such as the additional time spent studying the literature, which gave students a heightened sense of project ownership, and more opportunities to integrate feedback into writing and research. Additionally, the more intentional and regular communication necessitated by remote work proved beneficial for all team members. Finally, online seminars and conferences have made participation possible for many more students, especially those at predominantly undergraduate institutions. Participants aim to adopt these beneficial practices in our research groups even after pandemic restrictions end.
Collapse
Affiliation(s)
- Susan E Cohen
- Department of Biological Sciences, California State University Los Angeles, Los Angeles, CA 90032, USA
| | - Sara M Hashmi
- Department of Chemical Engineering, Northeastern University, Boston, MA 02115, USA.,Department of Mechanical & Industrial Engineering, Northeastern University, Boston, MA 02115, USA
| | - A-Andrew D Jones
- Department of Chemical Engineering, Northeastern University, Boston, MA 02115, USA.,School of Public Policy and Urban Affairs, Northeastern University, Boston, MA 02115, USA
| | - Vasiliki Lykourinou
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA
| | - Mary Jo Ondrechen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.,Center for Interdisciplinary Research on Complex Systems, Northeastern University, Boston, MA 02115, USA
| | - Srinivas Sridhar
- Department of Chemical Engineering, Northeastern University, Boston, MA 02115, USA.,Department of Physics, Northeastern University, Boston, MA 02115, USA
| | - Anne L van de Ven
- Department of Physics, Northeastern University, Boston, MA 02115, USA
| | - Lauren S Waters
- Department of Chemistry, University of Wisconsin Oshkosh, Oshkosh, WI 54901, USA
| | - Penny J Beuning
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.,Center for Interdisciplinary Research on Complex Systems, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
3
|
Reynolds KA, Rosa-Molinar E, Ward RE, Zhang H, Urbanowicz BR, Settles AM. Accelerating biological insight for understudied genes. Integr Comp Biol 2021; 61:2233-2243. [PMID: 33970251 DOI: 10.1093/icb/icab029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The rapid expansion of genome sequence data is increasing the discovery of protein-coding genes across all domains of life. Annotating these genes with reliable functional information is necessary to understand evolution, to define the full biochemical space accessed by nature, and to identify target genes for biotechnology improvements. The vast majority of proteins are annotated based on sequence conservation with no specific biological, biochemical, genetic, or cellular function identified. Recent technical advances throughout the biological sciences enable experimental research on these understudied protein-coding genes in a broader collection of species. However, scientists have incentives and biases to continue focusing on well documented genes within their preferred model organism. This perspective suggests a research model that seeks to break historic silos of research bias by enabling interdisciplinary teams to accelerate biological functional annotation. We propose an initiative to develop coordinated projects of collaborating evolutionary biologists, cell biologists, geneticists, and biochemists that will focus on subsets of target genes in multiple model organisms. Concurrent analysis in multiple organisms takes advantage of evolutionary divergence and selection, which causes individual species to be better suited as experimental models for specific genes. Most importantly, multisystem approaches would encourage transdisciplinary critical thinking and hypothesis testing that is inherently slow in current biological research.
Collapse
Affiliation(s)
- Kimberly A Reynolds
- The Green Center for Systems Biology and the Department of Biophysics, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Eduardo Rosa-Molinar
- Department of Pharmacology & Toxicology, The University of Kansas, Lawrence, KS 66047, USA
| | - Robert E Ward
- Department of Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Hongbin Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Breeanna R Urbanowicz
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia 30602, USA
| | - A Mark Settles
- Bioengineering Branch, NASA Ames Research Center, Moffett Field, CA USA
| |
Collapse
|
4
|
Frederick J, Hennessy F, Horn U, de la Torre Cortés P, van den Broek M, Strych U, Willson R, Hefer CA, Daran JMG, Sewell T, Otten LG, Brady D. The complete genome sequence of the nitrile biocatalyst Rhodocccus rhodochrous ATCC BAA-870. BMC Genomics 2020; 21:3. [PMID: 31898479 PMCID: PMC6941271 DOI: 10.1186/s12864-019-6405-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Accepted: 12/16/2019] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Rhodococci are industrially important soil-dwelling Gram-positive bacteria that are well known for both nitrile hydrolysis and oxidative metabolism of aromatics. Rhodococcus rhodochrous ATCC BAA-870 is capable of metabolising a wide range of aliphatic and aromatic nitriles and amides. The genome of the organism was sequenced and analysed in order to better understand this whole cell biocatalyst. RESULTS The genome of R. rhodochrous ATCC BAA-870 is the first Rhodococcus genome fully sequenced using Nanopore sequencing. The circular genome contains 5.9 megabase pairs (Mbp) and includes a 0.53 Mbp linear plasmid, that together encode 7548 predicted protein sequences according to BASys annotation, and 5535 predicted protein sequences according to RAST annotation. The genome contains numerous oxidoreductases, 15 identified antibiotic and secondary metabolite gene clusters, several terpene and nonribosomal peptide synthetase clusters, as well as 6 putative clusters of unknown type. The 0.53 Mbp plasmid encodes 677 predicted genes and contains the nitrile converting gene cluster, including a nitrilase, a low molecular weight nitrile hydratase, and an enantioselective amidase. Although there are fewer biotechnologically relevant enzymes compared to those found in rhodococci with larger genomes, such as the well-known Rhodococcus jostii RHA1, the abundance of transporters in combination with the myriad of enzymes found in strain BAA-870 might make it more suitable for use in industrially relevant processes than other rhodococci. CONCLUSIONS The sequence and comprehensive description of the R. rhodochrous ATCC BAA-870 genome will facilitate the additional exploitation of rhodococci for biotechnological applications, as well as enable further characterisation of this model organism. The genome encodes a wide range of enzymes, many with unknown substrate specificities supporting potential applications in biotechnology, including nitrilases, nitrile hydratase, monooxygenases, cytochrome P450s, reductases, proteases, lipases, and transaminases.
Collapse
Affiliation(s)
- Joni Frederick
- Protein Technologies, CSIR Biosciences, Meiring Naude Road, Brummeria, Pretoria, South Africa
- Electron Microscope Unit, University of Cape Town, Rondebosch, 7701 South Africa
- Present Address: LadHyx, UMR CNRS 7646, École Polytechnique, 91128 Palaiseau, France
| | - Fritha Hennessy
- Protein Technologies, CSIR Biosciences, Meiring Naude Road, Brummeria, Pretoria, South Africa
| | - Uli Horn
- Meraka, CSIR, Meiring Naude Road, Brummeria, 0091 South Africa
| | - Pilar de la Torre Cortés
- Industrial Microbiology, Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
| | - Marcel van den Broek
- Industrial Microbiology, Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
| | - Ulrich Strych
- Biology and Biochemistry, University of Houston, 4800 Calhoun Road, Houston, TX 77204 USA
- Present Address: Department of Pediatrics, Section of Tropical Medicine, Baylor College of Medicine, 1102 Bates Avenue, Houston, TX 77030 USA
| | - Richard Willson
- Biology and Biochemistry, University of Houston, 4800 Calhoun Road, Houston, TX 77204 USA
- Chemical and Biomolecular Engineering, University of Houston, 4800 Calhoun Road, Houston, TX 77204 USA
| | - Charles A. Hefer
- Bioinformatics and Computational Biology Unit, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, 0002 South Africa
- Present Address: AgResearch Limited, Lincoln Research Centre, Private Bag 4749, Christchurch, 8140 New Zealand
| | - Jean-Marc G. Daran
- Industrial Microbiology, Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
| | - Trevor Sewell
- Electron Microscope Unit, University of Cape Town, Rondebosch, 7701 South Africa
| | - Linda G. Otten
- Biocatalysis, Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
| | - Dean Brady
- Protein Technologies, CSIR Biosciences, Meiring Naude Road, Brummeria, Pretoria, South Africa
- Molecular Sciences Institute, School of Chemistry, University of the Witwatersrand, PO, Wits, 2050 South Africa
| |
Collapse
|
5
|
Romero S, Nastasa A, Chapman A, Kwong WK, Foster LJ. The honey bee gut microbiota: strategies for study and characterization. INSECT MOLECULAR BIOLOGY 2019; 28:455-472. [PMID: 30652367 DOI: 10.1111/imb.12567] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Gut microbiota research is an emerging field that improves our understanding of the ecological and functional dynamics of gut environments. The honey bee gut microbiota is a highly rewarding community to study, as honey bees are critical pollinators of many crops for human consumption and produce valuable commodities such as honey and wax. Most significantly, unique characteristics of the Apis mellifera gut habitat make it a valuable model system. This review discusses methods and pipelines used in the study of the gut microbiota of Ap. mellifera and closely related species for four main purposes: identifying microbiota taxonomy, characterizing microbiota genomes (microbiome), characterizing microbiota-microbiota interactions and identifying functions of the microbial community in the gut. The purpose of this contribution is to increase understanding of honey bee gut microbiota, to facilitate bee microbiota and microbiome research in general and to aid design of future experiments in this growing field.
Collapse
Affiliation(s)
- S Romero
- Michael Smith Laboratories and Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada
| | - A Nastasa
- Michael Smith Laboratories and Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada
| | - A Chapman
- Michael Smith Laboratories and Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada
| | - W K Kwong
- Biodiversity Research Centre, Department of Botany, University of British Columbia, Vancouver, BC, Canada
| | - L J Foster
- Michael Smith Laboratories and Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
6
|
Mohamed SB, Hassan MM, Munir KA, Abdalla NI, Adlan TA, Babiker AK. Re-annotation for hypothetical protein CA803_03125 of Methicillin-Resistant Staphylococcus aureus strain SO-1977 isolated from Sudan. Bioinformation 2019; 15:160-164. [PMID: 31354190 PMCID: PMC6637404 DOI: 10.6026/97320630015160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 12/26/2018] [Accepted: 12/27/2018] [Indexed: 11/23/2022] Open
Abstract
This study aims to describe the global detection and functional inference of hypothetical protein CA803_03125 from Staphylococcus aureus SO-1977. Computational methods were utilized to study this protein based on sequence similarity and presence of known protein domains. The BLASTp result revealed a significant similarity between the hypothetical protein (CA803_03125) and ADP-ribose hydrolase protein from four S. aureus strains (MW2, MRSA252, COL, and N315). Evolutionary tree diagram revealed a close relationship between the hypothetical protein and proteins of MW2 and COL strains. The physicochemical characterization revealed that all proteins were found to be stable, soluble, hydrophilic and acidic in their nature. The Macro domain was found to exist within all proteins. Moreover, the proteins were of pronounced similarity in terms of primary, secondary and tertiary organization. The protein CA803_03125 (SO-1977) is already known and well characterized as ADP-ribose hydrolase; therefore, we would recommend that its NCBI data has to be updated to be submitted under the name of ADP-ribose hydrolase.
Collapse
Affiliation(s)
- Sofia B Mohamed
- Bioinformatics and Biostatistics Department, National University Research Institute, National University, Sudan
| | - Mohamed M Hassan
- Bioinformatics and Biostatistics Department, National University Research Institute, National University, Sudan
| | - Ka Abdalla Munir
- Bioinformatics and Biostatistics Department, National University Research Institute, National University, Sudan
| | - Nusiba I Abdalla
- Bioinformatics and Biostatistics Department, National University Research Institute, National University, Sudan
| | - Talal A Adlan
- Bioinformatics and Biostatistics Department, National University Research Institute, National University, Sudan
| | - Aisha K Babiker
- Bioinformatics and Biostatistics Department, National University Research Institute, National University, Sudan
| |
Collapse
|
7
|
HashGO: hashing gene ontology for protein function prediction. Comput Biol Chem 2017; 71:264-273. [DOI: 10.1016/j.compbiolchem.2017.09.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 09/25/2017] [Indexed: 10/18/2022]
|
8
|
Barbour AG, Adeolu M, Gupta RS. Division of the genus Borrelia into two genera (corresponding to Lyme disease and relapsing fever groups) reflects their genetic and phenotypic distinctiveness and will lead to a better understanding of these two groups of microbes (Margos et al. (2016) There is inadequate evidence to support the division of the genus Borrelia. Int. J. Syst. Evol. Microbiol. doi: 10.1099/ijsem.0.001717). Int J Syst Evol Microbiol 2017; 67:2058-2067. [PMID: 28141502 DOI: 10.1099/ijsem.0.001815] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Affiliation(s)
- Alan G Barbour
- Departments of Medicine, Microbiology & Molecular Genetics, and Ecology & Evolutionary Biology, University of California, Irvine, California, USA
| | - Mobolaji Adeolu
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
| | - Radhey S Gupta
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
9
|
Zou X, Wang G, Yu G. Protein Function Prediction Using Deep Restricted Boltzmann Machines. BIOMED RESEARCH INTERNATIONAL 2017; 2017:1729301. [PMID: 28744460 PMCID: PMC5506480 DOI: 10.1155/2017/1729301] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 05/30/2017] [Indexed: 11/17/2022]
Abstract
Accurately annotating biological functions of proteins is one of the key tasks in the postgenome era. Many machine learning based methods have been applied to predict functional annotations of proteins, but this task is rarely solved by deep learning techniques. Deep learning techniques recently have been successfully applied to a wide range of problems, such as video, images, and nature language processing. Inspired by these successful applications, we investigate deep restricted Boltzmann machines (DRBM), a representative deep learning technique, to predict the missing functional annotations of partially annotated proteins. Experimental results on Homo sapiens, Saccharomyces cerevisiae, Mus musculus, and Drosophila show that DRBM achieves better performance than other related methods across different evaluation metrics, and it also runs faster than these comparing methods.
Collapse
Affiliation(s)
- Xianchun Zou
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Guijun Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
| |
Collapse
|
10
|
Abstract
BACKGROUND Gene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them. RESULTS Semantic similarity between genes is derived from GO hierarchy and annotations of genes. It is positively correlated with the similarity derived from various types of biological data and has been applied to predict gene function. In this paper, we investigate whether it is possible to replenish annotations of incompletely annotated genes by using semantic similarity between genes from two species with homology. For this investigation, we utilize three representative semantic similarity metrics to compute similarity between genes from two species. Next, we determine the k nearest neighborhood genes from the two species based on the chosen metric and then use terms annotated to k neighbors of a gene to replenish annotations of that gene. We perform experiments on archived (from Jan-2014 to Jan-2016) GO annotations of four species (Human, Mouse, Danio rerio and Arabidopsis thaliana) to assess the contribution of semantic similarity between genes from different species. The experimental results demonstrate that: (1) semantic similarity between genes from homologous species contributes much more on the improved accuracy (by 53.22%) than genes from single species alone, and genes from two species with low homology; (2) GO annotations of genes from homologous species are complementary to each other. CONCLUSIONS Our study shows that semantic similarity based interspecies gene function annotation from homologous species is more prominent than traditional intraspecies approaches. This work can promote more research on semantic similarity based function prediction across species.
Collapse
Affiliation(s)
- Guoxian Yu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Wei Luo
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Guangyuan Fu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| |
Collapse
|
11
|
Gupta RS. Impact of genomics on the understanding of microbial evolution and classification: the importance of Darwin's views on classification. FEMS Microbiol Rev 2016; 40:520-53. [PMID: 27279642 DOI: 10.1093/femsre/fuw011] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2016] [Indexed: 12/24/2022] Open
Abstract
Analyses of genome sequences, by some approaches, suggest that the widespread occurrence of horizontal gene transfers (HGTs) in prokaryotes disguises their evolutionary relationships and have led to questioning of the Darwinian model of evolution for prokaryotes. These inferences are critically examined in the light of comparative genome analysis, characteristic synapomorphies, phylogenetic trees and Darwin's views on examining evolutionary relationships. Genome sequences are enabling discovery of numerous molecular markers (synapomorphies) such as conserved signature indels (CSIs) and conserved signature proteins (CSPs), which are distinctive characteristics of different prokaryotic taxa. Based on these molecular markers, exhibiting high degree of specificity and predictive ability, numerous prokaryotic taxa of different ranks, currently identified based on the 16S rRNA gene trees, can now be reliably demarcated in molecular terms. Within all studied groups, multiple CSIs and CSPs have been identified for successive nested clades providing reliable information regarding their hierarchical relationships and these inferences are not affected by HGTs. These results strongly support Darwin's views on evolution and classification and supplement the current phylogenetic framework based on 16S rRNA in important respects. The identified molecular markers provide important means for developing novel diagnostics, therapeutics and for functional studies providing important insights regarding prokaryotic taxa.
Collapse
Affiliation(s)
- Radhey S Gupta
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
12
|
Neuhaus K, Landstorfer R, Fellner L, Simon S, Schafferhans A, Goldberg T, Marx H, Ozoline ON, Rost B, Kuster B, Keim DA, Scherer S. Translatomics combined with transcriptomics and proteomics reveals novel functional, recently evolved orphan genes in Escherichia coli O157:H7 (EHEC). BMC Genomics 2016; 17:133. [PMID: 26911138 PMCID: PMC4765031 DOI: 10.1186/s12864-016-2456-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 02/09/2016] [Indexed: 12/30/2022] Open
Abstract
Background Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). Results Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. Conclusions These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2456-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Klaus Neuhaus
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| | - Richard Landstorfer
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| | - Lea Fellner
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| | - Svenja Simon
- Lehrstuhl für Datenanalyse und Visualisierung, Fachbereich Informatik und Informationswissenschaft, Universität Konstanz, Box 78, 78457, Konstanz, Germany.
| | - Andrea Schafferhans
- Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
| | - Tatyana Goldberg
- Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
| | - Harald Marx
- Chair of Proteomics and Bioanalytics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354, Freising, Germany.
| | - Olga N Ozoline
- Institute of Cell Biophysics, Russian Academy of Sciences, Moscow Region, 142290, Pushchino, Russia.
| | - Burkhard Rost
- Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354, Freising, Germany. .,Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technische Universität München, Gregor-Mendel-Str. 4, 85354, Freising, Germany.
| | - Daniel A Keim
- Lehrstuhl für Datenanalyse und Visualisierung, Fachbereich Informatik und Informationswissenschaft, Universität Konstanz, Box 78, 78457, Konstanz, Germany.
| | - Siegfried Scherer
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| |
Collapse
|
13
|
Islam MS, Shahik SM, Sohel M, Patwary NIA, Hasan MA. In Silico Structural and Functional Annotation of Hypothetical Proteins of Vibrio cholerae O139. Genomics Inform 2015; 13:53-9. [PMID: 26175663 PMCID: PMC4500799 DOI: 10.5808/gi.2015.13.2.53] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Revised: 05/26/2015] [Accepted: 05/27/2015] [Indexed: 11/20/2022] Open
Abstract
In developing countries threat of cholera is a significant health concern whenever water purification and sewage disposal systems are inadequate. Vibrio cholerae is one of the responsible bacteria involved in cholera disease. The complete genome sequence of V. cholerae deciphers the presence of various genes and hypothetical proteins whose function are not yet understood. Hence analyzing and annotating the structure and function of hypothetical proteins is important for understanding the V. cholerae. V. cholerae O139 is the most common and pathogenic bacterial strain among various V. cholerae strains. In this study sequence of six hypothetical proteins of V. cholerae O139 has been annotated from NCBI. Various computational tools and databases have been used to determine domain family, protein-protein interaction, solubility of protein, ligand binding sites etc. The three dimensional structure of two proteins were modeled and their ligand binding sites were identified. We have found domains and families of only one protein. The analysis revealed that these proteins might have antibiotic resistance activity, DNA breaking-rejoining activity, integrase enzyme activity, restriction endonuclease, etc. Structural prediction of these proteins and detection of binding sites from this study would indicate a potential target aiding docking studies for therapeutic designing against cholera.
Collapse
Affiliation(s)
- Md Saiful Islam
- Department of Genetic Engineering & Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chittagong 4331, Bangladesh
| | - Shah Md Shahik
- Department of Genetic Engineering & Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chittagong 4331, Bangladesh
| | - Md Sohel
- Department of Genetic Engineering & Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chittagong 4331, Bangladesh
| | - Noman I A Patwary
- Department of Genetic Engineering & Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chittagong 4331, Bangladesh
| | - Md Anayet Hasan
- Department of Genetic Engineering & Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chittagong 4331, Bangladesh
| |
Collapse
|
14
|
Oany AR, Ahmad SAI, Kibria KK, Hossain MU, Jyoti TP. A Hypothetical Protein of Alteromonas macleodii AltDE1 (amad1_06475) Predicted to be a Cold-Shock Protein with RNA Chaperone Activity. GENE REGULATION AND SYSTEMS BIOLOGY 2015; 8:141-7. [PMID: 25574135 PMCID: PMC4271719 DOI: 10.4137/grsb.s20802] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 11/16/2014] [Accepted: 11/24/2014] [Indexed: 11/05/2022]
Abstract
Alteromonas macleodii AltDE1 is a deep sea protobacteria that is distinct from the surface isolates of the same species. This study was designed to elucidate the biological function of amad1_06475, a hypothetical protein of A. macleodii AltDE1. The 70 residues protein sequence showed considerable homology with cold-shock proteins (CSPs) and RNA chaperones from different organisms. Multiple sequence alignment further supported the presence of conserved csp domain on the protein sequence. The three-dimensional structure of the protein was also determined, and verified by PROCHECK, Verify3D, and QMEAN programs. The predicted structure contained five anti-parallel β-strands and RNA-binding motifs, which are characteristic features of prokaryotic CSPs. Finally, the binding of a thymidine-rich oligonucleotide and a single uracil molecule in the active site of the protein further strengthens our prediction about the function of amad1_06475 as a CSP and thereby acting as a RNA chaperone. The binding was performed by molecular docking tools and was compared with similar binding of 3PF5 (PDB) and 2HAX (PDB), major CSPs of Bacillus subtilis and Bacillus caldolyticus, respectively.
Collapse
Affiliation(s)
- Arafat Rahman Oany
- Department of Biotechnology and Genetic Engineering, Faculty of Life Science, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Shah Adil Ishtiyaq Ahmad
- Department of Biotechnology and Genetic Engineering, Faculty of Life Science, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Km Kaderi Kibria
- Department of Biotechnology and Genetic Engineering, Faculty of Life Science, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Mohammad Uzzal Hossain
- Department of Biotechnology and Genetic Engineering, Faculty of Life Science, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Tahmina Pervin Jyoti
- Biotechnology and Genetic Engineering Discipline, Life Science School, Khulna University, Khulna, Bangladesh
| |
Collapse
|
15
|
Reshma SV, Sathyanarayanan N, Nagendra HG. Characterization of hypothetical protein VNG0128C from Halobacterium NRC-1 reveals GALE like activity and its involvement in Leloir pathway of galactose metabolism. J Biomol Struct Dyn 2014; 33:1743-55. [PMID: 25397923 DOI: 10.1080/07391102.2014.969313] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
VNG0128C, a hypothetical protein from Halobacterium NRC-1, was chosen for detailed insilico and experimental investigations. Computational exercises revealed that VNG0128C functions as NAD(+) binding protein. The phylogenetic analysis with the homolog sequences of VNG0128C suggested that it could act as UDP-galactose 4-epimerase. Hence, the VNG0128C sequence was modeled using a suitable template and docking studies were performed with NAD and UDP-galactose as ligands. The binding interactions strongly indicate that VNG0128C could plausibly act as UDP-galactose 4-epimerase. In order to validate these insilico results, VNG0128C was cloned in pUC57, subcloned in pET22b(+), expressed in BL21 cells and purified using nickel affinity chromatography. An assay using blue dextran was performed to confirm the presence of NAD binding domain. To corroborate the epimerase like enzymatic role of the hypothetical protein, i.e. the ability of the enzyme to convert UDP-galactose to UDP-glucose, the conversion of NAD to NADH was measured. The experimental assay significantly correlated with the insilico predictions, indicating that VNG0128C has a NAD(+) binding domain with epimerase activity. Consequently, its key role in nucleotide-sugar metabolism was thus established. Additionally, the work highlights the need for a methodical characterization of hypothetical proteins (less studied class of biopolymers) to exploit them for relevant applications in the field of biology.
Collapse
Affiliation(s)
- S V Reshma
- a Department of Biotechnology , PES Institute of Technology , Bangalore , India
| | | | | |
Collapse
|
16
|
Martín-Galiano AJ, Yuste J, Cercenado MI, de la Campa AG. Inspecting the potential physiological and biomedical value of 44 conserved uncharacterised proteins of Streptococcus pneumoniae. BMC Genomics 2014; 15:652. [PMID: 25096389 PMCID: PMC4143570 DOI: 10.1186/1471-2164-15-652] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 07/21/2014] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The major Gram-positive coccoid pathogens cause similar invasive diseases and show high rates of antimicrobial resistance. Uncharacterised proteins shared by these organisms may be involved in virulence or be targets for antimicrobial therapy. RESULTS Forty four uncharacterised proteins from Streptococcus pneumoniae with homologues in Enterococcus faecalis and/or Staphylococcus aureus were selected for analysis. These proteins showed differences in terms of sequence conservation and number of interacting partners. Twenty eight of these proteins were monodomain proteins and 16 were modular, involving domain combinations and, in many cases, predicted unstructured regions. The genes coding for four of these 44 proteins were essential. Genomic and structural studies showed one of the four essential genes to code for a promising antibacterial target. The strongest impact of gene removal was on monodomain proteins showing high sequence conservation and/or interactions with many other proteins. Eleven out of 40 knockouts (one for each gene) showed growth delay and 10 knockouts presented a chaining phenotype. Five of these chaining mutants showed a lack of putative DNA-binding proteins. This suggest this phenotype results from a loss of overall transcription regulation. Five knockouts showed defective autolysis in response to penicillin and vancomycin, and attenuated virulence in an animal model of sepsis. CONCLUSIONS Uncharacterised proteins make up a reservoir of polypeptides of different physiological importance and biomedical potential. A promising antibacterial target was identified. Five of the 44 examined proteins seemed to be virulence factors.
Collapse
Affiliation(s)
- Antonio J Martín-Galiano
- />Centro Nacional de Microbiología and CIBERES (CIBER de Enfermedades Respiratorias), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - José Yuste
- />Centro Nacional de Microbiología and CIBERES (CIBER de Enfermedades Respiratorias), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - María I Cercenado
- />Centro Nacional de Microbiología and CIBERES (CIBER de Enfermedades Respiratorias), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - Adela G de la Campa
- />Centro Nacional de Microbiología and CIBERES (CIBER de Enfermedades Respiratorias), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
- />Presidencia, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| |
Collapse
|
17
|
Sorokina M, Stam M, Médigue C, Lespinet O, Vallenet D. Profiling the orphan enzymes. Biol Direct 2014; 9:10. [PMID: 24906382 PMCID: PMC4084501 DOI: 10.1186/1745-6150-9-10] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 05/29/2014] [Indexed: 11/10/2022] Open
Abstract
The emergence of Next Generation Sequencing generates an incredible amount of sequence and great potential for new enzyme discovery. Despite this huge amount of data and the profusion of bioinformatic methods for function prediction, a large part of known enzyme activities is still lacking an associated protein sequence. These particular activities are called "orphan enzymes". The present review proposes an update of previous surveys on orphan enzymes by mining the current content of public databases. While the percentage of orphan enzyme activities has decreased from 38% to 22% in ten years, there are still more than 1,000 orphans among the 5,000 entries of the Enzyme Commission (EC) classification. Taking into account all the reactions present in metabolic databases, this proportion dramatically increases to reach nearly 50% of orphans and many of them are not associated to a known pathway. We extended our survey to "local orphan enzymes" that are activities which have no representative sequence in a given clade, but have at least one in organisms belonging to other clades. We observe an important bias in Archaea and find that in general more than 30% of the EC activities have incomplete sequence information in at least one superkingdom. To estimate if candidate proteins for local orphans could be retrieved by homology search, we applied a simple strategy based on the PRIAM software and noticed that candidates may be proposed for an important fraction of local orphan enzymes. Finally, by studying relation between protein domains and catalyzed activities, it appears that newly discovered enzymes are mostly associated with already known enzyme domains. Thus, the exploration of the promiscuity and the multifunctional aspect of known enzyme families may solve part of the orphan enzyme issue. We conclude this review with a presentation of recent initiatives in finding proteins for orphan enzymes and in extending the enzyme world by the discovery of new activities.
Collapse
Affiliation(s)
- Maria Sorokina
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, 91057 Evry, France.
| | | | | | | | | |
Collapse
|
18
|
Tan SH, Normi YM, Leow ATC, Salleh AB, Karjiban RA, Murad AMA, Mahadi NM, Rahman MBA. A Sco protein among the hypothetical proteins of Bacillus lehensis G1: Its 3D macromolecular structure and association with Cytochrome C Oxidase. BMC STRUCTURAL BIOLOGY 2014; 14:11. [PMID: 24641837 PMCID: PMC3994876 DOI: 10.1186/1472-6807-14-11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Accepted: 03/14/2014] [Indexed: 11/10/2022]
Abstract
BACKGROUND At least a quarter of any complete genome encodes for hypothetical proteins (HPs) which are largely non-similar to other known, well-characterized proteins. Predicting and solving their structures and functions is imperative to aid understanding of any given organism as a complete biological system. The present study highlights the primary effort to classify and cluster 1202 HPs of Bacillus lehensis G1 alkaliphile to serve as a platform to mine and select specific HP(s) to be studied further in greater detail. RESULTS All HPs of B. lehensis G1 were grouped according to their predicted functions based on the presence of functional domains in their sequences. From the metal-binding group of HPs of the cluster, an HP termed Bleg1_2507 was discovered to contain a thioredoxin (Trx) domain and highly-conserved metal-binding ligands represented by Cys69, Cys73 and His159, similar to all prokaryotic and eukaryotic Sco proteins. The built 3D structure of Bleg1_2507 showed that it shared the βαβαββ core structure of Trx-like proteins as well as three flanking β-sheets, a 310 -helix at the N-terminus and a hairpin structure unique to Sco proteins. Docking simulations provided an interesting view of Bleg1_2507 in association with its putative cytochrome c oxidase subunit II (COXII) redox partner, Bleg1_2337, where the latter can be seen to hold its partner in an embrace, facilitated by hydrophobic and ionic interactions between the proteins. Although Bleg1_2507 shares relatively low sequence identity (47%) to BsSco, interestingly, the predicted metal-binding residues of Bleg1_2507 i.e. Cys-69, Cys-73 and His-159 were located at flexible active loops similar to other Sco proteins across biological taxa. This highlights structural conservation of Sco despite their various functions in prokaryotes and eukaryotes. CONCLUSIONS We propose that HP Bleg1_2507 is a Sco protein which is able to interact with COXII, its redox partner and therefore, may possess metallochaperone and redox functions similar to other documented bacterial Sco proteins. It is hoped that this scientific effort will help to spur the search for other physiologically relevant proteins among the so-called "orphan" proteins of any given organism.
Collapse
Affiliation(s)
- Soo Huei Tan
- Center for Enzyme and Microbial Biotechnology (EMTECH), Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, Serdang, Selangor 43400, Malaysia
| | - Yahaya M Normi
- Center for Enzyme and Microbial Biotechnology (EMTECH), Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, Serdang, Selangor 43400, Malaysia
| | - Adam Thean Chor Leow
- Center for Enzyme and Microbial Biotechnology (EMTECH), Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, Serdang, Selangor 43400, Malaysia
| | - Abu Bakar Salleh
- Center for Enzyme and Microbial Biotechnology (EMTECH), Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, Serdang, Selangor 43400, Malaysia
| | - Roghayeh Abedi Karjiban
- Center for Enzyme and Microbial Biotechnology (EMTECH), Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, Serdang, Selangor 43400, Malaysia
- Department of Chemistry, Faculty of Science, Universiti Putra Malaysia, Serdang, Selangor 43400, Malaysia
| | - Abdul Munir Abdul Murad
- School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM, Bangi, Selangor, Malaysia
| | - Nor Muhammad Mahadi
- Malaysia Genome Institute, Ministry of Science, Technology and Innovation, Jalan Bangi, Kajang, Selangor 43000, Malaysia
| | - Mohd Basyaruddin Abdul Rahman
- Center for Enzyme and Microbial Biotechnology (EMTECH), Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, Serdang, Selangor 43400, Malaysia
- Department of Chemistry, Faculty of Science, Universiti Putra Malaysia, Serdang, Selangor 43400, Malaysia
- Malaysia Genome Institute, Ministry of Science, Technology and Innovation, Jalan Bangi, Kajang, Selangor 43000, Malaysia
| |
Collapse
|
19
|
Carnitine metabolism to trimethylamine by an unusual Rieske-type oxygenase from human microbiota. Proc Natl Acad Sci U S A 2014; 111:4268-73. [PMID: 24591617 DOI: 10.1073/pnas.1316569111] [Citation(s) in RCA: 230] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Dietary intake of L-carnitine can promote cardiovascular diseases in humans through microbial production of trimethylamine (TMA) and its subsequent oxidation to trimethylamine N-oxide by hepatic flavin-containing monooxygenases. Although our microbiota are responsible for TMA formation from carnitine, the underpinning molecular and biochemical mechanisms remain unclear. In this study, using bioinformatics approaches, we first identified a two-component Rieske-type oxygenase/reductase (CntAB) and associated gene cluster proposed to be involved in carnitine metabolism in representative genomes of the human microbiota. CntA belongs to a group of previously uncharacterized Rieske-type proteins and has an unusual "bridging" glutamate but not the aspartate residue, which is believed to facilitate intersubunit electron transfer between the Rieske center and the catalytic mononuclear iron center. Using Acinetobacter baumannii as the model, we then demonstrate that cntAB is essential in carnitine degradation to TMA. Heterologous overexpression of cntAB enables Escherichia coli to produce TMA, confirming that these genes are sufficient in TMA formation. Site-directed mutagenesis experiments have confirmed that this unusual "bridging glutamate" residue in CntA is essential in catalysis and neither mutant (E205D, E205A) is able to produce TMA. Taken together, the data in our study reveal the molecular and biochemical mechanisms underpinning carnitine metabolism to TMA in human microbiota and assign the role of this novel group of Rieske-type proteins in microbial carnitine metabolism.
Collapse
|
20
|
Abstract
The genomic revolution promises great advances in the search for useful biocatalysts. Function-based metagenomic approaches have identified several enzymes with properties that make them useful candidates for a variety of bioprocesses. As DNA sequencing costs continue to decline, the volume of genomic data, along with their corresponding predicted protein sequences, will continue to increase dramatically, necessitating new approaches to leverage this information for gene-based bioprospecting efforts. Additionally, as new functions are discovered and correlated with this sequence information, the knowledge of the often complex relationship between a protein's sequence and function will improve. This in turn will lead to better gene-based bioprospecting approaches and facilitate the tailoring of desired properties through protein engineering projects. In this chapter, we discuss a number of recent advances in bioprospecting within the context of the genomic age.
Collapse
Affiliation(s)
- Michael A Hicks
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Kristala L J Prather
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; Synthetic Biology Engineering Research Center (SynBERC), Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
| |
Collapse
|
21
|
Abstract
Our ability to sequence genomes has provided us with near-complete lists of the proteins that compose cells, tissues, and organisms, but this is only the beginning of the process to discover the functions of cellular components. In the future, it's going to be crucial to develop computational analyses that can predict the biological functions of uncharacterised proteins. At the same time, we must not forget those fundamental experimental skills needed to confirm the predictions or send the analysts back to the drawing board to devise new ones.
Collapse
Affiliation(s)
- William C. Earnshaw
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, ICB, Edinburgh, Scotland, United Kingdom
- * E-mail:
| |
Collapse
|
22
|
Hwang WC, Bakolitsa C, Punta M, Coggill PC, Bateman A, Axelrod HL, Rawlings ND, Sedova M, Peterson SN, Eberhardt RY, Aravind L, Pascual J, Godzik A. LUD, a new protein domain associated with lactate utilization. BMC Bioinformatics 2013; 14:341. [PMID: 24274019 PMCID: PMC3924224 DOI: 10.1186/1471-2105-14-341] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 11/19/2013] [Indexed: 11/24/2022] Open
Abstract
Background A novel highly conserved protein domain, DUF162 [Pfam: PF02589], can be mapped to two proteins: LutB and LutC. Both proteins are encoded by a highly conserved LutABC operon, which has been implicated in lactate utilization in bacteria. Based on our analysis of its sequence, structure, and recent experimental evidence reported by other groups, we hereby redefine DUF162 as the LUD domain family. Results JCSG solved the first crystal structure [PDB:2G40] from the LUD domain family: LutC protein, encoded by ORF DR_1909, of Deinococcus radiodurans. LutC shares features with domains in the functionally diverse ISOCOT superfamily. We have observed that the LUD domain has an increased abundance in the human gut microbiome. Conclusions We propose a model for the substrate and cofactor binding and regulation in LUD domain. The significance of LUD-containing proteins in the human gut microbiome, and the implication of lactate metabolism in the radiation-resistance of Deinococcus radiodurans are discussed.
Collapse
Affiliation(s)
- William C Hwang
- Joint Center for Structural Genomics, La Jolla, CA 92037, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Revealing the hidden functional diversity of an enzyme family. Nat Chem Biol 2013; 10:42-9. [DOI: 10.1038/nchembio.1387] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 10/02/2013] [Indexed: 11/08/2022]
|
24
|
Benso A, Di Carlo S, Ur Rehman H, Politano G, Savino A, Suravajhala P. A combined approach for genome wide protein function annotation/prediction. Proteome Sci 2013; 11:S1. [PMID: 24564915 PMCID: PMC3909112 DOI: 10.1186/1477-5956-11-s1-s1] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Background Today large scale genome sequencing technologies are uncovering an increasing amount of new genes and proteins, which remain uncharacterized. Experimental procedures for protein function prediction are low throughput by nature and thus can't be used to keep up with the rate at which new proteins are discovered. On the other hand, proteins are the prominent stakeholders in almost all biological processes, and therefore the need to precisely know their functions for a better understanding of the underlying biological mechanism is inevitable. The challenge of annotating uncharacterized proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions. Methods We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein by aggregating heterogeneous information. Considered information include: protein motifs, protein sequence similarity, and protein homology data gathered from interacting proteins, combined with data from highly similar non-interacting proteins (hereinafter called Similactors). Moreover, to increase the predictive power of our model we also compute and integrate term specific relationships among functional terms based on Gene Ontology (GO). Results We tested our method on Saccharomyces Cerevisiae and Homo sapiens species proteins. The aggregation of different structural and functional evidence with GO relationships outperforms, in terms of precision and accuracy of prediction than the other methods reported in literature. The predicted precision and accuracy is 100% for more than half of the input set for both species; overall, we obtained 85.38% precision and 81.95% accuracy for Homo sapiens and 79.73% precision and 80.06% accuracy for Saccharomyces Cerevisiae species proteins.
Collapse
|
25
|
Tchigvintsev A, Tchigvintsev D, Flick R, Popovic A, Dong A, Xu X, Brown G, Lu W, Wu H, Cui H, Dombrowski L, Joo JC, Beloglazova N, Min J, Savchenko A, Caudy AA, Rabinowitz JD, Murzin AG, Yakunin AF. Biochemical and structural studies of conserved Maf proteins revealed nucleotide pyrophosphatases with a preference for modified nucleotides. ACTA ACUST UNITED AC 2013; 20:1386-98. [PMID: 24210219 PMCID: PMC3899018 DOI: 10.1016/j.chembiol.2013.09.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Revised: 09/06/2013] [Accepted: 09/13/2013] [Indexed: 11/17/2022]
Abstract
Maf (for multicopy associated filamentation) proteins represent a large family of conserved proteins implicated in cell division arrest but whose biochemical activity remains unknown. Here, we show that the prokaryotic and eukaryotic Maf proteins exhibit nucleotide pyrophosphatase activity against 5-methyl-UTP, pseudo-UTP, 5-methyl-CTP, and 7-methyl-GTP, which represent the most abundant modified bases in all organisms, as well as against canonical nucleotides dTTP, UTP, and CTP. Overexpression of the Maf protein YhdE in E. coli cells increased intracellular levels of dTMP and UMP, confirming that dTTP and UTP are the in vivo substrates of this protein. Crystal structures and site-directed mutagenesis of Maf proteins revealed the determinants of their activity and substrate specificity. Thus, pyrophosphatase activity of Maf proteins toward canonical and modified nucleotides might provide the molecular mechanism for a dual role of these proteins in cell division arrest and house cleaning. Maf proteins represent a family of nucleoside triphosphate pyrophosphatases Maf proteins hydrolyze the canonical nucleotides dTTP, UTP, and CTP Maf proteins are also active against m5UTP, m5CTP, pseudo-UTP, and m7GTP Maf structures reveal the molecular mechanisms of their substrate selectivity
Collapse
Affiliation(s)
- Anatoli Tchigvintsev
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON M5S 3E5, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Anton BP, Chang YC, Brown P, Choi HP, Faller LL, Guleria J, Hu Z, Klitgord N, Levy-Moonshine A, Maksad A, Mazumdar V, McGettrick M, Osmani L, Pokrzywa R, Rachlin J, Swaminathan R, Allen B, Housman G, Monahan C, Rochussen K, Tao K, Bhagwat AS, Brenner SE, Columbus L, de Crécy-Lagard V, Ferguson D, Fomenkov A, Gadda G, Morgan RD, Osterman AL, Rodionov DA, Rodionova IA, Rudd KE, Söll D, Spain J, Xu SY, Bateman A, Blumenthal RM, Bollinger JM, Chang WS, Ferrer M, Friedberg I, Galperin MY, Gobeill J, Haft D, Hunt J, Karp P, Klimke W, Krebs C, Macelis D, Madupu R, Martin MJ, Miller JH, O'Donovan C, Palsson B, Ruch P, Setterdahl A, Sutton G, Tate J, Yakunin A, Tchigvintsev D, Plata G, Hu J, Greiner R, Horn D, Sjölander K, Salzberg SL, Vitkup D, Letovsky S, Segrè D, DeLisi C, Roberts RJ, Steffen M, Kasif S. The COMBREX project: design, methodology, and initial results. PLoS Biol 2013; 11:e1001638. [PMID: 24013487 PMCID: PMC3754883 DOI: 10.1371/journal.pbio.1001638] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Brian P. Anton
- New England Biolabs, Ipswich, Massachusetts, United States of America
- * E-mail: (BPA); (SK)
| | - Yi-Chien Chang
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Peter Brown
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Han-Pil Choi
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Lina L. Faller
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Jyotsna Guleria
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Zhenjun Hu
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Niels Klitgord
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Ami Levy-Moonshine
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Almaz Maksad
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Varun Mazumdar
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Mark McGettrick
- Diatom Software LLC, Holliston, Massachusetts, United States of America
| | - Lais Osmani
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Revonda Pokrzywa
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - John Rachlin
- Diatom Software LLC, Holliston, Massachusetts, United States of America
| | - Rajeswari Swaminathan
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Benjamin Allen
- Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Mathematics, Emmanuel College, Boston, Massachusetts, United States of America
| | - Genevieve Housman
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Caitlin Monahan
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Krista Rochussen
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Kevin Tao
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Ashok S. Bhagwat
- Department of Chemistry, Wayne State University, Detroit, Michigan, United States of America
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, California, United States of America
| | - Linda Columbus
- Department of Chemistry, University of Virginia, Charlottesville, Virginia, United States of America
| | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida, United States of America
| | - Donald Ferguson
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
| | - Alexey Fomenkov
- New England Biolabs, Ipswich, Massachusetts, United States of America
| | - Giovanni Gadda
- Department of Chemistry, Georgia State University, Atlanta, Georgia, United States of America
| | - Richard D. Morgan
- New England Biolabs, Ipswich, Massachusetts, United States of America
| | - Andrei L. Osterman
- Bioinformatics and Systems Biology, Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Dmitry A. Rodionov
- Bioinformatics and Systems Biology, Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Irina A. Rodionova
- Bioinformatics and Systems Biology, Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Kenneth E. Rudd
- Department of Biochemistry and Molecular Biology, University of Miami, Miami, Florida, United States of America
| | - Dieter Söll
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - James Spain
- School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Shuang-yong Xu
- New England Biolabs, Ipswich, Massachusetts, United States of America
| | - Alex Bateman
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Robert M. Blumenthal
- Department of Medical Microbiology and Immunology, and Program in Bioinformatics, University of Toledo, Toledo, Ohio, United States of America
| | - J. Martin Bollinger
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Woo-Suk Chang
- Department of Biology, University of Texas-Arlington, Arlington, Texas, United States of America
| | - Manuel Ferrer
- Spanish National Research Council (CSIC), Institute of Catalysis, Madrid, Spain
| | - Iddo Friedberg
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
| | - Michael Y. Galperin
- National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Julien Gobeill
- Department of Library and Information Sciences, University of Applied Sciences Western Switzerland, Geneva, Switzerland
- Bibliomics and Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Daniel Haft
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - John Hunt
- Biological Sciences, Columbia University, New York, New York, United States of America
| | - Peter Karp
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, California, United States of America
| | - William Klimke
- National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Carsten Krebs
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Dana Macelis
- New England Biolabs, Ipswich, Massachusetts, United States of America
| | - Ramana Madupu
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Maria J. Martin
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Jeffrey H. Miller
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Claire O'Donovan
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Bernhard Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Patrick Ruch
- Department of Library and Information Sciences, University of Applied Sciences Western Switzerland, Geneva, Switzerland
- Bibliomics and Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Aaron Setterdahl
- Department of Chemistry, Indiana University Southeast, New Albany, Indiana, United States of America
| | - Granger Sutton
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - John Tate
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Alexander Yakunin
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada
| | - Dmitri Tchigvintsev
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada
| | - Germán Plata
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- Integrated Program in Cellular, Molecular, Structural, and Genetic Studies, Columbia University, New York, New York, United States of America
| | - Jie Hu
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Russell Greiner
- Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
| | - David Horn
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
| | - Kimmen Sjölander
- Berkeley Phylogenomics Group, University of California, Berkeley, California, United States of America
| | - Steven L. Salzberg
- Departments of Medicine and Biostatistics, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Dennis Vitkup
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Stanley Letovsky
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Daniel Segrè
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Charles DeLisi
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Richard J. Roberts
- New England Biolabs, Ipswich, Massachusetts, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Martin Steffen
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Simon Kasif
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- * E-mail: (BPA); (SK)
| |
Collapse
|
27
|
McCarthy FM, Lyons E. From data to function: functional modeling of poultry genomics data. Poult Sci 2013; 92:2519-29. [PMID: 23960137 DOI: 10.3382/ps.2012-02808] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
One of the challenges of functional genomics is to create a better understanding of the biological system being studied so that the data produced are leveraged to provide gains for agriculture, human health, and the environment. Functional modeling enables researchers to make sense of these data as it reframes a long list of genes or gene products (mRNA, ncRNA, and proteins) by grouping based upon function, be it individual molecular functions or interactions between these molecules or broader biological processes, including metabolic and signaling pathways. However, poultry researchers have been hampered by a lack of functional annotation data, tools, and training to use these data and tools. Moreover, this lack is becoming more critical as new sequencing technologies enable us to generate data not only for an increasingly diverse range of species but also individual genomes and populations of individuals. We discuss the impact of these new sequencing technologies on poultry research, with a specific focus on what functional modeling resources are available for poultry researchers. We also describe key strategies for researchers who wish to functionally model their own data, providing background information about functional modeling approaches, the data and tools to support these approaches, and the strengths and limitations of each. Specifically, we describe methods for functional analysis using Gene Ontology (GO) functional summaries, functional enrichment analysis, and pathways and network modeling. As annotation efforts begin to provide the fundamental data that underpin poultry functional modeling (such as improved gene identification, standardized gene nomenclature, temporal and spatial expression data and gene product function), tool developers are incorporating these data into new and existing tools that are used for functional modeling, and cyberinfrastructure is being developed to provide the necessary extendibility and scalability for storing and analyzing these data. This process will support the efforts of poultry researchers to make sense of their functional genomics data sets, and we provide here a starting point for researchers who wish to take advantage of these tools.
Collapse
Affiliation(s)
- F M McCarthy
- Department of Veterinary Science and Microbiology, University of Arizona, Tucson, AZ 85721, USA.
| | | |
Collapse
|
28
|
Buttigieg PL, Hankeln W, Kostadinov I, Kottmann R, Yilmaz P, Duhaime MB, Glöckner FO. Ecogenomic perspectives on domains of unknown function: correlation-based exploration of marine metagenomes. PLoS One 2013; 8:e50869. [PMID: 23516388 PMCID: PMC3597751 DOI: 10.1371/journal.pone.0050869] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2012] [Accepted: 10/24/2012] [Indexed: 11/19/2022] Open
Abstract
Background The proportion of conserved DNA sequences with no clear function is steadily growing in bioinformatics databases. Studies of sequence and structural homology have indicated that many uncharacterized protein domain sequences are variants of functionally described domains. If these variants promote an organism's ecological fitness, they are likely to be conserved in the genome of its progeny and the population at large. The genetic composition of microbial communities in their native ecosystems is accessible through metagenomics. We hypothesize the co-variation of protein domain sequences across metagenomes from similar ecosystems will provide insights into their potential roles and aid further investigation. Methodology/Principal findings We calculated the correlation of Pfam protein domain sequences across the Global Ocean Sampling metagenome collection, employing conservative detection and correlation thresholds to limit results to well-supported hits and associations. We then examined intercorrelations between domains of unknown function (DUFs) and domains involved in known metabolic pathways using network visualization and cluster-detection tools. We used a cautious “guilty-by-association” approach, referencing knowledge-level resources to identify and discuss associations that offer insight into DUF function. We observed numerous DUFs associated to photobiologically active domains and prevalent in the Cyanobacteria. Other clusters included DUFs associated with DNA maintenance and repair, inorganic nutrient metabolism, and sodium-translocating transport domains. We also observed a number of clusters reflecting known metabolic associations and cases that predicted functional reclassification of DUFs. Conclusion/Significance Critically examining domain covariation across metagenomic datasets can grant new perspectives on the roles and associations of DUFs in an ecological setting. Targeted attempts at DUF characterization in the laboratory or in silico may draw from these insights and opportunities to discover new associations and corroborate existing ones will arise as more large-scale metagenomic datasets emerge.
Collapse
Affiliation(s)
- Pier Luigi Buttigieg
- Microbial Genomics and Bioinformatics Group, Max Planck Institute for Marine Microbiology, Bremen, Germany.
| | | | | | | | | | | | | |
Collapse
|
29
|
Ornelas A, Korczynska M, Ragumani S, Kumaran D, Narindoshvili T, Shoichet BK, Swaminathan S, Raushel FM. Functional annotation and three-dimensional structure of an incorrectly annotated dihydroorotase from cog3964 in the amidohydrolase superfamily. Biochemistry 2012; 52:228-38. [PMID: 23214420 DOI: 10.1021/bi301483z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The substrate specificities of two incorrectly annotated enzymes belonging to cog3964 from the amidohydrolase superfamily were determined. This group of enzymes are currently misannotated as either dihydroorotases or adenine deaminases. Atu3266 from Agrobacterium tumefaciens C58 and Oant2987 from Ochrobactrum anthropi ATCC 49188 were found to catalyze the hydrolysis of acetyl-(R)-mandelate and similar esters with values of k(cat)/K(m) that exceed 10(5) M(-1) s(-1). These enzymes do not catalyze the deamination of adenine or the hydrolysis of dihydroorotate. Atu3266 was crystallized and the structure determined to a resolution of 2.62 Å. The protein folds as a distorted (β/α)(8) barrel and binds two zincs in the active site. The substrate profile was determined via a combination of computational docking to the three-dimensional structure of Atu3266 and screening of a highly focused library of potential substrates. The initial weak hit was the hydrolysis of N-acetyl-D-serine (k(cat)/K(m) = 4 M(-1) s(-1)). This was followed by the progressive identification of acetyl-(R)-glycerate (k(cat)/K(m) = 4 × 10(2) M(-1) s(-1)), acetyl glycolate (k(cat)/K(m) = 1.3 × 10(4) M(-1) s(-1)), and ultimately acetyl-(R)-mandelate (k(cat)/K(m) = 2.8 × 10(5) M(-1) s(-1)).
Collapse
Affiliation(s)
- Argentina Ornelas
- Department of Chemistry, P.O. Box 30012, Texas A&M University, College Station, TX 77842-3012, USA
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
The human genome has been referred to as the blueprint of human biology. In this review we consider an essential but largely ignored overlay to that blueprint, the human microbiome, which is composed of those microbes that live in and on our bodies. The human microbiome is a source of genetic diversity, a modifier of disease, an essential component of immunity, and a functional entity that influences metabolism and modulates drug interactions. Characterization and analysis of the human microbiome have been greatly catalyzed by advances in genomic technologies. We discuss how these technologies have shaped this emerging field of study and advanced our understanding of the human microbiome. We also identify future challenges, many of which are common to human genetic studies, and predict that in the future, analyzing genetic variation and risk of human disease will sometimes necessitate the integration of human and microbial genomic data sets.
Collapse
Affiliation(s)
- Elizabeth A Grice
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | |
Collapse
|
31
|
McNeil MB, Clulow JS, Wilf NM, Salmond GPC, Fineran PC. SdhE is a conserved protein required for flavinylation of succinate dehydrogenase in bacteria. J Biol Chem 2012; 287:18418-28. [PMID: 22474332 DOI: 10.1074/jbc.m111.293803] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Conserved uncharacterized genes account for ~30% of genes in both eukaryotic and bacterial genomes and are predicted to encode what are often termed "conserved hypothetical proteins." Many of these proteins have a wide phylogenetic distribution and might play important roles in conserved cellular pathways. Using the bacterium Serratia as a model system, we have investigated two conserved uncharacterized proteins, YgfY (a DUF339 protein, renamed SdhE; succinate dehydrogenase protein E) and YgfX (a DUF1434 protein). SdhE was required for growth on succinate as a sole carbon source and for the function, but not stability, of succinate dehydrogenase, an important component of the electron transport chain and the tricarboxylic acid cycle. SdhE interacted with the flavoprotein SdhA, directly bound the flavin adenine dinucleotide co-factor, and was required for the flavinylation of SdhA. This is the first demonstration of a protein required for FAD incorporation in bacteria. Furthermore, the loss of SdhE was highly pleiotropic, suggesting that SdhE might flavinylate other flavoproteins. Our findings are of wide importance to central metabolism because SdhE homologues are present in α-, β-, and γ-proteobacteria and multiple eukaryotes, including humans and yeast.
Collapse
Affiliation(s)
- Matthew B McNeil
- Department of Microbiology and Immunology, University of Otago, Dunedin 9054, New Zealand
| | | | | | | | | |
Collapse
|
32
|
Doerks T, van Noort V, Minguez P, Bork P. Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. PLoS One 2012; 7:e34302. [PMID: 22485162 PMCID: PMC3317503 DOI: 10.1371/journal.pone.0034302] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Accepted: 02/26/2012] [Indexed: 11/18/2022] Open
Abstract
The genome of Mycobacterium tuberculosis (H37Rv) contains 4,019 protein coding genes, of which more than thousand have been categorized as ‘hypothetical’ implying that for these not even weak functional associations could be identified so far. We here predict reliable functional indications for half of this large hypothetical orfeome: 497 genes can be annotated based on orthology, and another 125 can be linked to interacting proteins via integrated genomic context analysis and literature mining. The assignments include newly identified clusters of interacting proteins, hypothetical genes that are associated to well known pathways and putative disease-relevant targets. All together, we have raised the fraction of the proteome with at least some functional annotation to 88% which should considerably enhance the interpretation of large-scale experiments targeting this medically important organism.
Collapse
Affiliation(s)
- Tobias Doerks
- European Molecular Biology Laboratory, Heidelberg, Germany.
| | | | | | | |
Collapse
|
33
|
Gao B, Gupta RS. Phylogenetic framework and molecular signatures for the main clades of the phylum Actinobacteria. Microbiol Mol Biol Rev 2012; 76:66-112. [PMID: 22390973 PMCID: PMC3294427 DOI: 10.1128/mmbr.05011-11] [Citation(s) in RCA: 167] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
The phylum Actinobacteria harbors many important human pathogens and also provides one of the richest sources of natural products, including numerous antibiotics and other compounds of biotechnological interest. Thus, a reliable phylogeny of this large phylum and the means to accurately identify its different constituent groups are of much interest. Detailed phylogenetic and comparative analyses of >150 actinobacterial genomes reported here form the basis for achieving these objectives. In phylogenetic trees based upon 35 conserved proteins, most of the main groups of Actinobacteria as well as a number of their superageneric clades are resolved. We also describe large numbers of molecular markers consisting of conserved signature indels in protein sequences and whole proteins that are specific for either all Actinobacteria or their different clades (viz., orders, families, genera, and subgenera) at various taxonomic levels. These signatures independently support the existence of different phylogenetic clades, and based upon them, it is now possible to delimit the phylum Actinobacteria (excluding Coriobacteriia) and most of its major groups in clear molecular terms. The species distribution patterns of these markers also provide important information regarding the interrelationships among different main orders of Actinobacteria. The identified molecular markers, in addition to enabling the development of a stable and reliable phylogenetic framework for this phylum, also provide novel and powerful means for the identification of different groups of Actinobacteria in diverse environments. Genetic and biochemical studies on these Actinobacteria-specific markers should lead to the discovery of novel biochemical and/or other properties that are unique to different groups of Actinobacteria.
Collapse
Affiliation(s)
- Beile Gao
- Department of Biochemistry and Biomedical Science, McMaster University, Hamilton, Ontario, Canada
| | | |
Collapse
|
34
|
Sael L, Kihara D. Detecting local ligand-binding site similarity in nonhomologous proteins by surface patch comparison. Proteins 2012; 80:1177-95. [PMID: 22275074 DOI: 10.1002/prot.24018] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 11/27/2011] [Accepted: 12/13/2011] [Indexed: 11/06/2022]
Abstract
Functional elucidation of proteins is one of the essential tasks in biology. Function of a protein, specifically, small ligand molecules that bind to a protein, can be predicted by finding similar local surface regions in binding sites of known proteins. Here, we developed an alignment free local surface comparison method for predicting a ligand molecule which binds to a query protein. The algorithm, named Patch-Surfer, represents a binding pocket as a combination of segmented surface patches, each of which is characterized by its geometrical shape, the electrostatic potential, the hydrophobicity, and the concaveness. Representing a pocket by a set of patches is effective to absorb difference of global pocket shape while capturing local similarity of pockets. The shape and the physicochemical properties of surface patches are represented using the 3D Zernike descriptor, which is a series expansion of mathematical 3D function. Two pockets are compared using a modified weighted bipartite matching algorithm, which matches similar patches from the two pockets. Patch-Surfer was benchmarked on three datasets, which consist in total of 390 proteins that bind to one of 21 ligands. Patch-Surfer showed superior performance to existing methods including a global pocket comparison method, Pocket-Surfer, which we have previously introduced. Particularly, as intended, the accuracy showed large improvement for flexible ligand molecules, which bind to pockets in different conformations.
Collapse
Affiliation(s)
- Lee Sael
- Department of Computer Science, Purdue University, West Lafayette, Indiana 47907, USA
| | | |
Collapse
|
35
|
Gorbacheva MA, Yarosh AG, Dorovatovskii PV, Rakitina TV, Boiko KM, Korzhenevskii DA, Lipkin AV, Popov VO, Shumilin IA. A novel approach to studying the structural and functional properties of proteins with unknown functions. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2012; 38:99-105. [DOI: 10.1134/s1068162012010098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
36
|
Genetic diversity of the human pathogen Vibrio vulnificus: a new phylogroup. Int J Food Microbiol 2011; 153:436-43. [PMID: 22227412 DOI: 10.1016/j.ijfoodmicro.2011.12.011] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Revised: 12/01/2011] [Accepted: 12/07/2011] [Indexed: 11/21/2022]
Abstract
The biotype 3 group of the human pathogen Vibrio vulnificus emerged in Israel probably as a result of genome hybridization of two bacterial populations. We performed a genomic and phylogenetic study of V. vulnificus strains isolated from the environmental niche from which this group emerged - fish aquaculture in Israel. The genetic relationships and evolutionary aspects of 188 environmental and clinical isolates of the bacterium were studied by genomic typing. Genetic relations were determined based on variation at 12 variable number tandem repeat (VNTR, also termed SSR) loci. Analysis revealed a new cluster, in addition to the main groups of biotype 1& 2 and biotype 3. Similar grouping results were obtained with three different statistical approaches. Isolates forming this new cluster presented unclear biochemical profile nevertheless were not identified as biotype 1 or biotype 3. Further examination of representative strains by multilocus sequence typing (MLST) of 10 housekeeping genes and 5 conserved hypothetical genes supported the identification of this as yet undiscovered phylogroup (phenotypically diverse), termed clade A herein. This new clonal subgroup includes environmental as well as clinical isolates. The results highlight the fish aquaculture environment, and possibly man-made ecological niches as a whole, as a source for the emergence of new pathogenic strains.
Collapse
|
37
|
Venter E, Smith RD, Payne SH. Proteogenomic analysis of bacteria and archaea: a 46 organism case study. PLoS One 2011; 6:e27587. [PMID: 22114679 PMCID: PMC3219674 DOI: 10.1371/journal.pone.0027587] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 10/20/2011] [Indexed: 11/19/2022] Open
Abstract
Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.
Collapse
Affiliation(s)
- Eli Venter
- Department of Informatics, J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Richard D. Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Samuel H. Payne
- Department of Informatics, J. Craig Venter Institute, Rockville, Maryland, United States of America
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
- * E-mail:
| |
Collapse
|
38
|
Abstract
COMBREX (computational bridges to experimentation) is a project to engage the biological community in providing better functional annotation of genomes. In essence, the project involves the generation by computational biologists of a database of predicted functions for genes in bacterial genomes. Those genes for which no functional assignments have been proven experimentally are then open for bids by biochemists to test the predicted functions. High-priority genes are those for which no previous functional assignment has been made as well as those where uncharacterized examples are present in many genomes. A pilot project is running that focuses on bacterial and archaeal genomes.
Collapse
|
39
|
PigS and PigP regulate prodigiosin biosynthesis in Serratia via differential control of divergent operons, which include predicted transporters of sulfur-containing molecules. J Bacteriol 2010; 193:1076-85. [PMID: 21183667 DOI: 10.1128/jb.00352-10] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Serratia sp. strain ATCC 39006 produces the red-pigmented antibiotic prodigiosin. Regulation of prodigiosin biosynthesis involves a complex hierarchy, with PigP a master transcriptional regulator of multiple genes involved in prodigiosin production. The focus of this study was a member of the PigP regulon, pigS, which encodes an ArsR/SmtB family transcriptional repressor. Mutations in pigS reduced production of prodigiosin by decreasing the transcription of the biosynthetic operon. The pigS gene is the first in a four-gene operon, which also encodes three membrane proteins (pmpABC) of the COG2391 (DUF395; YedE/YeeE) and COG0730 (DUF81; TauE/SafE) families that we propose constitute transport components for sulfur-containing compounds. We provide the first experimental evidence confirming the membrane localization of a COG2391 protein, that of PmpB. Divergently transcribed from pigS-pmpABC is a bicistronic operon (blhA-orfY), which encodes a metallo-β-lactamase and a coenzyme A-disulfide reductase containing a rhodanese homology domain, both of which may participate in reactions with sulfur-containing compounds. The overproduction of the BlhA and OrfY enzymes and the PmpABC membrane proteins differentially affected pigmentation. We have dissected the contributions of these various proteins and determined their importance in the control of prodigiosin production. PigS-mediated control of prodigiosin occurred via binding directly to a short inverted repeat sequence in the intergenic region overlapping the predicted -10 regions of both pigS and blhA promoters and repressing transcription. PigP was required for the activation of these promoters, but only in the absence of PigS-mediated repression.
Collapse
|
40
|
Brylinski M, Skolnick J. FINDSITE-metal: integrating evolutionary information and machine learning for structure-based metal-binding site prediction at the proteome level. Proteins 2010; 79:735-51. [PMID: 21287609 DOI: 10.1002/prot.22913] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Revised: 09/27/2010] [Accepted: 10/07/2010] [Indexed: 12/13/2022]
Abstract
The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure-based approaches showing considerable promise. In this article, we present FINDSITE-metal, a new threading-based method designed specifically to detect metal-binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE-metal. Combining structure/evolutionary information with machine learning results in highly accurate metal-binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal-binding sites are detected with the best predicted binding site at rank 1 and within the top two ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium, and magnesium ions, the binding metal can be predicted with high, typically 70% to 90%, accuracy. FINDSITE-metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome-wide application of FINDSITE-metal that quantifies the metal-binding complement of the human proteome. FINDSITE-metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite-metal/.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
41
|
|
42
|
Roberts RJ, Chang YC, Hu Z, Rachlin JN, Anton BP, Pokrzywa RM, Choi HP, Faller LL, Guleria J, Housman G, Klitgord N, Mazumdar V, McGettrick MG, Osmani L, Swaminathan R, Tao KR, Letovsky S, Vitkup D, Segrè D, Salzberg SL, Delisi C, Steffen M, Kasif S. COMBREX: a project to accelerate the functional annotation of prokaryotic genomes. Nucleic Acids Res 2010; 39:D11-4. [PMID: 21097892 PMCID: PMC3013729 DOI: 10.1093/nar/gkq1168] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
COMBREX (http://combrex.bu.edu) is a project to increase the speed of the functional annotation of new bacterial and archaeal genomes. It consists of a database of functional predictions produced by computational biologists and a mechanism for experimental biochemists to bid for the validation of those predictions. Small grants are available to support successful bids.
Collapse
|
43
|
Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop. Viruses 2010; 2:2258-2268. [PMID: 21994619 PMCID: PMC3185566 DOI: 10.3390/v2102258] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2010] [Revised: 09/18/2010] [Accepted: 09/20/2010] [Indexed: 11/29/2022] Open
Abstract
Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world’s biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.
Collapse
|
44
|
Bateman A, Coggill P, Finn RD. DUFs: families in search of function. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010; 66:1148-52. [PMID: 20944204 PMCID: PMC2954198 DOI: 10.1107/s1744309110001685] [Citation(s) in RCA: 172] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2009] [Accepted: 01/13/2010] [Indexed: 11/30/2022]
Abstract
Domains of unknown function (DUFs) are a large set of uncharacterized protein families that are found in the Pfam database. Here, the scale and growth of functionally uncharacterized families in biological databases are surveyed and the prospects for discovering their function are examined. In particular, the important role that structural genomics can play in identifying potential function is evaluated.
Collapse
Affiliation(s)
- Alex Bateman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, England.
| | | | | |
Collapse
|
45
|
Beard BC, Trobridge GD, Ironside C, McCune JS, Adair JE, Kiem HP. Efficient and stable MGMT-mediated selection of long-term repopulating stem cells in nonhuman primates. J Clin Invest 2010; 120:2345-54. [PMID: 20551514 PMCID: PMC2898586 DOI: 10.1172/jci40767] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 04/21/2010] [Indexed: 12/23/2022] Open
Abstract
HSC transplantation using genetically modified autologous cells is a promising therapeutic strategy for various genetic diseases, cancer, and HIV. However, for many of these conditions, the current efficiency of gene transfer to HSCs is not sufficient for clinical use. The ability to increase the percentage of gene-modified cells following transplantation is critical to overcoming this obstacle. In vivo selection with mutant methylguanine methyltransferase (MGMTP140K) has been proposed to overcome low gene transfer efficiency to HSCs. Previous studies have shown efficient in vivo selection in mice and dogs but only transient selection in primates. Here, we report efficient and stable MGMTP140K-mediated multilineage selection in both macaque and baboon nonhuman primate models. Treatment consisting of both O6-benzylguanine (O6BG) and N,N'-bis(2-chloroethyl)-N-nitroso-urea (BCNU) stably increased the percentage of transgene-expressing cells from a range of initial levels of engrafted genetically modified cells, with the longest follow-up after drug treatment occurring over 2.2 years. Drug treatment was well tolerated, and selection occurred in myeloid, lymphoid, and erythroid cells as well as platelets. Retrovirus integration site analysis before and after drug treatments confirmed the presence of multiple clones. These nonhuman primate studies closely model a clinical setting and should have broad applications for HSC gene therapy targeting human diseases of malignant, genetic, and infectious nature, including HIV.
Collapse
Affiliation(s)
- Brian C. Beard
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Department of Medicine, Division of Hematology, and
Department of Pharmacy, University of Washington, Seattle Washington, USA
| | - Grant D. Trobridge
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Department of Medicine, Division of Hematology, and
Department of Pharmacy, University of Washington, Seattle Washington, USA
| | - Christina Ironside
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Department of Medicine, Division of Hematology, and
Department of Pharmacy, University of Washington, Seattle Washington, USA
| | - Jeannine S. McCune
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Department of Medicine, Division of Hematology, and
Department of Pharmacy, University of Washington, Seattle Washington, USA
| | - Jennifer E. Adair
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Department of Medicine, Division of Hematology, and
Department of Pharmacy, University of Washington, Seattle Washington, USA
| | - Hans-Peter Kiem
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Department of Medicine, Division of Hematology, and
Department of Pharmacy, University of Washington, Seattle Washington, USA
| |
Collapse
|
46
|
Warren AS, Archuleta J, Feng WC, Setubal JC. Missing genes in the annotation of prokaryotic genomes. BMC Bioinformatics 2010; 11:131. [PMID: 20230630 PMCID: PMC3098052 DOI: 10.1186/1471-2105-11-131] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 03/15/2010] [Indexed: 12/04/2022] Open
Abstract
Background Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question arises as to whether current genome annotations have systematically missing, small genes. Results We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations). The vast majority of the missing genes found are small (less than 100 aa). A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs. Conclusions Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.
Collapse
Affiliation(s)
- Andrew S Warren
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA.
| | | | | | | |
Collapse
|
47
|
Gupta RS, Mathews DW. Signature proteins for the major clades of Cyanobacteria. BMC Evol Biol 2010; 10:24. [PMID: 20100331 PMCID: PMC2823733 DOI: 10.1186/1471-2148-10-24] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2009] [Accepted: 01/25/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The phylogeny and taxonomy of cyanobacteria is currently poorly understood due to paucity of reliable markers for identification and circumscription of its major clades. RESULTS A combination of phylogenomic and protein signature based approaches was used to characterize the major clades of cyanobacteria. Phylogenetic trees were constructed for 44 cyanobacteria based on 44 conserved proteins. In parallel, Blastp searches were carried out on each ORF in the genomes of Synechococcus WH8102, Synechocystis PCC6803, Nostoc PCC7120, Synechococcus JA-3-3Ab, Prochlorococcus MIT9215 and Prochlor. marinus subsp. marinus CCMP1375 to identify proteins that are specific for various main clades of cyanobacteria. These studies have identified 39 proteins that are specific for all (or most) cyanobacteria and large numbers of proteins for other cyanobacterial clades. The identified signature proteins include: (i) 14 proteins for a deep branching clade (Clade A) of Gloebacter violaceus and two diazotrophic Synechococcus strains (JA-3-3Ab and JA2-3-B'a); (ii) 5 proteins that are present in all other cyanobacteria except those from Clade A; (iii) 60 proteins that are specific for a clade (Clade C) consisting of various marine unicellular cyanobacteria (viz. Synechococcus and Prochlorococcus); (iv) 14 and 19 signature proteins that are specific for the Clade C Synechococcus and Prochlorococcus strains, respectively; (v) 67 proteins that are specific for the Low B/A ecotype Prochlorococcus strains, containing lower ratio of chl b/a2 and adapted to growth at high light intensities; (vi) 65 and 8 proteins that are specific for the Nostocales and Chroococcales orders, respectively; and (vii) 22 and 9 proteins that are uniquely shared by various Nostocales and Oscillatoriales orders, or by these two orders and the Chroococcales, respectively. We also describe 3 conserved indels in flavoprotein, heme oxygenase and protochlorophyllide oxidoreductase proteins that are specific for either Clade C cyanobacteria or for various subclades of Prochlorococcus. Many other conserved indels for cyanobacterial clades have been described recently. CONCLUSIONS These signature proteins and indels provide novel means for circumscription of various cyanobacterial clades in clear molecular terms. Their functional studies should lead to discovery of novel properties that are unique to these groups of cyanobacteria.
Collapse
Affiliation(s)
- Radhey S Gupta
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada.
| | | |
Collapse
|
48
|
|
49
|
'Unknown' proteins and 'orphan' enzymes: the missing half of the engineering parts list--and how to find it. Biochem J 2009; 425:1-11. [PMID: 20001958 DOI: 10.1042/bj20091328] [Citation(s) in RCA: 135] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Like other forms of engineering, metabolic engineering requires knowledge of the components (the 'parts list') of the target system. Lack of such knowledge impairs both rational engineering design and diagnosis of the reasons for failures; it also poses problems for the related field of metabolic reconstruction, which uses a cell's parts list to recreate its metabolic activities in silico. Despite spectacular progress in genome sequencing, the parts lists for most organisms that we seek to manipulate remain highly incomplete, due to the dual problem of 'unknown' proteins and 'orphan' enzymes. The former are all the proteins deduced from genome sequence that have no known function, and the latter are all the enzymes described in the literature (and often catalogued in the EC database) for which no corresponding gene has been reported. Unknown proteins constitute up to about half of the proteins in prokaryotic genomes, and much more than this in higher plants and animals. Orphan enzymes make up more than a third of the EC database. Attacking the 'missing parts list' problem is accordingly one of the great challenges for post-genomic biology, and a tremendous opportunity to discover new facets of life's machinery. Success will require a co-ordinated community-wide attack, sustained over years. In this attack, comparative genomics is probably the single most effective strategy, for it can reliably predict functions for unknown proteins and genes for orphan enzymes. Furthermore, it is cost-efficient and increasingly straightforward to deploy owing to a proliferation of databases and associated tools.
Collapse
|
50
|
Louie B, Higdon R, Kolker E. A statistical model of protein sequence similarity and function similarity reveals overly-specific function predictions. PLoS One 2009; 4:e7546. [PMID: 19844580 PMCID: PMC2760442 DOI: 10.1371/journal.pone.0007546] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 09/13/2009] [Indexed: 12/02/2022] Open
Abstract
Background Predicting protein function from primary sequence is an important open problem in modern biology. Not only are there many thousands of proteins of unknown function, current approaches for predicting function must be improved upon. One problem in particular is overly-specific function predictions which we address here with a new statistical model of the relationship between protein sequence similarity and protein function similarity. Methodology Our statistical model is based on sets of proteins with experimentally validated functions and numeric measures of function specificity and function similarity derived from the Gene Ontology. The model predicts the similarity of function between two proteins given their amino acid sequence similarity measured by statistics from the BLAST sequence alignment algorithm. A novel aspect of our model is that it predicts the degree of function similarity shared between two proteins over a continuous range of sequence similarity, facilitating prediction of function with an appropriate level of specificity. Significance Our model shows nearly exact function similarity for proteins with high sequence similarity (bit score >244.7, e-value >1e−62, non-redundant NCBI protein database (NRDB)) and only small likelihood of specific function match for proteins with low sequence similarity (bit score <54.6, e-value <1e−05, NRDB). For sequence similarity ranges in between our annotation model shows an increasing relationship between function similarity and sequence similarity, but with considerable variability. We applied the model to a large set of proteins of unknown function, and predicted functions for thousands of these proteins ranging from general to very specific. We also applied the model to a data set of proteins with previously assigned, specific functions that were electronically based. We show that, on average, these prior function predictions are more specific (quite possibly overly-specific) compared to predictions from our model that is based on proteins with experimentally determined function.
Collapse
Affiliation(s)
- Brenton Louie
- Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington, United States of America
- Predictive Analytics, Seattle Children's Hospital, University of Washington School of Medicine, Seattle, Washington, United States of America
| | - Roger Higdon
- Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington, United States of America
- Predictive Analytics, Seattle Children's Hospital, University of Washington School of Medicine, Seattle, Washington, United States of America
| | - Eugene Kolker
- Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington, United States of America
- Predictive Analytics, Seattle Children's Hospital, University of Washington School of Medicine, Seattle, Washington, United States of America
- Biomedical and Health Informatics Division, Department of Medical Education and Biomedical Informatics, University of Washington School of Medicine, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|