Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Roberts RJ. Identifying protein function--a call for community action. PLoS Biol 2004;2:E42. [PMID: 15024411 PMCID: PMC368155 DOI: 10.1371/journal.pbio.0020042] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

For:	Roberts RJ. Identifying protein function--a call for community action. PLoS Biol 2004;2:E42. [PMID: 15024411 PMCID: PMC368155 DOI: 10.1371/journal.pbio.0020042] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Number

Cited by Other Article(s)

Herbst K, Wang T, Forchielli EJ, Thommes M, Paschalidis IC, Segrè D. Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations. Commun Biol 2024;7:407. [PMID: 38570615 PMCID: PMC10991586 DOI: 10.1038/s42003-024-06093-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 03/22/2024] [Indexed: 04/05/2024] Open

Cohen SE, Hashmi SM, Jones AAD, Lykourinou V, Ondrechen MJ, Sridhar S, van de Ven AL, Waters LS, Beuning PJ. Adapting Undergraduate Research to Remote Work to Increase Engagement. BIOPHYSICIST (ROCKVILLE, MD.) 2021;2:28-32. [PMID: 36909739 PMCID: PMC10003819 DOI: 10.35459/tbp.2021.000199] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Reynolds KA, Rosa-Molinar E, Ward RE, Zhang H, Urbanowicz BR, Settles AM. Accelerating biological insight for understudied genes. Integr Comp Biol 2021;61:2233-2243. [PMID: 33970251 DOI: 10.1093/icb/icab029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Frederick J, Hennessy F, Horn U, de la Torre Cortés P, van den Broek M, Strych U, Willson R, Hefer CA, Daran JMG, Sewell T, Otten LG, Brady D. The complete genome sequence of the nitrile biocatalyst Rhodocccus rhodochrous ATCC BAA-870. BMC Genomics 2020;21:3. [PMID: 31898479 PMCID: PMC6941271 DOI: 10.1186/s12864-019-6405-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Accepted: 12/16/2019] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Rhodococci are industrially important soil-dwelling Gram-positive bacteria that are well known for both nitrile hydrolysis and oxidative metabolism of aromatics. Rhodococcus rhodochrous ATCC BAA-870 is capable of metabolising a wide range of aliphatic and aromatic nitriles and amides. The genome of the organism was sequenced and analysed in order to better understand this whole cell biocatalyst.

RESULTS

The genome of R. rhodochrous ATCC BAA-870 is the first Rhodococcus genome fully sequenced using Nanopore sequencing. The circular genome contains 5.9 megabase pairs (Mbp) and includes a 0.53 Mbp linear plasmid, that together encode 7548 predicted protein sequences according to BASys annotation, and 5535 predicted protein sequences according to RAST annotation. The genome contains numerous oxidoreductases, 15 identified antibiotic and secondary metabolite gene clusters, several terpene and nonribosomal peptide synthetase clusters, as well as 6 putative clusters of unknown type. The 0.53 Mbp plasmid encodes 677 predicted genes and contains the nitrile converting gene cluster, including a nitrilase, a low molecular weight nitrile hydratase, and an enantioselective amidase. Although there are fewer biotechnologically relevant enzymes compared to those found in rhodococci with larger genomes, such as the well-known Rhodococcus jostii RHA1, the abundance of transporters in combination with the myriad of enzymes found in strain BAA-870 might make it more suitable for use in industrially relevant processes than other rhodococci.

CONCLUSIONS

The sequence and comprehensive description of the R. rhodochrous ATCC BAA-870 genome will facilitate the additional exploitation of rhodococci for biotechnological applications, as well as enable further characterisation of this model organism. The genome encodes a wide range of enzymes, many with unknown substrate specificities supporting potential applications in biotechnology, including nitrilases, nitrile hydratase, monooxygenases, cytochrome P450s, reductases, proteases, lipases, and transaminases.

Collapse

Affiliation(s)

Joni Frederick Protein Technologies, CSIR Biosciences, Meiring Naude Road, Brummeria, Pretoria, South Africa Electron Microscope Unit, University of Cape Town, Rondebosch, 7701 South Africa Present Address: LadHyx, UMR CNRS 7646, École Polytechnique, 91128 Palaiseau, France
Fritha Hennessy Protein Technologies, CSIR Biosciences, Meiring Naude Road, Brummeria, Pretoria, South Africa
Uli Horn Meraka, CSIR, Meiring Naude Road, Brummeria, 0091 South Africa
Pilar de la Torre Cortés Industrial Microbiology, Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
Marcel van den Broek Industrial Microbiology, Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
Ulrich Strych Biology and Biochemistry, University of Houston, 4800 Calhoun Road, Houston, TX 77204 USA Present Address: Department of Pediatrics, Section of Tropical Medicine, Baylor College of Medicine, 1102 Bates Avenue, Houston, TX 77030 USA
Richard Willson Biology and Biochemistry, University of Houston, 4800 Calhoun Road, Houston, TX 77204 USA Chemical and Biomolecular Engineering, University of Houston, 4800 Calhoun Road, Houston, TX 77204 USA
Charles A. Hefer Bioinformatics and Computational Biology Unit, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, 0002 South Africa Present Address: AgResearch Limited, Lincoln Research Centre, Private Bag 4749, Christchurch, 8140 New Zealand
Jean-Marc G. Daran Industrial Microbiology, Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
Trevor Sewell Electron Microscope Unit, University of Cape Town, Rondebosch, 7701 South Africa
Linda G. Otten Biocatalysis, Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
Dean Brady Protein Technologies, CSIR Biosciences, Meiring Naude Road, Brummeria, Pretoria, South Africa Molecular Sciences Institute, School of Chemistry, University of the Witwatersrand, PO, Wits, 2050 South Africa

Collapse

Romero S, Nastasa A, Chapman A, Kwong WK, Foster LJ. The honey bee gut microbiota: strategies for study and characterization. INSECT MOLECULAR BIOLOGY 2019;28:455-472. [PMID: 30652367 DOI: 10.1111/imb.12567] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Mohamed SB, Hassan MM, Munir KA, Abdalla NI, Adlan TA, Babiker AK. Re-annotation for hypothetical protein CA803_03125 of Methicillin-Resistant Staphylococcus aureus strain SO-1977 isolated from Sudan. Bioinformation 2019;15:160-164. [PMID: 31354190 PMCID: PMC6637404 DOI: 10.6026/97320630015160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 12/26/2018] [Accepted: 12/27/2018] [Indexed: 11/23/2022] Open

HashGO: hashing gene ontology for protein function prediction. Comput Biol Chem 2017;71:264-273. [DOI: 10.1016/j.compbiolchem.2017.09.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 09/25/2017] [Indexed: 10/18/2022]

Barbour AG, Adeolu M, Gupta RS. Division of the genus Borrelia into two genera (corresponding to Lyme disease and relapsing fever groups) reflects their genetic and phenotypic distinctiveness and will lead to a better understanding of these two groups of microbes (Margos et al. (2016) There is inadequate evidence to support the division of the genus Borrelia. Int. J. Syst. Evol. Microbiol. doi: 10.1099/ijsem.0.001717). Int J Syst Evol Microbiol 2017;67:2058-2067. [PMID: 28141502 DOI: 10.1099/ijsem.0.001815] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Zou X, Wang G, Yu G. Protein Function Prediction Using Deep Restricted Boltzmann Machines. BIOMED RESEARCH INTERNATIONAL 2017;2017:1729301. [PMID: 28744460 PMCID: PMC5506480 DOI: 10.1155/2017/1729301] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 05/30/2017] [Indexed: 11/17/2022]

Yu G, Luo W, Fu G, Wang J. Interspecies gene function prediction using semantic similarity. BMC SYSTEMS BIOLOGY 2016;10:121. [PMID: 28155711 PMCID: PMC5260010 DOI: 10.1186/s12918-016-0361-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Abstract

BACKGROUND

Gene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them.

RESULTS

Semantic similarity between genes is derived from GO hierarchy and annotations of genes. It is positively correlated with the similarity derived from various types of biological data and has been applied to predict gene function. In this paper, we investigate whether it is possible to replenish annotations of incompletely annotated genes by using semantic similarity between genes from two species with homology. For this investigation, we utilize three representative semantic similarity metrics to compute similarity between genes from two species. Next, we determine the k nearest neighborhood genes from the two species based on the chosen metric and then use terms annotated to k neighbors of a gene to replenish annotations of that gene. We perform experiments on archived (from Jan-2014 to Jan-2016) GO annotations of four species (Human, Mouse, Danio rerio and Arabidopsis thaliana) to assess the contribution of semantic similarity between genes from different species. The experimental results demonstrate that: (1) semantic similarity between genes from homologous species contributes much more on the improved accuracy (by 53.22%) than genes from single species alone, and genes from two species with low homology; (2) GO annotations of genes from homologous species are complementary to each other.

CONCLUSIONS

Our study shows that semantic similarity based interspecies gene function annotation from homologous species is more prominent than traditional intraspecies approaches. This work can promote more research on semantic similarity based function prediction across species.

Collapse

Gupta RS. Impact of genomics on the understanding of microbial evolution and classification: the importance of Darwin's views on classification. FEMS Microbiol Rev 2016;40:520-53. [PMID: 27279642 DOI: 10.1093/femsre/fuw011] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2016] [Indexed: 12/24/2022] Open

Neuhaus K, Landstorfer R, Fellner L, Simon S, Schafferhans A, Goldberg T, Marx H, Ozoline ON, Rost B, Kuster B, Keim DA, Scherer S. Translatomics combined with transcriptomics and proteomics reveals novel functional, recently evolved orphan genes in Escherichia coli O157:H7 (EHEC). BMC Genomics 2016;17:133. [PMID: 26911138 PMCID: PMC4765031 DOI: 10.1186/s12864-016-2456-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 02/09/2016] [Indexed: 12/30/2022] Open

Abstract

Background

Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome).

Results

Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization.

Conclusions

These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-2456-1) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Klaus Neuhaus Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
Richard Landstorfer Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
Lea Fellner Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
Svenja Simon Lehrstuhl für Datenanalyse und Visualisierung, Fachbereich Informatik und Informationswissenschaft, Universität Konstanz, Box 78, 78457, Konstanz, Germany.
Andrea Schafferhans Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
Tatyana Goldberg Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
Harald Marx Chair of Proteomics and Bioanalytics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354, Freising, Germany.
Olga N Ozoline Institute of Cell Biophysics, Russian Academy of Sciences, Moscow Region, 142290, Pushchino, Russia.
Burkhard Rost Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
Bernhard Kuster Chair of Proteomics and Bioanalytics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354, Freising, Germany. .,Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technische Universität München, Gregor-Mendel-Str. 4, 85354, Freising, Germany.
Daniel A Keim Lehrstuhl für Datenanalyse und Visualisierung, Fachbereich Informatik und Informationswissenschaft, Universität Konstanz, Box 78, 78457, Konstanz, Germany.
Siegfried Scherer Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.

Collapse

Islam MS, Shahik SM, Sohel M, Patwary NIA, Hasan MA. In Silico Structural and Functional Annotation of Hypothetical Proteins of Vibrio cholerae O139. Genomics Inform 2015;13:53-9. [PMID: 26175663 PMCID: PMC4500799 DOI: 10.5808/gi.2015.13.2.53] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Revised: 05/26/2015] [Accepted: 05/27/2015] [Indexed: 11/20/2022] Open

Oany AR, Ahmad SAI, Kibria KK, Hossain MU, Jyoti TP. A Hypothetical Protein of Alteromonas macleodii AltDE1 (amad1_06475) Predicted to be a Cold-Shock Protein with RNA Chaperone Activity. GENE REGULATION AND SYSTEMS BIOLOGY 2015;8:141-7. [PMID: 25574135 PMCID: PMC4271719 DOI: 10.4137/grsb.s20802] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 11/16/2014] [Accepted: 11/24/2014] [Indexed: 11/05/2022]

Reshma SV, Sathyanarayanan N, Nagendra HG. Characterization of hypothetical protein VNG0128C from Halobacterium NRC-1 reveals GALE like activity and its involvement in Leloir pathway of galactose metabolism. J Biomol Struct Dyn 2014;33:1743-55. [PMID: 25397923 DOI: 10.1080/07391102.2014.969313] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Martín-Galiano AJ, Yuste J, Cercenado MI, de la Campa AG. Inspecting the potential physiological and biomedical value of 44 conserved uncharacterised proteins of Streptococcus pneumoniae. BMC Genomics 2014;15:652. [PMID: 25096389 PMCID: PMC4143570 DOI: 10.1186/1471-2164-15-652] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 07/21/2014] [Indexed: 12/15/2022] Open

Sorokina M, Stam M, Médigue C, Lespinet O, Vallenet D. Profiling the orphan enzymes. Biol Direct 2014;9:10. [PMID: 24906382 PMCID: PMC4084501 DOI: 10.1186/1745-6150-9-10] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 05/29/2014] [Indexed: 11/10/2022] Open

Abstract

The emergence of Next Generation Sequencing generates an incredible amount of sequence and great potential for new enzyme discovery. Despite this huge amount of data and the profusion of bioinformatic methods for function prediction, a large part of known enzyme activities is still lacking an associated protein sequence. These particular activities are called "orphan enzymes". The present review proposes an update of previous surveys on orphan enzymes by mining the current content of public databases. While the percentage of orphan enzyme activities has decreased from 38% to 22% in ten years, there are still more than 1,000 orphans among the 5,000 entries of the Enzyme Commission (EC) classification. Taking into account all the reactions present in metabolic databases, this proportion dramatically increases to reach nearly 50% of orphans and many of them are not associated to a known pathway. We extended our survey to "local orphan enzymes" that are activities which have no representative sequence in a given clade, but have at least one in organisms belonging to other clades. We observe an important bias in Archaea and find that in general more than 30% of the EC activities have incomplete sequence information in at least one superkingdom. To estimate if candidate proteins for local orphans could be retrieved by homology search, we applied a simple strategy based on the PRIAM software and noticed that candidates may be proposed for an important fraction of local orphan enzymes. Finally, by studying relation between protein domains and catalyzed activities, it appears that newly discovered enzymes are mostly associated with already known enzyme domains. Thus, the exploration of the promiscuity and the multifunctional aspect of known enzyme families may solve part of the orphan enzyme issue. We conclude this review with a presentation of recent initiatives in finding proteins for orphan enzymes and in extending the enzyme world by the discovery of new activities.

Collapse

Tan SH, Normi YM, Leow ATC, Salleh AB, Karjiban RA, Murad AMA, Mahadi NM, Rahman MBA. A Sco protein among the hypothetical proteins of Bacillus lehensis G1: Its 3D macromolecular structure and association with Cytochrome C Oxidase. BMC STRUCTURAL BIOLOGY 2014;14:11. [PMID: 24641837 PMCID: PMC3994876 DOI: 10.1186/1472-6807-14-11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Accepted: 03/14/2014] [Indexed: 11/10/2022]

Abstract

BACKGROUND

At least a quarter of any complete genome encodes for hypothetical proteins (HPs) which are largely non-similar to other known, well-characterized proteins. Predicting and solving their structures and functions is imperative to aid understanding of any given organism as a complete biological system. The present study highlights the primary effort to classify and cluster 1202 HPs of Bacillus lehensis G1 alkaliphile to serve as a platform to mine and select specific HP(s) to be studied further in greater detail.

RESULTS

All HPs of B. lehensis G1 were grouped according to their predicted functions based on the presence of functional domains in their sequences. From the metal-binding group of HPs of the cluster, an HP termed Bleg1_2507 was discovered to contain a thioredoxin (Trx) domain and highly-conserved metal-binding ligands represented by Cys69, Cys73 and His159, similar to all prokaryotic and eukaryotic Sco proteins. The built 3D structure of Bleg1_2507 showed that it shared the βαβαββ core structure of Trx-like proteins as well as three flanking β-sheets, a 310 -helix at the N-terminus and a hairpin structure unique to Sco proteins. Docking simulations provided an interesting view of Bleg1_2507 in association with its putative cytochrome c oxidase subunit II (COXII) redox partner, Bleg1_2337, where the latter can be seen to hold its partner in an embrace, facilitated by hydrophobic and ionic interactions between the proteins. Although Bleg1_2507 shares relatively low sequence identity (47%) to BsSco, interestingly, the predicted metal-binding residues of Bleg1_2507 i.e. Cys-69, Cys-73 and His-159 were located at flexible active loops similar to other Sco proteins across biological taxa. This highlights structural conservation of Sco despite their various functions in prokaryotes and eukaryotes.

CONCLUSIONS

We propose that HP Bleg1_2507 is a Sco protein which is able to interact with COXII, its redox partner and therefore, may possess metallochaperone and redox functions similar to other documented bacterial Sco proteins. It is hoped that this scientific effort will help to spur the search for other physiologically relevant proteins among the so-called "orphan" proteins of any given organism.

Collapse

Carnitine metabolism to trimethylamine by an unusual Rieske-type oxygenase from human microbiota. Proc Natl Acad Sci U S A 2014;111:4268-73. [PMID: 24591617 DOI: 10.1073/pnas.1316569111] [Citation(s) in RCA: 230] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Hicks MA, Prather KLJ. Bioprospecting in the genomic age. ADVANCES IN APPLIED MICROBIOLOGY 2014;87:111-46. [PMID: 24581390 DOI: 10.1016/b978-0-12-800261-2.00003-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Earnshaw WC. Deducing protein function by forensic integrative cell biology. PLoS Biol 2013;11:e1001742. [PMID: 24358025 PMCID: PMC3866084 DOI: 10.1371/journal.pbio.1001742] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Hwang WC, Bakolitsa C, Punta M, Coggill PC, Bateman A, Axelrod HL, Rawlings ND, Sedova M, Peterson SN, Eberhardt RY, Aravind L, Pascual J, Godzik A. LUD, a new protein domain associated with lactate utilization. BMC Bioinformatics 2013;14:341. [PMID: 24274019 PMCID: PMC3924224 DOI: 10.1186/1471-2105-14-341] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 11/19/2013] [Indexed: 11/24/2022] Open

Revealing the hidden functional diversity of an enzyme family. Nat Chem Biol 2013;10:42-9. [DOI: 10.1038/nchembio.1387] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 10/02/2013] [Indexed: 11/08/2022]

Benso A, Di Carlo S, Ur Rehman H, Politano G, Savino A, Suravajhala P. A combined approach for genome wide protein function annotation/prediction. Proteome Sci 2013;11:S1. [PMID: 24564915 PMCID: PMC3909112 DOI: 10.1186/1477-5956-11-s1-s1] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Abstract

Background

Today large scale genome sequencing technologies are uncovering an increasing amount of new genes and proteins, which remain uncharacterized. Experimental procedures for protein function prediction are low throughput by nature and thus can't be used to keep up with the rate at which new proteins are discovered. On the other hand, proteins are the prominent stakeholders in almost all biological processes, and therefore the need to precisely know their functions for a better understanding of the underlying biological mechanism is inevitable. The challenge of annotating uncharacterized proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions.

Methods

We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein by aggregating heterogeneous information. Considered information include: protein motifs, protein sequence similarity, and protein homology data gathered from interacting proteins, combined with data from highly similar non-interacting proteins (hereinafter called Similactors). Moreover, to increase the predictive power of our model we also compute and integrate term specific relationships among functional terms based on Gene Ontology (GO).

Results

We tested our method on Saccharomyces Cerevisiae and Homo sapiens species proteins. The aggregation of different structural and functional evidence with GO relationships outperforms, in terms of precision and accuracy of prediction than the other methods reported in literature. The predicted precision and accuracy is 100% for more than half of the input set for both species; overall, we obtained 85.38% precision and 81.95% accuracy for Homo sapiens and 79.73% precision and 80.06% accuracy for Saccharomyces Cerevisiae species proteins.

Collapse

Tchigvintsev A, Tchigvintsev D, Flick R, Popovic A, Dong A, Xu X, Brown G, Lu W, Wu H, Cui H, Dombrowski L, Joo JC, Beloglazova N, Min J, Savchenko A, Caudy AA, Rabinowitz JD, Murzin AG, Yakunin AF. Biochemical and structural studies of conserved Maf proteins revealed nucleotide pyrophosphatases with a preference for modified nucleotides. ACTA ACUST UNITED AC 2013;20:1386-98. [PMID: 24210219 PMCID: PMC3899018 DOI: 10.1016/j.chembiol.2013.09.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Revised: 09/06/2013] [Accepted: 09/13/2013] [Indexed: 11/17/2022]

Anton BP, Chang YC, Brown P, Choi HP, Faller LL, Guleria J, Hu Z, Klitgord N, Levy-Moonshine A, Maksad A, Mazumdar V, McGettrick M, Osmani L, Pokrzywa R, Rachlin J, Swaminathan R, Allen B, Housman G, Monahan C, Rochussen K, Tao K, Bhagwat AS, Brenner SE, Columbus L, de Crécy-Lagard V, Ferguson D, Fomenkov A, Gadda G, Morgan RD, Osterman AL, Rodionov DA, Rodionova IA, Rudd KE, Söll D, Spain J, Xu SY, Bateman A, Blumenthal RM, Bollinger JM, Chang WS, Ferrer M, Friedberg I, Galperin MY, Gobeill J, Haft D, Hunt J, Karp P, Klimke W, Krebs C, Macelis D, Madupu R, Martin MJ, Miller JH, O'Donovan C, Palsson B, Ruch P, Setterdahl A, Sutton G, Tate J, Yakunin A, Tchigvintsev D, Plata G, Hu J, Greiner R, Horn D, Sjölander K, Salzberg SL, Vitkup D, Letovsky S, Segrè D, DeLisi C, Roberts RJ, Steffen M, Kasif S. The COMBREX project: design, methodology, and initial results. PLoS Biol 2013;11:e1001638. [PMID: 24013487 PMCID: PMC3754883 DOI: 10.1371/journal.pbio.1001638] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Affiliation(s)

Brian P. Anton New England Biolabs, Ipswich, Massachusetts, United States of America * E-mail: (BPA); (SK)
Yi-Chien Chang Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
Peter Brown Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Han-Pil Choi Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Lina L. Faller Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
Jyotsna Guleria Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Zhenjun Hu Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
Niels Klitgord Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
Ami Levy-Moonshine Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Almaz Maksad Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Varun Mazumdar Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
Mark McGettrick Diatom Software LLC, Holliston, Massachusetts, United States of America
Lais Osmani Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Revonda Pokrzywa Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
John Rachlin Diatom Software LLC, Holliston, Massachusetts, United States of America
Rajeswari Swaminathan Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Benjamin Allen Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts, United States of America Department of Mathematics, Emmanuel College, Boston, Massachusetts, United States of America
Genevieve Housman Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Caitlin Monahan Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Krista Rochussen Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Kevin Tao Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Ashok S. Bhagwat Department of Chemistry, Wayne State University, Detroit, Michigan, United States of America
Steven E. Brenner Department of Plant and Microbial Biology, University of California, Berkeley, California, United States of America
Linda Columbus Department of Chemistry, University of Virginia, Charlottesville, Virginia, United States of America
Valérie de Crécy-Lagard Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida, United States of America
Donald Ferguson Department of Microbiology, Miami University, Oxford, Ohio, United States of America
Alexey Fomenkov New England Biolabs, Ipswich, Massachusetts, United States of America
Giovanni Gadda Department of Chemistry, Georgia State University, Atlanta, Georgia, United States of America
Richard D. Morgan New England Biolabs, Ipswich, Massachusetts, United States of America
Andrei L. Osterman Bioinformatics and Systems Biology, Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
Dmitry A. Rodionov Bioinformatics and Systems Biology, Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
Irina A. Rodionova Bioinformatics and Systems Biology, Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
Kenneth E. Rudd Department of Biochemistry and Molecular Biology, University of Miami, Miami, Florida, United States of America
Dieter Söll Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
James Spain School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America
Shuang-yong Xu New England Biolabs, Ipswich, Massachusetts, United States of America
Alex Bateman European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
Robert M. Blumenthal Department of Medical Microbiology and Immunology, and Program in Bioinformatics, University of Toledo, Toledo, Ohio, United States of America
J. Martin Bollinger Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
Woo-Suk Chang Department of Biology, University of Texas-Arlington, Arlington, Texas, United States of America
Manuel Ferrer Spanish National Research Council (CSIC), Institute of Catalysis, Madrid, Spain
Iddo Friedberg Department of Microbiology, Miami University, Oxford, Ohio, United States of America
Michael Y. Galperin National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
Julien Gobeill Department of Library and Information Sciences, University of Applied Sciences Western Switzerland, Geneva, Switzerland Bibliomics and Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Daniel Haft J. Craig Venter Institute, Rockville, Maryland, United States of America
John Hunt Biological Sciences, Columbia University, New York, New York, United States of America
Peter Karp Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, California, United States of America
William Klimke National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
Carsten Krebs Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
Dana Macelis New England Biolabs, Ipswich, Massachusetts, United States of America
Ramana Madupu J. Craig Venter Institute, Rockville, Maryland, United States of America
Maria J. Martin European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
Jeffrey H. Miller Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
Claire O'Donovan European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
Bernhard Palsson Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
Patrick Ruch Department of Library and Information Sciences, University of Applied Sciences Western Switzerland, Geneva, Switzerland Bibliomics and Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Aaron Setterdahl Department of Chemistry, Indiana University Southeast, New Albany, Indiana, United States of America
Granger Sutton J. Craig Venter Institute, Rockville, Maryland, United States of America
John Tate Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
Alexander Yakunin Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada
Dmitri Tchigvintsev Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada
Germán Plata Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America Integrated Program in Cellular, Molecular, Structural, and Genetic Studies, Columbia University, New York, New York, United States of America
Jie Hu Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
Russell Greiner Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
David Horn School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
Kimmen Sjölander Berkeley Phylogenomics Group, University of California, Berkeley, California, United States of America
Steven L. Salzberg Departments of Medicine and Biostatistics, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
Dennis Vitkup Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
Stanley Letovsky Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
Daniel Segrè Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
Charles DeLisi Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
Richard J. Roberts New England Biolabs, Ipswich, Massachusetts, United States of America Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
Martin Steffen Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
Simon Kasif Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America * E-mail: (BPA); (SK)

Collapse

McCarthy FM, Lyons E. From data to function: functional modeling of poultry genomics data. Poult Sci 2013;92:2519-29. [PMID: 23960137 DOI: 10.3382/ps.2012-02808] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

One of the challenges of functional genomics is to create a better understanding of the biological system being studied so that the data produced are leveraged to provide gains for agriculture, human health, and the environment. Functional modeling enables researchers to make sense of these data as it reframes a long list of genes or gene products (mRNA, ncRNA, and proteins) by grouping based upon function, be it individual molecular functions or interactions between these molecules or broader biological processes, including metabolic and signaling pathways. However, poultry researchers have been hampered by a lack of functional annotation data, tools, and training to use these data and tools. Moreover, this lack is becoming more critical as new sequencing technologies enable us to generate data not only for an increasingly diverse range of species but also individual genomes and populations of individuals. We discuss the impact of these new sequencing technologies on poultry research, with a specific focus on what functional modeling resources are available for poultry researchers. We also describe key strategies for researchers who wish to functionally model their own data, providing background information about functional modeling approaches, the data and tools to support these approaches, and the strengths and limitations of each. Specifically, we describe methods for functional analysis using Gene Ontology (GO) functional summaries, functional enrichment analysis, and pathways and network modeling. As annotation efforts begin to provide the fundamental data that underpin poultry functional modeling (such as improved gene identification, standardized gene nomenclature, temporal and spatial expression data and gene product function), tool developers are incorporating these data into new and existing tools that are used for functional modeling, and cyberinfrastructure is being developed to provide the necessary extendibility and scalability for storing and analyzing these data. This process will support the efforts of poultry researchers to make sense of their functional genomics data sets, and we provide here a starting point for researchers who wish to take advantage of these tools.

Collapse

Buttigieg PL, Hankeln W, Kostadinov I, Kottmann R, Yilmaz P, Duhaime MB, Glöckner FO. Ecogenomic perspectives on domains of unknown function: correlation-based exploration of marine metagenomes. PLoS One 2013;8:e50869. [PMID: 23516388 PMCID: PMC3597751 DOI: 10.1371/journal.pone.0050869] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2012] [Accepted: 10/24/2012] [Indexed: 11/19/2022] Open

Abstract

Background

The proportion of conserved DNA sequences with no clear function is steadily growing in bioinformatics databases. Studies of sequence and structural homology have indicated that many uncharacterized protein domain sequences are variants of functionally described domains. If these variants promote an organism's ecological fitness, they are likely to be conserved in the genome of its progeny and the population at large. The genetic composition of microbial communities in their native ecosystems is accessible through metagenomics. We hypothesize the co-variation of protein domain sequences across metagenomes from similar ecosystems will provide insights into their potential roles and aid further investigation.

Methodology/Principal findings

We calculated the correlation of Pfam protein domain sequences across the Global Ocean Sampling metagenome collection, employing conservative detection and correlation thresholds to limit results to well-supported hits and associations. We then examined intercorrelations between domains of unknown function (DUFs) and domains involved in known metabolic pathways using network visualization and cluster-detection tools. We used a cautious “guilty-by-association” approach, referencing knowledge-level resources to identify and discuss associations that offer insight into DUF function. We observed numerous DUFs associated to photobiologically active domains and prevalent in the Cyanobacteria. Other clusters included DUFs associated with DNA maintenance and repair, inorganic nutrient metabolism, and sodium-translocating transport domains. We also observed a number of clusters reflecting known metabolic associations and cases that predicted functional reclassification of DUFs.

Conclusion/Significance

Critically examining domain covariation across metagenomic datasets can grant new perspectives on the roles and associations of DUFs in an ecological setting. Targeted attempts at DUF characterization in the laboratory or in silico may draw from these insights and opportunities to discover new associations and corroborate existing ones will arise as more large-scale metagenomic datasets emerge.

Collapse

Ornelas A, Korczynska M, Ragumani S, Kumaran D, Narindoshvili T, Shoichet BK, Swaminathan S, Raushel FM. Functional annotation and three-dimensional structure of an incorrectly annotated dihydroorotase from cog3964 in the amidohydrolase superfamily. Biochemistry 2012;52:228-38. [PMID: 23214420 DOI: 10.1021/bi301483z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Grice EA, Segre JA. The human microbiome: our second genome. Annu Rev Genomics Hum Genet 2012;13:151-70. [PMID: 22703178 DOI: 10.1146/annurev-genom-090711-163814] [Citation(s) in RCA: 379] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

McNeil MB, Clulow JS, Wilf NM, Salmond GPC, Fineran PC. SdhE is a conserved protein required for flavinylation of succinate dehydrogenase in bacteria. J Biol Chem 2012;287:18418-28. [PMID: 22474332 DOI: 10.1074/jbc.m111.293803] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

Doerks T, van Noort V, Minguez P, Bork P. Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. PLoS One 2012;7:e34302. [PMID: 22485162 PMCID: PMC3317503 DOI: 10.1371/journal.pone.0034302] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Accepted: 02/26/2012] [Indexed: 11/18/2022] Open

Gao B, Gupta RS. Phylogenetic framework and molecular signatures for the main clades of the phylum Actinobacteria. Microbiol Mol Biol Rev 2012;76:66-112. [PMID: 22390973 PMCID: PMC3294427 DOI: 10.1128/mmbr.05011-11] [Citation(s) in RCA: 167] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open

Sael L, Kihara D. Detecting local ligand-binding site similarity in nonhomologous proteins by surface patch comparison. Proteins 2012;80:1177-95. [PMID: 22275074 DOI: 10.1002/prot.24018] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 11/27/2011] [Accepted: 12/13/2011] [Indexed: 11/06/2022]

Gorbacheva MA, Yarosh AG, Dorovatovskii PV, Rakitina TV, Boiko KM, Korzhenevskii DA, Lipkin AV, Popov VO, Shumilin IA. A novel approach to studying the structural and functional properties of proteins with unknown functions. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2012;38:99-105. [DOI: 10.1134/s1068162012010098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Genetic diversity of the human pathogen Vibrio vulnificus: a new phylogroup. Int J Food Microbiol 2011;153:436-43. [PMID: 22227412 DOI: 10.1016/j.ijfoodmicro.2011.12.011] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Revised: 12/01/2011] [Accepted: 12/07/2011] [Indexed: 11/21/2022]

Venter E, Smith RD, Payne SH. Proteogenomic analysis of bacteria and archaea: a 46 organism case study. PLoS One 2011;6:e27587. [PMID: 22114679 PMCID: PMC3219674 DOI: 10.1371/journal.pone.0027587] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 10/20/2011] [Indexed: 11/19/2022] Open

COMBREX: COMputational BRidge to EXperiments. Biochem Soc Trans 2011;39:581-3. [PMID: 21428943 PMCID: PMC3064401 DOI: 10.1042/bst0390581] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

PigS and PigP regulate prodigiosin biosynthesis in Serratia via differential control of divergent operons, which include predicted transporters of sulfur-containing molecules. J Bacteriol 2010;193:1076-85. [PMID: 21183667 DOI: 10.1128/jb.00352-10] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Brylinski M, Skolnick J. FINDSITE-metal: integrating evolutionary information and machine learning for structure-based metal-binding site prediction at the proteome level. Proteins 2010;79:735-51. [PMID: 21287609 DOI: 10.1002/prot.22913] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Revised: 09/27/2010] [Accepted: 10/07/2010] [Indexed: 12/13/2022]

The future of microbial metagenomics (or is ignorance bliss?). ISME JOURNAL 2010;5:777-9. [PMID: 21107444 DOI: 10.1038/ismej.2010.178] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Roberts RJ, Chang YC, Hu Z, Rachlin JN, Anton BP, Pokrzywa RM, Choi HP, Faller LL, Guleria J, Housman G, Klitgord N, Mazumdar V, McGettrick MG, Osmani L, Swaminathan R, Tao KR, Letovsky S, Vitkup D, Segrè D, Salzberg SL, Delisi C, Steffen M, Kasif S. COMBREX: a project to accelerate the functional annotation of prokaryotic genomes. Nucleic Acids Res 2010;39:D11-4. [PMID: 21097892 PMCID: PMC3013729 DOI: 10.1093/nar/gkq1168] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop. Viruses 2010;2:2258-2268. [PMID: 21994619 PMCID: PMC3185566 DOI: 10.3390/v2102258] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2010] [Revised: 09/18/2010] [Accepted: 09/20/2010] [Indexed: 11/29/2022] Open

Bateman A, Coggill P, Finn RD. DUFs: families in search of function. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010;66:1148-52. [PMID: 20944204 PMCID: PMC2954198 DOI: 10.1107/s1744309110001685] [Citation(s) in RCA: 172] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2009] [Accepted: 01/13/2010] [Indexed: 11/30/2022]

Beard BC, Trobridge GD, Ironside C, McCune JS, Adair JE, Kiem HP. Efficient and stable MGMT-mediated selection of long-term repopulating stem cells in nonhuman primates. J Clin Invest 2010;120:2345-54. [PMID: 20551514 PMCID: PMC2898586 DOI: 10.1172/jci40767] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 04/21/2010] [Indexed: 12/23/2022] Open

Warren AS, Archuleta J, Feng WC, Setubal JC. Missing genes in the annotation of prokaryotic genomes. BMC Bioinformatics 2010;11:131. [PMID: 20230630 PMCID: PMC3098052 DOI: 10.1186/1471-2105-11-131] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 03/15/2010] [Indexed: 12/04/2022] Open

Gupta RS, Mathews DW. Signature proteins for the major clades of Cyanobacteria. BMC Evol Biol 2010;10:24. [PMID: 20100331 PMCID: PMC2823733 DOI: 10.1186/1471-2148-10-24] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2009] [Accepted: 01/25/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The phylogeny and taxonomy of cyanobacteria is currently poorly understood due to paucity of reliable markers for identification and circumscription of its major clades.

RESULTS

A combination of phylogenomic and protein signature based approaches was used to characterize the major clades of cyanobacteria. Phylogenetic trees were constructed for 44 cyanobacteria based on 44 conserved proteins. In parallel, Blastp searches were carried out on each ORF in the genomes of Synechococcus WH8102, Synechocystis PCC6803, Nostoc PCC7120, Synechococcus JA-3-3Ab, Prochlorococcus MIT9215 and Prochlor. marinus subsp. marinus CCMP1375 to identify proteins that are specific for various main clades of cyanobacteria. These studies have identified 39 proteins that are specific for all (or most) cyanobacteria and large numbers of proteins for other cyanobacterial clades. The identified signature proteins include: (i) 14 proteins for a deep branching clade (Clade A) of Gloebacter violaceus and two diazotrophic Synechococcus strains (JA-3-3Ab and JA2-3-B'a); (ii) 5 proteins that are present in all other cyanobacteria except those from Clade A; (iii) 60 proteins that are specific for a clade (Clade C) consisting of various marine unicellular cyanobacteria (viz. Synechococcus and Prochlorococcus); (iv) 14 and 19 signature proteins that are specific for the Clade C Synechococcus and Prochlorococcus strains, respectively; (v) 67 proteins that are specific for the Low B/A ecotype Prochlorococcus strains, containing lower ratio of chl b/a2 and adapted to growth at high light intensities; (vi) 65 and 8 proteins that are specific for the Nostocales and Chroococcales orders, respectively; and (vii) 22 and 9 proteins that are uniquely shared by various Nostocales and Oscillatoriales orders, or by these two orders and the Chroococcales, respectively. We also describe 3 conserved indels in flavoprotein, heme oxygenase and protochlorophyllide oxidoreductase proteins that are specific for either Clade C cyanobacteria or for various subclades of Prochlorococcus. Many other conserved indels for cyanobacterial clades have been described recently.

CONCLUSIONS

These signature proteins and indels provide novel means for circumscription of various cyanobacterial clades in clear molecular terms. Their functional studies should lead to discovery of novel properties that are unique to these groups of cyanobacteria.

Collapse

Biochemical networks: the evolution of gene annotation. Nat Chem Biol 2010;6:4-5. [PMID: 20016491 DOI: 10.1038/nchembio.288] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

'Unknown' proteins and 'orphan' enzymes: the missing half of the engineering parts list--and how to find it. Biochem J 2009;425:1-11. [PMID: 20001958 DOI: 10.1042/bj20091328] [Citation(s) in RCA: 135] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Louie B, Higdon R, Kolker E. A statistical model of protein sequence similarity and function similarity reveals overly-specific function predictions. PLoS One 2009;4:e7546. [PMID: 19844580 PMCID: PMC2760442 DOI: 10.1371/journal.pone.0007546] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 09/13/2009] [Indexed: 12/02/2022] Open

Abstract

Background

Predicting protein function from primary sequence is an important open problem in modern biology. Not only are there many thousands of proteins of unknown function, current approaches for predicting function must be improved upon. One problem in particular is overly-specific function predictions which we address here with a new statistical model of the relationship between protein sequence similarity and protein function similarity.

Methodology

Our statistical model is based on sets of proteins with experimentally validated functions and numeric measures of function specificity and function similarity derived from the Gene Ontology. The model predicts the similarity of function between two proteins given their amino acid sequence similarity measured by statistics from the BLAST sequence alignment algorithm. A novel aspect of our model is that it predicts the degree of function similarity shared between two proteins over a continuous range of sequence similarity, facilitating prediction of function with an appropriate level of specificity.

Significance

Our model shows nearly exact function similarity for proteins with high sequence similarity (bit score >244.7, e-value >1e⁻⁶², non-redundant NCBI protein database (NRDB)) and only small likelihood of specific function match for proteins with low sequence similarity (bit score <54.6, e-value <1e⁻⁰⁵, NRDB). For sequence similarity ranges in between our annotation model shows an increasing relationship between function similarity and sequence similarity, but with considerable variability. We applied the model to a large set of proteins of unknown function, and predicted functions for thousands of these proteins ranging from general to very specific. We also applied the model to a data set of proteins with previously assigned, specific functions that were electronically based. We show that, on average, these prior function predictions are more specific (quite possibly overly-specific) compared to predictions from our model that is based on proteins with experimentally determined function.

Collapse