Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Škunca N, Altenhoff A, Dessimoz C. Quality of computationally inferred gene ontology annotations. PLoS Comput Biol 2012;8:e1002533. [PMID: 22693439 PMCID: PMC3364937 DOI: 10.1371/journal.pcbi.1002533] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Accepted: 04/01/2012] [Indexed: 01/10/2023] Open

For:	Škunca N, Altenhoff A, Dessimoz C. Quality of computationally inferred gene ontology annotations. PLoS Comput Biol 2012;8:e1002533. [PMID: 22693439 PMCID: PMC3364937 DOI: 10.1371/journal.pcbi.1002533] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Accepted: 04/01/2012] [Indexed: 01/10/2023] Open

Number

Cited by Other Article(s)

Chen J, Goudey B, Geard N, Verspoor K. Integration of background knowledge for automatic detection of inconsistencies in gene ontology annotation. Bioinformatics 2024;40:i390-i400. [PMID: 38940182 PMCID: PMC11256942 DOI: 10.1093/bioinformatics/btae246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open

Abstract

MOTIVATION

Biological background knowledge plays an important role in the manual quality assurance (QA) of biological database records. One such QA task is the detection of inconsistencies in literature-based Gene Ontology Annotation (GOA). This manual verification ensures the accuracy of the GO annotations based on a comprehensive review of the literature used as evidence, Gene Ontology (GO) terms, and annotated genes in GOA records. While automatic approaches for the detection of semantic inconsistencies in GOA have been developed, they operate within predetermined contexts, lacking the ability to leverage broader evidence, especially relevant domain-specific background knowledge. This paper investigates various types of background knowledge that could improve the detection of prevalent inconsistencies in GOA. In addition, the paper proposes several approaches to integrate background knowledge into the automatic GOA inconsistency detection process.

RESULTS

We have extended a previously developed GOA inconsistency dataset with several kinds of GOA-related background knowledge, including GeneRIF statements, biological concepts mentioned within evidence texts, GO hierarchy and existing GO annotations of the specific gene. We have proposed several effective approaches to integrate background knowledge as part of the automatic GOA inconsistency detection process. The proposed approaches can improve automatic detection of self-consistency and several of the most prevalent types of inconsistencies.

This is the first study to explore the advantages of utilizing background knowledge and to propose a practical approach to incorporate knowledge in automatic GOA inconsistency detection. We establish a new benchmark for performance on this task. Our methods may be applicable to various tasks that involve incorporating biological background knowledge.

AVAILABILITY AND IMPLEMENTATION

https://github.com/jiyuc/de-inconsistency.

Collapse

Veatch OJ, Mazzotti DR, Schultz RT, Abel T, Michaelson JJ, Brodkin ES, Tunc B, Assouline SG, Nickl-Jockschat T, Malow BA, Sutcliffe JS, Pack AI. Calculating genetic risk for dysfunction in pleiotropic biological processes using whole exome sequencing data. J Neurodev Disord 2022;14:39. [PMID: 35751013 PMCID: PMC9233372 DOI: 10.1186/s11689-022-09448-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 06/08/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Numerous genes are implicated in autism spectrum disorder (ASD). ASD encompasses a wide-range and severity of symptoms and co-occurring conditions; however, the details of how genetic variation contributes to phenotypic differences are unclear. This creates a challenge for translating genetic evidence into clinically useful knowledge. Sleep disturbances are particularly prevalent co-occurring conditions in ASD, and genetics may inform treatment. Identifying convergent mechanisms with evidence for dysfunction that connect ASD and sleep biology could help identify better treatments for sleep disturbances in these individuals.

METHODS

To identify mechanisms that influence risk for ASD and co-occurring sleep disturbances, we analyzed whole exome sequence data from individuals in the Simons Simplex Collection (n = 2380). We predicted protein damaging variants (PDVs) in genes currently implicated in either ASD or sleep duration in typically developing children. We predicted a network of ASD-related proteins with direct evidence for interaction with sleep duration-related proteins encoded by genes with PDVs. Overrepresentation analyses of Gene Ontology-defined biological processes were conducted on the resulting gene set. We calculated the likelihood of dysfunction in the top overrepresented biological process. We then tested if scores reflecting genetic dysfunction in the process were associated with parent-reported sleep duration.

RESULTS

There were 29 genes with PDVs in the ASD dataset where variation was reported in the literature to be associated with both ASD and sleep duration. A network of 108 proteins encoded by ASD and sleep duration candidate genes with PDVs was identified. The mechanism overrepresented in PDV-containing genes that encode proteins in the interaction network with the most evidence for dysfunction was cerebral cortex development (GO:0,021,987). Scores reflecting dysfunction in this process were associated with sleep durations; the largest effects were observed in adolescents (p = 4.65 × 10^-3).

CONCLUSIONS

Our bioinformatic-driven approach detected a biological process enriched for genes encoding a protein-protein interaction network linking ASD gene products with sleep duration gene products where accumulation of potentially damaging variants in individuals with ASD was associated with sleep duration as reported by the parents. Specifically, genetic dysfunction impacting development of the cerebral cortex may affect sleep by disrupting sleep homeostasis which is evidenced to be regulated by this brain region. Future functional assessments and objective measurements of sleep in adolescents with ASD could provide the basis for more informed treatment of sleep problems in these individuals.

Collapse

Chen J, Goudey B, Zobel J, Geard N, Verspoor K. Exploring automatic inconsistency detection for literature-based gene ontology annotation. Bioinformatics 2022;38:i273-i281. [PMID: 35758780 PMCID: PMC9235499 DOI: 10.1093/bioinformatics/btac230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/08/2022] [Indexed: 11/12/2022] Open

Chen J, Geard N, Zobel J, Verspoor K. Automatic consistency assurance for literature-based gene ontology annotation. BMC Bioinformatics 2021;22:565. [PMID: 34823464 PMCID: PMC8620237 DOI: 10.1186/s12859-021-04479-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 11/15/2021] [Indexed: 12/21/2022] Open

Ramsey J, McIntosh B, Renfro D, Aleksander SA, LaBonte S, Ross C, Zweifel AE, Liles N, Farrar S, Gill JJ, Erill I, Ades S, Berardini TZ, Bennett JA, Brady S, Britton R, Carbon S, Caruso SM, Clements D, Dalia R, Defelice M, Doyle EL, Friedberg I, Gurney SMR, Hughes L, Johnson A, Kowalski JM, Li D, Lovering RC, Mans TL, McCarthy F, Moore SD, Murphy R, Paustian TD, Perdue S, Peterson CN, Prüß BM, Saha MS, Sheehy RR, Tansey JT, Temple L, Thorman AW, Trevino S, Vollmer AC, Walbot V, Willey J, Siegele DA, Hu JC. Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO). PLoS Comput Biol 2021;17:e1009463. [PMID: 34710081 PMCID: PMC8553046 DOI: 10.1371/journal.pcbi.1009463] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open

Affiliation(s)

Jolene Ramsey Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
Brenley McIntosh Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Daniel Renfro Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Suzanne A. Aleksander Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Sandra LaBonte Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Curtis Ross Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
Adrienne E. Zweifel Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Nathan Liles Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Shabnam Farrar Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Jason J. Gill Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America Department of Animal Science, Texas A&M University, College Station, Texas, United States of America
Ivan Erill Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States of America Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, Maryland, United States of America
Sarah Ades Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
Tanya Z. Berardini The Arabidopsis Information Resource, Phoenix Bioinformatics, Newark, California, United States of America
Jennifer A. Bennett Department of Biology and Earth Science, Otterbein University, Westerville, Ohio, United States of America
Siobhan Brady Department of Plant Biology and Genome Center, University of California Davis, Davis, California, United States of America
Robert Britton Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
Seth Carbon Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
Steven M. Caruso Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States of America
Dave Clements Department of Biology, John Hopkins University, Baltimore, Maryland, United States of America
Ritu Dalia Department of Biology, Drexel University, Philadelphia, Pennsylvania, United States of America
Meredith Defelice Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
Erin L. Doyle Biology Department, Doane University, Crete, Nebraska, United States of America
Iddo Friedberg Department of Microbiology, Miami University, Oxford, Ohio, United States of America
Susan M. R. Gurney Department of Biology, Drexel University, Philadelphia, Pennsylvania, United States of America
Lee Hughes Department of Biological Sciences, University of North Texas, Denton, Texas, United States of America
Allison Johnson Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
Jason M. Kowalski Biological Sciences Department, University of Wisconsin-Parkside, Kenosha, Wisconsin, United States of America
Donghui Li The Arabidopsis Information Resource, Phoenix Bioinformatics, Newark, California, United States of America
Ruth C. Lovering Institute of Cardiovascular Science, University College London, London, United Kingdom
Tamara L. Mans Department of Biochemistry and Biotechnology, Minnesota State University Moorhead, Brooklyn Park, Minnesota, United States of America
Fiona McCarthy Department of Basic Science, College of Veterinary Medicine, Mississippi State University, Starkville, Mississippi, United States of America
Sean D. Moore Burnett School of Biomedical Sciences, University of Central Florida, Orlando, Florida, United States of America
Rebecca Murphy Department of Biology, Centenary College of Louisiana, Shreveport, Louisiana, United States of America
Timothy D. Paustian Department of Bacteriology, University of Wisconsin, Madison, Wisconsin, United States of America
Sarah Perdue Biological Sciences Department, University of Wisconsin-Parkside, Kenosha, Wisconsin, United States of America
Celeste N. Peterson Biology Department, Suffolk University, Boston, Massachusetts, United States of America
Birgit M. Prüß Microbiological Sciences Department, North Dakota State University, Fargo, North Dakota, United States of America
Margaret S. Saha Department of Biology, College of William & Mary, Williamsburg, Virginia, United States of America
Robert R. Sheehy Biology Department, Radford University, Radford, Virginia, United States of America
John T. Tansey Department of Biochemistry and Molecular Biology, Otterbein University, Westerville, Ohio, United States of America
Louise Temple School of Integrated Sciences, James Madison University, Harrisonburg, Virginia, United States of America
Alexander William Thorman Department of Environmental and Public Health Sciences, University of Cincinnati, Cincinnati, Ohio, United States of America
Saul Trevino Department of Chemistry, Math, and Physics, Houston Baptist University, Houston, Texas, United States of America
Amy Cheng Vollmer Department of Biology, Swarthmore College, Swarthmore, Pennsylvania, United States of America
Virginia Walbot Department of Biology, Stanford University, Stanford, California, United States of America
Joanne Willey Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
Deborah A. Siegele Department of Biology, Texas A&M University, College Station, Texas, United States of America
James C. Hu Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America

Collapse

Harris BD, Crow M, Fischer S, Gillis J. Single-cell co-expression analysis reveals that transcriptional modules are shared across cell types in the brain. Cell Syst 2021;12:748-756.e3. [PMID: 34015329 PMCID: PMC8298279 DOI: 10.1016/j.cels.2021.04.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 02/11/2021] [Accepted: 04/23/2021] [Indexed: 12/27/2022]

PhotoModPlus: A web server for photosynthetic protein prediction from genome neighborhood features. PLoS One 2021;16:e0248682. [PMID: 33730083 PMCID: PMC7968678 DOI: 10.1371/journal.pone.0248682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 03/03/2021] [Indexed: 11/20/2022] Open

Wei X, Zhang C, Freddolino PL, Zhang Y. Detecting Gene Ontology misannotations using taxon-specific rate ratio comparisons. Bioinformatics 2021;36:4383-4388. [PMID: 32470107 DOI: 10.1093/bioinformatics/btaa548] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 03/24/2020] [Accepted: 05/26/2020] [Indexed: 02/05/2023] Open

Makrodimitris S, van Ham RCHJ, Reinders MJT. Automatic Gene Function Prediction in the 2020's. Genes (Basel) 2020;11:E1264. [PMID: 33120976 PMCID: PMC7692357 DOI: 10.3390/genes11111264] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 02/06/2023] Open

Wood V, Carbon S, Harris MA, Lock A, Engel SR, Hill DP, Van Auken K, Attrill H, Feuermann M, Gaudet P, Lovering RC, Poux S, Rutherford KM, Mungall CJ. Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns. Open Biol 2020;10:200149. [PMID: 32875947 PMCID: PMC7536087 DOI: 10.1098/rsob.200149] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 08/06/2020] [Indexed: 12/11/2022] Open

Affiliation(s)

Valerie Wood Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
Seth Carbon Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Midori A. Harris Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
Antonia Lock Department of Genetics, Evolution and Environment, University College London, London WC1E 6B, UK
Stacia R. Engel Department of Genetics, Stanford University, Palo Alto, CA 94304-5477, USA
David P. Hill Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
Kimberly Van Auken Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, USA
Helen Attrill Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
Marc Feuermann Swiss Institute of Bioinformatics, 1 Michel-Servet, 1204 Geneva, Switzerland
Pascale Gaudet Swiss Institute of Bioinformatics, 1 Michel-Servet, 1204 Geneva, Switzerland
Ruth C. Lovering Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London, London WC1E 6JF, UK
Sylvain Poux Swiss Institute of Bioinformatics, 1 Michel-Servet, 1204 Geneva, Switzerland
Kim M. Rutherford Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
Christopher J. Mungall Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

Collapse

Moi D, Kilchoer L, Aguilar PS, Dessimoz C. Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes. PLoS Comput Biol 2020;16:e1007553. [PMID: 32697802 PMCID: PMC7423146 DOI: 10.1371/journal.pcbi.1007553] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 08/12/2020] [Accepted: 05/18/2020] [Indexed: 01/09/2023] Open

Abstract

Phylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require at least quadratic time as a function of the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available at https://github.com/DessimozLab/HogProf.

Genes that are involved in the same biological process tend to co-evolve. This property is exploited by the technique of phylogenetic profiling, which identifies co-evolving (and therefore likely functionally related) genes through patterns of correlated gene retention and loss in evolution and across species. However, conventional methods to computing and clustering these correlated genes do not scale with increasing numbers of genomes. HogProf is a novel phylogenetic profiling tool built on probabilistic data structures. It allows the user to construct searchable databases containing the evolutionary history of hundreds of thousands of protein families. Such fast detection of coevolution takes advantage of the rapidly increasing amount of genomic data publicly available, and can uncover unknown biological networks and guide in-vivo research and experimentation. We have applied our tool to describe the biological networks underpinning sexual reproduction in eukaryotes.

Collapse

Warwick Vesztrocy A, Dessimoz C. Benchmarking gene ontology function predictions using negative annotations. Bioinformatics 2020;36:i210-i218. [PMID: 32657372 PMCID: PMC7355306 DOI: 10.1093/bioinformatics/btaa466] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Stamboulian M, Guerrero RF, Hahn MW, Radivojac P. The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. Bioinformatics 2020;36:i219-i226. [PMID: 32657391 PMCID: PMC7355290 DOI: 10.1093/bioinformatics/btaa468] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Sangphukieo A, Laomettachit T, Ruengjitchatchawalya M. Photosynthetic protein classification using genome neighborhood-based machine learning feature. Sci Rep 2020;10:7108. [PMID: 32346070 PMCID: PMC7189237 DOI: 10.1038/s41598-020-64053-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 04/07/2020] [Indexed: 11/08/2022] Open

Functionally Enigmatic Genes in Cancer: Using TCGA Data to Map the Limitations of Annotations. Sci Rep 2020;10:4106. [PMID: 32139709 PMCID: PMC7057977 DOI: 10.1038/s41598-020-60456-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 02/10/2020] [Indexed: 12/14/2022] Open

Wilson J, Staley JM, Wyckoff GJ. Extinction of chromosomes due to specialization is a universal occurrence. Sci Rep 2020;10:2170. [PMID: 32034231 PMCID: PMC7005762 DOI: 10.1038/s41598-020-58997-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Accepted: 01/20/2020] [Indexed: 11/09/2022] Open

Noh S, Christopher L, Strassmann JE, Queller DC. Wild Dictyostelium discoideum social amoebae show plastic responses to the presence of nonrelatives during multicellular development. Ecol Evol 2020;10:1119-1134. [PMID: 32076502 PMCID: PMC7029077 DOI: 10.1002/ece3.5924] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 10/30/2019] [Accepted: 11/18/2019] [Indexed: 11/11/2022] Open

Handling Noise in Protein Interaction Networks. BIOMED RESEARCH INTERNATIONAL 2019;2019:8984248. [PMID: 31828144 PMCID: PMC6885184 DOI: 10.1155/2019/8984248] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 09/23/2019] [Indexed: 12/22/2022]

Abstract

Protein-protein interactions (PPIs) can be conveniently represented as networks, allowing the use of graph theory for their study. Network topology studies may reveal patterns associated with specific organisms. Here, we propose a new methodology to denoise PPI networks and predict missing links solely based on the network topology, the organization measurement (OM) method. The OM methodology was applied in the denoising of the PPI networks of two Saccharomyces cerevisiae datasets (Yeast and CS2007) and one Homo sapiens dataset (Human). To evaluate the denoising capabilities of the OM methodology, two strategies were applied. The first strategy compared its application in random networks and in the reference set networks, while the second strategy perturbed the networks with the gradual random addition and removal of edges. The application of the OM methodology to the Yeast and Human reference sets achieved an AUC of 0.95 and 0.87, in Yeast and Human networks, respectively. The random removal of 80% of the Yeast and Human reference set interactions resulted in an AUC of 0.71 and 0.62, whereas the random addition of 80% interactions resulted in an AUC of 0.75 and 0.72, respectively. Applying the OM methodology to the CS2007 dataset yields an AUC of 0.99. We also perturbed the network of the CS2007 dataset by randomly inserting and removing edges in the same proportions previously described. The false positives identified and removed from the network varied from 97%, when inserting 20% more edges, to 89%, when 80% more edges were inserted. The true positives identified and inserted in the network varied from 95%, when removing 20% of the edges, to 40%, after the random deletion of 80% edges. The OM methodology is sensitive to the topological structure of the biological networks. The obtained results suggest that the present approach can efficiently be used to denoise PPI networks.

Collapse

Sedlar K, Kolek J, Gruber M, Jureckova K, Branska B, Csaba G, Vasylkivska M, Zimmer R, Patakova P, Provaznik I. A transcriptional response of Clostridium beijerinckii NRRL B-598 to a butanol shock. BIOTECHNOLOGY FOR BIOFUELS 2019;12:243. [PMID: 31636702 PMCID: PMC6790243 DOI: 10.1186/s13068-019-1584-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Accepted: 10/04/2019] [Indexed: 06/10/2023]

Abstract

BACKGROUND

One of the main obstacles preventing solventogenic clostridia from achieving higher yields in biofuel production is the toxicity of produced solvents. Unfortunately, regulatory mechanisms responsible for the shock response are poorly described on the transcriptomic level. Although the strain Clostridium beijerinckii NRRL B-598, a promising butanol producer, has been studied under different conditions in the past, its transcriptional response to a shock caused by butanol in the cultivation medium remains unknown.

RESULTS

In this paper, we present a transcriptional response of the strain during a butanol challenge, caused by the addition of butanol to the cultivation medium at the very end of the acidogenic phase, using RNA-Seq. We resequenced and reassembled the genome sequence of the strain and prepared novel genome and gene ontology annotation to provide the most accurate results. When compared to samples under standard cultivation conditions, samples gathered during butanol shock represented a well-distinguished group. Using reference samples gathered directly before the addition of butanol, we identified genes that were differentially expressed in butanol challenge samples. We determined clusters of 293 down-regulated and 301 up-regulated genes whose expression was affected by the cultivation conditions. Enriched term "RNA binding" among down-regulated genes corresponded to the downturn of translation and the cluster contained a group of small acid-soluble spore proteins. This explained phenotype of the culture that had not sporulated. On the other hand, up-regulated genes were characterized by the term "protein binding" which corresponded to activation of heat-shock proteins that were identified within this cluster.

CONCLUSIONS

We provided an overall transcriptional response of the strain C. beijerinckii NRRL B-598 to butanol shock, supplemented by auxiliary technologies, including high-pressure liquid chromatography and flow cytometry, to capture the corresponding phenotypic response. We identified genes whose regulation was affected by the addition of butanol to the cultivation medium and inferred related molecular functions that were significantly influenced. Additionally, using high-quality genome assembly and custom-made gene ontology annotation, we demonstrated that this settled terminology, widely used for the analysis of model organisms, could also be applied to non-model organisms and for research in the field of biofuels.

Collapse

Cruz F, Lagoa D, Mendes J, Rocha I, Ferreira EC, Rocha M, Dias O. SamPler - a novel method for selecting parameters for gene functional annotation routines. BMC Bioinformatics 2019;20:454. [PMID: 31488049 PMCID: PMC6727554 DOI: 10.1186/s12859-019-3038-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Accepted: 08/21/2019] [Indexed: 11/17/2022] Open

Abstract

BACKGROUND

As genome sequencing projects grow rapidly, the diversity of organisms with recently assembled genome sequences peaks at an unprecedented scale, thereby highlighting the need to make gene functional annotations fast and efficient. However, the (high) quality of such annotations must be guaranteed, as this is the first indicator of the genomic potential of every organism. Automatic procedures help accelerating the annotation process, though decreasing the confidence and reliability of the outcomes. Manually curating a genome-wide annotation of genes, enzymes and transporter proteins function is a highly time-consuming, tedious and impractical task, even for the most proficient curator. Hence, a semi-automated procedure, which balances the two approaches, will increase the reliability of the annotation, while speeding up the process. In fact, a prior analysis of the annotation algorithm may leverage its performance, by manipulating its parameters, hastening the downstream processing and the manual curation of assigning functions to genes encoding proteins.

RESULTS

Here SamPler, a novel strategy to select parameters for gene functional annotation routines is presented. This semi-automated method is based on the manual curation of a randomly selected set of genes/proteins. Then, in a multi-dimensional array, this sample is used to assess the automatic annotations for all possible combinations of the algorithm's parameters. These assessments allow creating an array of confusion matrices, for which several metrics are calculated (accuracy, precision and negative predictive value) and used to reach optimal values for the parameters.

CONCLUSIONS

The potential of this methodology is demonstrated with four genome functional annotations performed in merlin, an in-house user-friendly computational framework for genome-scale metabolic annotation and model reconstruction. For that, SamPler was implemented as a new plugin for the merlin tool.

Collapse

Armean IM, Lilley KS, Trotter MWB, Pilkington NCV, Holden SB. Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation. Bioinformatics 2019;34:1884-1892. [PMID: 29390084 PMCID: PMC5972588 DOI: 10.1093/bioinformatics/btx803] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 01/29/2018] [Indexed: 12/11/2022] Open

Rajakovich LJ, Pandelia ME, Mitchell AJ, Chang WC, Zhang B, Boal AK, Krebs C, Bollinger JM. A New Microbial Pathway for Organophosphonate Degradation Catalyzed by Two Previously Misannotated Non-Heme-Iron Oxygenases. Biochemistry 2019;58:1627-1647. [PMID: 30789718 PMCID: PMC6503667 DOI: 10.1021/acs.biochem.9b00044] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Abstract

The assignment of biochemical functions to hypothetical proteins is challenged by functional diversification within many protein structural superfamilies. This diversification, which is particularly common for metalloenzymes, renders functional annotations that are founded solely on sequence and domain similarities unreliable and often erroneous. Definitive biochemical characterization to delineate functional subgroups within these superfamilies will aid in improving bioinformatic approaches for functional annotation. We describe here the structural and functional characterization of two non-heme-iron oxygenases, TmpA and TmpB, which are encoded by a genomically clustered pair of genes found in more than 350 species of bacteria. TmpA and TmpB are functional homologues of a pair of enzymes (PhnY and PhnZ) that degrade 2-aminoethylphosphonate but instead act on its naturally occurring, quaternary ammonium analogue, 2-(trimethylammonio)ethylphosphonate (TMAEP). TmpA, an iron(II)- and 2-(oxo)glutarate-dependent oxygenase misannotated as a γ-butyrobetaine (γbb) hydroxylase, shows no activity toward γbb but efficiently hydroxylates TMAEP. The product, ( R)-1-hydroxy-2-(trimethylammonio)ethylphosphonate [( R)-OH-TMAEP], then serves as the substrate for the second enzyme, TmpB. By contrast to its purported phosphohydrolytic activity, TmpB is an HD-domain oxygenase that uses a mixed-valent diiron cofactor to enact oxidative cleavage of the C-P bond of its substrate, yielding glycine betaine and phosphate. The high specificities of TmpA and TmpB for their N-trimethylated substrates suggest that they have evolved specifically to degrade TMAEP, which was not previously known to be subject to microbial catabolism. This study thus adds to the growing list of known pathways through which microbes break down organophosphonates to harvest phosphorus, carbon, and nitrogen in nutrient-limited niches.

Collapse

Affiliation(s)

Lauren J. Rajakovich Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Present address: Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States
Maria-Eirini Pandelia Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Present address: Department of Biochemistry, Brandeis University, Waltham, Massachusetts 02453, United States
Andrew J. Mitchell Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Present address: Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142
Wei-chen Chang Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Present address: Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27695, United States
Bo Zhang Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Present address: REG Life Sciences, LLC, South San Francisco, California 94080
Amie K. Boal Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States
Carsten Krebs Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States
J. Martin Bollinger Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States

Collapse

Reimand J, Isserlin R, Voisin V, Kucera M, Tannus-Lopes C, Rostamianfar A, Wadi L, Meyer M, Wong J, Xu C, Merico D, Bader GD. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nat Protoc 2019;14:482-517. [PMID: 30664679 PMCID: PMC6607905 DOI: 10.1038/s41596-018-0103-9] [Citation(s) in RCA: 964] [Impact Index Per Article: 192.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Hadarovich A, Anishchenko I, Tuzikov AV, Kundrotas PJ, Vakser IA. Gene ontology improves template selection in comparative protein docking. Proteins 2018;87:245-253. [PMID: 30520123 DOI: 10.1002/prot.25645] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2018] [Revised: 10/21/2018] [Accepted: 11/29/2018] [Indexed: 02/06/2023]

Warwick Vesztrocy A, Dessimoz C, Redestig H. Prioritising candidate genes causing QTL using hierarchical orthologous groups. Bioinformatics 2018;34:i612-i619. [PMID: 30423067 PMCID: PMC6129274 DOI: 10.1093/bioinformatics/bty615] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Vidulin V, Šmuc T, Džeroski S, Supek F. The evolutionary signal in metagenome phyletic profiles predicts many gene functions. MICROBIOME 2018;6:129. [PMID: 29991352 PMCID: PMC6040064 DOI: 10.1186/s40168-018-0506-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 06/19/2018] [Indexed: 06/08/2023]

Abstract

BACKGROUND

The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner.

RESULTS

We evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of gene functions. The MPPs are an encoding of environmental DNA sequencing data that consists of relative abundances of gene families across metagenomes. We find that such MPPs can accurately predict 826 Gene Ontology functional categories, while drawing on human gut microbiomes, ocean metagenomes, and DNA sequences from various other engineered and natural environments. Overall, in this task, the MPPs are highly accurate, and moreover they provide coverage for a set of Gene Ontology terms largely complementary to standard phylogenetic profiles, derived from fully sequenced genomes. We also find that metagenomes approximated from taxon relative abundance obtained via 16S rRNA gene sequencing may provide surprisingly useful predictive models. Crucially, the MPPs derived from different types of environments can infer distinct, non-overlapping sets of gene functions and therefore complement each other. Consistently, simulations on > 5000 metagenomes indicate that the amount of data is not in itself critical for maximizing predictive accuracy, while the diversity of sampled environments appears to be the critical factor for obtaining robust models.

CONCLUSIONS

In past work, metagenomics has provided invaluable insight into ecology of various habitats, into diversity of microbial life and also into human health and disease mechanisms. We propose that environmental DNA sequencing additionally constitutes a useful tool to predict biological roles of genes, yielding inferences out of reach for existing comparative genomics approaches.

Collapse

Hörtenhuber M, Toledo EM, Smedler E, Arenas E, Malmersjö S, Louhivuori L, Uhlén P. Mapping genes for calcium signaling and their associated human genetic disorders. Bioinformatics 2018;33:2547-2554. [PMID: 28430858 PMCID: PMC5870714 DOI: 10.1093/bioinformatics/btx225] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Accepted: 04/18/2017] [Indexed: 01/21/2023] Open

Peng J, Li Q, Shang X. Investigations on factors influencing HPO-based semantic similarity calculation. J Biomed Semantics 2017;8:34. [PMID: 29297376 PMCID: PMC5763495 DOI: 10.1186/s13326-017-0144-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Yu G, Lu C, Wang J. NoGOA: predicting noisy GO annotations using evidences and sparse representation. BMC Bioinformatics 2017;18:350. [PMID: 28732468 PMCID: PMC5521088 DOI: 10.1186/s12859-017-1764-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 07/14/2017] [Indexed: 01/11/2023] Open

Abstract

BACKGROUND

Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem.

RESULTS

We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction.

CONCLUSIONS

The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .

Collapse

Jun SR, Nookaew I, Hauser L, Gorin A. Assessment of genome annotation using gene function similarity within the gene neighborhood. BMC Bioinformatics 2017;18:345. [PMID: 28724412 PMCID: PMC5517811 DOI: 10.1186/s12859-017-1761-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Accepted: 07/13/2017] [Indexed: 11/29/2022] Open

Zhang T, Coates BS, Wang Y, Wang Y, Bai S, Wang Z, He K. Down-regulation of aminopeptidase N and ABC transporter subfamily G transcripts in Cry1Ab and Cry1Ac resistant Asian corn borer, Ostrinia furnacalis (Lepidoptera: Crambidae). Int J Biol Sci 2017;13:835-851. [PMID: 28808417 PMCID: PMC5555102 DOI: 10.7150/ijbs.18868] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 03/16/2017] [Indexed: 12/20/2022] Open

Abstract

The Asian corn borer (ACB), Ostrinia furnacalis (Lepidoptera: Crambidae), is a highly destructive pest of cultivated maize throughout East Asia. Bacillus thuringiensis (Bt) crystalline protein (Cry) toxins cause mortality by a mechanism involving pore formation or signal transduction following toxin binding to receptors along the midgut lumen of susceptible insects, but this mechanism and mutations therein that lead to resistance are not fully understood. In the current study, quantitative comparisons were made among midgut expressed transcripts from O. furnacalis susceptible (ACB-BtS) and laboratory selected strains resistant to Cry1Ab (ACB-AbR) and Cry1Ac toxins (ACB-AcR) when feeding on non-Bt diet. From a combined de novo transcriptome assembly of 83,370 transcripts, ORFs of ≥ 100 amino acids were predicted and annotated for 28,940 unique isoforms derived from 12,288 transcripts. Transcriptome-wide expression estimated from RNA-seq read depths predicted significant down-regulation of transcripts for previously known Bt resistance genes, aminopeptidase N1 (apn1) and apn3, as well as a putative ATP binding cassette transporter group G (abcg) gene in both ACB-AbR and -AcR (log2[fold-change] ≥ 1.36; P < 0.0001). The transcripts that were most highly differentially regulated in both ACB-AbR and -AcR compared to ACB-BtS (log2[fold-change] ≥ 2.0; P < 0.0001) included up- and down-regulation of serine proteases, storage proteins and cytochrome P450 monooxygenases, as well as up-regulation of genes with predicted transport function. This study predicted the significant down-regulation of transcripts for previously known Bt resistance genes, aminopeptidase N1 (apn1) and apn3, as well as abccg gene in both ACB-AbR and -AcR. These data are important for the understanding of systemic differences between Bt resistant and susceptible genotypes.

Collapse

Koç I, Caetano-Anollés G. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data. PLoS One 2017;12:e0176129. [PMID: 28467492 PMCID: PMC5414959 DOI: 10.1371/journal.pone.0176129] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 04/05/2017] [Indexed: 11/18/2022] Open

Abstract

The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.

Collapse

Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework. Sci Rep 2017;7:381. [PMID: 28336965 PMCID: PMC5428484 DOI: 10.1038/s41598-017-00465-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 02/28/2017] [Indexed: 11/21/2022] Open

Škunca N, Roberts RJ, Steffen M. Evaluating Computational Gene Ontology Annotations. Methods Mol Biol 2017;1446:97-109. [PMID: 27812938 DOI: 10.1007/978-1-4939-3743-1_8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Garibay-Hernández A, Barkla BJ, Vera-Estrella R, Martinez A, Pantoja O. Membrane Proteomic Insights into the Physiology and Taxonomy of an Oleaginous Green Microalga. PLANT PHYSIOLOGY 2017;173:390-416. [PMID: 27837088 PMCID: PMC5210721 DOI: 10.1104/pp.16.01240] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Accepted: 11/03/2016] [Indexed: 05/22/2023]

Gaudet P, Škunca N, Hu JC, Dessimoz C. Primer on the Gene Ontology. Methods Mol Biol 2017;1446:25-37. [PMID: 27812933 DOI: 10.1007/978-1-4939-3743-1_3] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Holliday GL, Davidson R, Akiva E, Babbitt PC. Evaluating Functional Annotations of Enzymes Using the Gene Ontology. Methods Mol Biol 2017;1446:111-132. [PMID: 27812939 PMCID: PMC5837055 DOI: 10.1007/978-1-4939-3743-1_9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]

Gaudet P, Dessimoz C. Gene Ontology: Pitfalls, Biases, and Remedies. Methods Mol Biol 2017;1446:189-205. [PMID: 27812944 DOI: 10.1007/978-1-4939-3743-1_14] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Pesquita C. Semantic Similarity in the Gene Ontology. Methods Mol Biol 2017;1446:161-173. [PMID: 27812942 DOI: 10.1007/978-1-4939-3743-1_12] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Beissinger TM, Morota G. Medical Subject Heading (MeSH) annotations illuminate maize genetics and evolution. PLANT METHODS 2017;13:8. [PMID: 28250803 PMCID: PMC5324291 DOI: 10.1186/s13007-017-0159-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Accepted: 02/18/2017] [Indexed: 05/22/2023]

Schwarz E, Izmailov R, Liò P, Meyer-Lindenberg A. Protein Interaction Networks Link Schizophrenia Risk Loci to Synaptic Function. Schizophr Bull 2016;42:1334-1342. [PMID: 27056717 PMCID: PMC5049524 DOI: 10.1093/schbul/sbw035] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Lu C, Wang J, Zhang Z, Yang P, Yu G. NoisyGOA: Noisy GO annotations prediction using taxonomic and semantic similarity. Comput Biol Chem 2016;65:203-211. [PMID: 27670689 DOI: 10.1016/j.compbiolchem.2016.09.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 09/07/2016] [Indexed: 10/21/2022]

Falda M, Lavezzo E, Fontana P, Bianco L, Berselli M, Formentin E, Toppo S. Eliciting the Functional Taxonomy from protein annotations and taxa. Sci Rep 2016;6:31971. [PMID: 27534507 PMCID: PMC4989186 DOI: 10.1038/srep31971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 08/01/2016] [Indexed: 11/30/2022] Open

Vidulin V, Šmuc T, Supek F. Extensive complementarity between gene function prediction methods. Bioinformatics 2016;32:3645-3653. [PMID: 27522084 DOI: 10.1093/bioinformatics/btw532] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 07/11/2016] [Accepted: 08/09/2016] [Indexed: 12/22/2022] Open

Abstract

MOTIVATION

The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions.

RESULTS

Our pipeline amalgamates 5 133 543 genes from 2071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1227 Gene Ontology (GO) terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse genomic AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most-confident prediction per gene/function, rather than enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits/gene of known Escherichia coli gene functions. This can be increased by up to 5.5 bits/gene using individual AFP methods or by 11 additional bits/gene upon integration, thereby providing a highly-ranking predictor on the Critical Assessment of Function Annotation 2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them.

AVAILABILITY AND IMPLEMENTATION

The individual and integrated GO predictions for the complete set of genes are available from http://gorbi.irb.hr/ CONTACT: fran.supek@irb.hrSupplementary information: Supplementary materials are available at Bioinformatics online.

Collapse

Fu G, Wang J, Yang B, Yu G. NegGOA: negative GO annotations selection using ontology structure. Bioinformatics 2016;32:2996-3004. [PMID: 27318205 DOI: 10.1093/bioinformatics/btw366] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Accepted: 06/01/2016] [Indexed: 11/14/2022] Open

Abstract

MOTIVATION

Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space.

RESULTS

In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods.

AVAILABILITY AND IMPLEMENTATION

The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa

CONTACT

gxyu@swu.edu.cn

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Culminskaya I, Kulminski AM, Yashin AI. Coordinated Action of Biological Processes during Embryogenesis Can Cause Genome-Wide Linkage Disequilibrium in the Human Genome and Influence Age-Related Phenotypes. ANNALS OF GERONTOLOGY AND GERIATRIC RESEARCH 2016;3:1035. [PMID: 28357417 PMCID: PMC5367637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Sangrador-Vegas A, Mitchell AL, Chang HY, Yong SY, Finn RD. GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016;2016:baw027. [PMID: 26994912 PMCID: PMC4799721 DOI: 10.1093/database/baw027] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 02/19/2016] [Indexed: 11/17/2022]

Peng J, Wang T, Wang J, Wang Y, Chen J. Extending gene ontology with gene association networks. Bioinformatics 2015;32:1185-94. [PMID: 26644414 DOI: 10.1093/bioinformatics/btv712] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 11/26/2015] [Indexed: 01/01/2023] Open

Luo X, Ming Z, You Z, Li S, Xia Y, Leung H. Improving network topology-based protein interactome mapping via collaborative filtering. Knowl Based Syst 2015. [DOI: 10.1016/j.knosys.2015.10.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Di Lena P, Domeniconi G, Margara L, Moro G. GOTA: GO term annotation of biomedical literature. BMC Bioinformatics 2015;16:346. [PMID: 26511083 PMCID: PMC4625458 DOI: 10.1186/s12859-015-0777-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 10/13/2015] [Indexed: 12/12/2022] Open