Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Massjouni N, Rivera CG, Murali TM. VIRGO: computational prediction of gene functions. Nucleic Acids Res 2006;34:W340-4. [PMID: 16845022 PMCID: PMC1538839 DOI: 10.1093/nar/gkl225] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

For:	Massjouni N, Rivera CG, Murali TM. VIRGO: computational prediction of gene functions. Nucleic Acids Res 2006;34:W340-4. [PMID: 16845022 PMCID: PMC1538839 DOI: 10.1093/nar/gkl225] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Number

Cited by Other Article(s)

Teso S, Masera L, Diligenti M, Passerini A. Combining learning and constraints for genome-wide protein annotation. BMC Bioinformatics 2019;20:338. [PMID: 31208327 PMCID: PMC6580517 DOI: 10.1186/s12859-019-2875-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 05/03/2019] [Indexed: 11/28/2022] Open

Ashu EE, Xu J, Yuan ZC. Bacteria in Cancer Therapeutics: A Framework for Effective Therapeutic Bacterial Screening and Identification. J Cancer 2019;10:1781-1793. [PMID: 31205534 PMCID: PMC6547982 DOI: 10.7150/jca.31699] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Accepted: 02/21/2019] [Indexed: 12/11/2022] Open

Havugimana PC, Hu P, Emili A. Protein complexes, big data, machine learning and integrative proteomics: lessons learned over a decade of systematic analysis of protein interaction networks. Expert Rev Proteomics 2017;14:845-855. [PMID: 28918672 DOI: 10.1080/14789450.2017.1374179] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Zhang SB, Lai JH. Semantic similarity measurement between gene ontology terms based on exclusively inherited shared information. Gene 2015;558:108-17. [DOI: 10.1016/j.gene.2014.12.062] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 12/15/2014] [Accepted: 12/24/2014] [Indexed: 11/25/2022]

Hu P, Jiang H, Emili A. Incorporating Correlations among Gene Ontology Terms into Predicting Protein Functions. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open

Wang Z, Cao R, Cheng J. Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks. BMC Bioinformatics 2013;14 Suppl 3:S3. [PMID: 23514381 PMCID: PMC3584933 DOI: 10.1186/1471-2105-14-s3-s3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Abstract

Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era. However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function. Here, we developed a method that integrated profile-sequence alignment, profile-profile alignment, and Domain Co-Occurrence Networks (DCN) to predict protein function at different levels of complexity, ranging from obvious homology, to remote homology, to no homology. We tested the method blindingly in the 2011 Critical Assessment of Function Annotation (CAFA). Our experiments demonstrated that our three-level prediction method effectively increased the recall of function prediction while maintaining a reasonable precision. Particularly, our method can predict function terms defined by the Gene Ontology more accurately than three standard baseline methods in most situations, handle multi-domain proteins naturally, and make ab initio function prediction when no homology exists. These results show that our approach can combine complementary strengths of most widely used BLAST-based function prediction methods, rarely used in function prediction but more sensitive profile-profile comparison-based homology detection methods, and non-homology-based domain co-occurrence networks, to effectively extend the power of function prediction from high homology, to low homology, to no homology (ab initio cases).

Collapse

Greene CS, Troyanskaya OG. Accurate evaluation and analysis of functional genomics data and methods. Ann N Y Acad Sci 2012;1260:95-100. [PMID: 22268703 DOI: 10.1111/j.1749-6632.2011.06383.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Hallinan JS, James K, Wipat A. Network approaches to the functional analysis of microbial proteins. Adv Microb Physiol 2011;59:101-33. [PMID: 22114841 DOI: 10.1016/b978-0-12-387661-4.00005-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Janga SC, Díaz-Mejía JJ, Moreno-Hagelsieb G. Network-based function prediction and interactomics: the case for metabolic enzymes. Metab Eng 2010;13:1-10. [PMID: 20654726 DOI: 10.1016/j.ymben.2010.07.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2010] [Revised: 07/15/2010] [Accepted: 07/16/2010] [Indexed: 12/19/2022]

Abstract

As sequencing technologies increase in power, determining the functions of unknown proteins encoded by the DNA sequences so produced becomes a major challenge. Functional annotation is commonly done on the basis of amino-acid sequence similarity alone. Long after sequence similarity becomes undetectable by pair-wise comparison, profile-based identification of homologs can often succeed due to the conservation of position-specific patterns, important for a protein's three dimensional folding and function. Nevertheless, prediction of protein function from homology-driven approaches is not without problems. Homologous proteins might evolve different functions and the power of homology detection has already started to reach its maximum. Computational methods for inferring protein function, which exploit the context of a protein in cellular networks, have come to be built on top of homology-based approaches. These network-based functional inference techniques provide both a first hand hint into a proteins' functional role and offer complementary insights to traditional methods for understanding the function of uncharacterized proteins. Most recent network-based approaches aim to integrate diverse kinds of functional interactions to boost both coverage and confidence level. These techniques not only promise to solve the moonlighting aspect of proteins by annotating proteins with multiple functions, but also increase our understanding on the interplay between different functional classes in a cell. In this article we review the state of the art in network-based function prediction and describe some of the underlying difficulties and successes. Given the volume of high-throughput data that is being reported the time is ripe to employ these network-based approaches, which can be used to unravel the functions of the uncharacterized proteins accumulating in the genomic databases.

Collapse

Beaver JE, Tasan M, Gibbons FD, Tian W, Hughes TR, Roth FP. FuncBase: a resource for quantitative gene function annotation. Bioinformatics 2010;26:1806-7. [PMID: 20495000 PMCID: PMC2894510 DOI: 10.1093/bioinformatics/btq265] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2010] [Revised: 04/17/2010] [Accepted: 05/16/2010] [Indexed: 11/14/2022] Open

Ko S, Lee H. Integrative approaches to the prediction of protein functions based on the feature selection. BMC Bioinformatics 2009;10:455. [PMID: 20043848 PMCID: PMC2813249 DOI: 10.1186/1471-2105-10-455] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2009] [Accepted: 12/31/2009] [Indexed: 01/30/2023] Open

Abstract

Background

Protein function prediction has been one of the most important issues in functional genomics. With the current availability of various genomic data sets, many researchers have attempted to develop integration models that combine all available genomic data for protein function prediction. These efforts have resulted in the improvement of prediction quality and the extension of prediction coverage. However, it has also been observed that integrating more data sources does not always increase the prediction quality. Therefore, selecting data sources that highly contribute to the protein function prediction has become an important issue.

Results

We present systematic feature selection methods that assess the contribution of genome-wide data sets to predict protein functions and then investigate the relationship between genomic data sources and protein functions. In this study, we use ten different genomic data sources in Mus musculus, including: protein-domains, protein-protein interactions, gene expressions, phenotype ontology, phylogenetic profiles and disease data sources to predict protein functions that are labelled with Gene Ontology (GO) terms. We then apply two approaches to feature selection: exhaustive search feature selection using a kernel based logistic regression (KLR), and a kernel based L1-norm regularized logistic regression (KL1LR). In the first approach, we exhaustively measure the contribution of each data set for each function based on its prediction quality. In the second approach, we use the estimated coefficients of features as measures of contribution of data sources. Our results show that the proposed methods improve the prediction quality compared to the full integration of all data sources and other filter-based feature selection methods. We also show that contributing data sources can differ depending on the protein function. Furthermore, we observe that highly contributing data sets can be similar among a group of protein functions that have the same parent in the GO hierarchy.

Conclusions

In contrast to previous integration methods, our approaches not only increase the prediction quality but also gather information about highly contributing data sources for each protein function. This information can help researchers collect relevant data sources for annotating protein functions.

Collapse

Malone BM, Perkins AD, Bridges SM. Integrating phenotype and gene expression data for predicting gene function. BMC Bioinformatics 2009;10 Suppl 11:S20. [PMID: 19811686 PMCID: PMC3226192 DOI: 10.1186/1471-2105-10-s11-s20] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023] Open

Li X, Chen H, Li J, Zhang Z. Gene function prediction with gene interaction networks: a context graph kernel approach. ACTA ACUST UNITED AC 2009;14:119-28. [PMID: 19789115 DOI: 10.1109/titb.2009.2033116] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Rentzsch R, Orengo CA. Protein function prediction--the power of multiplicity. Trends Biotechnol 2009;27:210-9. [PMID: 19251332 DOI: 10.1016/j.tibtech.2009.01.002] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Revised: 01/21/2009] [Accepted: 01/23/2009] [Indexed: 01/07/2023]

Forslund K, Sonnhammer ELL. Predicting protein function from domain content. ACTA ACUST UNITED AC 2008;24:1681-7. [PMID: 18591194 DOI: 10.1093/bioinformatics/btn312] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Peña-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS, Morris Q, Klein-Seetharaman J, Bar-Joseph Z, Chen T, Sun F, Troyanskaya OG, Marcotte EM, Xu D, Hughes TR, Roth FP. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol 2008;9 Suppl 1:S2. [PMID: 18613946 PMCID: PMC2447536 DOI: 10.1186/gb-2008-9-s1-s2] [Citation(s) in RCA: 197] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Zhao XM, Chen L, Aihara K. Protein function prediction with high-throughput data. Amino Acids 2008;35:517-30. [PMID: 18427717 DOI: 10.1007/s00726-008-0077-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Accepted: 03/13/2008] [Indexed: 12/12/2022]

Linghu B, Snitkin ES, Holloway DT, Gustafson AM, Xia Y, DeLisi C. High-precision high-coverage functional inference from integrated data sources. BMC Bioinformatics 2008;9:119. [PMID: 18298847 PMCID: PMC2292694 DOI: 10.1186/1471-2105-9-119] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2007] [Accepted: 02/25/2008] [Indexed: 11/15/2022] Open

Abstract

Background

Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation.

Results

We first apply this framework to Saccharomyces cerevisiae. In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Naïve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms.

Conclusion

We provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.

Collapse

Li Y, Guo Z, Ma W, Yang D, Wang D, Zhang M, Zhu J, Zhong G, Li Y, Yao C, Wang J. Finding finer functions for partially characterized proteins by protein-protein interaction networks. CHINESE SCIENCE BULLETIN-CHINESE 2007. [DOI: 10.1007/s11434-008-0016-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Frishman D. Protein annotation at genomic scale: the current status. Chem Rev 2007;107:3448-66. [PMID: 17658902 DOI: 10.1021/cr068303k] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Murali TM, Wu CJ, Kasif S. The art of gene function prediction. Nat Biotechnol 2007;24:1474-5; author reply 1475-6. [PMID: 17160037 DOI: 10.1038/nbt1206-1474] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]