Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Deng M, Chen T, Sun F. An integrated probabilistic model for functional prediction of proteins. J Comput Biol 2004;11:463-75. [PMID: 15285902 DOI: 10.1089/1066527041410346] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Deng M, Chen T, Sun F. An integrated probabilistic model for functional prediction of proteins. J Comput Biol 2004;11:463-75. [PMID: 15285902 DOI: 10.1089/1066527041410346] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Koo HJ, Pan W. Are trait-associated genes clustered together in a gene network? Genet Epidemiol 2024. [PMID: 38472164 DOI: 10.1002/gepi.22557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 01/25/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024]

Devkota P, Mohanty SD, Manda P. A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature. BioData Min 2022;15:22. [PMID: 36171616 PMCID: PMC9516808 DOI: 10.1186/s13040-022-00310-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 09/17/2022] [Indexed: 11/27/2022] Open

James K, Alsobhe A, Cockell SJ, Wipat A, Pocock M. Integration of probabilistic functional networks without an external Gold Standard. BMC Bioinformatics 2022;23:302. [PMID: 35879662 PMCID: PMC9316706 DOI: 10.1186/s12859-022-04834-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 07/11/2022] [Indexed: 11/10/2022] Open

Mancuso CA, Bills PS, Krum D, Newsted J, Liu R, Krishnan A. GenePlexus: a web-server for gene discovery using network-based machine learning. Nucleic Acids Res 2022;50:W358-W366. [PMID: 35580053 PMCID: PMC9252732 DOI: 10.1093/nar/gkac335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/13/2022] [Accepted: 04/30/2022] [Indexed: 11/28/2022] Open

Law JN, Akers K, Tasnina N, Santina CMD, Deutsch S, Kshirsagar M, Klein-Seetharaman J, Crovella M, Rajagopalan P, Kasif S, Murali TM. Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2. Gigascience 2021;10:giab082. [PMID: 34966926 PMCID: PMC8716363 DOI: 10.1093/gigascience/giab082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 09/21/2021] [Accepted: 11/28/2021] [Indexed: 01/02/2023] Open

Vu TTD, Jung J. Protein function prediction with gene ontology: from traditional to deep learning models. PeerJ 2021;9:e12019. [PMID: 34513334 PMCID: PMC8395570 DOI: 10.7717/peerj.12019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 07/29/2021] [Indexed: 11/25/2022] Open

Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. Bioinformatics 2020;36:3457-3465. [PMID: 32129827 PMCID: PMC7267831 DOI: 10.1093/bioinformatics/btaa150] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 12/22/2022] Open

Abstract

Background

Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem.

Results

In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows.

Availability and implementation

The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available.

Contact

arjun@msu.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Vascon S, Frasca M, Tripodi R, Valentini G, Pelillo M. Protein function prediction as a graph-transduction game. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2018.04.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. BIOINFORMATICS (OXFORD, ENGLAND) 2020;36:3457-3465. [PMID: 32129827 DOI: 10.1101/721423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 05/26/2023]

Abstract

BACKGROUND

RESULTS

In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene's full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation's appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows.

AVAILABILITY AND IMPLEMENTATION

The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available.

CONTACT

arjun@msu.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Frasca M, Bianchi NC. Multitask Protein Function Prediction through Task Dissimilarity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:1550-1560. [PMID: 28328509 DOI: 10.1109/tcbb.2017.2684127] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Yunes JM, Babbitt PC. Effusion: prediction of protein function from sequence similarity networks. Bioinformatics 2019;35:442-451. [PMID: 30084920 PMCID: PMC6361244 DOI: 10.1093/bioinformatics/bty672] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 07/24/2018] [Accepted: 07/30/2018] [Indexed: 12/26/2022] Open

Peng W, Li M, Chen L, Wang L. Predicting Protein Functions by Using Unbalanced Random Walk Algorithm on Three Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017;14:360-369. [PMID: 28368814 DOI: 10.1109/tcbb.2015.2394314] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Meng J, Wekesa JS, Shi GL, Luan YS. Protein function prediction based on data fusion and functional interrelationship. Math Biosci 2016;274:25-32. [DOI: 10.1016/j.mbs.2016.02.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Revised: 01/08/2016] [Accepted: 02/01/2016] [Indexed: 10/22/2022]

Ma C, Chen Y, Wilkins D, Chen X, Zhang J. An unsupervised learning approach to find ovarian cancer genes through integration of biological data. BMC Genomics 2015;16 Suppl 9:S3. [PMID: 26328548 PMCID: PMC4547402 DOI: 10.1186/1471-2164-16-s9-s3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Frasca M. Automated gene function prediction through gene multifunctionality in biological networks. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.04.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Frasca M, Bassis S, Valentini G. Learning node labels with multi-category Hopfield networks. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-1965-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

The clustering of functionally related genes contributes to CNV-mediated disease. Genome Res 2015;25:802-13. [PMID: 25887030 PMCID: PMC4448677 DOI: 10.1101/gr.184325.114] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 04/13/2015] [Indexed: 12/20/2022]

Tiwari AK, Srivastava R. A survey of computational intelligence techniques in protein function prediction. INTERNATIONAL JOURNAL OF PROTEOMICS 2014;2014:845479. [PMID: 25574395 PMCID: PMC4276698 DOI: 10.1155/2014/845479] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 10/31/2014] [Accepted: 11/07/2014] [Indexed: 02/08/2023]

Chen B, Li M, Wang J, Wu FX. Disease gene identification by using graph kernels and Markov random fields. SCIENCE CHINA. LIFE SCIENCES 2014;57:1054-1063. [PMID: 25326067 DOI: 10.1007/s11427-014-4745-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 07/14/2014] [Indexed: 01/05/2023]

Chen B, Wang J, Li M, Wu FX. Identifying disease genes by integrating multiple data sources. BMC Med Genomics 2014;7 Suppl 2:S2. [PMID: 25350511 PMCID: PMC4243092 DOI: 10.1186/1755-8794-7-s2-s2] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Mulder NJ, Akinola RO, Mazandu GK, Rapanoel H. Using biological networks to improve our understanding of infectious diseases. Comput Struct Biotechnol J 2014;11:1-10. [PMID: 25379138 PMCID: PMC4212278 DOI: 10.1016/j.csbj.2014.08.006] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Dellinger AE, Nixon AB, Pang H. Integrative Pathway Analysis Using Graph-Based Learning with Applications to TCGA Colon and Ovarian Data. Cancer Inform 2014;13:1-9. [PMID: 25125969 PMCID: PMC4125381 DOI: 10.4137/cin.s13634] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Revised: 03/17/2014] [Accepted: 03/18/2014] [Indexed: 12/15/2022] Open

Valentini G, Paccanaro A, Caniza H, Romero AE, Re M. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif Intell Med 2014;61:63-78. [PMID: 24726035 PMCID: PMC4070077 DOI: 10.1016/j.artmed.2014.03.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Revised: 03/05/2014] [Accepted: 03/10/2014] [Indexed: 02/07/2023]

Abstract

OBJECTIVE

In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization.

MATERIALS AND METHODS

We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions.

RESULTS

The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation.

CONCLUSIONS

Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network.

Collapse

Valentini G. Hierarchical ensemble methods for protein function prediction. ISRN BIOINFORMATICS 2014;2014:901419. [PMID: 25937954 PMCID: PMC4393075 DOI: 10.1155/2014/901419] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2014] [Accepted: 02/25/2014] [Indexed: 12/11/2022]

Kuppuswamy U, Ananthasubramanian S, Wang Y, Balakrishnan N, Ganapathiraju MK. Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions. Algorithms Mol Biol 2014;9:10. [PMID: 24708602 PMCID: PMC4124845 DOI: 10.1186/1748-7188-9-10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2013] [Accepted: 03/11/2014] [Indexed: 01/30/2023] Open

Abstract

Background

The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown.

Results

We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of ~58% and ~ 40% for localization and functions respectively of proteins were determined at a threshold of ~30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k-nearest neighbor classifier confirmed that our results compared favorably.

Conclusions

This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in GWAS studies to be associated with diseases, which are of translational interest.

Collapse

Peng W, Wang J, Cai J, Chen L, Li M, Wu FX. Improving protein function prediction using domain and protein complexes in PPI networks. BMC SYSTEMS BIOLOGY 2014;8:35. [PMID: 24655481 PMCID: PMC3994332 DOI: 10.1186/1752-0509-8-35] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2012] [Accepted: 03/14/2014] [Indexed: 01/25/2023]

Musso G, Tasan M, Mosimann C, Beaver JE, Plovie E, Carr LA, Chua HN, Dunham J, Zuberi K, Rodriguez H, Morris Q, Zon L, Roth FP, MacRae CA. Novel cardiovascular gene functions revealed via systematic phenotype prediction in zebrafish. Development 2014;141:224-35. [PMID: 24346703 DOI: 10.1242/dev.099796] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Hayashida M, Kamada M, Song J, Akutsu T. Prediction of protein-RNA residue-base contacts using two-dimensional conditional random field with the lasso. BMC SYSTEMS BIOLOGY 2013;7 Suppl 2:S15. [PMID: 24564966 PMCID: PMC3866258 DOI: 10.1186/1752-0509-7-s2-s15] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract

Background

To uncover molecular functions and networks in biological cellular systems, it is important to dissect interactions between proteins and RNAs. Many studies have been performed to investigate and analyze interactions between protein amino acid residues and RNA bases. In terms of interactions between residues in proteins, it is generally accepted that an amino acid residue at interacting sites has coevolved together with the partner residue in order to keep the interaction between residues in proteins. Based on this hypothesis, in our previous study to identify residue-residue contact pairs in interacting proteins, we made calculations of mutual information (M I) between amino acid residues from some multiple sequence alignment of homologous proteins, and combined it with a discriminative random field (DRF) approach, which is a special type of conditional random fields (CRFs) and has been proved useful for the purpose of extracting distinguishing areas from a photograph in the image processing field. Recently, the evolutionary correlation of interactions between residues and DNA bases has also been found in certain transcription factors and the DNA-binding sites.

Results

In this paper, we employ more generic two-dimensional CRFs than such DRFs to predict interactions between protein amino acid residues and RNA bases. In addition, we introduce labels representing kinds of amino acids and bases as local features of a CRF. Furthermore, we examine the utility of L₁-norm regularization (lasso) for the CRF. For evaluation of our method, we use residue-base interactions between several Pfam domains and Rfam entries, conduct cross-validation, and calculate the average AUC (Area under ROC Curve) score. The results suggest that our CRF-based method using mutual information and labels with the lasso is useful for further improving the performance, especially provided that the features of CRF are successfully reduced by the lasso approach.

Conclusions

We propose simple and generic two-dimensional CRF models using labels and mutual information with the lasso. Use of the CRF-based method in combination with the lasso is particularly useful for predicting the residue-base contacts in protein-RNA interactions.

Collapse

Frasca M, Bertoni A, Re M, Valentini G. A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Netw 2013;43:84-98. [DOI: 10.1016/j.neunet.2013.01.021] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2012] [Revised: 01/28/2013] [Accepted: 01/29/2013] [Indexed: 01/03/2023]

Lee J, Lee J. Hidden information revealed by optimal community structure from a protein-complex bipartite network improves protein function prediction. PLoS One 2013;8:e60372. [PMID: 23577106 PMCID: PMC3618231 DOI: 10.1371/journal.pone.0060372] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 02/25/2013] [Indexed: 11/18/2022] Open

Abstract

The task of extracting the maximal amount of information from a biological network has drawn much attention from researchers, for example, predicting the function of a protein from a protein-protein interaction (PPI) network. It is well known that biological networks consist of modules/communities, a set of nodes that are more densely inter-connected among themselves than with the rest of the network. However, practical applications of utilizing the community information have been rather limited. For protein function prediction on a network, it has been shown that none of the existing community-based protein function prediction methods outperform a simple neighbor-based method. Recently, we have shown that proper utilization of a highly optimal modularity community structure for protein function prediction can outperform neighbor-assisted methods. In this study, we propose two function prediction approaches on bipartite networks that consider the community structure information as well as the neighbor information from the network: 1) a simple screening method and 2) a random forest based method. We demonstrate that our community-assisted methods outperform neighbor-assisted methods and the random forest method yields the best performance. In addition, we show that using the optimal community structure information is essential for more accurate function prediction for the protein-complex bipartite network of Saccharomyces cerevisiae. Community detection can be carried out either using a modified modularity for dealing with the original bipartite network or first projecting the network into a single-mode network (i.e., PPI network) and then applying community detection to the reduced network. We find that the projection leads to the loss of information in a significant way. Since our prediction methods rely only on the network topology, they can be applied to various fields where an efficient network-based analysis is required.

Collapse

Lichtenstein I, Charleston MA, Caetano TS, Gamble JR, Vadas MA. Active subnetwork recovery with a mechanism-dependent scoring function; with application to angiogenesis and organogenesis studies. BMC Bioinformatics 2013;14:59. [PMID: 23432934 PMCID: PMC3663784 DOI: 10.1186/1471-2105-14-59] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2012] [Accepted: 01/21/2013] [Indexed: 11/10/2022] Open

Abstract

Background

The learning active subnetworks problem involves finding subnetworks of a bio-molecular network that are active in a particular condition. Many approaches integrate observation data (e.g., gene expression) with the network topology to find candidate subnetworks. Increasingly, pathway databases contain additional annotation information that can be mined to improve prediction accuracy, e.g., interaction mechanism (e.g., transcription, microRNA, cleavage) annotations. We introduce a mechanism-based approach to active subnetwork recovery which exploits such annotations. We suggest that neighboring interactions in a network tend to be co-activated in a way that depends on the “correlation” of their mechanism annotations. e.g., neighboring phosphorylation and de-phosphorylation interactions may be more likely to be co-activated than neighboring phosphorylation and covalent bonding interactions.

Results

Our method iteratively learns the mechanism correlations and finds the most likely active subnetwork. We use a probabilistic graphical model with a Markov Random Field component which creates dependencies between the states (active or non-active) of neighboring interactions, that incorporates a mechanism-based component to the function. We apply a heuristic-based EM-based algorithm suitable for the problem. We validated our method’s performance using simulated data in networks downloaded from GeneGO against the same approach without the mechanism-based component, and two other existing methods. We validated our methods performance in correctly recovering (1) the true interaction states, and (2) global network properties of the original network against these other methods. We applied our method to networks generated from time-course gene expression studies in angiogenesis and lung organogenesis and validated the findings from a biological perspective against current literature.

Conclusions

The advantage of our mechanism-based approach is best seen in networks composed of connected regions with a large number of interactions annotated with a subset of mechanisms, e.g., a regulatory region of transcription interactions, or a cleavage cascade region. When applied to real datasets, our method recovered novel and biologically meaningful putative interactions, e.g., interactions from an integrin signaling pathway using the angiogenesis dataset, and a group of regulatory microRNA interactions in an organogenesis network.

Collapse

Saini A, Hou J. Progressive Clustering Based Method for Protein Function Prediction. Bull Math Biol 2013;75:331-50. [DOI: 10.1007/s11538-013-9809-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 01/07/2013] [Indexed: 12/26/2022]

Re M, Mesiti M, Valentini G. A fast ranking algorithm for predicting gene functions in biomolecular networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012;9:1812-1818. [PMID: 23221088 DOI: 10.1109/tcbb.2012.114] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]

Chua HN, Wong L. Predicting Protein Functions from Protein Interaction Networks. ACTA ACUST UNITED AC 2012. [DOI: 10.4018/ijkdb.2012100104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

A Resource of Quantitative Functional Annotation for Homo sapiens Genes. G3-GENES GENOMES GENETICS 2012;2:223-33. [PMID: 22384401 PMCID: PMC3284330 DOI: 10.1534/g3.111.000828] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2011] [Accepted: 11/23/2011] [Indexed: 01/31/2023]

Hallinan J. Data mining for microbiologists. J Microbiol Methods 2012. [DOI: 10.1016/b978-0-08-099387-4.00002-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]

Wei P, Pan W. Bayesian Joint Modeling of Multiple Gene Networks and Diverse Genomic Data to Identify Target Genes of a Transcription Factor. Ann Appl Stat 2012;6:334-355. [PMID: 22408712 PMCID: PMC3298193 DOI: 10.1214/11-aoas502] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference. Mach Learn 2011. [DOI: 10.1007/s10994-011-5271-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]

Hallinan JS, James K, Wipat A. Network approaches to the functional analysis of microbial proteins. Adv Microb Physiol 2011;59:101-33. [PMID: 22114841 DOI: 10.1016/b978-0-12-387661-4.00005-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Hawkins T, Kihara D. FUNCTION PREDICTION OF UNCHARACTERIZED PROTEINS. J Bioinform Comput Biol 2011;5:1-30. [PMID: 17477489 DOI: 10.1142/s0219720007002503] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2006] [Revised: 09/23/2006] [Accepted: 10/10/2006] [Indexed: 11/18/2022]

Mazandu GK, Mulder NJ. Using the underlying biological organization of the Mycobacterium tuberculosis functional network for protein function prediction. INFECTION GENETICS AND EVOLUTION 2011;12:922-32. [PMID: 22085822 DOI: 10.1016/j.meegid.2011.10.027] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Revised: 10/25/2011] [Accepted: 10/28/2011] [Indexed: 10/15/2022]

Abstract

Despite ever-increasing amounts of sequence and functional genomics data, there is still a deficiency of functional annotation for many newly sequenced proteins. For Mycobacterium tuberculosis (MTB), more than half of its genome is still uncharacterized, which hampers the search for new drug targets within the bacterial pathogen and limits our understanding of its pathogenicity. As for many other genomes, the annotations of proteins in the MTB proteome were generally inferred from sequence homology, which is effective but its applicability has limitations. We have carried out large-scale biological data integration to produce an MTB protein functional interaction network. Protein functional relationships were extracted from the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database, and additional functional interactions from microarray, sequence and protein signature data. The confidence level of protein relationships in the additional functional interaction data was evaluated using a dynamic data-driven scoring system. This functional network has been used to predict functions of uncharacterized proteins using Gene Ontology (GO) terms, and the semantic similarity between these terms measured using a state-of-the-art GO similarity metric. To achieve better trade-off between improvement of quality, genomic coverage and scalability, this prediction is done by observing the key principles driving the biological organization of the functional network. This study yields a new functionally characterized MTB strain CDC1551 proteome, consisting of 3804 and 3698 proteins out of 4195 with annotations in terms of the biological process and molecular function ontologies, respectively. These data can contribute to research into the Development of effective anti-tubercular drugs with novel biological mechanisms of action.

Collapse

Ebrahimi N, Yang Y. An Integrated Probabilistic Model for Assessing a Nanocomponent's Reliability. J Appl Probab 2011. [DOI: 10.1239/jap/1316796918] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Hayashida M, Kamada M, Song J, Akutsu T. Conditional random field approach to prediction of protein-protein interactions using domain information. BMC SYSTEMS BIOLOGY 2011;5 Suppl 1:S8. [PMID: 21689483 PMCID: PMC3121124 DOI: 10.1186/1752-0509-5-s1-s8] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Nguyen CD, Gardiner KJ, Cios KJ. Protein annotation from protein interaction networks and Gene Ontology. J Biomed Inform 2011;44:824-9. [PMID: 21571095 DOI: 10.1016/j.jbi.2011.04.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2010] [Revised: 04/17/2011] [Accepted: 04/26/2011] [Indexed: 01/12/2023]

Mazandu GK, Mulder NJ. Scoring protein relationships in functional interaction networks predicted from sequence data. PLoS One 2011;6:e18607. [PMID: 21526183 PMCID: PMC3079720 DOI: 10.1371/journal.pone.0018607] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2010] [Accepted: 03/07/2011] [Indexed: 11/21/2022] Open

Erdin S, Lisewski AM, Lichtarge O. Protein function prediction: towards integration of similarity metrics. Curr Opin Struct Biol 2011;21:180-8. [PMID: 21353529 DOI: 10.1016/j.sbi.2011.02.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2011] [Accepted: 02/03/2011] [Indexed: 11/16/2022]

Mostafavi S, Goldenberg A, Morris Q. Predicting node characteristics from molecular networks. Methods Mol Biol 2011;781:399-414. [PMID: 21877293 DOI: 10.1007/978-1-61779-276-2_20] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2022]

Bertoni A, Frasca M, Valentini G. COSNet: A Cost Sensitive Neural Network for Semi-supervised Learning in Graphs. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES 2011. [DOI: 10.1007/978-3-642-23780-5_24] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]

Kourmpetis YA, van Dijk AD, van Ham RC, ter Braak CJ. Genome-wide computational function prediction of Arabidopsis proteins by integration of multiple data sources. PLANT PHYSIOLOGY 2011;155:271-81. [PMID: 21098674 PMCID: PMC3075770 DOI: 10.1104/pp.110.162164] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]

Jiang X, Gold D, Kolaczyk ED. Network-based auto-probit modeling for protein function prediction. Biometrics 2010;67:958-66. [PMID: 21133881 DOI: 10.1111/j.1541-0420.2010.01519.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]