Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Couto FM, Silva MJ. Disjunctive shared information between ontology concepts: application to Gene Ontology. J Biomed Semantics 2011;2:5. [PMID: 21884591 DOI: 10.1186/2041-1480-2-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Accepted: 08/31/2011] [Indexed: 01/12/2023] Open

For:	Couto FM, Silva MJ. Disjunctive shared information between ontology concepts: application to Gene Ontology. J Biomed Semantics 2011;2:5. [PMID: 21884591 DOI: 10.1186/2041-1480-2-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Accepted: 08/31/2011] [Indexed: 01/12/2023] Open

Number

Cited by Other Article(s)

Bandyopadhyay SS, Halder AK, Saha S, Chatterjee P, Nasipuri M, Basu S. Assessment of GO-Based Protein Interaction Affinities in the Large-Scale Human–Coronavirus Family Interactome. Vaccines (Basel) 2023;11:vaccines11030549. [PMID: 36992133 DOI: 10.3390/vaccines11030549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/19/2023] [Accepted: 02/23/2023] [Indexed: 03/03/2023] Open

Pesaranghader A, Matwin S, Sokolova M, Grenier JC, Beiko RG, Hussin J. OUP accepted manuscript. Bioinformatics 2022;38:3051-3061. [PMID: 35536192 PMCID: PMC9154256 DOI: 10.1093/bioinformatics/btac304] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 02/12/2022] [Indexed: 11/24/2022] Open

Kim J, Kim D, Sohn KA. HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball. Bioinformatics 2021;37:2971-2980. [PMID: 33760022 DOI: 10.1093/bioinformatics/btab193] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 03/14/2021] [Accepted: 03/23/2021] [Indexed: 02/02/2023] Open

Ikram N, Qadir MA, Afzal MT. SimExact – An Efficient Method to Compute Function Similarity Between Proteins Using Gene Ontology. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017092842] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Wang X, Zhu X, Ye M, Wang Y, Li CD, Xiong Y, Wei DQ. STS-NLSP: A Network-Based Label Space Partition Method for Predicting the Specificity of Membrane Transporter Substrates Using a Hybrid Feature of Structural and Semantic Similarity. Front Bioeng Biotechnol 2019;7:306. [PMID: 31781551 PMCID: PMC6851049 DOI: 10.3389/fbioe.2019.00306] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 10/17/2019] [Indexed: 12/11/2022] Open

Abstract

Membrane transport proteins play crucial roles in the pharmacokinetics of substrate drugs, the drug resistance in cancer and are vital to the process of drug discovery, development and anti-cancer therapeutics. However, experimental methods to profile a substrate drug against a panel of transporters to determine its specificity are labor intensive and time consuming. In this article, we aim to develop an in silico multi-label classification approach to predict whether a substrate can specifically recognize one of the 13 categories of drug transporters ranging from ATP-binding cassette to solute carrier families using both structural fingerprints and chemical ontologies information of substrates. The data-driven network-based label space partition (NLSP) method was utilized to construct the model based on a hybrid of similarity-based feature by the integration of 2D fingerprint and semantic similarity. This method builds predictors for each label cluster (possibly intersecting) detected by community detection algorithms and takes union of label sets for a compound as final prediction. NLSP lies into the ensembles of multi-label classifier category in multi-label learning field. We utilized Cramér's V statistics to quantify the label correlations and depicted them via a heatmap. The jackknife tests and iterative stratification based cross-validation method were adopted on a benchmark dataset to evaluate the prediction performance of the proposed models both in multi-label and label-wise manner. Compared with other powerful multi-label methods, ML-kNN, MTSVM, and RAkELd, our multi-label classification model of NLPS-RF (random forest-based NLSP) has proven to be a feasible and effective model, and performed satisfactorily in the predictive task of transporter-substrate specificity. The idea behind NLSP method is intriguing and the power of NLSP remains to be explored for the multi-label learning problems in bioinformatics. The benchmark dataset, intermediate results and python code which can fully reproduce our experiments and results are available at https://github.com/dqwei-lab/STS.

Collapse

Halder AK, Dutta P, Kundu M, Basu S, Nasipuri M. Review of computational methods for virus-host protein interaction prediction: a case study on novel Ebola-human interactions. Brief Funct Genomics 2019;17:381-391. [PMID: 29028879 PMCID: PMC7109800 DOI: 10.1093/bfgp/elx026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures. BIOMED RESEARCH INTERNATIONAL 2019;2019:6750296. [PMID: 30809545 PMCID: PMC6369486 DOI: 10.1155/2019/6750296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 01/13/2019] [Indexed: 11/30/2022]

Abstract

In the field of biology, researchers need to compare genes or gene products using semantic similarity measures (SSM). Continuous data growth and diversity in data characteristics comprise what is called big data; current biological SSMs cannot handle big data. Therefore, these measures need the ability to control the size of big data. We used parallel and distributed processing by splitting data into multiple partitions and applied SSM measures to each partition; this approach helped manage big data scalability and computational problems. Our solution involves three steps: split gene ontology (GO), data clustering, and semantic similarity calculation. To test this method, split GO and data clustering algorithms were defined and assessed for performance in the first two steps. Three of the best SSMs in biology [Resnik, Shortest Semantic Differentiation Distance (SSDD), and SORA] are enhanced by introducing threaded parallel processing, which is used in the third step. Our results demonstrate that introducing threads in SSMs reduced the time of calculating semantic similarity between gene pairs and improved performance of the three SSMs. Average time was reduced by 24.51% for Resnik, 22.93%, for SSDD, and 33.68% for SORA. Total time was reduced by 8.88% for Resnik, 23.14% for SSDD, and 39.27% for SORA. Using these threaded measures in the distributed system, combined with using split GO and data clustering algorithms to split input data based on their similarity, reduced the average time more than did the approach of equally dividing input data. Time reduction increased with increasing number of splits. Time reduction percentage was 24.1%, 39.2%, and 66.6% for Threaded SSDD; 33.0%, 78.2%, and 93.1% for Threaded SORA in the case of 2, 3, and 4 slaves, respectively; and 92.04% for Threaded Resnik in the case of four slaves.

Collapse

Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology. PLoS One 2018;13:e0208626. [PMID: 30532199 PMCID: PMC6287949 DOI: 10.1371/journal.pone.0208626] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Accepted: 11/20/2018] [Indexed: 12/14/2022] Open

Abstract

Identifying disease genes from a vast amount of genetic data is one of the most challenging tasks in the post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these markers, but their performance is highly dependent upon the size and quality of available data. In this study, we demonstrated that machine learning classifiers trained on gene functional similarities, using Gene Ontology (GO), can improve the identification of genes involved in complex diseases. For this purpose, we developed a supervised machine learning methodology to predict complex disease genes. The proposed pipeline was assessed using Autism Spectrum Disorder (ASD) candidate genes. A quantitative measure of gene functional similarities was obtained by employing different semantic similarity measures. To infer the hidden functional similarities between ASD genes, various types of machine learning classifiers were built on quantitative semantic similarity matrices of ASD and non-ASD genes. The classifiers trained and tested on ASD and non-ASD gene functional similarities outperformed previously reported ASD classifiers. For example, a Random Forest (RF) classifier achieved an AUC of 0. 80 for predicting new ASD genes, which was higher than the reported classifier (0.73). Additionally, this classifier was able to predict 73 novel ASD candidate genes that were enriched for core ASD phenotypes, such as autism and obsessive-compulsive behavior. In addition, predicted genes were also enriched for ASD co-occurring conditions, including Attention Deficit Hyperactivity Disorder (ADHD). We also developed a KNIME workflow with the proposed methodology which allows users to configure and execute it without requiring machine learning and programming skills. Machine learning is an effective and reliable technique to decipher ASD mechanism by identifying novel disease genes, but this study further demonstrated that their performance can be improved by incorporating a quantitative measure of gene functional similarities. Source code and the workflow of the proposed methodology are available at https://github.com/Muh-Asif/ASD-genes-prediction.

Collapse

GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms. Sci Rep 2018;8:15107. [PMID: 30305653 PMCID: PMC6180005 DOI: 10.1038/s41598-018-33219-y] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 09/24/2018] [Indexed: 01/29/2023] Open

Liu W, Liu J, Rajapakse JC. Gene Ontology Enrichment Improves Performances of Functional Similarity of Genes. Sci Rep 2018;8:12100. [PMID: 30108262 PMCID: PMC6092333 DOI: 10.1038/s41598-018-30455-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 07/25/2018] [Indexed: 12/23/2022] Open

Ikram N, Qadir MA, Afzal MT. Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:905-912. [PMID: 28436885 DOI: 10.1109/tcbb.2017.2695542] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Dutta P, Basu S, Kundu M. Assessment of Semantic Similarity between Proteins Using Information Content and Topological Properties of the Gene Ontology Graph. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:839-849. [PMID: 28371781 DOI: 10.1109/tcbb.2017.2689762] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Chu Y, Wang Z, Wang R, Zhang N, Li J, Hu Y, Teng M, Wang Y. WDNfinder: A method for minimum driver node set detection and analysis in directed and weighted biological network. J Bioinform Comput Biol 2017;15:1750021. [DOI: 10.1142/s0219720017500214] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Chen Q, Wan Y, Zhang X, Lei Y, Zobel J, Verspoor K. Comparative Analysis of Sequence Clustering Methods for Deduplication of Biological Databases. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2017. [DOI: 10.1145/3131611] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

A Novel Measure for Semantic Similarity Computation of Gene Ontology Terms Using Weighted Aggregation of Information Contents. HEPATITIS MONTHLY 2017. [DOI: 10.5812/zjrms.12041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]

Dutta P, Halder AK, Basu S, Kundu M. A survey on Ebola genome and current trends in computational research on the Ebola virus. Brief Funct Genomics 2017;17:374-380. [DOI: 10.1093/bfgp/elx020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Bible PW, Sun HW, Morasso MI, Loganantharaj R, Wei L. The effects of shared information on semantic calculations in the gene ontology. Comput Struct Biotechnol J 2017;15:195-211. [PMID: 28217262 PMCID: PMC5299144 DOI: 10.1016/j.csbj.2017.01.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2016] [Revised: 01/25/2017] [Accepted: 01/25/2017] [Indexed: 01/01/2023] Open

Abstract

The structured vocabulary that describes gene function, the gene ontology (GO), serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI) between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK), an open source C++ library with Python interface (github.com/paulbible/ggtk).

Collapse

Shui Y, Cho YR. Alignment of PPI Networks Using Semantic Similarity for Conserved Protein Complex Prediction. IEEE Trans Nanobioscience 2017;15:380-389. [PMID: 28113907 DOI: 10.1109/tnb.2016.2555802] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Pesquita C. Semantic Similarity in the Gene Ontology. Methods Mol Biol 2017;1446:161-173. [PMID: 27812942 DOI: 10.1007/978-1-4939-3743-1_12] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Ehsani R, Drabløs F. TopoICSim: a new semantic similarity measure based on gene ontology. BMC Bioinformatics 2016;17:296. [PMID: 27473391 PMCID: PMC4966780 DOI: 10.1186/s12859-016-1160-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2016] [Accepted: 07/21/2016] [Indexed: 01/14/2023] Open

Abstract

Background

The Gene Ontology (GO) is a dynamic, controlled vocabulary that describes the cellular function of genes and proteins according to tree major categories: biological process, molecular function and cellular component. It has become widely used in many bioinformatics applications for annotating genes and measuring their semantic similarity, rather than their sequence similarity. Generally speaking, semantic similarity measures involve the GO tree topology, information content of GO terms, or a combination of both.

Results

Here we present a new semantic similarity measure called TopoICSim (Topological Information Content Similarity) which uses information on the specific paths between GO terms based on the topology of the GO tree, and the distribution of information content along these paths. The TopoICSim algorithm was evaluated on two human benchmark datasets based on KEGG pathways and Pfam domains grouped as clans, using GO terms from either the biological process or molecular function. The performance of the TopoICSim measure compared favorably to five existing methods. Furthermore, the TopoICSim similarity was also tested on gene/protein sets defined by correlated gene expression, using three human datasets, and showed improved performance compared to two previously published similarity measures. Finally we used an online benchmarking resource which evaluates any similarity measure against a set of 11 similarity measures in three tests, using gene/protein sets based on sequence similarity, Pfam domains, and enzyme classifications. The results for TopoICSim showed improved performance relative to most of the measures included in the benchmarking, and in particular a very robust performance throughout the different tests.

Conclusions

The TopoICSim similarity measure provides a competitive method with robust performance for quantification of semantic similarity between genes and proteins based on GO annotations. An R script for TopoICSim is available at http://bigr.medisin.ntnu.no/tools/TopoICSim.R.

Collapse

Zhang SB, Lai JH. Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures. Gene 2016;586:148-57. [PMID: 27080954 DOI: 10.1016/j.gene.2016.04.024] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Revised: 03/28/2016] [Accepted: 04/08/2016] [Indexed: 11/19/2022]

Pesaranghader A, Matwin S, Sokolova M, Beiko RG. simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes. Bioinformatics 2015;32:1380-7. [PMID: 26708333 DOI: 10.1093/bioinformatics/btv755] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 12/21/2015] [Indexed: 12/19/2022] Open

Harispe S, Ranwez S, Janaqi S, Montmain J. Semantic Similarity from Natural Language and Ontology Analysis. ACTA ACUST UNITED AC 2015. [DOI: 10.2200/s00639ed1v01y201504hlt027] [Citation(s) in RCA: 113] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Zhang SB, Lai JH. Semantic similarity measurement between gene ontology terms based on exclusively inherited shared information. Gene 2015;558:108-17. [DOI: 10.1016/j.gene.2014.12.062] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 12/15/2014] [Accepted: 12/24/2014] [Indexed: 11/25/2022]

Palma G, Vidal ME, Haag E, Raschid L, Thor A. Determining similarity of scientific entities in annotation datasets. Database (Oxford) 2015;2015:bau123. [PMID: 25725057 PMCID: PMC4343076 DOI: 10.1093/database/bau123] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Revised: 12/02/2014] [Accepted: 12/03/2014] [Indexed: 11/22/2022]

Affiliation(s)

Guillermo Palma Departamento de Computación Universidad Simón Bolívar, Caracas, Venezuela, Department of Biology, University of Maryland, College Park, MD, 20742 USA Smith School of Business, Institute of Advanced Computer Studies, and Department of Computer Science. College Park, MD, 20742 USA and University of Applied Sciences for Telecommunications, Leipzig, Germany 04277
Maria-Esther Vidal Departamento de Computación Universidad Simón Bolívar, Caracas, Venezuela, Department of Biology, University of Maryland, College Park, MD, 20742 USA Smith School of Business, Institute of Advanced Computer Studies, and Department of Computer Science. College Park, MD, 20742 USA and University of Applied Sciences for Telecommunications, Leipzig, Germany 04277
Eric Haag Departamento de Computación Universidad Simón Bolívar, Caracas, Venezuela, Department of Biology, University of Maryland, College Park, MD, 20742 USA Smith School of Business, Institute of Advanced Computer Studies, and Department of Computer Science. College Park, MD, 20742 USA and University of Applied Sciences for Telecommunications, Leipzig, Germany 04277
Louiqa Raschid Departamento de Computación Universidad Simón Bolívar, Caracas, Venezuela, Department of Biology, University of Maryland, College Park, MD, 20742 USA Smith School of Business, Institute of Advanced Computer Studies, and Department of Computer Science. College Park, MD, 20742 USA and University of Applied Sciences for Telecommunications, Leipzig, Germany 04277
Andreas Thor Departamento de Computación Universidad Simón Bolívar, Caracas, Venezuela, Department of Biology, University of Maryland, College Park, MD, 20742 USA Smith School of Business, Institute of Advanced Computer Studies, and Department of Computer Science. College Park, MD, 20742 USA and University of Applied Sciences for Telecommunications, Leipzig, Germany 04277

Collapse

Na D, Son H, Gsponer J. Categorizer: a tool to categorize genes into user-defined biological groups based on semantic similarity. BMC Genomics 2014;15:1091. [PMID: 25495442 PMCID: PMC4298957 DOI: 10.1186/1471-2164-15-1091] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 12/04/2014] [Indexed: 11/10/2022] Open

Harispe S, Sánchez D, Ranwez S, Janaqi S, Montmain J. A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 2013;48:38-53. [PMID: 24269894 DOI: 10.1016/j.jbi.2013.11.006] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Revised: 11/06/2013] [Accepted: 11/09/2013] [Indexed: 10/26/2022]

Cho YR, Mina M, Lu Y, Kwon N, Guzzi PH. M-Finder: Uncovering functionally associated proteins from interactome data integrated with GO annotations. Proteome Sci 2013;11:S3. [PMID: 24565382 PMCID: PMC3909039 DOI: 10.1186/1477-5956-11-s1-s3] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Abstract

BACKGROUND

Protein-protein interactions (PPIs) play a key role in understanding the mechanisms of cellular processes. The availability of interactome data has catalyzed the development of computational approaches to elucidate functional behaviors of proteins on a system level. Gene Ontology (GO) and its annotations are a significant resource for functional characterization of proteins. Because of wide coverage, GO data have often been adopted as a benchmark for protein function prediction on the genomic scale.

RESULTS

We propose a computational approach, called M-Finder, for functional association pattern mining. This method employs semantic analytics to integrate the genome-wide PPIs with GO data. We also introduce an interactive web application tool that visualizes a functional association network linked to a protein specified by a user. The proposed approach comprises two major components. First, the PPIs that have been generated by high-throughput methods are weighted in terms of their functional consistency using GO and its annotations. We assess two advanced semantic similarity metrics which quantify the functional association level of each interacting protein pair. We demonstrate that these measures outperform the other existing methods by evaluating their agreement to other biological features, such as sequence similarity, the presence of common Pfam domains, and core PPIs. Second, the information flow-based algorithm is employed to discover a set of proteins functionally associated with the protein in a query and their links efficiently. This algorithm reconstructs a functional association network of the query protein. The output network size can be flexibly determined by parameters.

CONCLUSIONS

M-Finder provides a useful framework to investigate functional association patterns with any protein. This software will also allow users to perform further systematic analysis of a set of proteins for any specific function. It is available online at http://bionet.ecs.baylor.edu/mfinder.

Collapse

COUTO FRANCISCOM, PINTO HSOFIA. THE NEXT GENERATION OF SIMILARITY MEASURES THAT FULLY EXPLORE THE SEMANTICS IN BIOMEDICAL ONTOLOGIES. J Bioinform Comput Biol 2013;11:1371001. [DOI: 10.1142/s0219720013710017] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Ferreira JD, Hastings J, Couto FM. Exploiting disjointness axioms to improve semantic similarity measures. Bioinformatics 2013;29:2781-7. [PMID: 24002110 DOI: 10.1093/bioinformatics/btt491] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Teng Z, Guo M, Liu X, Dai Q, Wang C, Xuan P. Measuring gene functional similarity based on group-wise comparison of GO terms. Bioinformatics 2013;29:1424-32. [PMID: 23572412 DOI: 10.1093/bioinformatics/btt160] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Lopes P, Oliveira JL. COEUS: "semantic web in a box" for biomedical applications. J Biomed Semantics 2012;3:11. [PMID: 23244467 PMCID: PMC3554586 DOI: 10.1186/2041-1480-3-11] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Accepted: 11/05/2012] [Indexed: 11/30/2022] Open

Yang H, Nepusz T, Paccanaro A. Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty. ACTA ACUST UNITED AC 2012;28:1383-9. [PMID: 22522134 DOI: 10.1093/bioinformatics/bts129] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

MOTIVATION

Several measures have been recently proposed for quantifying the functional similarity between gene products according to well-structured controlled vocabularies where biological terms are organized in a tree or in a directed acyclic graph (DAG) structure. However, existing semantic similarity measures ignore two important facts. First, when calculating the similarity between two terms, they disregard the descendants of these terms. While this makes no difference when the ontology is a tree, we shall show that it has important consequences when the ontology is a DAG-this is the case, for example, with the Gene Ontology (GO). Second, existing similarity measures do not model the inherent uncertainty which comes from the fact that our current knowledge of the gene annotation and of the ontology structure is incomplete. Here, we propose a novel approach based on downward random walks that can be used to improve any of the existing similarity measures to exhibit these two properties. The approach is computationally efficient-random walks do not need to be simulated as we provide formulas to calculate their stationary distributions.

RESULTS

To show that our approach can potentially improve any semantic similarity measure, we test it on six different semantic similarity measures: three commonly used measures by Resnik (1999), Lin (1998), and Jiang and Conrath (1997); and three recently proposed measures: simUI, simGIC by Pesquita et al. (2008); GraSM by Couto et al. (2007); and Couto and Silva (2011). We applied these improved measures to the GO annotations of the yeast Saccharomyces cerevisiae, and tested how they correlate with sequence similarity, mRNA co-expression and protein-protein interaction data. Our results consistently show that the use of downward random walks leads to more reliable similarity measures.

Collapse

Falda M, Toppo S, Pescarolo A, Lavezzo E, Di Camillo B, Facchinetti A, Cilia E, Velasco R, Fontana P. Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms. BMC Bioinformatics 2012;13 Suppl 4:S14. [PMID: 22536960 PMCID: PMC3314586 DOI: 10.1186/1471-2105-13-s4-s14] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Lv S, Li Y, Wang Q, Ning S, Huang T, Wang P, Sun J, Zheng Y, Liu W, Ai J, Li X. A novel method to quantify gene set functional association based on gene ontology. J R Soc Interface 2011;9:1063-72. [PMID: 21998111 DOI: 10.1098/rsif.2011.0551] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open