1
|
Jiang H, Wang Y, Yin C, Pan H, Chen L, Feng K, Chang Y, Sun H. SLIVER: Unveiling large scale gene regulatory networks of single-cell transcriptomic data through causal structure learning and modules aggregation. Comput Biol Med 2024; 178:108690. [PMID: 38879931 DOI: 10.1016/j.compbiomed.2024.108690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/19/2024] [Accepted: 06/01/2024] [Indexed: 06/18/2024]
Abstract
Prevalent Gene Regulatory Network (GRN) construction methods rely on generalized correlation analysis. However, in biological systems, regulation is essentially a causal relationship that cannot be adequately captured solely through correlation. Therefore, it is more reasonable to infer GRNs from a causal perspective. Existing causal discovery algorithms typically rely on Directed Acyclic Graphs (DAGs) to model causal relationships, but it often requires traversing the entire network, which result in computational demands skyrocketing as the number of nodes grows and make causal discovery algorithms only suitable for small networks with one or two hundred nodes or fewer. In this study, we propose the SLIVER (cauSaL dIscovery Via dimEnsionality Reduction) algorithm which integrates causal structural equation model and graph decomposition. SLIVER introduces a set of factor nodes, serving as abstractions of different functional modules to integrate the regulatory relationships between genes based on their respective functions or pathways, thus reducing the GRN to the product of two low-dimensional matrices. Subsequently, we employ the structural causal model (SCM) to learn the GRN within the gene node space, enforce the DAG constraint in the low-dimensional space, and guide each factor to aggregate various functions through cosine similarity. We evaluate the performance of the SLIVER algorithm on 12 real single cell transcriptomic datasets, and demonstrate it outperforms other 12 widely used methods both in GRN inference performance and computational resource usage. The analysis of the gene information integrated by factor nodes also demonstrate the biological explanation of factor nodes in GRNs. We apply it to scRNA-seq of Type 2 diabetes mellitus to capture the transcriptional regulatory structural changes of β cells under high insulin demand.
Collapse
Affiliation(s)
- Hongyang Jiang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Yuezhu Wang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Chaoyi Yin
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Hao Pan
- College of Software, Jilin University, Changchun, 130012, China
| | - Liqun Chen
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Ke Feng
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China; International Center of Future Science, Jilin University, Changchun, China; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China
| | - Huiyan Sun
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China; International Center of Future Science, Jilin University, Changchun, China; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China.
| |
Collapse
|
2
|
Nguyen QH, Le DH. Similarity Calculation, Enrichment Analysis, and Ontology Visualization of Biomedical Ontologies using UFO. Curr Protoc 2021; 1:e115. [PMID: 33900688 DOI: 10.1002/cpz1.115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The rapid growth of biomedical ontologies observed in recent years has been reported to be useful in various applications. In this article, we propose two main-function protocols-term-related and entity-related-with the three most common ontology analyses, including similarity calculation, enrichment analysis, and ontology visualization, which can be done by separate methods. Many previously developed tools implementing those methods run on different platforms and implement a limited number of the methods for similarity calculation and enrichment analysis tools for a specific type of biomedical ontology, although any type can be acceptable. Moreover, depending on each application, methods have distinct advantages; thus, the greater the number of methods a tool has, the better decisions that users make. The protocol here implements all the analyses above using an advanced popular tool called UFO. UFO is a Cytoscape app that unifies most of the semantic similarity measures for between-term and between-entity similarity calculation for biomedical ontologies in OBO format, which can calculate the similarity between two sets of entities and weigh imported entity networks, as well as generate functional similarity networks. The complete protocol can be performed in 30 min and is designed for use by biologists with no prior bioinformatics training. © 2021 Wiley Periodicals LLC. Basic Protocol: Running UFO using a list of input Gene Ontology, Disease Ontology, or Human Phenotype Ontology data.
Collapse
Affiliation(s)
- Quang-Huy Nguyen
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
| | - Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam.,School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
| |
Collapse
|
3
|
Le DH. UFO: A tool for unifying biomedical ontology-based semantic similarity calculation, enrichment analysis and visualization. PLoS One 2020; 15:e0235670. [PMID: 32645039 PMCID: PMC7347127 DOI: 10.1371/journal.pone.0235670] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 06/22/2020] [Indexed: 02/06/2023] Open
Abstract
Background Biomedical ontologies have been growing quickly and proven to be useful in many biomedical applications. Important applications of those data include estimating the functional similarity between ontology terms and between annotated biomedical entities, analyzing enrichment for a set of biomedical entities. Many semantic similarity calculation and enrichment analysis methods have been proposed for such applications. Also, a number of tools implementing the methods have been developed on different platforms. However, these tools have implemented a small number of the semantic similarity calculation and enrichment analysis methods for a certain type of biomedical ontology. Note that the methods can be applied to all types of biomedical ontologies. More importantly, each method can be dominant in different applications; thus, users have more choice with more number of methods implemented in tools. Also, more functions would facilitate their task with ontology. Results In this study, we developed a Cytoscape app, named UFO, which unifies most of the semantic similarity measures for between-term and between-entity similarity calculation for all types of biomedical ontologies in OBO format. Based on the similarity calculation, UFO can calculate the similarity between two sets of entities and weigh imported entity networks as well as generate functional similarity networks. Besides, it can perform enrichment analysis of a set of entities by different methods. Moreover, UFO can visualize structural relationships between ontology terms, annotating relationships between entities and terms, and functional similarity between entities. Finally, we demonstrated the ability of UFO through some case studies on finding the best semantic similarity measures for assessing the similarity between human disease phenotypes, constructing biomedical entity functional similarity networks for predicting disease-associated biomarkers, and performing enrichment analysis on a set of similar phenotypes. Conclusions Taken together, UFO is expected to be a tool where biomedical ontologies can be exploited for various biomedical applications. Availability UFO is distributed as a Cytoscape app, and can be downloaded freely at Cytoscape App (http://apps.cytoscape.org/apps/ufo) for non-commercial use
Collapse
Affiliation(s)
- Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
- School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
- * E-mail:
| |
Collapse
|
4
|
Xiao X, Chen WJ, Qiu WR. A Novel Prediction of Quaternary Structural Type of Proteins with Gene Ontology. Protein Pept Lett 2019; 27:313-320. [PMID: 31749418 DOI: 10.2174/0929866526666191014144618] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 05/20/2019] [Accepted: 06/29/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND The information of quaternary structure attributes of proteins is very important because it is closely related to the biological functions of proteins. With the rapid development of new generation sequencing technology, we are facing a challenge: how to automatically identify the four-level attributes of new polypeptide chains according to their sequence information (i.e., whether they are formed as just as a monomer, or as a hetero-oligomer, or a homo-oligomer). OBJECTIVE In this article, our goal is to find a new way to represent protein sequences, thereby improving the prediction rate of protein quaternary structure. METHODS In this article, we developed a prediction system for protein quaternary structural type in which a protein sequence was expressed by combining the Pfam functional-domain and gene ontology. turn protein features into digital sequences, and complete the prediction of quaternary structure through specific machine learning algorithms and verification algorithm. RESULTS Our data set contains 5495 protein samples. Through the method provided in this paper, we classify proteins into monomer, or as a hetero-oligomer, or a homo-oligomer, and the prediction rate is 74.38%, which is 3.24% higher than that of previous studies. Through this new feature extraction method, we can further classify the four-level structure of proteins, and the results are also correspondingly improved. CONCLUSION After the applying the new prediction system, compared with the previous results, we have successfully improved the prediction rate. We have reason to believe that the feature extraction method in this paper has better practicability and can be used as a reference for other protein classification problems.
Collapse
Affiliation(s)
- Xuan Xiao
- School of Information, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Wei-Jie Chen
- School of Information, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Wang-Ren Qiu
- School of Information, Jingdezhen Ceramic Institute, Jingdezhen 333403, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
5
|
Hayes WB, Mamano N. SANA NetGO: a combinatorial approach to using Gene Ontology (GO) terms to score network alignments. Bioinformatics 2019; 34:1345-1352. [PMID: 29228175 DOI: 10.1093/bioinformatics/btx716] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 12/04/2017] [Indexed: 01/05/2023] Open
Abstract
Motivation Gene Ontology (GO) terms are frequently used to score alignments between protein-protein interaction (PPI) networks. Methods exist to measure GO similarity between proteins in isolation, but proteins in a network alignment are not isolated: each pairing is dependent on every other via the alignment itself. Existing measures fail to take into account the frequency of GO terms across networks, instead imposing arbitrary rules on when to allow GO terms. Results Here we develop NetGO, a new measure that naturally weighs infrequent, informative GO terms more heavily than frequent, less informative GO terms, without arbitrary cutoffs, instead downweighting GO terms according to their frequency in the networks being aligned. This is a global measure applicable only to alignments, independent of pairwise GO measures, in the same sense that the edge-based EC or S3 scores are global measures of topological similarity independent of pairwise topological similarities. We demonstrate the superiority of NetGO in alignments of predetermined quality and show that NetGO correlates with alignment quality better than any existing GO-based alignment measures. We also demonstrate that NetGO provides a measure of taxonomic similarity between species, consistent with existing taxonomic measuresa feature not shared with existing GObased network alignment measures. Finally, we re-score alignments produced by almost a dozen aligners from a previous study and show that NetGO does a better job at separating good alignments from bad ones. Availability and implementation Available as part of SANA. Contact whayes@uci.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA 92697-3435, USA
| | - Nil Mamano
- Department of Computer Science, University of California, Irvine, CA 92697-3435, USA
| |
Collapse
|
6
|
Zhou S, Kang H, Yao B, Gong Y. An automated pipeline for analyzing medication event reports in clinical settings. BMC Med Inform Decis Mak 2018; 18:113. [PMID: 30526590 PMCID: PMC6284273 DOI: 10.1186/s12911-018-0687-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Medication events in clinical settings are significant threats to patient safety. Analyzing and learning from the medication event reports is an important way to prevent the recurrence of these events. Currently, the analysis of medication event reports is ineffective and requires heavy workloads for clinicians. An automated pipeline is proposed to help clinicians deal with the accumulated reports, extract valuable information and generate feedback from the reports. Thus, the strategy of medication event prevention can be further developed based on the lessons learned. METHODS In order to build the automated pipeline, four classic machine learning classifiers (i.e., support vector machine, Naïve Bayes, random forest, and multi-layer perceptron) were compared to identify the event originating stages, event types, and event causes from the medication event reports. The precision, recall and F-1 measure were calculated to assess the performance of the classifiers. Further, a strategy to measure the similarity of medication event reports in our pipeline was established and evaluated by human subjects through a questionnaire. RESULTS We developed three classifiers to identify the medication event originating stages, event types and causes, respectively. For the event originating stages, a support vector machine classifier obtains the best performance with an F-1 measure of 0.792. For the event types, a support vector machine classifier exhibits the best performance with an F-1 measure of 0.758. And for the event causes, a random forest classifier reaches an F-1 measure of 0.925. The questionnaire results show that the similarity measurement is consistent with the domain experts in the task of identifying similar reports. CONCLUSION We developed and evaluated an automated pipeline that could identify three attributes from the medication event reports and calculate the similarity scores between the reports based on the attributes. The pipeline is expected to improve the efficiency of analyzing the medication event reports and to learn from the reports in a timely manner.
Collapse
Affiliation(s)
- Sicheng Zhou
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin Street, Suite 600, Houston, 77030, TX, USA
| | - Hong Kang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin Street, Suite 600, Houston, 77030, TX, USA
| | - Bin Yao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin Street, Suite 600, Houston, 77030, TX, USA
| | - Yang Gong
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin Street, Suite 600, Houston, 77030, TX, USA.
| |
Collapse
|
7
|
Liu W, Liu J, Rajapakse JC. Gene Ontology Enrichment Improves Performances of Functional Similarity of Genes. Sci Rep 2018; 8:12100. [PMID: 30108262 PMCID: PMC6092333 DOI: 10.1038/s41598-018-30455-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 07/25/2018] [Indexed: 12/23/2022] Open
Abstract
There exists a plethora of measures to evaluate functional similarity (FS) between genes, which is a widely used in many bioinformatics applications including detecting molecular pathways, identifying co-expressed genes, predicting protein-protein interactions, and prioritization of disease genes. Measures of FS between genes are mostly derived from Information Contents (IC) of Gene Ontology (GO) terms annotating the genes. However, existing measures evaluating IC of terms based either on the representations of terms in the annotating corpus or on the knowledge embedded in the GO hierarchy do not consider the enrichment of GO terms by the querying pair of genes. The enrichment of a GO term by a pair of gene is dependent on whether the term is annotated by one gene (i.e., partial annotation) or by both genes (i.e. complete annotation) in the pair. In this paper, we propose a method that incorporate enrichment of GO terms by a gene pair in computing their FS and show that GO enrichment improves the performances of 46 existing FS measures in the prediction of sequence homologies, gene expression correlations, protein-protein interactions, and disease associated genes.
Collapse
Affiliation(s)
- Wenting Liu
- Human Genetics, Genome Institute of Singapore, Singapore, Singapore.
| | - Jianjun Liu
- Human Genetics, Genome Institute of Singapore, Singapore, Singapore.
| | - Jagath C Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
8
|
Zhang J, Jia K, Jia J, Qian Y. An improved approach to infer protein-protein interaction based on a hierarchical vector space model. BMC Bioinformatics 2018; 19:161. [PMID: 29699476 PMCID: PMC5921294 DOI: 10.1186/s12859-018-2152-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 04/09/2018] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Comparing and classifying functions of gene products are important in today's biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches proposed, those based on the vector space model are relatively simple, but their effectiveness is far from satisfying. RESULTS We propose a Hierarchical Vector Space Model (HVSM) for computing semantic similarity between different genes or their products, which enhances the basic vector space model by introducing the relation between GO terms. Besides the directly annotated terms, HVSM also takes their ancestors and descendants related by "is_a" and "part_of" relations into account. Moreover, HVSM introduces the concept of a Certainty Factor to calibrate the semantic similarity based on the number of terms annotated to genes. To assess the performance of our method, we applied HVSM to Homo sapiens and Saccharomyces cerevisiae protein-protein interaction datasets. Compared with TCSS, Resnik, and other classic similarity measures, HVSM achieved significant improvement for distinguishing positive from negative protein interactions. We also tested its correlation with sequence, EC, and Pfam similarity using online tool CESSM. CONCLUSIONS HVSM showed an improvement of up to 4% compared to TCSS, 8% compared to IntelliGO, 12% compared to basic VSM, 6% compared to Resnik, 8% compared to Lin, 11% compared to Jiang, 8% compared to Schlicker, and 11% compared to SimGIC using AUC scores. CESSM test showed HVSM was comparable to SimGIC, and superior to all other similarity measures in CESSM as well as TCSS. Supplementary information and the software are available at https://github.com/kejia1215/HVSM .
Collapse
Affiliation(s)
- Jiongmin Zhang
- Department of Computer Science & Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062 China
| | - Ke Jia
- Department of Computer Science & Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062 China
| | - Jinmeng Jia
- School of life science, East China Normal University, Dongchuan Road, Shanghai, 200241 China
| | - Ying Qian
- Department of Computer Science & Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062 China
| |
Collapse
|
9
|
Yu H, Jiao B, Lu L, Wang P, Chen S, Liang C, Liu W. NetMiner-an ensemble pipeline for building genome-wide and high-quality gene co-expression network using massive-scale RNA-seq samples. PLoS One 2018; 13:e0192613. [PMID: 29425247 PMCID: PMC5806890 DOI: 10.1371/journal.pone.0192613] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 01/27/2018] [Indexed: 01/10/2023] Open
Abstract
Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.
Collapse
Affiliation(s)
- Hua Yu
- Nantong Medical College and School of Pharmacy, Nantong University, Nantong, China
- State Key Laboratory of Plant Genomics, Institute of Genetic and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- * E-mail: , , (HY); (CL); (WL)
| | - Bingke Jiao
- State Key Laboratory of Plant Genomics, Institute of Genetic and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lu Lu
- Nantong Polytechnic College, Nantong, China
| | - Pengfei Wang
- Nantong Medical College and School of Pharmacy, Nantong University, Nantong, China
| | - Shuangcheng Chen
- Nantong Medical College and School of Pharmacy, Nantong University, Nantong, China
| | - Chengzhi Liang
- State Key Laboratory of Plant Genomics, Institute of Genetic and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- * E-mail: , , (HY); (CL); (WL)
| | - Wei Liu
- Nantong Medical College and School of Pharmacy, Nantong University, Nantong, China
- * E-mail: , , (HY); (CL); (WL)
| |
Collapse
|
10
|
Mazandu GK, Chimusa ER, Mulder NJ. Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery. Brief Bioinform 2017; 18:886-901. [PMID: 27473066 DOI: 10.1093/bib/bbw067] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Indexed: 01/02/2023] Open
Abstract
Gene Ontology (GO) semantic similarity tools enable retrieval of semantic similarity scores, which incorporate biological knowledge embedded in the GO structure for comparing or classifying different proteins or list of proteins based on their GO annotations. This facilitates a better understanding of biological phenomena underlying the corresponding experiment and enables the identification of processes pertinent to different biological conditions. Currently, about 14 tools are available, which may play an important role in improving protein analyses at the functional level using different GO semantic similarity measures. Here we survey these tools to provide a comprehensive view of the challenges and advances made in this area to avoid redundant effort in developing features that already exist, or implementing ideas already proven to be obsolete in the context of GO. This helps researchers, tool developers, as well as end users, understand the underlying semantic similarity measures implemented through knowledge of pertinent features of, and issues related to, a particular tool. This should empower users to make appropriate choices for their biological applications and ensure effective knowledge discovery based on GO annotations.
Collapse
|
11
|
Kang H, Gong Y. Developing a similarity searching module for patient safety event reporting system using semantic similarity measures. BMC Med Inform Decis Mak 2017; 17:75. [PMID: 28699567 PMCID: PMC5506579 DOI: 10.1186/s12911-017-0467-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background The most important knowledge in the field of patient safety is regarding the prevention and reduction of patient safety events (PSE) during treatment and care. The similarities and patterns among the events may otherwise go unnoticed if they are not properly reported and analyzed. There is an urgent need for developing a PSE reporting system that can dynamically measure the similarities of the events and thus promote event analysis and learning effect. Methods In this study, three prevailing algorithms of semantic similarity were implemented to measure the similarities of the 366 PSE annotated by the taxonomy of The Agency for Healthcare Research and Quality (AHRQ). The performance of each algorithm was then evaluated by a group of domain experts based on a 4-point Likert scale. The consistency between the scales of the algorithms and experts was measured and compared with the scales randomly assigned. The similarity algorithms and scores, as a self-learning and self-updating module, were then integrated into the system. Results The result shows that the similarity scores reflect a high consistency with the experts’ review than those randomly assigned. Moreover, incorporating the algorithms into our reporting system enables a mechanism to learn and update based upon PSE similarity. Conclusion In conclusion, integrating semantic similarity algorithms into a PSE reporting system can help us learn from previous events and provide timely knowledge support to the reporters. With the knowledge base in the PSE domain, the new generation reporting system holds promise in educating healthcare providers and preventing the recurrence and serious consequences of PSE.
Collapse
Affiliation(s)
- Hong Kang
- School of Biomedical Informatics, the University of Texas Health Science Center at Houston, 7000 Fannin St., Houston, TX, 77030, USA
| | - Yang Gong
- School of Biomedical Informatics, the University of Texas Health Science Center at Houston, 7000 Fannin St., Houston, TX, 77030, USA.
| |
Collapse
|
12
|
Tian Z, Wang C, Guo M, Liu X, Teng Z. An improved method for functional similarity analysis of genes based on Gene Ontology. BMC SYSTEMS BIOLOGY 2016; 10:119. [PMID: 28155727 PMCID: PMC5259995 DOI: 10.1186/s12918-016-0359-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Background Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. Results We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. Conclusions The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0359-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhen Tian
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Chunyu Wang
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Maozu Guo
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Xiaoyan Liu
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Zhixia Teng
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.,Department of Information Management and Information System, Northeast Forestry University, Harbin, 150001, People's Republic of China
| |
Collapse
|
13
|
Harispe S, Ranwez S, Janaqi S, Montmain J. Semantic Similarity from Natural Language and Ontology Analysis. ACTA ACUST UNITED AC 2015. [DOI: 10.2200/s00639ed1v01y201504hlt027] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
14
|
Konopka BM, Golda T, Kotulska M. Evaluating the Significance of Protein Functional Similarity Based on Gene Ontology. J Comput Biol 2014; 21:809-22. [DOI: 10.1089/cmb.2014.0181] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Affiliation(s)
- Bogumil M. Konopka
- Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, Wroclaw, Poland
| | - Tomasz Golda
- Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, Wroclaw, Poland
| | - Malgorzata Kotulska
- Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, Wroclaw, Poland
| |
Collapse
|
15
|
OrthoClust: an orthology-based network framework for clustering data across multiple species. Genome Biol 2014; 15:R100. [PMID: 25249401 PMCID: PMC4289247 DOI: 10.1186/gb-2014-15-8-r100] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Accepted: 06/26/2014] [Indexed: 01/28/2023] Open
Abstract
Increasingly, high-dimensional genomics data are becoming available for many organisms.Here, we develop OrthoClust for simultaneously clustering data across multiple species. OrthoClust is a computational framework that integrates the co-association networks of individual species by utilizing the orthology relationships of genes between species. It outputs optimized modules that are fundamentally cross-species, which can either be conserved or species-specific. We demonstrate the application of OrthoClust using the RNA-Seq expression profiles of Caenorhabditis elegans and Drosophila melanogaster from the modENCODE consortium. A potential application of cross-species modules is to infer putative analogous functions of uncharacterized elements like non-coding RNAs based on guilt-by-association.
Collapse
|
16
|
Lin CC, Chang YM, Pan CT, Chen CC, Ling L, Tsao KC, Yang RB, Li WH. Functional evolution of cardiac microRNAs in heart development and functions. Mol Biol Evol 2014; 31:2722-34. [PMID: 25063441 DOI: 10.1093/molbev/msu217] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
MicroRNAs (miRNAs) are a class of endogenous small noncoding RNAs that regulate gene expression either by degrading target mRNAs or by suppressing protein translation. miRNAs have been found to be involved in many biological processes, such as development, differentiation, and growth. However, the evolution of miRNA regulatory functions and networks has not been well studied. In this study, we conducted a cross-species analysis to study the evolution of cardiac miRNAs and their regulatory functions and networks. We found that conserved cardiac miRNA target genes have maintained highly conserved cardiac functions. Additionally, most of cardiac miRNA target genes in human with annotations of cardiac functions evolved from the corresponding homologous targets, which are also involved in heart development-related functions. On the basis of these results, we investigated the functional evolution of cardiac miRNAs and presented a functional evolutionary map. From this map, we identified the evolutionary time at which the cardiac miRNAs became involved in heart development or function and found that the biological processes of heart development evolved earlier than those of heart functions, for example, heart contraction/relaxation or cardiac hypertrophy. Our study of the evolution of the cardiac miRNA regulatory networks revealed the emergence of new regulatory functional branches during evolution. Furthermore, we discovered that early evolved cardiac miRNA target genes tend to participate in the early stages of heart development. This study sheds light on the evolution of developmental features of genes regulated by cardiac miRNAs.
Collapse
Affiliation(s)
- Chen-Ching Lin
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan Department of Ecology and Evolution, University of Chicago
| | - Yao-Ming Chang
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan Department of Ecology and Evolution, University of Chicago
| | - Cheng-Tsung Pan
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Chien-Chang Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Li Ling
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Ku-Chi Tsao
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Ruey-Bing Yang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Wen-Hsiung Li
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan Department of Ecology and Evolution, University of Chicago
| |
Collapse
|
17
|
Fu HL, Wu DP, Wang XF, Wang JG, Jiao F, Song LL, Xie H, Wen XY, Shan HS, Du YX, Zhao YP. Altered miRNA expression is associated with differentiation, invasion, and metastasis of esophageal squamous cell carcinoma (ESCC) in patients from Huaian, China. Cell Biochem Biophys 2014; 67:657-68. [PMID: 23516093 DOI: 10.1007/s12013-013-9554-3] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Esophageal squamous cell carcinoma (ESCC) is the leading malignancy in Huaian, China. Recently, emerging studies have suggested that an aberrant microRNA (miRNA) expression signature exists in ESCC. However, there is discordant information available on specific miRNA expression in patients from different regions. In this study, we identified 12 miRNAs that are differentially expressed in patients with ESCC from Huaian, China. Among these miRNAs that displayed unique miRNA expression signatures, miR-1, miR-29c, miR-100, miR-133a, miR-133b, miR-143, miR-145, and miR-195 were downregulated, and miR-7, miR-21, miR-223, and miR-1246 were upregulated in cancerous tissue compared with the adjacent normal tissue. Bioinformatics analyses identified the major biological processes and signaling pathways that are targeted by these differentially expressed miRNAs. Accordingly, miR-29c, miR-100, miR-133a, and miR-133b were found to be involved in invasion and metastasis of ESCC, and miR-7 and miR-21 were found to be related to the differentiation of ESCC. Thus, our data present new evidence for the important roles of miRNAs in ESCC.
Collapse
Affiliation(s)
- Hai Long Fu
- Department of Laboratory Medicine, the 82nd Hospital of the People's Liberation Army, Huaian, 223001, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins. PLoS One 2014; 9:e89545. [PMID: 24647341 PMCID: PMC3960097 DOI: 10.1371/journal.pone.0089545] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2013] [Accepted: 01/23/2014] [Indexed: 12/23/2022] Open
Abstract
Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.
Collapse
|
19
|
A simulation to analyze feature selection methods utilizing gene ontology for gene expression classification. J Biomed Inform 2013; 46:1044-59. [PMID: 23892294 DOI: 10.1016/j.jbi.2013.07.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Revised: 07/05/2013] [Accepted: 07/21/2013] [Indexed: 01/02/2023]
Abstract
Gene expression profile classification is a pivotal research domain assisting in the transformation from traditional to personalized medicine. A major challenge associated with gene expression data classification is the small number of samples relative to the large number of genes. To address this problem, researchers have devised various feature selection algorithms to reduce the number of genes. Recent studies have been experimenting with the use of semantic similarity between genes in Gene Ontology (GO) as a method to improve feature selection. While there are few studies that discuss how to use GO for feature selection, there is no simulation study that addresses when to use GO-based feature selection. To investigate this, we developed a novel simulation, which generates binary class datasets, where the differentially expressed genes between two classes have some underlying relationship in GO. This allows us to investigate the effects of various factors such as the relative connectedness of the underlying genes in GO, the mean magnitude of separation between differentially expressed genes denoted by δ, and the number of training samples. Our simulation results suggest that the connectedness in GO of the differentially expressed genes for a biological condition is the primary factor for determining the efficacy of GO-based feature selection. In particular, as the connectedness of differentially expressed genes increases, the classification accuracy improvement increases. To quantify this notion of connectedness, we defined a measure called Biological Condition Annotation Level BCAL(G), where G is a graph of differentially expressed genes. Our main conclusions with respect to GO-based feature selection are the following: (1) it increases classification accuracy when BCAL(G) ≥ 0.696; (2) it decreases classification accuracy when BCAL(G) ≤ 0.389; (3) it provides marginal accuracy improvement when 0.389<BCAL(G)<0.696 and δ<1; (4) as the number of genes in a biological condition increases beyond 50 and δ ≥ 0.7, the improvement from GO-based feature selection decreases; and (5) we recommend not using GO-based feature selection when a biological condition has less than ten genes. Our results are derived from datasets preprocessed using RMA (Robust Multi-array Average), cases where δ is between 0.3 and 2.5, and training sample sizes between 20 and 200, therefore our conclusions are limited to these specifications. Overall, this simulation is innovative and addresses the question of when SoFoCles-style feature selection should be used for classification instead of statistical-based ranking measures.
Collapse
|
20
|
A novel insight into Gene Ontology semantic similarity. Genomics 2013; 101:368-75. [DOI: 10.1016/j.ygeno.2013.04.010] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Revised: 04/08/2013] [Accepted: 04/19/2013] [Indexed: 12/28/2022]
|
21
|
Abate F, Acquaviva A, Ficarra E, Piva R, Macii E. Gelsius: a literature-based workflow for determining quantitative associations between genes and biological processes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:619-631. [PMID: 24091396 DOI: 10.1109/tcbb.2013.11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
An effective knowledge extraction and quantification methodology from biomedical literature would allow the researcher to organize and analyze the results of high-throughput experiments on microarrays and next-generation sequencing technologies. Despite the large amount of raw information available on the web, a tool able to extract a measure of the correlation between a list of genes and biological processes is not yet available. In this paper, we present Gelsius, a workflow that incorporates biomedical literature to quantify the correlation between genes and terms describing biological processes. To achieve this target, we build different modules focusing on query expansion and document cononicalization. In this way, we reached to improve the measurement of correlation, performed using a latent semantic analysis approach. To the best of our knowledge, this is the first complete tool able to extract a measure of genes-biological processes correlation from literature. We demonstrate the effectiveness of the proposed workflow on six biological processes and a set of genes, by showing that correlation results for known relationships are in accordance with definitions of gene functions provided by NCI Thesaurus. On the other side, the tool is able to propose new candidate relationships for later experimental validation. The tool is available at >http://bioeda1.polito.it:8080/medSearchServlet/.
Collapse
|
22
|
Teng Z, Guo M, Liu X, Dai Q, Wang C, Xuan P. Measuring gene functional similarity based on group-wise comparison of GO terms. Bioinformatics 2013; 29:1424-32. [PMID: 23572412 DOI: 10.1093/bioinformatics/btt160] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
23
|
Xu L, Cheng C, George EO, Homayouni R. Literature aided determination of data quality and statistical significance threshold for gene expression studies. BMC Genomics 2012; 13 Suppl 8:S23. [PMID: 23282414 PMCID: PMC3535704 DOI: 10.1186/1471-2164-13-s8-s23] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background Gene expression data are noisy due to technical and biological variability. Consequently, analysis of gene expression data is complex. Different statistical methods produce distinct sets of genes. In addition, selection of expression p-value (EPv) threshold is somewhat arbitrary. In this study, we aimed to develop novel literature based approaches to integrate functional information in analysis of gene expression data. Methods Functional relationships between genes were derived by Latent Semantic Indexing (LSI) of Medline abstracts and used to calculate the function cohesion of gene sets. In this study, literature cohesion was applied in two ways. First, Literature-Based Functional Significance (LBFS) method was developed to calculate a p-value for the cohesion of differentially expressed genes (DEGs) in order to objectively evaluate the overall biological significance of the gene expression experiments. Second, Literature Aided Statistical Significance Threshold (LASST) was developed to determine the appropriate expression p-value threshold for a given experiment. Results We tested our methods on three different publicly available datasets. LBFS analysis demonstrated that only two experiments were significantly cohesive. For each experiment, we also compared the LBFS values of DEGs generated by four different statistical methods. We found that some statistical tests produced more functionally cohesive gene sets than others. However, no statistical test was consistently better for all experiments. This reemphasizes that a statistical test must be carefully selected for each expression study. Moreover, LASST analysis demonstrated that the expression p-value thresholds for some experiments were considerably lower (p < 0.02 and 0.01), suggesting that the arbitrary p-values and false discovery rate thresholds that are commonly used in expression studies may not be biologically sound. Conclusions We have developed robust and objective literature-based methods to evaluate the biological support for gene expression experiments and to determine the appropriate statistical significance threshold. These methods will assist investigators to more efficiently extract biologically meaningful insights from high throughput gene expression experiments.
Collapse
Affiliation(s)
- Lijing Xu
- Bioinformatics Program, Memphis, TN 38152, USA
| | | | | | | |
Collapse
|
24
|
Hsu CL, Yang UC. Discovering pathway cross-talks based on functional relations between pathways. BMC Genomics 2012; 13 Suppl 7:S25. [PMID: 23282018 PMCID: PMC3521217 DOI: 10.1186/1471-2164-13-s7-s25] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background In biological systems, pathways coordinate or interact with one another to achieve a complex biological process. Studying how they influence each other is essential for understanding the intricacies of a biological system. However, current methods rely on statistical tests to determine pathway relations, and may lose numerous biologically significant relations. Results This study proposes a method that identifies the pathway relations by measuring the functional relations between pathways based on the Gene Ontology (GO) annotations. This approach identified 4,661 pathway relations among 166 pathways from Pathway Interaction Database (PID). Using 143 pathway interactions from PID as testing data, the function-based approach (FBA) is able to identify 93% of pathway interactions, better than the existing methods based on the shared components and protein-protein interactions. Many well-known pathway cross-talks are only identified by FBA. In addition, the false positive rate of FBA is significantly lower than others via pathway co-expression analysis. Conclusions This function-based approach appears to be more sensitive and able to infer more biologically significant and explainable pathway relations.
Collapse
Affiliation(s)
- Chia-Lang Hsu
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | | |
Collapse
|
25
|
Alvarez MA, Yan C. A graph-based semantic similarity measure for the gene ontology. J Bioinform Comput Biol 2012; 9:681-95. [PMID: 22084008 DOI: 10.1142/s0219720011005641] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2011] [Revised: 06/23/2011] [Accepted: 06/24/2011] [Indexed: 11/18/2022]
Abstract
Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.
Collapse
Affiliation(s)
- Marco A Alvarez
- Department of Computer Science, Utah State University, Logan, Utah 84322, USA.
| | | |
Collapse
|
26
|
Glass K, Ott E, Losert W, Girvan M. Implications of functional similarity for gene regulatory interactions. J R Soc Interface 2012; 9:1625-36. [PMID: 22298814 DOI: 10.1098/rsif.2011.0585] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
If one gene regulates another, those two genes are likely to be involved in many of the same biological functions. Conversely, shared biological function may be suggestive of the existence and nature of a regulatory interaction. With this in mind, we develop a measure of functional similarity between genes based on annotations made to the Gene Ontology in which the magnitude of their functional relationship is also indicative of a regulatory relationship. In contrast to other measures that have previously been used to quantify the functional similarity between genes, our measure scales the strength of any shared functional annotation by the frequency of that function's appearance across the entire set of annotations. We apply our method to both Escherichia coli and Saccharomyces cerevisiae gene annotations and find that the strength of our scaled similarity measure is more predictive of known regulatory interactions than previously published measures of functional similarity. In addition, we observe that the strength of the scaled similarity measure is correlated with the structural importance of links in the known regulatory network. By contrast, other measures of functional similarity are not indicative of any structural importance in the regulatory network. We therefore conclude that adequately adjusting for the frequency of shared biological functions is important in the construction of a functional similarity measure aimed at elucidating the existence and nature of regulatory interactions. We also compare the performance of the scaled similarity with a high-throughput method for determining regulatory interactions from gene expression data and observe that the ontology-based approach identifies a different subset of regulatory interactions compared with the gene expression approach. We show that combining predictions from the scaled similarity with those from the reconstruction algorithm leads to a significant improvement in the accuracy of the reconstructed network.
Collapse
Affiliation(s)
- Kimberly Glass
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| | | | | | | |
Collapse
|
27
|
Wang F, Liu M, Song B, Li D, Pei H, Guo Y, Huang J, Zhang D. Prediction and characterization of protein-protein interaction networks in swine. Proteome Sci 2012; 10:2. [PMID: 22230699 PMCID: PMC3306829 DOI: 10.1186/1477-5956-10-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2011] [Accepted: 01/10/2012] [Indexed: 11/13/2022] Open
Abstract
Background Studying the large-scale protein-protein interaction (PPI) network is important in understanding biological processes. The current research presents the first PPI map of swine, which aims to give new insights into understanding their biological processes. Results We used three methods, Interolog-based prediction of porcine PPI network, domain-motif interactions from structural topology-based prediction of porcine PPI network and motif-motif interactions from structural topology-based prediction of porcine PPI network, to predict porcine protein interactions among 25,767 porcine proteins. We predicted 20,213, 331,484, and 218,705 porcine PPIs respectively, merged the three results into 567,441 PPIs, constructed four PPI networks, and analyzed the topological properties of the porcine PPI networks. Our predictions were validated with Pfam domain annotations and GO annotations. Averages of 70, 10,495, and 863 interactions were related to the Pfam domain-interacting pairs in iPfam database. For comparison, randomized networks were generated, and averages of only 4.24, 66.79, and 44.26 interactions were associated with Pfam domain-interacting pairs in iPfam database. In GO annotations, we found 52.68%, 75.54%, 27.20% of the predicted PPIs sharing GO terms respectively. However, the number of PPI pairs sharing GO terms in the 10,000 randomized networks reached 52.68%, 75.54%, 27.20% is 0. Finally, we determined the accuracy and precision of the methods. The methods yielded accuracies of 0.92, 0.53, and 0.50 at precisions of about 0.93, 0.74, and 0.75, respectively. Conclusion The results reveal that the predicted PPI networks are considerably reliable. The present research is an important pioneering work on protein function research. The porcine PPI data set, the confidence score of each interaction and a list of related data are available at (http://pppid.biositemap.com/).
Collapse
Affiliation(s)
- Fen Wang
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Ramírez F, Lawyer G, Albrecht M. Novel search method for the discovery of functional relationships. ACTA ACUST UNITED AC 2011; 28:269-76. [PMID: 22180409 PMCID: PMC3259435 DOI: 10.1093/bioinformatics/btr631] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Motivation: Numerous annotations are available that functionally characterize genes and proteins with regard to molecular process, cellular localization, tissue expression, protein domain composition, protein interaction, disease association and other properties. Searching this steadily growing amount of information can lead to the discovery of new biological relationships between genes and proteins. To facilitate the searches, methods are required that measure the annotation similarity of genes and proteins. However, most current similarity methods are focused only on annotations from the Gene Ontology (GO) and do not take other annotation sources into account. Results: We introduce the new method BioSim that incorporates multiple sources of annotations to quantify the functional similarity of genes and proteins. We compared the performance of our method with four other well-known methods adapted to use multiple annotation sources. We evaluated the methods by searching for known functional relationships using annotations based only on GO or on our large data warehouse BioMyn. This warehouse integrates many diverse annotation sources of human genes and proteins. We observed that the search performance improved substantially for almost all methods when multiple annotation sources were included. In particular, our method outperformed the other methods in terms of recall and average precision. Contact:mario.albrecht@mpi-inf.mpg.de Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fidel Ramírez
- Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbrücken, Germany
| | | | | |
Collapse
|
29
|
Guzzi PH, Mina M, Guerra C, Cannataro M. Semantic similarity analysis of protein data: assessment with biological features and issues. Brief Bioinform 2011; 13:569-85. [PMID: 22138322 DOI: 10.1093/bib/bbr066] [Citation(s) in RCA: 112] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The integration of proteomics data with biological knowledge is a recent trend in bioinformatics. A lot of biological information is available and is spread on different sources and encoded in different ontologies (e.g. Gene Ontology). Annotating existing protein data with biological information may enable the use (and the development) of algorithms that use biological ontologies as framework to mine annotated data. Recently many methodologies and algorithms that use ontologies to extract knowledge from data, as well as to analyse ontologies themselves have been proposed and applied to other fields. Conversely, the use of such annotations for the analysis of protein data is a relatively novel research area that is currently becoming more and more central in research. Existing approaches span from the definition of the similarity among genes and proteins on the basis of the annotating terms, to the definition of novel algorithms that use such similarities for mining protein data on a proteome-wide scale. This work, after the definition of main concept of such analysis, presents a systematic discussion and comparison of main approaches. Finally, remaining challenges, as well as possible future directions of research are presented.
Collapse
|
30
|
du Plessis L, Skunca N, Dessimoz C. The what, where, how and why of gene ontology--a primer for bioinformaticians. Brief Bioinform 2011; 12:723-35. [PMID: 21330331 PMCID: PMC3220872 DOI: 10.1093/bib/bbr002] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications.
Collapse
Affiliation(s)
- Louis du Plessis
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | | | | |
Collapse
|
31
|
Benabderrahmane S, Smail-Tabbone M, Poch O, Napoli A, Devignes MD. IntelliGO: a new vector-based semantic similarity measure including annotation origin. BMC Bioinformatics 2010; 11:588. [PMID: 21122125 PMCID: PMC3098105 DOI: 10.1186/1471-2105-11-588] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Accepted: 12/01/2010] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND The Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. These measures generally involve the GO graph structure, the information content of GO aspects, or a combination of both. However, only a few of the semantic similarity measures described so far can handle GO annotations differently according to their origin (i.e. their evidence codes). RESULTS We present here a new semantic similarity measure called IntelliGO which integrates several complementary properties in a novel vector space model. The coefficients associated with each GO term that annotates a given gene or protein include its information content as well as a customized value for each type of GO evidence code. The generalized cosine similarity measure, used for calculating the dot product between two vectors, has been rigorously adapted to the context of the GO graph. The IntelliGO similarity measure is tested on two benchmark datasets consisting of KEGG pathways and Pfam domains grouped as clans, considering the GO biological process and molecular function terms, respectively, for a total of 683 yeast and human genes and involving more than 67,900 pair-wise comparisons. The ability of the IntelliGO similarity measure to express the biological cohesion of sets of genes compares favourably to four existing similarity measures. For inter-set comparison, it consistently discriminates between distinct sets of genes. Furthermore, the IntelliGO similarity measure allows the influence of weights assigned to evidence codes to be checked. Finally, the results obtained with a complementary reference technique give intermediate but correct correlation values with the sequence similarity, Pfam, and Enzyme classifications when compared to previously published measures. CONCLUSIONS The IntelliGO similarity measure provides a customizable and comprehensive method for quantifying gene similarity based on GO annotations. It also displays a robust set-discriminating power which suggests it will be useful for functional clustering. AVAILABILITY An on-line version of the IntelliGO similarity measure is available at: http://bioinfo.loria.fr/Members/benabdsi/intelligo_project/
Collapse
Affiliation(s)
- Sidahmed Benabderrahmane
- LORIA (CNRS, INRIA, Nancy-Université), Équipe Orpailleur, Bâtiment B, Campus scientifique, 54506 Vandoeuvre-lès-Nancy Cedex, France
| | - Malika Smail-Tabbone
- LORIA (CNRS, INRIA, Nancy-Université), Équipe Orpailleur, Bâtiment B, Campus scientifique, 54506 Vandoeuvre-lès-Nancy Cedex, France
| | - Olivier Poch
- L.B.G.I., CNRS UMR7104, IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France
| | - Amedeo Napoli
- LORIA (CNRS, INRIA, Nancy-Université), Équipe Orpailleur, Bâtiment B, Campus scientifique, 54506 Vandoeuvre-lès-Nancy Cedex, France
| | - Marie-Dominique Devignes
- LORIA (CNRS, INRIA, Nancy-Université), Équipe Orpailleur, Bâtiment B, Campus scientifique, 54506 Vandoeuvre-lès-Nancy Cedex, France
| |
Collapse
|
32
|
Wang J, Zhou X, Zhu J, Zhou C, Guo Z. Revealing and avoiding bias in semantic similarity scores for protein pairs. BMC Bioinformatics 2010; 11:290. [PMID: 20509916 PMCID: PMC2903568 DOI: 10.1186/1471-2105-11-290] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Accepted: 05/28/2010] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Semantic similarity scores for protein pairs are widely applied in functional genomic researches for finding functional clusters of proteins, predicting protein functions and protein-protein interactions, and for identifying putative disease genes. However, because some proteins, such as those related to diseases, tend to be studied more intensively, annotations are likely to be biased, which may affect applications based on semantic similarity measures. Thus, it is necessary to evaluate the effects of the bias on semantic similarity scores between proteins and then find a method to avoid them. RESULTS First, we evaluated 14 commonly used semantic similarity scores for protein pairs and demonstrated that they significantly correlated with the numbers of annotation terms for the proteins (also known as the protein annotation length). These results suggested that current applications of the semantic similarity scores between proteins might be unreliable. Then, to reduce this annotation bias effect, we proposed normalizing the semantic similarity scores between proteins using the power transformation of the scores. We provide evidence that this improves performance in some applications. CONCLUSIONS Current semantic similarity measures for protein pairs are highly dependent on protein annotation lengths, which are subject to biological research bias. This affects applications that are based on these semantic similarity scores, especially in clustering studies that rely on score magnitudes. The normalized scores proposed in this paper can reduce the effects of this bias to some extent.
Collapse
Affiliation(s)
- Jing Wang
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Xianxiao Zhou
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Jing Zhu
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Chenggui Zhou
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Zheng Guo
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China
| |
Collapse
|
33
|
Abstract
In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies.Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.
Collapse
Affiliation(s)
- Catia Pesquita
- LaSIGE, Faculty of Sciences, University of Lisboa, Lisboa, Portugal.
| | | | | | | | | |
Collapse
|
34
|
Chagoyen M, Carazo JM, Pascual-Montano A. Assessment of protein set coherence using functional annotations. BMC Bioinformatics 2008; 9:444. [PMID: 18937846 PMCID: PMC2588600 DOI: 10.1186/1471-2105-9-444] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Accepted: 10/20/2008] [Indexed: 11/23/2022] Open
Abstract
Background Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities), they do not specifically measure the degree of homogeneity of a protein set. Results In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation. Conclusion We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating 'functional modules' obtained from computational analysis of protein-protein interaction networks. Matlab code and supplementary data are available at
Collapse
|
35
|
Gene Ontology term overlap as a measure of gene functional similarity. BMC Bioinformatics 2008; 9:327. [PMID: 18680592 PMCID: PMC2518162 DOI: 10.1186/1471-2105-9-327] [Citation(s) in RCA: 120] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2008] [Accepted: 08/04/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The availability of various high-throughput experimental and computational methods allows biologists to rapidly infer functional relationships between genes. It is often necessary to evaluate these predictions computationally, a task that requires a reference database for functional relatedness. One such reference is the Gene Ontology (GO). A number of groups have suggested that the semantic similarity of the GO annotations of genes can serve as a proxy for functional relatedness. Here we evaluate a simple measure of semantic similarity, term overlap (TO). RESULTS We computed the TO for randomly selected gene pairs from the mouse genome. For comparison, we implemented six previously reported semantic similarity measures that share the feature of using computation of probabilities of terms to infer information content, in addition to three vector based approaches and a normalized version of the TO measure. We find that the overlap measure is highly correlated with the others but differs in detail. TO is at least as good a predictor of sequence similarity as the other measures. We further show that term overlap may avoid some problems that affect the probability-based measures. Term overlap is also much faster to compute than the information content-based measures. CONCLUSION Our experiments suggest that term overlap can serve as a simple and fast alternative to other approaches which use explicit information content estimation or require complex pre-calculations, while also avoiding problems that some other measures may encounter.
Collapse
|
36
|
Pesquita C, Faria D, Bastos H, Ferreira AEN, Falcão AO, Couto FM. Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics 2008; 9 Suppl 5:S4. [PMID: 18460186 PMCID: PMC2367622 DOI: 10.1186/1471-2105-9-s5-s4] [Citation(s) in RCA: 191] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations. Results We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation. Conclusions This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid simGIC was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.
Collapse
Affiliation(s)
- Catia Pesquita
- XLDB, Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa, Campo Grande-Edifício C6, Lisboa, Portugal.
| | | | | | | | | | | |
Collapse
|
37
|
Burgun A, Bodenreider O. Accessing and integrating data and knowledge for biomedical research. Yearb Med Inform 2008:91-101. [PMID: 18660883 PMCID: PMC2553094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Open
Abstract
OBJECTIVES To review the issues that have arisen with the advent of translational research in terms of integration of data and knowledge, and survey current efforts to address these issues. METHODS Using examples form the biomedical literature, we identified new trends in biomedical research and their impact on bioinformatics. We analyzed the requirements for effective knowledge repositories and studied issues in the integration of biomedical knowledge. RESULTS New diagnostic and therapeutic approaches based on gene expression patterns have brought about new issues in the statistical analysis of data, and new workflows are needed are needed to support translational research. Interoperable data repositories based on standard annotations, infrastructures and services are needed to support the pooling and meta-analysis of data, as well as their comparison to earlier experiments. High-quality, integrated ontologies and knowledge bases serve as a source of prior knowledge used in combination with traditional data mining techniques and contribute to the development of more effective data analysis strategies. CONCLUSION As biomedical research evolves from traditional clinical and biological investigations towards omics sciences and translational research, specific needs have emerged, including integrating data collected in research studies with patient clinical data, linking omics knowledge with medical knowledge, modeling the molecular basis of diseases, and developing tools that support in-depth analysis of research data. As such, translational research illustrates the need to bridge the gap between bioinformatics and medical informatics, and opens new avenues for biomedical informatics research.
Collapse
Affiliation(s)
- A Burgun
- Département d'Information Médicale, CHU Pontchaillou, rue Henri Le Guilloux, F-35033 Rennes Cedex, France.
| | | |
Collapse
|
38
|
Bodenreider O. Biomedical ontologies in action: role in knowledge management, data integration and decision support. Yearb Med Inform 2008:67-79. [PMID: 18660879 PMCID: PMC2592252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Open
Abstract
OBJECTIVES To provide typical examples of biomedical ontologies in action, emphasizing the role played by biomedical ontologies in knowledge management, data integration and decision support. METHODS Biomedical ontologies selected for their practical impact are examined from a functional perspective. Examples of applications are taken from operational systems and the biomedical literature, with a bias towards recent journal articles. RESULTS The ontologies under investigation in this survey include SNOMED CT, the Logical Observation Identifiers, Names, and Codes (LOINC), the Foundational Model of Anatomy, the Gene Ontology, RxNorm, the National Cancer Institute Thesaurus, the International Classification of Diseases, the Medical Subject Headings (MeSH) and the Unified Medical Language System (UMLS). The roles played by biomedical ontologies are classified into three major categories: knowledge management (indexing and retrieval of data and information, access to information, mapping among ontologies); data integration, exchange and semantic interoperability; and decision support and reasoning (data selection and aggregation, decision support, natural language processing applications, knowledge discovery). CONCLUSIONS Ontologies play an important role in biomedical research through a variety of applications. While ontologies are used primarily as a source of vocabulary for standardization and integration purposes, many applications also use them as a source of computable knowledge. Barriers to the use of ontologies in biomedical applications are discussed.
Collapse
Affiliation(s)
- O Bodenreider
- National Library of Medicine, 8600 Rockville Pike - MS 3841 (Bldg 38A, Rm B1N28U), Bethesda, MD 20894, USA.
| |
Collapse
|