51
|
Abstract
BACKGROUND Ontologies in biomedicine facilitate information integration, data exchange, search and query of biomedical data, and other critical knowledge-intensive tasks. The OBO Foundry is a collaborative effort to establish a set of principles for ontology development with the eventual goal of creating a set of interoperable reference ontologies in the domain of biomedicine. One of the key requirements to achieve this goal is to ensure that ontology developers reuse term definitions that others have already created rather than create their own definitions, thereby making the ontologies orthogonal. METHODS We used a simple lexical algorithm to analyze the extent to which the set of OBO Foundry candidate ontologies identified from September 2009 to September 2010 conforms to this vision. Specifically, we analyzed (1) the level of explicit term reuse in this set of ontologies, (2) the level of overlap, where two ontologies define similar terms independently, and (3) how the levels of reuse and overlap changed during the course of this year. RESULTS We found that 30% of the ontologies reuse terms from other Foundry candidates and 96% of the candidate ontologies contain terms that overlap with terms from the other ontologies. We found that while term reuse increased among the ontologies between September 2009 and September 2010, the level of overlap among the ontologies remained relatively constant. Additionally, we analyzed the six ontologies announced as OBO Foundry members on March 5, 2010, and identified that the level of overlap was extremely low, but, notably, so was the level of term reuse. CONCLUSIONS We have created a prototype web application that allows OBO Foundry ontology developers to see which classes from their ontologies overlap with classes from other ontologies in the OBO Foundry (http://obomap.bioontology.org). From our analysis, we conclude that while the OBO Foundry has made significant progress toward orthogonality during the period of this study through increased adoption of explicit term reuse, a large amount of overlap remains among these ontologies. Furthermore, the characteristics of the identified overlap, such as the terms it comprises and its distribution among the ontologies, indicate that the achieving orthogonality will be exceptionally difficult, if not impossible.
Collapse
|
52
|
Tirrell R, Evani U, Berman AE, Mooney SD, Musen MA, Shah NH. An ontology-neutral framework for enrichment analysis. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2010; 2010:797-801. [PMID: 21347088 PMCID: PMC3041299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Advanced statistical methods used to analyze high-throughput data (e.g. gene-expression assays) result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is relevant for and extensible to data analysis with other high-throughput measurement modalities such as proteomics, metabolomics, and tissue-microarray assays. With the availability of tools for automatic ontology-based annotation of datasets with terms from biomedical ontologies besides the GO, we need not restrict enrichment analysis to the GO. We describe, RANSUM - Rich Annotation Summarizer - which performs enrichment analysis using any ontology in the National Center for Biomedical Ontology's (NCBO) BioPortal. We outline the methodology of enrichment analysis, the associated challenges, and discuss novel analyses enabled by RANSUM.
Collapse
|
53
|
Parai GK, Jonquet C, Xu R, Musen MA, Shah NH. The Lexicon Builder Web service: Building Custom Lexicons from two hundred Biomedical Ontologies. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2010; 2010:587-591. [PMID: 21347046 PMCID: PMC3041331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Domain specific biomedical lexicons are extensively used by researchers for natural language processing tasks. Currently these lexicons are created manually by expert curators and there is a pressing need for automated methods to compile such lexicons. The Lexicon Builder Web service addresses this need and reduces the investment of time and effort involved in lexicon maintenance. The service has three components: Inclusion - selects one or several ontologies (or its branches) and includes preferred names and synonym terms; Exclusion - filters terms based on the term's Medline frequency, syntactic type, UMLS semantic type and match with stopwords; Output - aggregates information, handles compression and output formats. Evaluation demonstrates that the service has high accuracy and runtime performance. It is currently being evaluated for several use cases to establish its utility in biomedical information processing tasks. The Lexicon Builder promotes collaboration, sharing and standardization of lexicons amongst researchers by automating the creation, maintainence and cross referencing of custom lexicons.
Collapse
|
54
|
Tudorache T, Falconer S, Nyulas C, Storey MA, Ustün TB, Musen MA. Supporting the Collaborative Authoring of ICD-11 with WebProtégé. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2010; 2010:802-806. [PMID: 21347089 PMCID: PMC3041458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The World Health Organization (WHO) is well under way with the new revision of the International Classification of Diseases (ICD-11). The current revision process is significantly different from past ones: the ICD-11 authoring is now open to a large international community of medical experts, who perform the authoring in a web-based collaborative platform. The classification is also embracing a more formal representation that is suitable for electronic health records. We present the ICD Collaborative Authoring Tool (iCAT), a customization of the WebProtégé editor that supports the community based authoring of ICD-11 on the Web and provides features such as discussion threads integrated in the authoring process, change tracking, content reviewing, and so on. The WHO editors evaluated the initial version of iCAT and found the tool intuitive and easy to learn. They also identified improvement potentials and new requirements for large-scale collaboration support. A demo version of the tool is available at: http://icatdemo.stanford.edu.
Collapse
|
55
|
Xu R, Musen MA, Shah NH. A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2010; 2010:907-911. [PMID: 21347110 PMCID: PMC3041393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The Unified Medical Language System (UMLS) Metathesaurus is widely used for biomedical natural language processing (NLP) tasks. In this study, we systematically analyzed UMLS Metathesaurus terms by analyzing their occurrences in over 18 million MEDLINE abstracts. Our goals were: 1. analyze the frequency and syntactic distribution of Metathesaurus terms in MEDLINE; 2. create a filtered UMLS Metathesaurus based on the MEDLINE analysis; 3. augment the UMLS Metathesaurus where each term is associated with metadata on its MEDLINE frequency and syntactic distribution statistics. After MEDLINE frequency-based filtering, the augmented UMLS Metathesaurus contains 518,835 terms and is roughly 13% of its original size. We have shown that the syntactic and frequency information is useful to identify errors in the Metathesaurus. This filtered and augmented UMLS Metathesaurus can potentially be used to improve efficiency and precision of UMLS-based information retrieval and NLP tasks.
Collapse
|
56
|
Abstract
Background Researchers in biomedical informatics use ontologies and terminologies to annotate their data in order to facilitate data integration and translational discoveries. As the use of ontologies for annotation of biomedical datasets has risen, a common challenge is to identify ontologies that are best suited to annotating specific datasets. The number and variety of biomedical ontologies is large, and it is cumbersome for a researcher to figure out which ontology to use. Methods We present the Biomedical Ontology Recommender web service. The system uses textual metadata or a set of keywords describing a domain of interest and suggests appropriate ontologies for annotating or representing the data. The service makes a decision based on three criteria. The first one is coverage, or the ontologies that provide most terms covering the input text. The second is connectivity, or the ontologies that are most often mapped to by other ontologies. The final criterion is size, or the number of concepts in the ontologies. The service scores the ontologies as a function of scores of the annotations created using the National Center for Biomedical Ontology (NCBO) Annotator web service. We used all the ontologies from the UMLS Metathesaurus and the NCBO BioPortal. Results We compare and contrast our Recommender by an exhaustive functional comparison to previously published efforts. We evaluate and discuss the results of several recommendation heuristics in the context of three real world use cases. The best recommendations heuristics, rated ‘very relevant’ by expert evaluators, are the ones based on coverage and connectivity criteria. The Recommender service (alpha version) is available to the community and is embedded into BioPortal.
Collapse
|
57
|
O’Connor MJ, Halaschek-Wiener C, Musen MA. Mapping Master: A Flexible Approach for Mapping Spreadsheets to OWL. LECTURE NOTES IN COMPUTER SCIENCE 2010. [DOI: 10.1007/978-3-642-17749-1_13] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
58
|
Ghazvinian A, Noy NF, Musen MA. Creating mappings for ontologies in biomedicine: simple methods work. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2009; 2009:198-202. [PMID: 20351849 PMCID: PMC2815474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Creating mappings between concepts in different ontologies is a critical step in facilitating data integration. In recent years, researchers have developed many elaborate algorithms that use graph structure, background knowledge, machine learning and other techniques to generate mappings between ontologies. We compared the performance of these advanced algorithms on creating mappings for biomedical ontologies with the performance of a simple mapping algorithm that relies on lexical matching. Our evaluation has shown that (1) most of the advanced algorithms are either not publicly available or do not scale to the size of biomedical ontologies today, and (2) for many biomedical ontologies, simple lexical matching methods outperform most of the advanced algorithms in both precision and recall. Our results have practical implications for biomedical researchers who need to create alignments for their ontologies.
Collapse
|
59
|
Izadi M, Buckeridge D, Okhmatovskaia A, Tu SW, O'Connor MJ, Nyulas C, Musen MA. A Bayesian network model for analysis of detection performance in surveillance systems. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2009; 2009:276-280. [PMID: 20351864 PMCID: PMC2815401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Worldwide developments concerning infectious diseases and bioterrorism are driving forces for improving aberrancy detection in public health surveillance. The performance of an aberrancy detection algorithm can be measured in terms of sensitivity, specificity and timeliness. However, these metrics are probabilistically dependent variables and there is always a trade-off between them. This situation raises the question of how to quantify this tradeoff. The answer to this question depends on the characteristics of the specific disease under surveillance, the characteristics of data used for surveillance, and the algorithmic properties of detection methods. In practice, the evidence describing the relative performance of different algorithms remains fragmented and mainly qualitative. In this paper, we consider the development and evaluation of a Bayesian network framework for analysis of performance measures of aberrancy detection algorithms. This framework enables principled comparison of algorithms and identification of suitable algorithms for use in specific public health surveillance settings.
Collapse
|
60
|
O'Connor MJ, Nyulas C, Tu S, Buckeridge DL, Okhmatovskaia A, Musen MA. Software-engineering challenges of building and deploying reusable problem solvers. ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN, ANALYSIS AND MANUFACTURING : AI EDAM 2009; 23:339-356. [PMID: 23565031 PMCID: PMC3615443 DOI: 10.1017/s0890060409990047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Problem solving methods (PSMs) are software components that represent and encode reusable algorithms. They can be combined with representations of domain knowledge to produce intelligent application systems. A goal of research on PSMs is to provide principled methods and tools for composing and reusing algorithms in knowledge-based systems. The ultimate objective is to produce libraries of methods that can be easily adapted for use in these systems. Despite the intuitive appeal of PSMs as conceptual building blocks, in practice, these goals are largely unmet. There are no widely available tools for building applications using PSMs and no public libraries of PSMs available for reuse. This paper analyzes some of the reasons for the lack of widespread adoptions of PSM techniques and illustrate our analysis by describing our experiences developing a complex, high-throughput software system based on PSM principles. We conclude that many fundamental principles in PSM research are useful for building knowledge-based systems. In particular, the task-method decomposition process, which provides a means for structuring knowledge-based tasks, is a powerful abstraction for building systems of analytic methods. However, despite the power of PSMs in the conceptual modeling of knowledge-based systems, software engineering challenges have been seriously underestimated. The complexity of integrating control knowledge modeled by developers using PSMs with the domain knowledge that they model using ontologies creates a barrier to widespread use of PSM-based systems. Nevertheless, the surge of recent interest in ontologies has led to the production of comprehensive domain ontologies and of robust ontology-authoring tools. These developments present new opportunities to leverage the PSM approach.
Collapse
|
61
|
Shah NH, Bhatia N, Jonquet C, Rubin D, Chiang AP, Musen MA. Comparison of concept recognizers for building the Open Biomedical Annotator. BMC Bioinformatics 2009; 10 Suppl 9:S14. [PMID: 19761568 PMCID: PMC2745685 DOI: 10.1186/1471-2105-10-s9-s14] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The National Center for Biomedical Ontology (NCBO) is developing a system for automated, ontology-based access to online biomedical resources (Shah NH, et al.: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2):S1). The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition tool to identify ontology concepts in the resource's textual metadata. In this paper, we present a comparison of two concept recognizers – NLM's MetaMap and the University of Michigan's Mgrep. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Based on our analysis we also suggest areas of potential improvements for Mgrep. We have subsequently used Mgrep to build the Open Biomedical Annotator service. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS) and NCBO ontologies. The Annotator also leverages the hierarchical structure of the ontologies and their mappings to expand annotations. The Annotator service is available to the community as a REST Web service for creating ontology-based annotations of their data.
Collapse
|
62
|
Supekar KS, Musen MA, Menon V. Development of Large-Scale Functional Brain Networks in Children. Neuroimage 2009. [DOI: 10.1016/s1053-8119(09)70984-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
63
|
Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, Griffith N, Jonquet C, Rubin DL, Storey MA, Chute CG, Musen MA. BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res 2009; 37:W170-3. [PMID: 19483092 PMCID: PMC2703982 DOI: 10.1093/nar/gkp440] [Citation(s) in RCA: 356] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Biomedical ontologies provide essential domain knowledge to drive data integration, information retrieval, data annotation, natural-language processing and decision support. BioPortal (http://bioportal.bioontology.org) is an open repository of biomedical ontologies that provides access via Web services and Web browsers to ontologies developed in OWL, RDF, OBO format and Protégé frames. BioPortal functionality includes the ability to browse, search and visualize ontologies. The Web interface also facilitates community-based participation in the evaluation and evolution of ontology content by providing features to add notes to ontology terms, mappings between terms and ontology reviews based on criteria such as usability, domain coverage, quality of content, and documentation and support. BioPortal also enables integrated search of biomedical data resources such as the Gene Expression Omnibus (GEO), ClinicalTrials.gov, and ArrayExpress, through the annotation and indexing of these resources with ontologies in BioPortal. Thus, BioPortal not only provides investigators, clinicians, and developers 'one-stop shopping' to programmatically access biomedical ontologies, but also provides support to integrate data from a variety of biomedical resources.
Collapse
|
64
|
Jonquet C, Shah NH, Musen MA. The open biomedical annotator. SUMMIT ON TRANSLATIONAL BIOINFORMATICS 2009; 2009:56-60. [PMID: 21347171 PMCID: PMC3041576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The range of publicly available biomedical data is enormous and is expanding fast. This expansion means that researchers now face a hurdle to extracting the data they need from the large numbers of data that are available. Biomedical researchers have turned to ontologies and terminologies to structure and annotate their data with ontology concepts for better search and retrieval. However, this annotation process cannot be easily automated and often requires expert curators. Plus, there is a lack of easy-to-use systems that facilitate the use of ontologies for annotation. This paper presents the Open Biomedical Annotator (OBA), an ontology-based Web service that annotates public datasets with biomedical ontology concepts based on their textual metadata (www.bioontology.org). The biomedical community can use the annotator service to tag datasets automatically with ontology terms (from UMLS and NCBO BioPortal ontologies). Such annotations facilitate translational discoveries by integrating annotated data.[1].
Collapse
|
65
|
Shah NH, Jonquet C, Chiang AP, Butte AJ, Chen R, Musen MA. Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009; 10 Suppl 2:S1. [PMID: 19208184 PMCID: PMC2646250 DOI: 10.1186/1471-2105-10-s2-s1] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The volume of publicly available genomic scale data is increasing. Genomic datasets in public repositories are annotated with free-text fields describing the pathological state of the studied sample. These annotations are not mapped to concepts in any ontology, making it difficult to integrate these datasets across repositories. We have previously developed methods to map text-annotations of tissue microarrays to concepts in the NCI thesaurus and SNOMED-CT. In this work we generalize our methods to map text annotations of gene expression datasets to concepts in the UMLS. We demonstrate the utility of our methods by processing annotations of datasets in the Gene Expression Omnibus. We demonstrate that we enable ontology-based querying and integration of tissue and gene expression microarray data. We enable identification of datasets on specific diseases across both repositories. Our approach provides the basis for ontology-driven data integration for translational research on gene and protein expression data. Based on this work we have built a prototype system for ontology based annotation and indexing of biomedical data. The system processes the text metadata of diverse resource elements such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed article abstracts to annotate and index them with concepts from appropriate ontologies. The key functionality of this system is to enable users to locate biomedical data resources related to particular ontology concepts.
Collapse
|
66
|
Rubin DL, Talos IF, Halle M, Musen MA, Kikinis R. Computational neuroanatomy: ontology-based representation of neural components and connectivity. BMC Bioinformatics 2009; 10 Suppl 2:S3. [PMID: 19208191 PMCID: PMC2646240 DOI: 10.1186/1471-2105-10-s2-s3] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND A critical challenge in neuroscience is organizing, managing, and accessing the explosion in neuroscientific knowledge, particularly anatomic knowledge. We believe that explicit knowledge-based approaches to make neuroscientific knowledge computationally accessible will be helpful in tackling this challenge and will enable a variety of applications exploiting this knowledge, such as surgical planning. RESULTS We developed ontology-based models of neuroanatomy to enable symbolic lookup, logical inference and mathematical modeling of neural systems. We built a prototype model of the motor system that integrates descriptive anatomic and qualitative functional neuroanatomical knowledge. In addition to modeling normal neuroanatomy, our approach provides an explicit representation of abnormal neural connectivity in disease states, such as common movement disorders. The ontology-based representation encodes both structural and functional aspects of neuroanatomy. The ontology-based models can be evaluated computationally, enabling development of automated computer reasoning applications. CONCLUSION Neuroanatomical knowledge can be represented in machine-accessible format using ontologies. Computational neuroanatomical approaches such as described in this work could become a key tool in translational informatics, leading to decision support applications that inform and guide surgical planning and personalized care for neurological disease in the future.
Collapse
|
67
|
Noy NF, Tudorache T, de Coronado S, Musen MA. Developing biomedical ontologies collaboratively. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2008; 2008:520-524. [PMID: 18998901 PMCID: PMC2656043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Revised: 07/15/2008] [Indexed: 05/27/2023]
Abstract
The development of ontologies that define entities and relationships among them has become essential for modern work in biomedicine. Ontologies are becoming so large in their coverage that no single centralized group of people can develop them effectively and ontology development becomes a community-based enterprise. In this paper we present Collaborative Protégé-a prototype tool that supports many aspects of community-based development, such as discussions integrated with ontology-editing process, chats, and annotation of changes. We have evaluated Collaborative Protégé in the context of the NCI Thesaurus development. Users have found the tool effective for carrying out discussions and recording design rationale.
Collapse
|
68
|
Musen MA, Shah NH, Noy NF, Dai BY, Dorf M, Griffith N, Buntrok J, Jonquet C, Montegut MJ, Rubin DL. BioPortal: ontologies and data resources with the click of a mouse. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2008:1223-1224. [PMID: 18999306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/15/2008] [Revised: 07/15/2008] [Indexed: 05/27/2023]
|
69
|
Buckeridge DL, Okhmatovskaia A, Tu S, O'Connor M, Nyulas C, Musen MA. Predicting outbreak detection in public health surveillance: quantitative analysis to enable evidence-based method selection. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2008; 2008:76-80. [PMID: 18999264 PMCID: PMC2656053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Revised: 07/07/2008] [Indexed: 05/27/2023]
Abstract
Public health surveillance is critical for accurate and timely outbreak detection and effective epidemic control. A wide range of statistical algorithms is used for surveillance, and important differences have been noted in the ability of these algorithms to detect outbreaks. The evidence about the relative performance of these algorithms, however, remains limited and mainly qualitative. Using simulated outbreak data, we developed and validated quantitative models for predicting the ability of commonly used surveillance algorithms to detect different types of outbreaks. The developed models accurately predict the ability of different algorithms to detect different types of outbreaks. These models enable evidence-based algorithm selection and can guide research into algorithm development.
Collapse
|
70
|
Buckeridge DL, Okhmatovskaia A, Tu S, O'Connor M, Nyulas C, Musen MA. Understanding detection performance in public health surveillance: modeling aberrancy-detection algorithms. J Am Med Inform Assoc 2008; 15:760-9. [PMID: 18755992 PMCID: PMC2585528 DOI: 10.1197/jamia.m2799] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2008] [Accepted: 07/25/2008] [Indexed: 01/04/2023] Open
Abstract
OBJECTIVE Statistical aberrancy-detection algorithms play a central role in automated public health systems, analyzing large volumes of clinical and administrative data in real-time with the goal of detecting disease outbreaks rapidly and accurately. Not all algorithms perform equally well in terms of sensitivity, specificity, and timeliness in detecting disease outbreaks and the evidence describing the relative performance of different methods is fragmented and mainly qualitative. DESIGN We developed and evaluated a unified model of aberrancy-detection algorithms and a software infrastructure that uses this model to conduct studies to evaluate detection performance. We used a task-analytic methodology to identify the common features and meaningful distinctions among different algorithms and to provide an extensible framework for gathering evidence about the relative performance of these algorithms using a number of evaluation metrics. We implemented our model as part of a modular software infrastructure (Biological Space-Time Outbreak Reasoning Module, or BioSTORM) that allows configuration, deployment, and evaluation of aberrancy-detection algorithms in a systematic manner. MEASUREMENT We assessed the ability of our model to encode the commonly used EARS algorithms and the ability of the BioSTORM software to reproduce an existing evaluation study of these algorithms. RESULTS Using our unified model of aberrancy-detection algorithms, we successfully encoded the EARS algorithms, deployed these algorithms using BioSTORM, and were able to reproduce and extend previously published evaluation results. CONCLUSION The validated model of aberrancy-detection algorithms and its software implementation will enable principled comparison of algorithms, synthesis of results from evaluation studies, and identification of surveillance algorithms for use in specific public health settings.
Collapse
|
71
|
Balduccini M, Baral C, Brodaric B, Colton S, Fox P, Gutelius D, Hinkelmann K, Horswill I, Huberman B, Hudlicka E, Lerman K, Lisetti C, McGuinness DL, Maher ML, Musen MA, Sahami M, Sleeman D, Thönssen B, Velasquez JD, Ventura D. AAAI 2008 Spring Symposia Reports. AI MAG 2008. [DOI: 10.1609/aimag.v29i3.2148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
The Association for the Advancement of Artificial Intelligence (AAAI) was pleased to present the AAAI 2008 Spring Symposium Series, held Wednesday through Friday, March 26–28, 2008 at Stanford University, California. The titles of the eight symposia were as follows: (1) AI Meets Business Rules and Process Management, (2) Architectures for Intelligent Theory-Based Agents, (3) Creative Intelligent Systems, (4) Emotion, Personality, and Social Behavior, (5) Semantic Scientific Knowledge Integration, (6) Social Information Processing, (7) Symbiotic Relationships between Semantic Web and Knowledge Engineering, (8) Using AI to Motivate Greater Participation in Computer Science The goal of the AI Meets Business Rules and Process Management AAAI symposium was to investigate the various approaches and standards to represent business rules, business process management and the semantic web with respect to expressiveness and reasoning capabilities. The focus of the Architectures for Intelligent Theory-Based Agents AAAI symposium was the definition of architectures for intelligent theory-based agents, comprising languages, knowledge representation methodologies, reasoning algorithms, and control loops. The Creative Intelligent Systems Symposium included five major discussion sessions and a general poster session (in which all contributing papers were presented). The purpose of this symposium was to explore the synergies between creative cognition and intelligent systems. The goal of the Emotion, Personality, and Social Behavior symposium was to examine fundamental issues in affect and personality in both biological and artificial agents, focusing on the roles of these factors in mediating social behavior. The Semantic Scientific Knowledge Symposium was interested in bringing together the semantic technologies community with the scientific information technology community in an effort to build the general semantic science information community. The Social Information Processing's goal was to investigate computational and analytic approaches that will enable users to harness the efforts of large numbers of other users to solve a variety of information processing problems, from discovering high-quality content to managing common resources. The goal of the Symbiotic Relationships between the Semantic Web and Software Engineering symposium was to explore how the lessons learned by the knowledge-engineering community over the past three decades could be applied to the bold research agenda of current workers in semantic web technologies. The purpose of the Using AI to Motivate Greater Participation in Computer Science symposium was to identify ways that topics in AI may be used to motivate greater student participation in computer science by highlighting fun, engaging, and intellectually challenging developments in AI-related curriculum at a number of educational levels. Technical reports of the symposia were published by AAAI Press.
Collapse
|
72
|
Noy NF, Griffith N, Musen MA. Collecting Community-Based Mappings in an Ontology Repository. LECTURE NOTES IN COMPUTER SCIENCE 2008. [DOI: 10.1007/978-3-540-88564-1_24] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
73
|
Noy NF, de Coronado S, Solbrig H, Fragoso G, Hartel FW, Musen MA. Representing the NCI Thesaurus in OWL DL: Modeling tools help modeling languages. APPLIED ONTOLOGY 2008; 3:173-190. [PMID: 19789731 PMCID: PMC2753293 DOI: 10.3233/ao-2008-0051] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The National Cancer Institute's (NCI) Thesaurus is a biomedical reference ontology. The NCI Thesaurus is represented using Description Logic, more specifically Ontylog, a Description logic implemented by Apelon, Inc. We are exploring the use of the DL species of the Web Ontology Language (OWL DL)-a W3C recommended standard for ontology representation-instead of Ontylog for representing the NCI Thesaurus. We have studied the requirements for knowledge representation of the NCI Thesaurus, and considered how OWL DL (and its implementation in Protégé-OWL) satisfies these requirements. In this paper, we discuss the areas where OWL DL was sufficient for representing required components, where tool support that would hide some of the complexity and extra levels of indirection would be required, and where language expressiveness is not sufficient given the representation requirements. Because many of the knowledge-representation issues that we encountered are very similar to the issues in representing other biomedical terminologies and ontologies in general, we believe that the lessons that we learned and the approaches that we developed will prove useful and informative for other researchers.
Collapse
|
74
|
Rubin DL, Lewis SE, Mungall CJ, Misra S, Westerfield M, Ashburner M, Sim I, Chute CG, Solbrig H, Storey MA, Smith B, Day-Richter J, Noy NF, Musen MA. National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2007; 10:185-98. [PMID: 16901225 DOI: 10.1089/omi.2006.10.185] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The National Center for Biomedical Ontology is a consortium that comprises leading informaticians, biologists, clinicians, and ontologists, funded by the National Institutes of Health (NIH) Roadmap, to develop innovative technology and methods that allow scientists to record, manage, and disseminate biomedical information and knowledge in machine-processable form. The goals of the Center are (1) to help unify the divergent and isolated efforts in ontology development by promoting high quality open-source, standards-based tools to create, manage, and use ontologies, (2) to create new software tools so that scientists can use ontologies to annotate and analyze biomedical data, (3) to provide a national resource for the ongoing evaluation, integration, and evolution of biomedical ontologies and associated tools and theories in the context of driving biomedical projects (DBPs), and (4) to disseminate the tools and resources of the Center and to identify, evaluate, and communicate best practices of ontology development to the biomedical community. Through the research activities within the Center, collaborations with the DBPs, and interactions with the biomedical community, our goal is to help scientists to work more effectively in the e-science paradigm, enhancing experiment design, experiment execution, data analysis, information synthesis, hypothesis generation and testing, and understand human disease.
Collapse
|
75
|
Moreira DA, Shah NH, Musen MA. Interpretation errors related to the GO annotation file format. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2007; 2007:538-542. [PMID: 18693894 PMCID: PMC2655813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/15/2007] [Revised: 07/06/2007] [Accepted: 10/11/2007] [Indexed: 05/26/2023]
Abstract
The Gene Ontology (GO) is the most widely used ontology for creating biomedical annotations. GO annotations are statements associating a biological entity with a GO term. These statements comprise a large dataset of biological knowledge that is used widely in biomedical research. GO Annotations are available as "gene association files" from the GO website in a tab-delimited file format (GO Annotation File Format) composed of rows of 15 tab-delimited fields. This simple format lacks the knowledge representation (KR) capabilities to represent unambiguously semantic relationships between each field. This paper demonstrates that this KR shortcoming leads users to interpret the files in ways that can be erroneous. We propose a complementary format to represent GO annotation files as knowledge bases using the W3C recommended Web Ontology Language (OWL).
Collapse
|