1
|
Fang C, Guo F, Zhao X, Zhang Z, Lu J, Pan H, Xu T, Li W, Yang M, Huang Y, Zhao Y, Zhao S. Biological mechanisms of growth performance and meat quality in porcine muscle tissue. Anim Biotechnol 2021; 33:1246-1254. [PMID: 33704018 DOI: 10.1080/10495398.2021.1886939] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Growth performance and meat quality are important traits for pig production. The aim of the present study was to investigate the molecular mechanisms underlying growth performance and meat quality, and to identify novel target molecules for predicting the growth performance and meat quality. The differentially expressed genes (DEGs) in Diannan small ears pigs (DSP) and Landrace pigs (LP) were assessed by RNA-sequencing analyzing technology. A total of 339 DEGs were obtained between DSP and LP. 146 DEGs were upregulated in LP compared with DSP and 193 DEGs were upregulated in DSP compared with LP. The DEGs were significantly enriched in 26 GO and 3 KEGG pathways. The protein-protein interaction (PPI) network with 201 nodes and 382 edges was constructed and 5 modules were extracted from the entire network. The identified upregulated expression of genes involved in glycolysis and myogenesis as well as extracellular matrix may be associated with fast body and muscle deposition rates in LP. Increased expression of genes involved in PPAR signaling pathway and fatty acid metabolism as well as oxidative phosphate processes could be related to the intramuscular fat deposition and meat quality in DSP. The present study may provide an improved understanding of the growth performance and meat quality.
Collapse
Affiliation(s)
- Chen Fang
- Yunnan Key Laboratory of Animal Nutrition and Feed Science, Yunnan Agricultural University, Kunming, China
| | - Fei Guo
- Yunnan Key Laboratory of Animal Nutrition and Feed Science, Yunnan Agricultural University, Kunming, China
| | - Xiaoqi Zhao
- Yunnan Key Laboratory of Animal Nutrition and Feed Science, Yunnan Agricultural University, Kunming, China.,Institute of Herbivorous Livestock, Yunnan Academy of Animal Sciences, Kunming, China
| | - Zining Zhang
- Yunnan Key Laboratory of Animal Nutrition and Feed Science, Yunnan Agricultural University, Kunming, China
| | - Junlan Lu
- Yunnan Key Laboratory of Animal Nutrition and Feed Science, Yunnan Agricultural University, Kunming, China
| | - Hongbin Pan
- Yunnan Key Laboratory of Animal Nutrition and Feed Science, Yunnan Agricultural University, Kunming, China
| | - Taojie Xu
- Yunnan Key Laboratory of Animal Nutrition and Feed Science, Yunnan Agricultural University, Kunming, China
| | - Weizhen Li
- College of Veterinary Medicine, Yunnan Agricultural University, Kunming, China
| | - Minghua Yang
- Yunnan Key Laboratory of Animal Nutrition and Feed Science, Yunnan Agricultural University, Kunming, China
| | - Ying Huang
- Yunnan Key Laboratory of Animal Nutrition and Feed Science, Yunnan Agricultural University, Kunming, China
| | - Yanguang Zhao
- Research Institute of Pig and Animal Nutrition, Yunnan Academy of Animal Sciences, Kunming, China
| | - Sumei Zhao
- Yunnan Key Laboratory of Animal Nutrition and Feed Science, Yunnan Agricultural University, Kunming, China
| |
Collapse
|
2
|
Abstract
Resource Description Framework (RDF) can seen as a solution in today’s landscape of knowledge representation research. An RDF language has symmetrical features because subjects and objects in triples can be interchangeably used. Moreover, the regularity and symmetry of the RDF language allow knowledge representation that is easily processed by machines, and because its structure is similar to natural languages, it is reasonably readable for people. RDF provides some useful features for generalized knowledge representation. Its distributed nature, due to its identifier grounding in IRIs, naturally scales to the size of the Web. However, its use is often hidden from view and is, therefore, one of the less well-known of the knowledge representation frameworks. Therefore, we summarise RDF v1.0 and v1.1 to broaden its audience within the knowledge representation community. This article reviews current approaches, tools, and applications for mapping from relational databases to RDF and from XML to RDF. We discuss RDF serializations, including formats with support for multiple graphs and we analyze RDF compression proposals. Finally, we present a summarized formal definition of RDF 1.1 that provides additional insights into the modeling of reification, blank nodes, and entailments.
Collapse
|
3
|
Kamdar MR, Fernández JD, Polleres A, Tudorache T, Musen MA. Enabling Web-scale data integration in biomedicine through Linked Open Data. NPJ Digit Med 2019; 2:90. [PMID: 31531395 PMCID: PMC6736878 DOI: 10.1038/s41746-019-0162-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 08/06/2019] [Indexed: 01/17/2023] Open
Abstract
The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems.
Collapse
Affiliation(s)
- Maulik R. Kamdar
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA USA
| | - Javier D. Fernández
- Vienna University of Economics & Business, Vienna, Austria
- Complexity Science Hub Vienna, Vienna, Austria
| | - Axel Polleres
- Vienna University of Economics & Business, Vienna, Austria
- Complexity Science Hub Vienna, Vienna, Austria
| | - Tania Tudorache
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA USA
| | - Mark A. Musen
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA USA
| |
Collapse
|
4
|
Gu F, Zhao C, Jiang T, Li X, Mao Y, Zhou C. Association Between Nicotine-dependent Gene Polymorphism and Smoking Cessation in Patients With Lung Cancer. Clin Lung Cancer 2019; 21:171-176. [PMID: 31402126 DOI: 10.1016/j.cllc.2019.07.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 05/31/2019] [Accepted: 07/09/2019] [Indexed: 12/15/2022]
Abstract
BACKGROUND Patients with lung cancer continue to smoke owing to complex factors. Failure to quit smoking (defined as nicotine dependence) is significantly associated with genetic status. This study aimed to investigate the relationship between polymorphisms in nicotine dependence genes and smoking status after the diagnosis of lung cancer. PATIENTS AND METHODS A total of 240 patients with lung cancer were included from July 2017 to March 2018. According to the actual smoking condition after lung cancer diagnosis, eligible patients were divided into 3 groups: the never-smoking group, the failure to quit smoking group, and the successful smoking cessation group. Fagerstrom Test for Nicotine Dependence scores were used to evaluate the smoking status of each group. Three nicotine-dependent genes with 6 loci were detected. RESULTS Among the 240 patients, 86 were never-smokers, 51 failed to quit smoking, and 104 successfully quit smoking. The initial age of smoking in the failure to quit smoking group was significantly younger than those in the successful smoking cessation group (P = .001). There was a significant difference in the GG and AG and AA genotype distributions of CHRNA3 (rs578776) among the 3 groups (P = .003). There was also a significant difference in the distribution of CHRNA4 (rs2229959) genotypes among the 3 groups (P = .003). However, there was no significant difference in the genotype distribution of CHRNA5 (rs588765) among the 3 groups (P = .277). CONCLUSIONS Gene polymorphisms of CHRNA3 (rs578776) and CHRNA4 (rs1044396 and rs2229959) were associated with the success of smoking cessation after the diagnosis of lung cancer, which should be considered in the management of smoking cessation after patients are diagnosed with lung cancer.
Collapse
Affiliation(s)
- Fen Gu
- Department of Medical Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Chao Zhao
- Department of Medical Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Tao Jiang
- Department of Medical Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Xuefei Li
- Department of Medical Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Yanjun Mao
- Nursing Department, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China.
| | - Caicun Zhou
- Department of Medical Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China.
| |
Collapse
|
5
|
Sahoo SS, Valdez J, Rueschman M. Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:1070-1079. [PMID: 28269904 PMCID: PMC5333253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Scientific reproducibility is key to scientific progress as it allows the research community to build on validated results, protect patients from potentially harmful trial drugs derived from incorrect results, and reduce wastage of valuable resources. The National Institutes of Health (NIH) recently published a systematic guideline titled "Rigor and Reproducibility " for supporting reproducible research studies, which has also been accepted by several scientific journals. These journals will require published articles to conform to these new guidelines. Provenance metadata describes the history or origin of data and it has been long used in computer science to capture metadata information for ensuring data quality and supporting scientific reproducibility. In this paper, we describe the development of Provenance for Clinical and healthcare Research (ProvCaRe) framework together with a provenance ontology to support scientific reproducibility by formally modeling a core set of data elements representing details of research study. We extend the PROV Ontology (PROV-O), which has been recommended as the provenance representation model by World Wide Web Consortium (W3C), to represent both: (a) data provenance, and (b) process provenance. We use 124 study variables from 6 clinical research studies from the National Sleep Research Resource (NSRR) to evaluate the coverage of the provenance ontology. NSRR is the largest repository of NIH-funded sleep datasets with 50,000 studies from 36,000 participants. The provenance ontology reuses ontology concepts from existing biomedical ontologies, for example the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), to model the provenance information of research studies. The ProvCaRe framework is being developed as part of the Big Data to Knowledge (BD2K) data provenance project.
Collapse
Affiliation(s)
- Satya S Sahoo
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH
| | - Joshua Valdez
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH
| | - Michael Rueschman
- Department of Medicine, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| |
Collapse
|
6
|
Sahoo SS, Ramesh P, Welter E, Bukach A, Valdez J, Tatsuoka C, Bamps Y, Stoll S, Jobst BC, Sajatovic M. Insight: An ontology-based integrated database and analysis platform for epilepsy self-management research. Int J Med Inform 2016; 94:21-30. [PMID: 27573308 PMCID: PMC5010027 DOI: 10.1016/j.ijmedinf.2016.06.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Revised: 06/15/2016] [Accepted: 06/18/2016] [Indexed: 11/18/2022]
Abstract
We present Insight as an integrated database and analysis platform for epilepsy self-management research as part of the national Managing Epilepsy Well Network. Insight is the only available informatics platform for accessing and analyzing integrated data from multiple epilepsy self-management research studies with several new data management features and user-friendly functionalities. The features of Insight include, (1) use of Common Data Elements defined by members of the research community and an epilepsy domain ontology for data integration and querying, (2) visualization tools to support real time exploration of data distribution across research studies, and (3) an interactive visual query interface for provenance-enabled research cohort identification. The Insight platform contains data from five completed epilepsy self-management research studies covering various categories of data, including depression, quality of life, seizure frequency, and socioeconomic information. The data represents over 400 participants with 7552 data points. The Insight data exploration and cohort identification query interface has been developed using Ruby on Rails Web technology and open source Web Ontology Language Application Programming Interface to support ontology-based reasoning. We have developed an efficient ontology management module that automatically updates the ontology mappings each time a new version of the Epilepsy and Seizure Ontology is released. The Insight platform features a Role-based Access Control module to authenticate and effectively manage user access to different research studies. User access to Insight is managed by the Managing Epilepsy Well Network database steering committee consisting of representatives of all current collaborating centers of the Managing Epilepsy Well Network. New research studies are being continuously added to the Insight database and the size as well as the unique coverage of the dataset allows investigators to conduct aggregate data analysis that will inform the next generation of epilepsy self-management studies.
Collapse
Affiliation(s)
- Satya S Sahoo
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States; Electrical Engineering and Computer Science Department, School of Engineering, Case Western Reserve University, Cleveland, OH 44106, United States.
| | - Priya Ramesh
- Electrical Engineering and Computer Science Department, School of Engineering, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Elisabeth Welter
- Neurological Institute, University Hospitals Case Medical Center, Cleveland, OH 44106, United States
| | - Ashley Bukach
- Neurological Institute, University Hospitals Case Medical Center, Cleveland, OH 44106, United States
| | - Joshua Valdez
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Curtis Tatsuoka
- Neurological Institute, University Hospitals Case Medical Center, Cleveland, OH 44106, United States
| | - Yvan Bamps
- Rollins School of Public Health, Emory University, Atlanta, GA 30322, United States
| | - Shelley Stoll
- Center for Managing Chronic Disease, University of Michigan, Ann Arbor, MI 48109, United States
| | - Barbara C Jobst
- Department of Neurology, Geisel School of Medicine, Dartmouth College, Lebanon, NH 03756, United States
| | - Martha Sajatovic
- Neurological Institute, University Hospitals Case Medical Center, Cleveland, OH 44106, United States
| |
Collapse
|
7
|
|
8
|
Bhat A, Dakna M, Mischak H. Integrating proteomics profiling data sets: a network perspective. Methods Mol Biol 2015; 1243:237-53. [PMID: 25384750 DOI: 10.1007/978-1-4939-1872-0_14] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Understanding disease mechanisms often requires complex and accurate integration of cellular pathways and molecular networks. Systems biology offers the possibility to provide a comprehensive map of the cell's intricate wiring network, which can ultimately lead to decipher the disease phenotype. Here, we describe what biological pathways are, how they function in normal and abnormal cellular systems, limitations faced by databases for integrating data, and highlight how network models are emerging as a powerful integrative framework to understand and interpret the roles of proteins and peptides in diseases.
Collapse
Affiliation(s)
- Akshay Bhat
- Mosaiques-Diagnostics GmbH, Mellendorfer Straße 7-9, D-30625, Hannover, Germany,
| | | | | |
Collapse
|
9
|
Prioritizing Genes Related to Nicotine Addiction Via a Multi-source-Based Approach. Mol Neurobiol 2014; 52:442-55. [PMID: 25193020 DOI: 10.1007/s12035-014-8874-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2014] [Accepted: 08/19/2014] [Indexed: 10/24/2022]
Abstract
Nicotine has a broad impact on both the central and peripheral nervous systems. Over the past decades, an increasing number of genes potentially involved in nicotine addiction have been identified by different technical approaches. However, the molecular mechanisms underlying nicotine addiction remain largely unknown. Under such situation, prioritizing the candidate genes for further investigation is becoming increasingly important. In this study, we presented a multi-source-based gene prioritization approach for nicotine addiction by utilizing the vast amounts of information generated from for nicotine addiction study during the past years. In this approach, we first collected and curated genes from studies in four categories, i.e., genetic association analysis, genetic linkage analysis, high-throughput gene/protein expression analysis, and literature search of single gene/protein-based studies. Based on these resources, the genes were scored and a weight value was determined for each category. Finally, the genes were ranked by their combined scores, and 220 genes were selected as the prioritized nicotine addiction-related genes. Evaluation suggested the prioritized genes were promising targets for further analysis and replication study.
Collapse
|
10
|
Shin D, Arthur G, Popescu M, Korkin D, Shyu CR. Uncovering influence links in molecular knowledge networks to streamline personalized medicine. J Biomed Inform 2014; 52:394-405. [PMID: 25150201 DOI: 10.1016/j.jbi.2014.08.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Revised: 08/04/2014] [Accepted: 08/08/2014] [Indexed: 01/10/2023]
Abstract
OBJECTIVES We developed Resource Description Framework (RDF)-induced InfluGrams (RIIG) - an informatics formalism to uncover complex relationships among biomarker proteins and biological pathways using the biomedical knowledge bases. We demonstrate an application of RIIG in morphoproteomics, a theranostic technique aimed at comprehensive analysis of protein circuitries to design effective therapeutic strategies in personalized medicine setting. METHODS RIIG uses an RDF "mashup" knowledge base that integrates publicly available pathway and protein data with ontologies. To mine for RDF-induced Influence Links, RIIG introduces notions of RDF relevancy and RDF collider, which mimic conditional independence and "explaining away" mechanism in probabilistic systems. Using these notions and constraint-based structure learning algorithms, the formalism generates the morphoproteomic diagrams, which we call InfluGrams, for further analysis by experts. RESULTS RIIG was able to recover up to 90% of predefined influence links in a simulated environment using synthetic data and outperformed a naïve Monte Carlo sampling of random links. In clinical cases of Acute Lymphoblastic Leukemia (ALL) and Mesenchymal Chondrosarcoma, a significant level of concordance between the RIIG-generated and expert-built morphoproteomic diagrams was observed. In a clinical case of Squamous Cell Carcinoma, RIIG allowed selection of alternative therapeutic targets, the validity of which was supported by a systematic literature review. We have also illustrated an ability of RIIG to discover novel influence links in the general case of the ALL. CONCLUSIONS Applications of the RIIG formalism demonstrated its potential to uncover patient-specific complex relationships among biological entities to find effective drug targets in a personalized medicine setting. We conclude that RIIG provides an effective means not only to streamline morphoproteomic studies, but also to bridge curated biomedical knowledge and causal reasoning with the clinical data in general.
Collapse
Affiliation(s)
- Dmitriy Shin
- University of Missouri, School of Medicine, Department of Pathology and Anatomical Sciences, Columbia, MO 65212, United States; University of Missouri, Graduate School, MU Informatics Institute, Columbia, MO 65211, United States.
| | - Gerald Arthur
- University of Missouri, School of Medicine, Department of Pathology and Anatomical Sciences, Columbia, MO 65212, United States; University of Missouri, Graduate School, MU Informatics Institute, Columbia, MO 65211, United States
| | - Mihail Popescu
- University of Missouri, School of Medicine, Department of Health Management and Informatics, Columbia, MO 65212, United States; University of Missouri, Graduate School, MU Informatics Institute, Columbia, MO 65211, United States; University of Missouri, College of Engineering, Department of Computer Science, Columbia, MO 65211, United States
| | - Dmitry Korkin
- Worcester Polytechnic Institute, Department of Computer Science, Department of Biology and Biotechnology, Department of Applied Math, Worcester, MA 01609, United States
| | - Chi-Ren Shyu
- University of Missouri, Graduate School, MU Informatics Institute, Columbia, MO 65211, United States; University of Missouri, College of Engineering, Department of Electrical and Computer Engineering, Columbia, MO 65211, United States
| |
Collapse
|
11
|
Dumontier M, Baker CJ, Baran J, Callahan A, Chepelev L, Cruz-Toledo J, Del Rio NR, Duck G, Furlong LI, Keath N, Klassen D, McCusker JP, Queralt-Rosinach N, Samwald M, Villanueva-Rosales N, Wilkinson MD, Hoehndorf R. The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. J Biomed Semantics 2014; 5:14. [PMID: 24602174 PMCID: PMC4015691 DOI: 10.1186/2041-1480-5-14] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 02/02/2014] [Indexed: 11/10/2022] Open
Abstract
The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org.
Collapse
Affiliation(s)
- Michel Dumontier
- Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Sahoo SS, Lhatoo SD, Gupta DK, Cui L, Zhao M, Jayapandian C, Bozorgi A, Zhang GQ. Epilepsy and seizure ontology: towards an epilepsy informatics infrastructure for clinical research and patient care. J Am Med Inform Assoc 2014; 21:82-9. [PMID: 23686934 PMCID: PMC3912711 DOI: 10.1136/amiajnl-2013-001696] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Revised: 04/21/2013] [Accepted: 04/23/2013] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVE Epilepsy encompasses an extensive array of clinical and research subdomains, many of which emphasize multi-modal physiological measurements such as electroencephalography and neuroimaging. The integration of structured, unstructured, and signal data into a coherent structure for patient care as well as clinical research requires an effective informatics infrastructure that is underpinned by a formal domain ontology. METHODS We have developed an epilepsy and seizure ontology (EpSO) using a four-dimensional epilepsy classification system that integrates the latest International League Against Epilepsy terminology recommendations and National Institute of Neurological Disorders and Stroke (NINDS) common data elements. It imports concepts from existing ontologies, including the Neural ElectroMagnetic Ontologies, and uses formal concept analysis to create a taxonomy of epilepsy syndromes based on their seizure semiology and anatomical location. RESULTS EpSO is used in a suite of informatics tools for (a) patient data entry, (b) epilepsy focused clinical free text processing, and (c) patient cohort identification as part of the multi-center NINDS-funded study on sudden unexpected death in epilepsy. EpSO is available for download at http://prism.case.edu/prism/index.php/EpilepsyOntology. DISCUSSION An epilepsy ontology consortium is being created for community-driven extension, review, and adoption of EpSO. We are in the process of submitting EpSO to the BioPortal repository. CONCLUSIONS EpSO plays a critical role in informatics tools for epilepsy patient care and multi-center clinical research.
Collapse
Affiliation(s)
- Satya S Sahoo
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Samden D Lhatoo
- Department of Neurology, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Deepak K Gupta
- Department of Neurology, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Licong Cui
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Meng Zhao
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Catherine Jayapandian
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Alireza Bozorgi
- Department of Neurology, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Guo-Qiang Zhang
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
13
|
DASTGHEIB SHIMA, MESBAH ARSHAM, KOCHUT KRYS. MONTAGE: CREATING SELF-POPULATING DOMAIN ONTOLOGIES FROM LINKED OPEN DATA. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2013. [DOI: 10.1142/s1793351x1340014x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Domain-specific ontologies have become integral components of numerous semantic- and knowledge-based applications. However, creating such ontologies and populating them with correct individuals is a difficult and time-consuming process. Recently, a vast amount of knowledge has become available as part of the Linked Open Data (LOD) project, which includes data sets in multiple areas. In this paper, we present mOntage, a novel ontology design and population framework, which allows a domain expert to easily define a domain ontology schema and specify the ontology's classes and properties in terms of the subsets of the existing LOD data sources. The classes and properties of the ontology being created can be defined either directly, in terms of existing LOD-available classes and properties, or can be newly constructed by the domain expert. The definitions, called maps, are encoded as part of the ontology itself, effectively converting it into a self-populating ontology. Finally, a dedicated software system automatically populates the ontology with instances obtained from the selected LOD sources by executing suitable SPARQL queries. We illustrate our framework by creating Cancer Treatment ontology in the area of biomedicine.
Collapse
Affiliation(s)
- SHIMA DASTGHEIB
- Department of Computer Science, University of Georgia, Athens, Georgia 30602, USA
| | - ARSHAM MESBAH
- Department of Computer Science, University of Georgia, Athens, Georgia 30602, USA
| | - KRYS KOCHUT
- Department of Computer Science, University of Georgia, Athens, Georgia 30602, USA
| |
Collapse
|
14
|
Rebholz-Schuhmann D, Grabmüller C, Kavaliauskas S, Croset S, Woollard P, Backofen R, Filsell W, Clark D. A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources. Drug Discov Today 2013; 19:882-9. [PMID: 24201223 DOI: 10.1016/j.drudis.2013.10.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Revised: 09/24/2013] [Accepted: 10/28/2013] [Indexed: 10/26/2022]
Abstract
In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.
Collapse
Affiliation(s)
- Dietrich Rebholz-Schuhmann
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; Computerlinguistik, Universität Zürich, Binzmühlestrasse 14, 8050 Zürich, Switzerland.
| | - Christoph Grabmüller
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Silvestras Kavaliauskas
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Samuel Croset
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Woollard
- GlaxoSmithKline, GlaxoSmithKline Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, UK
| | - Rolf Backofen
- Albert-Ludwigs-University Freiburg, Fahnenbergplatz, D-79085 Freiburg, Germany
| | - Wendy Filsell
- Unilever R&D, Colworth Science Park, Sharnbrook MK44 1LQ, UK
| | - Dominic Clark
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
15
|
Sahoo SS, Zhang GQ, Lhatoo SD. Epilepsy informatics and an ontology-driven infrastructure for large database research and patient care in epilepsy. Epilepsia 2013; 54:1335-41. [PMID: 23647220 PMCID: PMC3774789 DOI: 10.1111/epi.12211] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/02/2013] [Indexed: 11/28/2022]
Abstract
The epilepsy community increasingly recognizes the need for a modern classification system that can also be easily integrated with effective informatics tools. The 2010 reports by the United States President's Council of Advisors on Science and Technology (PCAST) identified informatics as a critical resource to improve quality of patient care, drive clinical research, and reduce the cost of health services. An effective informatics infrastructure for epilepsy, which is underpinned by a formal knowledge model or ontology, can leverage an ever increasing amount of multimodal data to improve (1) clinical decision support, (2) access to information for patients and their families, (3) easier data sharing, and (4) accelerate secondary use of clinical data. Modeling the recommendations of the International League Against Epilepsy (ILAE) classification system in the form of an epilepsy domain ontology is essential for consistent use of terminology in a variety of applications, including electronic health records systems and clinical applications. In this review, we discuss the data management issues in epilepsy and explore the benefits of an ontology-driven informatics infrastructure and its role in adoption of a "data-driven" paradigm in epilepsy research.
Collapse
Affiliation(s)
- Satya S. Sahoo
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, U.S.A
| | - Guo-Qiang Zhang
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, U.S.A
| | - Samden D. Lhatoo
- Department of Neurology, School of Medicine, Case Western Reserve University, Cleveland, Ohio, U.S.A
| |
Collapse
|
16
|
Shotgun proteomic analysis on the diapause and non-diapause eggs of domesticated silkworm Bombyx mori. PLoS One 2013; 8:e60386. [PMID: 23580252 PMCID: PMC3620277 DOI: 10.1371/journal.pone.0060386] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 02/27/2013] [Indexed: 12/31/2022] Open
Abstract
To clarify the molecular mechanisms of silkworm diapause, it is necessary to investigate the molecular basis at protein level. Here, the spectra of peptides digested from silkworm diapause and non-diapause eggs were obtained from liquid chromatography tandem mass spectrometry (LC-MS/MS) and were analyzed by bioinformatics methods. A total of 501 and 562 proteins were identified from the diapause and non-diapause eggs respectively, of which 309 proteins were shared commonly. Among these common-expressed proteins, three main storage proteins (vitellogenin precursor, egg-specific protein and low molecular lipoprotein 30 K precursor), nine heat shock proteins (HSP19.9, 20.1, 20.4, 20.8, 21.4, 23.7, 70, 90-kDa heat shock protein and heat shock cognate protein), 37 metabolic enzymes, 22 ribosomal proteins were identified. There were 192 and 253 unique proteins identified in the diapause and non-diapause eggs respectively, of which 24 and 48 had functional annotations, these unique proteins indicated that the metabolism, translation of the mRNA and synthesis of proteins were potentially more highly represented in the non-dipause eggs than that in the diapause eggs. The relative mRNA levels of four identified proteins in the two kinds of eggs were also compared using quantitative reverse transcription PCR (qRT-PCR) and showed some inconsistencies with protein expression. GO signatures of 486 out of the 502 and 545 out of the 562 proteins identified in the diapause and non-diapause eggs respectively were available. In addition, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis showed the Metabolism, Translation and Transcription pathway were potentially more active in the non-dipause eggs at this stage.
Collapse
|
17
|
Remli MA, Deris S. An Approach for Biological Data Integration and Knowledge Retrieval Based on Ontology, Semantic Web Services Composition, and AI Planning. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
This chapter describes an approach involved in two knowledge management processes in biological fields, namely data integration and knowledge retrieval based on ontology, Web services, and Artificial Intelligence (AI) planning. For the data integration, Semantic Web combining with ontology is promising several ways to integrate a heterogeneous biological database. The goal of this work is to construct an integration approach for gram-positive bacteria organism that combines gene, protein, and pathway, thus allowing biological questions to be answered. The authors present a new perspective to retrieve knowledge by using Semantic Web services composition and Artificial Intelligence (AI) planning system, Simple Hierarchical Order Planner 2 (SHOP2). A Semantic Web service annotated with domain ontology is used to describe services for biological pathway knowledge retrieval at Kyoto Encyclopedia of Gene and Genomes (KEGG) database. The authors investigate the effectiveness of this approach by applying a real world scenario in pathway information retrieval for an organism where the biologist needs to discover the pathway description from a given specific gene of interest. Both of these two processes (data integration and knowledge retrieval) used ontology as the key role to achieve the biological goals.
Collapse
|
18
|
Gladun A, Rogushina J, Valencia-García R, Béjar RM. Semantics-driven modelling of user preferences for information retrieval in the biomedical domain. Inform Health Soc Care 2013; 38:150-70. [DOI: 10.3109/17538157.2012.735730] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- Anatoly Gladun
- International Research and Training Centre of Information Technologies and Systems, National Academy of Sciences and Ministry of Education of Ukraine, Ukraine
| | | | | | | |
Collapse
|
19
|
Abstract
The modern biomedical research and healthcare delivery domains have seen an unparalleled increase in the rate of innovation and novel technologies over the past several decades. Catalyzed by paradigm-shifting public and private programs focusing upon the formation and delivery of genomic and personalized medicine, the need for high-throughput and integrative approaches to the collection, management, and analysis of heterogeneous data sets has become imperative. This need is particularly pressing in the translational bioinformatics domain, where many fundamental research questions require the integration of large scale, multi-dimensional clinical phenotype and bio-molecular data sets. Modern biomedical informatics theory and practice has demonstrated the distinct benefits associated with the use of knowledge-based systems in such contexts. A knowledge-based system can be defined as an intelligent agent that employs a computationally tractable knowledge base or repository in order to reason upon data in a targeted domain and reproduce expert performance relative to such reasoning operations. The ultimate goal of the design and use of such agents is to increase the reproducibility, scalability, and accessibility of complex reasoning tasks. Examples of the application of knowledge-based systems in biomedicine span a broad spectrum, from the execution of clinical decision support, to epidemiologic surveillance of public data sets for the purposes of detecting emerging infectious diseases, to the discovery of novel hypotheses in large-scale research data sets. In this chapter, we will review the basic theoretical frameworks that define core knowledge types and reasoning operations with particular emphasis on the applicability of such conceptual models within the biomedical domain, and then go on to introduce a number of prototypical data integration requirements and patterns relevant to the conduct of translational bioinformatics that can be addressed via the design and use of knowledge-based systems.
Collapse
Affiliation(s)
- Philip R O Payne
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America.
| |
Collapse
|
20
|
Sahoo SS, Zhao M, Luo L, Bozorgi A, Gupta D, Lhatoo SD, Zhang GQ. OPIC: Ontology-driven Patient Information Capturing system for epilepsy. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:799-808. [PMID: 23304354 PMCID: PMC3540561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The widespread use of paper or document-based forms for capturing patient information in various clinical settings, for example in epilepsy centers, is a critical barrier for large-scale, multi-center research studies that require interoperable, consistent, and error-free data collection. This challenge can be addressed by a web-accessible and flexible patient data capture system that is supported by a common terminological system to facilitate data re-usability, sharing, and integration. We present OPIC, an Ontology-driven Patient Information Capture (OPIC) system that uses a domain-specific epilepsy and seizure ontology (EpSO) to (1) support structured entry of multi-modal epilepsy data, (2) proactively ensure quality of data through use of ontology terms in drop-down menus, and (3) identify and index clinically relevant ontology terms in free-text fields to improve accuracy of subsequent analytical queries (e.g. cohort identification). EpSO, modeled using the Web Ontology Language (OWL), conforms to the recommendations of the International League Against Epilepsy (ILAE) classification and terminological commission. OPIC has been developed using agile software engineering methodology for rapid development cycles in close collaboration with domain expert and end users. We report the result from the initial deployment of OPIC at the University Hospitals Case Medical Center (UH CMC) epilepsy monitoring unit (EMU) as part of the NIH-funded project on Sudden Unexpected Death in Epilepsy (SUDEP). Preliminary user evaluation shows that OPIC has achieved its design objectives to be an intuitive patient information capturing system that also reduces the potential for data entry errors and variability in use of epilepsy terms.
Collapse
Affiliation(s)
- Satya S Sahoo
- Division of Medical Informatics, CaseWestern Reserve University, Cleveland, OH, USA
| | | | | | | | | | | | | |
Collapse
|
21
|
Teodoro D, Pasche E, Gobeill J, Emonet S, Ruch P, Lovis C. Building a transnational biosurveillance network using semantic web technologies: requirements, design, and preliminary evaluation. J Med Internet Res 2012; 14:e73. [PMID: 22642960 PMCID: PMC3799609 DOI: 10.2196/jmir.2043] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2012] [Revised: 03/05/2012] [Accepted: 04/29/2012] [Indexed: 11/13/2022] Open
Abstract
Background Antimicrobial resistance has reached globally alarming levels and is becoming a major public health threat. Lack of efficacious antimicrobial resistance surveillance systems was identified as one of the causes of increasing resistance, due to the lag time between new resistances and alerts to care providers. Several initiatives to track drug resistance evolution have been developed. However, no effective real-time and source-independent antimicrobial resistance monitoring system is available publicly. Objective To design and implement an architecture that can provide real-time and source-independent antimicrobial resistance monitoring to support transnational resistance surveillance. In particular, we investigated the use of a Semantic Web-based model to foster integration and interoperability of interinstitutional and cross-border microbiology laboratory databases. Methods Following the agile software development methodology, we derived the main requirements needed for effective antimicrobial resistance monitoring, from which we proposed a decentralized monitoring architecture based on the Semantic Web stack. The architecture uses an ontology-driven approach to promote the integration of a network of sentinel hospitals or laboratories. Local databases are wrapped into semantic data repositories that automatically expose local computing-formalized laboratory information in the Web. A central source mediator, based on local reasoning, coordinates the access to the semantic end points. On the user side, a user-friendly Web interface provides access and graphical visualization to the integrated views. Results We designed and implemented the online Antimicrobial Resistance Trend Monitoring System (ARTEMIS) in a pilot network of seven European health care institutions sharing 70+ million triples of information about drug resistance and consumption. Evaluation of the computing performance of the mediator demonstrated that, on average, query response time was a few seconds (mean 4.3, SD 0.1×102 seconds). Clinical pertinence assessment showed that resistance trends automatically calculated by ARTEMIS had a strong positive correlation with the European Antimicrobial Resistance Surveillance Network (EARS-Net) (ρ = .86, P < .001) and the Sentinel Surveillance of Antibiotic Resistance in Switzerland (SEARCH) (ρ = .84, P < .001) systems. Furthermore, mean resistance rates extracted by ARTEMIS were not significantly different from those of either EARS-Net (∆ = ±0.130; 95% confidence interval –0 to 0.030; P < .001) or SEARCH (∆ = ±0.042; 95% confidence interval –0.004 to 0.028; P = .004). Conclusions We introduce a distributed monitoring architecture that can be used to build transnational antimicrobial resistance surveillance networks. Results indicated that the Semantic Web-based approach provided an efficient and reliable solution for development of eHealth architectures that enable online antimicrobial resistance monitoring from heterogeneous data sources. In future, we expect that more health care institutions can join the ARTEMIS network so that it can provide a large European and wider biosurveillance network that can be used to detect emerging bacterial resistance in a multinational context and support public health actions.
Collapse
|
22
|
Holford ME, McCusker JP, Cheung KH, Krauthammer M. A semantic web framework to integrate cancer omics data with biological knowledge. BMC Bioinformatics 2012; 13 Suppl 1:S10. [PMID: 22373303 PMCID: PMC3471346 DOI: 10.1186/1471-2105-13-s1-s10] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The RDF triple provides a simple linguistic means of describing limitless types of information. Triples can be flexibly combined into a unified data source we call a semantic model. Semantic models open new possibilities for the integration of variegated biological data. We use Semantic Web technology to explicate high throughput clinical data in the context of fundamental biological knowledge. We have extended Corvus, a data warehouse which provides a uniform interface to various forms of Omics data, by providing a SPARQL endpoint. With the querying and reasoning tools made possible by the Semantic Web, we were able to explore quantitative semantic models retrieved from Corvus in the light of systematic biological knowledge. RESULTS For this paper, we merged semantic models containing genomic, transcriptomic and epigenomic data from melanoma samples with two semantic models of functional data - one containing Gene Ontology (GO) data, the other, regulatory networks constructed from transcription factor binding information. These two semantic models were created in an ad hoc manner but support a common interface for integration with the quantitative semantic models. Such combined semantic models allow us to pose significant translational medicine questions. Here, we study the interplay between a cell's molecular state and its response to anti-cancer therapy by exploring the resistance of cancer cells to Decitabine, a demethylating agent. CONCLUSIONS We were able to generate a testable hypothesis to explain how Decitabine fights cancer - namely, that it targets apoptosis-related gene promoters predominantly in Decitabine-sensitive cell lines, thus conveying its cytotoxic effect by activating the apoptosis pathway. Our research provides a framework whereby similar hypotheses can be developed easily.
Collapse
|
23
|
|
24
|
Parikh PP, Minning TA, Nguyen V, Lalithsena S, Asiaee AH, Sahoo SS, Doshi P, Tarleton R, Sheth AP. A semantic problem solving environment for integrative parasite research: identification of intervention targets for Trypanosoma cruzi. PLoS Negl Trop Dis 2012; 6:e1458. [PMID: 22272365 PMCID: PMC3260319 DOI: 10.1371/journal.pntd.0001458] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2011] [Accepted: 11/18/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge. METHODOLOGY/PRINCIPAL FINDINGS We developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results. CONCLUSION/SIGNIFICANCE The SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal.
Collapse
Affiliation(s)
- Priti P. Parikh
- The Kno.e.sis Center, Department of Computer Science and Engineering, Wright State University, Dayton, Ohio, United States of America
| | - Todd A. Minning
- Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, Georgia, United States of America
| | - Vinh Nguyen
- The Kno.e.sis Center, Department of Computer Science and Engineering, Wright State University, Dayton, Ohio, United States of America
| | - Sarasi Lalithsena
- The Kno.e.sis Center, Department of Computer Science and Engineering, Wright State University, Dayton, Ohio, United States of America
| | - Amir H. Asiaee
- THINC Lab, Department of Computer Science, University of Georgia, Athens, Georgia, United States of America
| | - Satya S. Sahoo
- The Kno.e.sis Center, Department of Computer Science and Engineering, Wright State University, Dayton, Ohio, United States of America
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Prashant Doshi
- THINC Lab, Department of Computer Science, University of Georgia, Athens, Georgia, United States of America
| | - Rick Tarleton
- Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, Georgia, United States of America
| | - Amit P. Sheth
- The Kno.e.sis Center, Department of Computer Science and Engineering, Wright State University, Dayton, Ohio, United States of America
| |
Collapse
|
25
|
Harland L, Larminie C, Sansone SA, Popa S, Marshall MS, Braxenthaler M, Cantor M, Filsell W, Forster MJ, Huang E, Matern A, Musen M, Saric J, Slater T, Wilson J, Lynch N, Wise J, Dix I. Empowering industrial research with shared biomedical vocabularies. Drug Discov Today 2011; 16:940-7. [PMID: 21963522 PMCID: PMC7098809 DOI: 10.1016/j.drudis.2011.09.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2011] [Revised: 07/29/2011] [Accepted: 09/19/2011] [Indexed: 10/17/2022]
Abstract
The life science industries (including pharmaceuticals, agrochemicals and consumer goods) are exploring new business models for research and development that focus on external partnerships. In parallel, there is a desire to make better use of data obtained from sources such as human clinical samples to inform and support early research programmes. Success in both areas depends upon the successful integration of heterogeneous data from multiple providers and scientific domains, something that is already a major challenge within the industry. This issue is exacerbated by the absence of agreed standards that unambiguously identify the entities, processes and observations within experimental results. In this article we highlight the risks to future productivity that are associated with incomplete biological and chemical vocabularies and suggest a new model to address this long-standing issue.
Collapse
|
26
|
Sahoo SS, Ogbuji C, Luo L, Dong X, Cui L, Redline SS, Zhang GQ. MiDas: automatic extraction of a common domain of discourse in sleep medicine for multi-center data integration. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:1196-1205. [PMID: 22195180 PMCID: PMC3243207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Clinical studies often use data dictionaries with controlled sets of terms to facilitate data collection, limited interoperability and sharing at a local site. Multi-center retrospective clinical studies require that these data dictionaries, originating from individual participating centers, be harmonized in preparation for the integration of the corresponding clinical research data. Domain ontologies are often used to facilitate multi-center data integration by modeling terms from data dictionaries in a logic-based language, but interoperability among domain ontologies (using automated techniques) is an unresolved issue. Although many upper-level reference ontologies have been proposed to address this challenge, our experience in integrating multi-center sleep medicine data highlights the need for an upper level ontology that models a common set of terms at multiple-levels of abstraction, which is not covered by the existing upper-level ontologies. We introduce a methodology underpinned by a Minimal Domain of Discourse (MiDas) algorithm to automatically extract a minimal common domain of discourse (upper-domain ontology) from an existing domain ontology. Using the Multi-Modality, Multi-Resource Environment for Physiological and Clinical Research (Physio-MIMI) multi-center project in sleep medicine as a use case, we demonstrate the use of MiDas in extracting a minimal domain of discourse for sleep medicine, from Physio-MIMI's Sleep Domain Ontology (SDO). We then extend the resulting domain of discourse with terms from the data dictionary of the Sleep Heart and Health Study (SHHS) to validate MiDas. To illustrate the wider applicability of MiDas, we automatically extract the respective domains of discourse from 6 sample domain ontologies from the National Center for Biomedical Ontologies (NCBO) and the OBO Foundry.
Collapse
Affiliation(s)
- Satya S Sahoo
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
| | | | | | | | | | | | | |
Collapse
|
27
|
Jupp S, Klein J, Schanstra J, Stevens R. Developing a kidney and urinary pathway knowledge base. J Biomed Semantics 2011; 2 Suppl 2:S7. [PMID: 21624162 PMCID: PMC3102896 DOI: 10.1186/2041-1480-2-s2-s7] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration. RESULTS We present a Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine. A specialised KUP ontology is used to tie the various layers together, whilst background knowledge from external databases is incorporated by conversion into RDF. Using SPARQL as a query mechanism, we are able to query for proteins expressed in urine and place these back into the context of genes expressed in regions of the kidney. CONCLUSIONS The KUPKB gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions. The Semantic Web technologies we use, together with the background knowledge from the domain's ontologies, allows both rapid conversion and integration of this knowledge base. The KUPKB is still relatively small, but questions remain about scalability, maintenance and availability of the knowledge itself. AVAILABILITY The KUPKB may be accessed via http://www.e-lico.eu/kupkb.
Collapse
Affiliation(s)
- Simon Jupp
- School of Computer Science, University of Manchester, UK
| | - Julie Klein
- Institut National de la Santé et de la Recherche Médicale (INSERM), U858, Toulouse, France
- Université Toulouse III Paul-Sabatier, I2MR, IFR150, Toulouse, France
| | - Joost Schanstra
- Institut National de la Santé et de la Recherche Médicale (INSERM), U858, Toulouse, France
- Université Toulouse III Paul-Sabatier, I2MR, IFR150, Toulouse, France
| | - Robert Stevens
- School of Computer Science, University of Manchester, UK
| |
Collapse
|
28
|
Galdzicki M, Rodriguez C, Chandran D, Sauro HM, Gennari JH. Standard biological parts knowledgebase. PLoS One 2011; 6:e17005. [PMID: 21390321 PMCID: PMC3044748 DOI: 10.1371/journal.pone.0017005] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Accepted: 01/19/2011] [Indexed: 11/19/2022] Open
Abstract
We have created the Knowledgebase of Standard Biological Parts (SBPkb) as a publically accessible Semantic Web resource for synthetic biology (sbolstandard.org). The SBPkb allows researchers to query and retrieve standard biological parts for research and use in synthetic biology. Its initial version includes all of the information about parts stored in the Registry of Standard Biological Parts (partsregistry.org). SBPkb transforms this information so that it is computable, using our semantic framework for synthetic biology parts. This framework, known as SBOL-semantic, was built as part of the Synthetic Biology Open Language (SBOL), a project of the Synthetic Biology Data Exchange Group. SBOL-semantic represents commonly used synthetic biology entities, and its purpose is to improve the distribution and exchange of descriptions of biological parts. In this paper, we describe the data, our methods for transformation to SBPkb, and finally, we demonstrate the value of our knowledgebase with a set of sample queries. We use RDF technology and SPARQL queries to retrieve candidate "promoter" parts that are known to be both negatively and positively regulated. This method provides new web based data access to perform searches for parts that are not currently possible.
Collapse
Affiliation(s)
- Michal Galdzicki
- Biomedical & Health Informatics, University of Washington, Seattle, Washington, United States of America
| | - Cesar Rodriguez
- BIOFAB, University of California, Berkeley, California, United States of America
| | - Deepak Chandran
- Bioengineering, University of Washington, Seattle, Washington, United States of America
| | - Herbert M. Sauro
- Bioengineering, University of Washington, Seattle, Washington, United States of America
| | - John H. Gennari
- Biomedical & Health Informatics, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
29
|
Lister AL, Lord P, Pocock M, Wipat A. Annotation of SBML models through rule-based semantic integration. J Biomed Semantics 2010; 1 Suppl 1:S3. [PMID: 20626923 PMCID: PMC2903722 DOI: 10.1186/2041-1480-1-s1-s3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The creation of accurate quantitative Systems Biology Markup Language (SBML) models is a time-intensive, manual process often complicated by the many data sources and formats required to annotate even a small and well-scoped model. Ideally, the retrieval and integration of biological knowledge for model annotation should be performed quickly, precisely, and with a minimum of manual effort. RESULTS Here we present rule-based mediation, a method of semantic data integration applied to systems biology model annotation. The heterogeneous data sources are first syntactically converted into ontologies, which are then aligned to a small domain ontology by applying a rule base. We demonstrate proof-of-principle of this application of rule-based mediation using off-the-shelf semantic web technology through two use cases for SBML model annotation. Existing tools and technology provide a framework around which the system is built, reducing development time and increasing usability. CONCLUSIONS Integrating resources in this way accommodates multiple formats with different semantics, and provides richly-modelled biological knowledge suitable for annotation of SBML models. This initial work establishes the feasibility of rule-based mediation as part of an automated SBML model annotation system. AVAILABILITY Detailed information on the project files as well as further information on and comparisons with similar projects is available from the project page at http://cisban-silico.cs.ncl.ac.uk/RBM/.
Collapse
Affiliation(s)
- Allyson L Lister
- Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health, Newcastle University, Campus for Ageing and Vitality, Newcastle upon Tyne NE4 5PL, UK.
| | | | | | | |
Collapse
|
30
|
Colombo G, Merico D, Boncoraglio G, De Paoli F, Ellul J, Frisoni G, Nagy Z, van der Lugt A, Vassányi I, Antoniotti M. An ontological modeling approach to cerebrovascular disease studies: the NEUROWEB case. J Biomed Inform 2010; 43:469-84. [PMID: 20074662 DOI: 10.1016/j.jbi.2009.12.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2008] [Revised: 10/29/2009] [Accepted: 12/21/2009] [Indexed: 10/20/2022]
Abstract
The NEUROWEB project supports cerebrovascular researchers' association studies, intended as the search for statistical correlations between a feature (e.g., a genotype) and a phenotype. In this project the phenotype refers to the patients' pathological state, and thus it is formulated on the basis of the clinical data collected during the diagnostic activity. In order to enhance the statistical robustness of the association inquiries, the project involves four European Union clinical institutions. Each institution provides its proprietary repository, storing patients' data. Although all sites comply with common diagnostic guidelines, they also adopt specific protocols, resulting in partially discrepant repository contents. Therefore, in order to effectively exploit NEUROWEB data for association studies, it is necessary to provide a framework for the phenotype formulation, grounded on the clinical repository content which explicitly addresses the inherent integration problem. To that end, we developed an ontological model for cerebrovascular phenotypes, the NEUROWEB Reference Ontology, composed of three layers. The top-layer (Top Phenotypes) is an expert-based cerebrovascular disease taxonomy. The middle-layer deconstructs the Top Phenotypes into more elementary phenotypes (Low Phenotypes) and general-use medical concepts such as anatomical parts and topological concepts. The bottom-layer (Core Data Set, or CDS) comprises the clinical indicators required for cerebrovascular disorder diagnosis. Low Phenotypes are connected to the bottom-layer (CDS) by specifying what combination of CDS values is required for their existence. Finally, CDS elements are mapped to the local repositories of clinical data. The NEUROWEB system exploits the Reference Ontology to query the different repositories and to retrieve patients characterized by a common phenotype.
Collapse
Affiliation(s)
- Gianluca Colombo
- Dipartimento di Informatica, Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano Bicocca, U14 Viale Sarca 336, I-20126 Milan, Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Payne PRO, Embi PJ, Sen CK. Translational informatics: enabling high-throughput research paradigms. Physiol Genomics 2009; 39:131-40. [PMID: 19737991 DOI: 10.1152/physiolgenomics.00050.2009] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
A common thread throughout the clinical and translational research domains is the need to collect, manage, integrate, analyze, and disseminate large-scale, heterogeneous biomedical data sets. However, well-established and broadly adopted theoretical and practical frameworks and models intended to address such needs are conspicuously absent in the published literature or other reputable knowledge sources. Instead, the development and execution of multidisciplinary, clinical, or translational studies are significantly limited by the propagation of "silos" of both data and expertise. Motivated by this fundamental challenge, we report upon the current state and evolution of biomedical informatics as it pertains to the conduct of high-throughput clinical and translational research and will present both a conceptual and practical framework for the design and execution of informatics-enabled studies. The objective of presenting such findings and constructs is to provide the clinical and translational research community with a common frame of reference for discussing and expanding upon such models and methodologies.
Collapse
Affiliation(s)
- Philip R O Payne
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio 43210, USA.
| | | | | |
Collapse
|
32
|
Antezana E, Kuiper M, Mironov V. Biological knowledge management: the emerging role of the Semantic Web technologies. Brief Bioinform 2009; 10:392-407. [PMID: 19457869 DOI: 10.1093/bib/bbp024] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
New knowledge is produced at a continuously increasing speed, and the list of papers, databases and other knowledge sources that a researcher in the life sciences needs to cope with is actually turning into a problem rather than an asset. The adequate management of knowledge is therefore becoming fundamentally important for life scientists, especially if they work with approaches that thoroughly depend on knowledge integration, such as systems biology. Several initiatives to organize biological knowledge sources into a readily exploitable resourceome are presently being carried out. Ontologies and Semantic Web technologies revolutionize these efforts. Here, we review the benefits, trends, current possibilities, and the potential this holds for the biosciences.
Collapse
Affiliation(s)
- Erick Antezana
- Department of Biology at the Norwegian University of Science and Technology
| | | | | |
Collapse
|
33
|
Manning M, Aggarwal A, Gao K, Tucker-Kellogg G. Scaling the walls of discovery: using semantic metadata for integrative problem solving. Brief Bioinform 2009; 10:164-76. [DOI: 10.1093/bib/bbp007] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
|
34
|
Holford ME, Rajeevan H, Zhao H, Kidd KK, Cheung KH. Semantic Web-based integration of cancer pathways and allele frequency data. Cancer Inform 2009; 8:19-30. [PMID: 19458791 PMCID: PMC2664696 DOI: 10.4137/cin.s1006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
We demonstrate the use of Semantic Web technology to integrate the ALFRED allele frequency database and the Starpath pathway resource. The linking of population-specific genotype data with cancer-related pathway data is potentially useful given the growing interest in personalized medicine and the exploitation of pathway knowledge for cancer drug discovery. We model our data using the Web Ontology Language (OWL), drawing upon ideas from existing standard formats BioPAX for pathway data and PML for allele frequency data. We store our data within an Oracle database, using Oracle Semantic Technologies. We then query the data using Oracle’s rule-based inference engine and SPARQL-like RDF query language. The ability to perform queries across the domains of population genetics and pathways offers the potential to answer a number of cancer-related research questions. Among the possibilities is the ability to identify genetic variants which are associated with cancer pathways and whose frequency varies significantly between ethnic groups. This sort of information could be useful for designing clinical studies and for providing background data in personalized medicine. It could also assist with the interpretation of genetic analysis results such as those from genome-wide association studies.
Collapse
Affiliation(s)
- Matthew E Holford
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, USA
| | | | | | | | | |
Collapse
|
35
|
The NIFSTD and BIRNLex vocabularies: building comprehensive ontologies for neuroscience. Neuroinformatics 2008; 6:175-94. [PMID: 18975148 DOI: 10.1007/s12021-008-9032-z] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2008] [Accepted: 09/26/2008] [Indexed: 10/21/2022]
Abstract
A critical component of the Neuroscience Information Framework (NIF) project is a consistent, flexible terminology for describing and retrieving neuroscience-relevant resources. Although the original NIF specification called for a loosely structured controlled vocabulary for describing neuroscience resources, as the NIF system evolved, the requirement for a formally structured ontology for neuroscience with sufficient granularity to describe and access a diverse collection of information became obvious. This requirement led to the NIF standardized (NIFSTD) ontology, a comprehensive collection of common neuroscience domain terminologies woven into an ontologically consistent, unified representation of the biomedical domains typically used to describe neuroscience data (e.g., anatomy, cell types, techniques), as well as digital resources (tools, databases) being created throughout the neuroscience community. NIFSTD builds upon a structure established by the BIRNLex, a lexicon of concepts covering clinical neuroimaging research developed by the Biomedical Informatics Research Network (BIRN) project. Each distinct domain module is represented using the Web Ontology Language (OWL). As much as has been practical, NIFSTD reuses existing community ontologies that cover the required biomedical domains, building the more specific concepts required to annotate NIF resources. By following this principle, an extensive vocabulary was assembled in a relatively short period of time for NIF information annotation, organization, and retrieval, in a form that promotes easy extension and modification. We report here on the structure of the NIFSTD, and its predecessor BIRNLex, the principles followed in its construction and provide examples of its use within NIF.
Collapse
|
36
|
Abstract
Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.
Collapse
Affiliation(s)
- Zhang Zhang
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520, USA
| | | | | |
Collapse
|
37
|
Cheung KH, Kashyap V, Luciano JS, Chen H, Wang Y, Stephens S. Semantic mashup of biomedical data. J Biomed Inform 2008; 41:683-6. [PMID: 18703163 PMCID: PMC3742004 DOI: 10.1016/j.jbi.2008.08.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2008] [Revised: 07/30/2008] [Accepted: 08/05/2008] [Indexed: 12/24/2022]
Affiliation(s)
- KH Cheung
- Yale Center for Medical Informatics and Departments of Anesthesiology and Genetics, School of Medicine, Computer Science Department, Yale University, P.O. Box 208009, New Haven, CT 06520, USA
| | - V Kashyap
- Clinical Informatics R&D, Partners HealthCare System, Wellesley, Massachusetts, USA
| | | | - H Chen
- College of Computer Science, Zhejiang University, Hangzhou, China
| | - Y Wang
- Lilly Singapore Centre for Drug Discovery, Singapore
| | - S Stephens
- Discovery IT, Eli Lilly, Boston, MA, USA
| |
Collapse
|