1
|
Martinez K, Agirre J, Akune Y, Aoki-Kinoshita KF, Arighi C, Axelsen KB, Bolton E, Bordeleau E, Edwards NJ, Fadda E, Feizi T, Hayes C, Ives CM, Joshi HJ, Krishna Prasad K, Kossida S, Lisacek F, Liu Y, Lütteke T, Ma J, Malik A, Martin M, Mehta AY, Neelamegham S, Panneerselvam K, Ranzinger R, Ricard-Blum S, Sanou G, Shanker V, Thomas PD, Tiemeyer M, Urban J, Vita R, Vora J, Yamamoto Y, Mazumder R. Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy. Database (Oxford) 2024; 2024:baae073. [PMID: 39137905 PMCID: PMC11321244 DOI: 10.1093/database/baae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/24/2024] [Accepted: 07/10/2024] [Indexed: 08/15/2024]
Abstract
Dynamic changes in protein glycosylation impact human health and disease progression. However, current resources that capture disease and phenotype information focus primarily on the macromolecules within the central dogma of molecular biology (DNA, RNA, proteins). To gain a better understanding of organisms, there is a need to capture the functional impact of glycans and glycosylation on biological processes. A workshop titled "Functional impact of glycans and their curation" was held in conjunction with the 16th Annual International Biocuration Conference to discuss ongoing worldwide activities related to glycan function curation. This workshop brought together subject matter experts, tool developers, and biocurators from over 20 projects and bioinformatics resources. Participants discussed four key topics for each of their resources: (i) how they curate glycan function-related data from publications and other sources, (ii) what type of data they would like to acquire, (iii) what data they currently have, and (iv) what standards they use. Their answers contributed input that provided a comprehensive overview of state-of-the-art glycan function curation and annotations. This report summarizes the outcome of discussions, including potential solutions and areas where curators, data wranglers, and text mining experts can collaborate to address current gaps in glycan and glycosylation annotations, leveraging each other's work to improve their respective resources and encourage impactful data sharing among resources. Database URL: https://wiki.glygen.org/Glycan_Function_Workshop_2023.
Collapse
Affiliation(s)
- Karina Martinez
- Department of Biochemistry & Molecular Medicine, The George Washington University School of Medicine and Health Sciences, 2300 I St. NW, Washington, DC 20052, United States
| | - Jon Agirre
- York Structural Biology Laboratory, Department of Chemistry, University of York, Wentworth Way, York YO10 5DD, United Kingdom
| | - Yukie Akune
- The Glycosciences Laboratory, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, United Kingdom
| | - Kiyoko F Aoki-Kinoshita
- Glycan and Life Systems Integration Center (GaLSIC), Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Ave, Newark, DE 19716, United States
| | - Kristian B Axelsen
- Swiss-Prot Group, Swiss Institute of Bioinformatics (SIB), CMU, 1 rue Michel Servet, Geneva 4 1211, Switzerland
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, United States
| | - Emily Bordeleau
- Michael Smith Laboratories, The University of British Columbia, 2185 East Mall, Vancouver, British Columbia V6T 1Z4, Canada
| | - Nathan J Edwards
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, 2115 Wisconsin Ave NW, Washington, DC 20007, United States
| | - Elisa Fadda
- Department of Chemistry and Hamilton Institute, Maynooth University, Kilcock Road, Maynooth, Co. Kildare W23 AH3Y, Ireland
| | - Ten Feizi
- The Glycosciences Laboratory, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, United Kingdom
| | - Catherine Hayes
- Proteome Informatics Group, Swiss Institute of Bioinformatics (SIB), route de Drize 7, Geneva CH-1227, Switzerland
| | - Callum M Ives
- Department of Chemistry and Hamilton Institute, Maynooth University, Kilcock Road, Maynooth, Co. Kildare W23 AH3Y, Ireland
| | - Hiren J Joshi
- Copenhagen Center for Glycomics, Department of Cellular and Molecular Medicine, Faculty of Health Sciences, University of Copenhagen, Blegdamsvej 3, Copenhagen DK-2200, Denmark
| | - Khakurel Krishna Prasad
- ELI Beamlines Facility, The Extreme Light Infrastructure ERIC, Za Radnicí 835, Dolní Břežany 25241, Czech Republic
| | - Sofia Kossida
- IMGT, The International ImMunoGeneTics Information System, National Center for Scientific Research (CNRS), Institute of Human Genetics (IGH), University of Montpellier (UM), 141 rue de la Cardonille, Montpellier 34 090, France
| | - Frederique Lisacek
- Proteome Informatics Group, Swiss Institute of Bioinformatics (SIB), route de Drize 7, Geneva CH-1227, Switzerland
| | - Yan Liu
- The Glycosciences Laboratory, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, United Kingdom
| | - Thomas Lütteke
- Institute of Veterinary Physiology and Biochemistry, Justus-Liebig-University Gießen, Frankfurter Str. 100, Gießen 35392, Germany
| | - Junfeng Ma
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, 3900 Reservior Road NW, Washington, DC 20007, United States
| | - Adnan Malik
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Akul Y Mehta
- Department of Surgery, Beth Israel Deaconess Medical Center, National Center for Functional Glycomics, Harvard Medical School, 330 Brookline Avenue, Boston, MA 02215, United States
| | - Sriram Neelamegham
- Departments of Chemical & Biological Engineering, Biomedical Engineering and Medicine, University at Buffalo, State University of New York, 906 Furnas Hall, Buffalo, NY 14260, United States
| | - Kalpana Panneerselvam
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - René Ranzinger
- Complex Carbohydrate Research Center, University of Georgia, 315 Riverbend Rd, Athens, GA 30602, United States
| | - Sylvie Ricard-Blum
- Institute of Molecular and Supramolecular Chemistry and Biochemistry (ICBMS), UMR 5246, University Lyon 1, CNRS, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex F-69622, France
| | - Gaoussou Sanou
- IMGT, The International ImMunoGeneTics Information System, National Center for Scientific Research (CNRS), Institute of Human Genetics (IGH), University of Montpellier (UM), 141 rue de la Cardonille, Montpellier 34 090, France
| | - Vijay Shanker
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Ave, Newark, DE 19716, United States
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, 2001 N Soto Street, Los Angeles, CA 90032, United States
| | - Michael Tiemeyer
- Complex Carbohydrate Research Center, University of Georgia, 315 Riverbend Rd, Athens, GA 30602, United States
| | - James Urban
- Department of Chemistry and Molecular Biology, University of Gothenburg, Medicinaregatan 7 B, Gothenburg 41390, Sweden
| | - Randi Vita
- Immune Epitope Database and Analysis Project, La Jolla Institute for Allergy & Immunology, 9420 Athena Circle, La Jolla, CA 92037, United States
| | - Jeet Vora
- Department of Biochemistry & Molecular Medicine, The George Washington University School of Medicine and Health Sciences, 2300 I St. NW, Washington, DC 20052, United States
| | - Yasunori Yamamoto
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871, Japan
| | - Raja Mazumder
- Department of Biochemistry & Molecular Medicine, The George Washington University School of Medicine and Health Sciences, 2300 I St. NW, Washington, DC 20052, United States
| |
Collapse
|
2
|
Müller-Dott S, Tsirvouli E, Vazquez M, Ramirez Flores R, Badia-i-Mompel P, Fallegger R, Türei D, Lægreid A, Saez-Rodriguez J. Expanding the coverage of regulons from high-confidence prior knowledge for accurate estimation of transcription factor activities. Nucleic Acids Res 2023; 51:10934-10949. [PMID: 37843125 PMCID: PMC10639077 DOI: 10.1093/nar/gkad841] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 08/08/2023] [Accepted: 09/22/2023] [Indexed: 10/17/2023] Open
Abstract
Gene regulation plays a critical role in the cellular processes that underlie human health and disease. The regulatory relationship between transcription factors (TFs), key regulators of gene expression, and their target genes, the so called TF regulons, can be coupled with computational algorithms to estimate the activity of TFs. However, to interpret these findings accurately, regulons of high reliability and coverage are needed. In this study, we present and evaluate a collection of regulons created using the CollecTRI meta-resource containing signed TF-gene interactions for 1186 TFs. In this context, we introduce a workflow to integrate information from multiple resources and assign the sign of regulation to TF-gene interactions that could be applied to other comprehensive knowledge bases. We find that the signed CollecTRI-derived regulons outperform other public collections of regulatory interactions in accurately inferring changes in TF activities in perturbation experiments. Furthermore, we showcase the value of the regulons by examining TF activity profiles in three different cancer types and exploring TF activities at the level of single-cells. Overall, the CollecTRI-derived TF regulons enable the accurate and comprehensive estimation of TF activities and thereby help to interpret transcriptomics data.
Collapse
Affiliation(s)
- Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Eirini Tsirvouli
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | | | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Pau Badia-i-Mompel
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Robin Fallegger
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Dénes Türei
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Astrid Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| |
Collapse
|
3
|
He Y. Development and Applications of Interoperable Biomedical Ontologies for Integrative Data and Knowledge Representation and Multiscale Modeling in Systems Medicine. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2486:233-244. [PMID: 35437726 DOI: 10.1007/978-1-0716-2265-0_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The data FAIR Guiding Principles state that all data should be Findable, Accessible, Interoperable, and Reusable. Ontology is critical to data integration, sharing, and analysis. Given thousands of ontologies have been developed in the era of artificial intelligence, it is critical to have interoperable ontologies to support standardized data and knowledge presentation and reasoning. For interoperable ontology development, the eXtensible ontology development (XOD) strategy offers four principles including ontology term reuse, semantic alignment, ontology design pattern usage, and community extensibility. Many software programs are available to help implement these principles. As a demonstration, the XOD strategy is applied to developing the interoperable Coronavirus Infectious Disease Ontology (CIDO). Various applications of interoperable ontologies, such as COVID-19 and kidney precision medicine research, are also introduced in this chapter.
Collapse
Affiliation(s)
- Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
| |
Collapse
|
4
|
Wang Z, He Y. Precision omics data integration and analysis with interoperable ontologies and their application for COVID-19 research. Brief Funct Genomics 2021; 20:235-248. [PMID: 34159360 PMCID: PMC8287950 DOI: 10.1093/bfgp/elab029] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 05/10/2021] [Accepted: 05/24/2021] [Indexed: 12/12/2022] Open
Abstract
Omics technologies are widely used in biomedical research. Precision medicine focuses on individual-level disease treatment and prevention. Here, we propose the usage of the term 'precision omics' to represent the combinatorial strategy that applies omics to translate large-scale molecular omics data for precision disease understanding and accurate disease diagnosis, treatment and prevention. Given the complexity of both omics and precision medicine, precision omics requires standardized representation and integration of heterogeneous data types. Ontology has emerged as an important artificial intelligence component to become critical for standard data and metadata representation, standardization and integration. To support precision omics, we propose a precision omics ontology hypothesis, which hypothesizes that the effectiveness of precision omics is positively correlated with the interoperability of ontologies used for data and knowledge integration. Therefore, to make effective precision omics studies, interoperable ontologies are required to standardize and incorporate heterogeneous data and knowledge in a human- and computer-interpretable manner. Methods for efficient development and application of interoperable ontologies are proposed and illustrated. With the interoperable omics data and knowledge, omics tools such as OmicsViz can also be evolved to process, integrate, visualize and analyze various omics data, leading to the identification of new knowledge and hypotheses of molecular mechanisms underlying the outcomes of diseases such as COVID-19. Given extensive COVID-19 omics research, we propose the strategy of precision omics supported by interoperable ontologies, accompanied with ontology-based semantic reasoning and machine learning, leading to systematic disease mechanism understanding and rational design of precision treatment and prevention. SHORT ABSTRACT Precision medicine focuses on individual-level disease treatment and prevention. Precision omics is a new strategy that applies omics for precision medicine research, which requires standardized representation and integration of individual genetics and phenotypes, experimental conditions, and data analysis settings. Ontology has emerged as an important artificial intelligence component to become critical for standard data and metadata representation, standardization and integration. To support precision omics, interoperable ontologies are required in order to standardize and incorporate heterogeneous data and knowledge in a human- and computer-interpretable manner. With the interoperable omics data and knowledge, omics tools such as OmicsViz can also be evolved to process, integrate, visualize and analyze various omics data, leading to the identification of new knowledge and hypotheses of molecular mechanisms underlying disease outcomes. The precision COVID-19 omics study is provided as the primary use case to illustrate the rationale and implementation of the precision omics strategy.
Collapse
Affiliation(s)
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI, USA
| |
Collapse
|
5
|
Masci AM, White S, Neely B, Ardini-Polaske M, Hill CB, Misra RS, Aronow B, Gaddis N, Yang L, Wert SE, Palmer SM, Chan C. Ontology-guided segmentation and object identification for developmental mouse lung immunofluorescent images. BMC Bioinformatics 2021; 22:82. [PMID: 33622235 PMCID: PMC7901098 DOI: 10.1186/s12859-021-04008-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 02/08/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Immunofluorescent confocal microscopy uses labeled antibodies as probes against specific macromolecules to discriminate between multiple cell types. For images of the developmental mouse lung, these cells are themselves organized into densely packed higher-level anatomical structures. These types of images can be challenging to segment automatically for several reasons, including the relevance of biomedical context, dependence on the specific set of probes used, prohibitive cost of generating labeled training data, as well as the complexity and dense packing of anatomical structures in the image. The use of an application ontology helps surmount these challenges by combining image data with its metadata to provide a meaningful biological context, modeled after how a human expert would make use of contextual information to identify histological structures, that constrains and simplifies the process of segmentation and object identification. RESULTS We propose an innovative approach for the semi-supervised analysis of complex and densely packed anatomical structures from immunofluorescent images that utilizes an application ontology to provide a simplified context for image segmentation and object identification. We describe how the logical organization of biological facts in the form of an ontology can provide useful constraints that facilitate automatic processing of complex images. We demonstrate the results of ontology-guided segmentation and object identification in mouse developmental lung images from the Bioinformatics REsource ATlas for the Healthy lung database of the Molecular Atlas of Lung Development (LungMAP1) program CONCLUSION: We describe a novel ontology-guided approach to segmentation and classification of complex immunofluorescence images of the developing mouse lung. The ontology is used to automatically generate constraints for each image based on its biomedical context, which facilitates image segmentation and classification.
Collapse
Affiliation(s)
- Anna Maria Masci
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA.
| | - Scott White
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Ben Neely
- Duke Crucible, Duke University, Durham, NC, USA
| | | | - Carol B Hill
- Duke Clinical Research Institute, Duke School of Medicine, Durham, NC, USA
| | - Ravi S Misra
- Department of Pediatrics, University of Rochester Medical Center, Rochester, NY, USA
| | - Bruce Aronow
- Departments of Biomedical Informatics, Developmental Biology, and Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | | | - Lina Yang
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Susan E Wert
- Department of Pediatrics Perinatal Institute Divisions of Neonatology, Perinatal and Pulmonary Biology Cincinnati Children's Hospital Medical Center/Research Foundation, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Scott M Palmer
- Vice Chair for Research, Department of Medicine, Director, Respiratory Research, Duke Clinical Research Institute, Duke University Medical Center, Durham, NC, USA
| | - Cliburn Chan
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| |
Collapse
|
6
|
Liu Y, Hur J, Chan WKB, Wang Z, Xie J, Sun D, Handelman S, Sexton J, Yu H, He Y. Ontological modeling and analysis of experimentally or clinically verified drugs against coronavirus infection. Sci Data 2021; 8:16. [PMID: 33441564 PMCID: PMC7806933 DOI: 10.1038/s41597-021-00799-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 12/14/2020] [Indexed: 12/25/2022] Open
Abstract
Our systematic literature collection and annotation identified 106 chemical drugs and 31 antibodies effective against the infection of at least one human coronavirus (including SARS-CoV, SAR-CoV-2, and MERS-CoV) in vitro or in vivo in an experimental or clinical setting. A total of 163 drug protein targets were identified, and 125 biological processes involving the drug targets were significantly enriched based on a Gene Ontology (GO) enrichment analysis. The Coronavirus Infectious Disease Ontology (CIDO) was used as an ontological platform to represent the anti-coronaviral drugs, chemical compounds, drug targets, biological processes, viruses, and the relations among these entities. In addition to new term generation, CIDO also adopted various terms from existing ontologies and developed new relations and axioms to semantically represent our annotated knowledge. The CIDO knowledgebase was systematically analyzed for scientific insights. To support rational drug design, a "Host-coronavirus interaction (HCI) checkpoint cocktail" strategy was proposed to interrupt the important checkpoints in the dynamic HCI network, and ontologies would greatly support the design process with interoperable knowledge representation and reasoning.
Collapse
Affiliation(s)
- Yingtong Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Junguk Hur
- University of North Dakota School of Medicine and Health Sciences, Grand Forks, ND, 58202, USA
| | - Wallace K B Chan
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Zhigang Wang
- Department of Biomedical Engineering, Institute of Basic Medical Sciences and School of Basic Medicine, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, 100005, China
| | - Jiangan Xie
- School of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Duxin Sun
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Samuel Handelman
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- U-M Center for Drug Repurposing, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jonathan Sexton
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- U-M Center for Drug Repurposing, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Medicinal Chemistry, College of Pharmacy, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Hong Yu
- Department of Respiratory and Critical Care Medicine, Guizhou Province People's Hospital and NHC Key Laboratory of Immunological Diseases, People's Hospital of Guizhou University, Guiyang, Guizhou, 550002, China
- Department of Basic Medicine, Guizhou University Medical College, Guiyang, Guizhou, 550025, China
| | - Yongqun He
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
7
|
Sarwar DM, Nickerson DP. CellML Model Discovery with the Physiome Model Repository. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11681-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
8
|
Li X, Lin X, Ren H, Guo J. Ontological Organization and Bioinformatic Analysis of Adverse Drug Reactions From Package Inserts: Development and Usability Study. J Med Internet Res 2020; 22:e20443. [PMID: 32706718 PMCID: PMC7400033 DOI: 10.2196/20443] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 06/11/2020] [Accepted: 06/14/2020] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Licensed drugs may cause unexpected adverse reactions in patients, resulting in morbidity, risk of mortality, therapy disruptions, and prolonged hospital stays. Officially approved drug package inserts list the adverse reactions identified from randomized controlled clinical trials with high evidence levels and worldwide postmarketing surveillance. Formal representation of the adverse drug reaction (ADR) enclosed in semistructured package inserts will enable deep recognition of side effects and rational drug use, substantially reduce morbidity, and decrease societal costs. OBJECTIVE This paper aims to present an ontological organization of traceable ADR information extracted from licensed package inserts. In addition, it will provide machine-understandable knowledge for bioinformatics analysis, semantic retrieval, and intelligent clinical applications. METHODS Based on the essential content of package inserts, a generic ADR ontology model is proposed from two dimensions (and nine subdimensions), covering the ADR information and medication instructions. This is followed by a customized natural language processing method programmed with Python to retrieve the relevant information enclosed in package inserts. After the biocuration and identification of retrieved data from the package insert, an ADR ontology is automatically built for further bioinformatic analysis. RESULTS We collected 165 package inserts of quinolone drugs from the National Medical Products Administration and other drug databases in China, and built a specialized ADR ontology containing 2879 classes and 15,711 semantic relations. For each quinolone drug, the reported ADR information and medication instructions have been logically represented and formally organized in an ADR ontology. To demonstrate its usage, the source data were further bioinformatically analyzed. For example, the number of drug-ADR triples and major ADRs associated with each active ingredient were recorded. The 10 ADRs most frequently observed among quinolones were identified and categorized based on the 18 categories defined in the proposal. The occurrence frequency, severity, and ADR mitigation method explicitly stated in package inserts were also analyzed, as well as the top 5 specific populations with contraindications for quinolone drugs. CONCLUSIONS Ontological representation and organization using officially approved information from drug package inserts enables the identification and bioinformatic analysis of adverse reactions caused by a specific drug with regard to predefined ADR ontology classes and semantic relations. The resulting ontology-based ADR knowledge source classifies drug-specific adverse reactions, and supports a better understanding of ADRs and safer prescription of medications.
Collapse
Affiliation(s)
- Xiaoying Li
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Xin Lin
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Huiling Ren
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Jinjing Guo
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
9
|
He Y, Wang H, Zheng J, Beiting DP, Masci AM, Yu H, Liu K, Wu J, Curtis JL, Smith B, Alekseyenko AV, Obeid JS. OHMI: the ontology of host-microbiome interactions. J Biomed Semantics 2019; 10:25. [PMID: 31888755 PMCID: PMC6937947 DOI: 10.1186/s13326-019-0217-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Accepted: 12/04/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Host-microbiome interactions (HMIs) are critical for the modulation of biological processes and are associated with several diseases. Extensive HMI studies have generated large amounts of data. We propose that the logical representation of the knowledge derived from these data and the standardized representation of experimental variables and processes can foster integration of data and reproducibility of experiments and thereby further HMI knowledge discovery. METHODS Through a multi-institutional collaboration, a community-based Ontology of Host-Microbiome Interactions (OHMI) was developed following the Open Biological/Biomedical Ontologies (OBO) Foundry principles. As an OBO library ontology, OHMI leverages established ontologies to create logically structured representations of (1) microbiomes, microbial taxonomy, host species, host anatomical entities, and HMIs under different conditions and (2) associated study protocols and types of data analysis and experimental results. RESULTS Aligned with the Basic Formal Ontology, OHMI comprises over 1000 terms, including terms imported from more than 10 existing ontologies together with some 500 OHMI-specific terms. A specific OHMI design pattern was generated to represent typical host-microbiome interaction studies. As one major OHMI use case, drawing on data from over 50 peer-reviewed publications, we identified over 100 bacteria and fungi from the gut, oral cavity, skin, and airway that are associated with six rheumatic diseases including rheumatoid arthritis. Our ontological study identified new high-level microbiota taxonomical structures. Two microbiome-related competency questions were also designed and addressed. We were also able to use OHMI to represent statistically significant results identified from a large existing microbiome database data analysis. CONCLUSION OHMI represents entities and relations in the domain of HMIs. It supports shared knowledge representation, data and metadata standardization and integration, and can be used in formulation of advanced queries for purposes of data analysis.
Collapse
Affiliation(s)
- Yongqun He
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
| | - Haihe Wang
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
- Daqing Branch of Harbin Medical University, Daqing, 163319 Heilongjiang China
| | - Jie Zheng
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104 USA
| | - Daniel P. Beiting
- University of Pennsylvania School of Veterinary Medicine, Philadelphia, PA 19104 USA
| | - Anna Maria Masci
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710 USA
| | - Hong Yu
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
- People’s Hospital of Guizhou Province, Guiyang, 550025 Guizhou China
| | - Kaiyong Liu
- School of Public Health, Anhui Medical University, No 81 Meishan Road, Hefei, 230032 Anhui China
| | - Jianmin Wu
- Center for Cancer Bioinformatics, Peking University Cancer Hospital & Institute, Beijing, 100142 China
| | - Jeffrey L. Curtis
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
- Pulmonary & Critical Care Medicine Section, Medical Service, VA Ann Arbor Healthcare System, Ann Arbor, MI 48105 USA
| | - Barry Smith
- University at Buffalo, Buffalo, NY 14260 USA
| | - Alexander V. Alekseyenko
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425 USA
| | - Jihad S. Obeid
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425 USA
| |
Collapse
|
10
|
Pyysalo S, Baker S, Ali I, Haselwimmer S, Shah T, Young A, Guo Y, Högberg J, Stenius U, Narita M, Korhonen A. LION LBD: a literature-based discovery system for cancer biology. Bioinformatics 2019; 35:1553-1561. [PMID: 30304355 PMCID: PMC6499247 DOI: 10.1093/bioinformatics/bty845] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 09/19/2018] [Accepted: 10/08/2018] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION The overwhelming size and rapid growth of the biomedical literature make it impossible for scientists to read all studies related to their work, potentially leading to missed connections and wasted time and resources. Literature-based discovery (LBD) aims to alleviate these issues by identifying implicit links between disjoint parts of the literature. While LBD has been studied in depth since its introduction three decades ago, there has been limited work making use of recent advances in biomedical text processing methods in LBD. RESULTS We present LION LBD, a literature-based discovery system that enables researchers to navigate published information and supports hypothesis generation and testing. The system is built with a particular focus on the molecular biology of cancer using state-of-the-art machine learning and natural language processing methods, including named entity recognition and grounding to domain ontologies covering a wide range of entity types and a novel approach to detecting references to the hallmarks of cancer in text. LION LBD implements a broad selection of co-occurrence based metrics for analyzing the strength of entity associations, and its design allows real-time search to discover indirect associations between entities in a database of tens of millions of publications while preserving the ability of users to explore each mention in its original context in the literature. Evaluations of the system demonstrate its ability to identify undiscovered links and rank relevant concepts highly among potential connections. AVAILABILITY AND IMPLEMENTATION The LION LBD system is available via a web-based user interface and a programmable API, and all components of the system are made available under open licenses from the project home page http://lbd.lionproject.net. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sampo Pyysalo
- Language Technology Lab, Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| | - Simon Baker
- Language Technology Lab, Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| | - Imran Ali
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Stefan Haselwimmer
- Language Technology Lab, Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| | - Tejas Shah
- Language Technology Lab, Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| | - Andrew Young
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK
| | - Yufan Guo
- Language Technology Lab, Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| | - Johan Högberg
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Ulla Stenius
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Masashi Narita
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK
| | - Anna Korhonen
- Language Technology Lab, Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| |
Collapse
|
11
|
Xu H, Wang Y, Diao L, Wang X, Zhang Y, Zhu J, Liu J, Yao J, Liu Z, Li Y, He F, Wang Z, Liu Y, Li D. UVGD 1.0: a gene-centric database bridging ultraviolet radiation and molecular biology effects in organisms. Int J Radiat Biol 2019; 95:1172-1177. [PMID: 31021279 DOI: 10.1080/09553002.2019.1609127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Objectives: Exposing to ultraviolet for a certain time will trigger some significant molecular biology effects in an organism. In the past few decades, varied ultraviolet-associated biological effects as well as their related genes, have been discovered under biologists' efforts. However, information about ultraviolet-related genes is dispersed in thousands of scientific papers, and there is still no study emphasizing on the systematic collection of ultraviolet-related genes. Methods: We collected ultraviolet-related genes and built this gene-centric database UVGD based on literature mining and manual curation. Literature mining was based on the ultraviolet-related abstracts downloaded from PubMed, and we obtained sentences in which ultraviolet keywords and genes co-occur at single-sentence level by using bio-entity recognizer. After that, manual curation was implemented in order to identify whether the genes are related to ultraviolet or not. Results: We built the ultraviolet-related knowledge base UVGD 1.0 (URL: http://biokb.ncpsb.org/UVGD/ ), which contains 663 ultraviolet-related genes, together with 17 associated biological processes, 117 associated phenotypes, and 2628 MeSH terms. Conclusion: UVGD is helpful to understand the ultraviolet-related biological processes in organisms and we believe it would be useful for biologists to study the responding mechanisms to ultraviolet.
Collapse
Affiliation(s)
- Hao Xu
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Yan Wang
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Lihong Diao
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Xun Wang
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Yi Zhang
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Jiarun Zhu
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Jinying Liu
- b School of Traditional Chinese Medicine, Beijing University of Chinese Medicine , Beijing , China
| | - Jingwen Yao
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Zhongyang Liu
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Yang Li
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Fuchu He
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Zhidong Wang
- c Beijing Institute of Radiation Medicine , Beijing , China
| | - Yuan Liu
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Dong Li
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| |
Collapse
|
12
|
Pan H, Bian X, Yang S, He Y, Yang X, Liu Y. The cell line ontology-based representation, integration and analysis of cell lines used in China. BMC Bioinformatics 2019; 20:179. [PMID: 31272367 PMCID: PMC6509802 DOI: 10.1186/s12859-019-2724-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Chinese National Infrastructure of Cell Line stores and distributes cell lines for biomedical research in China. This study aims to represent and integrate the information of NICR cell lines into the community-based Cell Line Ontology (CLO). RESULTS We have aligned, represented, and added all identified 2704 cell line cells in NICR to CLO. We also proposed new ontology design patterns to represent the usage of cell line cells as disease models by inducing tumor formation in model organisms, and the relations between cell line cells and their expressed or overexpressed genes or proteins. The resulting CLO-NICR ontology also includes the Chinese representation of the NICR cell line information. CLO-NICR was merged into the general CLO. To serve the cell research community in China, the Chinese version of CLO-NICR was also generated and deposited in the OntoChina ontology repository. The usage of CLO-NICR was demonstrated by DL query and knowledge extraction. CONCLUSIONS In summary, all identified cell lines from NICR are represented by the semantics framework of CLO and incorporated into CLO as a most recent update. We also generated a CLO-NICR and its Chinese view (CLO-NICR-Cv). The development of CLO-NICR and CLO-NIC-Cv allows the integration of the cell lines from NICR into the community-based CLO ontology and provides an integrative platform to support different applications of CLO in China.
Collapse
Affiliation(s)
- Hongjie Pan
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Xiaocui Bian
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Sheng Yang
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
| | - Xiaolin Yang
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Yuqin Liu
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
13
|
He Y, Duncan WD, Cooper DJ, Hansen J, Iyengar R, Ong E, Walker K, Tibi O, Smith S, Serra LM, Zheng J, Sarntivijai S, Schürer S, O'Shea KS, Diehl AD. OSCI: standardized stem cell ontology representation and use cases for stem cell investigation. BMC Bioinformatics 2019; 20:180. [PMID: 31272389 PMCID: PMC6509805 DOI: 10.1186/s12859-019-2723-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
Background Stem cells and stem cell lines are widely used in biomedical research. The Cell Ontology (CL) and Cell Line Ontology (CLO) are two community-based OBO Foundry ontologies in the domains of in vivo cells and in vitro cell line cells, respectively. Results To support standardized stem cell investigations, we have developed an Ontology for Stem Cell Investigations (OSCI). OSCI imports stem cell and cell line terms from CL and CLO, and investigation-related terms from existing ontologies. A novel focus of OSCI is its application in representing metadata types associated with various stem cell investigations. We also applied OSCI to systematically categorize experimental variables in an induced pluripotent stem cell line cell study related to bipolar disorder. In addition, we used a semi-automated literature mining approach to identify over 200 stem cell gene markers. The relations between these genes and stem cells are modeled and represented in OSCI. Conclusions OSCI standardizes stem cells found in vivo and in vitro and in various stem cell investigation processes and entities. The presented use cases demonstrate the utility of OSCI in iPSC studies and literature mining related to bipolar disorder.
Collapse
Affiliation(s)
- Yongqun He
- University of Michigan Medical School, Ann Arbor, MI, USA.
| | | | | | - Jens Hansen
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,SBCNY, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ravi Iyengar
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,SBCNY, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Edison Ong
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Kendal Walker
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Omar Tibi
- John Hopkins Unversity, Baltimore, MD, USA
| | | | - Lucas M Serra
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Jie Zheng
- University of Pennsylvania, Philadelphia, PA, USA
| | | | | | - K Sue O'Shea
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Alexander D Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA.
| |
Collapse
|
14
|
Sarntivijai S, He Y, Diehl AD. Cells in ExperimentaL Life Sciences (CELLS-2018): capturing the knowledge of normal and diseased cells with ontologies. BMC Bioinformatics 2019; 20:183. [PMID: 31272374 PMCID: PMC6509796 DOI: 10.1186/s12859-019-2721-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Cell cultures and cell lines are widely used in life science experiments. In conjunction with the 2018 International Conference on Biomedical Ontology (ICBO-2018), the 2nd International Workshop on Cells in ExperimentaL Life Science (CELLS-2018) focused on two themes of knowledge representation, for newly-discovered cell types and for cells in disease states. This workshop included five oral presentations and a general discussion session. Two new ontologies, including the Cancer Cell Ontology (CCL) and the Ontology for Stem Cell Investigations (OSCI), were reported in the workshop. In another representation, the Cell Line Ontology (CLO) framework was applied and extended to represent cell line cells used in China and their Chinese representation. Other presentations included a report on the application of ontologies to cross-compare cell types and marker patterns used in flow cytometry studies, and a presentation on new experimental findings about novel cell types based on single cell RNA sequencing assay and their corresponding ontological representation. The general discussion session focused on the ontology design patterns in representing newly-discovered cell types and cells in disease states.
Collapse
Affiliation(s)
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI USA
| | - Alexander D. Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
| |
Collapse
|
15
|
Stanford NJ, Scharm M, Dobson PD, Golebiewski M, Hucka M, Kothamachu VB, Nickerson D, Owen S, Pahle J, Wittig U, Waltemath D, Goble C, Mendes P, Snoep J. Data Management in Computational Systems Biology: Exploring Standards, Tools, Databases, and Packaging Best Practices. Methods Mol Biol 2019; 2049:285-314. [PMID: 31602618 DOI: 10.1007/978-1-4939-9736-7_17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Computational systems biology involves integrating heterogeneous datasets in order to generate models. These models can assist with understanding and prediction of biological phenomena. Generating datasets and integrating them into models involves a wide range of scientific expertise. As a result these datasets are often collected by one set of researchers, and exchanged with others researchers for constructing the models. For this process to run smoothly the data and models must be FAIR-findable, accessible, interoperable, and reusable. In order for data and models to be FAIR they must be structured in consistent and predictable ways, and described sufficiently for other researchers to understand them. Furthermore, these data and models must be shared with other researchers, with appropriately controlled sharing permissions, before and after publication. In this chapter we explore the different data and model standards that assist with structuring, describing, and sharing. We also highlight the popular standards and sharing databases within computational systems biology.
Collapse
Affiliation(s)
| | - Martin Scharm
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| | - Paul D Dobson
- School of Computer Science, University of Manchester, Manchester, UK
| | - Martin Golebiewski
- Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Michael Hucka
- Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | | | - David Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Stuart Owen
- School of Computer Science, University of Manchester, Manchester, UK
| | - Jürgen Pahle
- BIOMS/BioQuant, Heidelberg University, Heidelberg, Germany.
| | - Ulrike Wittig
- Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany
| | - Dagmar Waltemath
- Medical Informatics, University Medicine Greifswald, Greifswald, Germany
| | - Carole Goble
- School of Computer Science, University of Manchester, Manchester, UK
| | - Pedro Mendes
- Centre for Quantitative Medicine, University of Connecticut, Farmington, CT, USA
| | - Jacky Snoep
- School of Computer Science, University of Manchester, Manchester, UK.,Biochemistry, Stellenbosch University, Stellenbosch, South Africa
| |
Collapse
|
16
|
Ding R, Qu Y, Wu CH, Vijay-Shanker K. Automatic gene annotation using GO terms from cellular component domain. BMC Med Inform Decis Mak 2018; 18:119. [PMID: 30526566 PMCID: PMC6284271 DOI: 10.1186/s12911-018-0694-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background The Gene Ontology (GO) is a resource that supplies information about gene product function using ontologies to represent biological knowledge. These ontologies cover three domains: Cellular Component (CC), Molecular Function (MF), and Biological Process (BP). GO annotation is a process which assigns gene functional information using GO terms to relevant genes in the literature. It is a common task among the Model Organism Database (MOD) groups. Manual GO annotation relies on human curators assigning gene functional information using GO terms by reading the biomedical literature. This process is very time-consuming and labor-intensive. As a result, many MODs can afford to curate only a fraction of relevant articles. Methods GO terms from the CC domain can be essentially divided into two sub-hierarchies: subcellular location terms, and protein complex terms. We cast the task of gene annotation using GO terms from the CC domain as relation extraction between gene and other entities: (1) extract cases where a protein is found to be in a subcellular location, and (2) extract cases where a protein is a subunit of a protein complex. For each relation extraction task, we use an approach based on triggers and syntactic dependencies to extract the desired relations among entities. Results We tested our approach on the BC4GO test set, a publicly available corpus for GO annotation. Our approach obtains a F1-score of 71%, a precision of 91% and a recall of 58% for predicting GO terms from CC Domain for given genes. Conclusions We have described a novel approach of treating gene annotation with GO terms from CC domain as two relation extraction subtasks. Evaluation results show that our approach achieves a F1-score of 71% for predicting GO terms for given genes. Thereby our approach can be used to accelerate the process of GO annotation for the bio-annotators.
Collapse
Affiliation(s)
- Ruoyao Ding
- School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
| | - Yingying Qu
- School of Business, Guangdong University of Foreign Studies, Guangzhou, China.
| | - Cathy H Wu
- Department of Computer and Information Science, University of Delaware, Newark, DE, 19716, USA
| | - K Vijay-Shanker
- Department of Computer and Information Science, University of Delaware, Newark, DE, 19716, USA
| |
Collapse
|
17
|
Ławrynowicz A, Potoniec J, Robaczyk M, Tudorache T. Discovery of Emerging Design Patterns in Ontologies Using Tree Mining. SEMANTIC WEB 2018; 9:517-544. [PMID: 30505251 PMCID: PMC6261490 DOI: 10.3233/sw-170280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The research goal of this work is to investigate modeling patterns that recur in ontologies. Such patterns may originate from certain design solutions, and they may possibly indicate emerging ontology design patterns. We describe our tree-mining method for identifying the emerging design patterns. The method works in two steps: (1) we transform the ontology axioms in a tree shape in order to find axiom patterns; and then, (2) we use association analysis to mine co-occuring axiom patterns in order to extract emerging design patterns. We conduct an experimental study on a set of 331 ontologies from the BioPortal repository. We show that recurring axiom patterns appear across all individual ontologies, as well as across the whole set. In individual ontologies, we find frequent and non-trivial patterns with and without variables. Some of the former patterns have more than 300,000 occurrences. The longest pattern without a variable discovered from the whole ontology set has size 12, and it appears in 14 ontologies. To the best of our knowledge, this is the first method for automatic discovery of emerging design patterns in ontologies. Finally, we demonstrate that we are able to automatically detect patterns, for which we have manually confirmed that they are fragments of ontology design patterns described in the literature. Since our method is not specific to particular ontologies, we conclude that we should be able to discover new, emerging design patterns for arbitrary ontology sets.
Collapse
Affiliation(s)
- Agnieszka Ławrynowicz
- Faculty of Computing, Poznan University of Technology, ul. Piotrowo 3, 60-965 Poznan, Poland
| | - Jedrzej Potoniec
- Faculty of Computing, Poznan University of Technology, ul. Piotrowo 3, 60-965 Poznan, Poland
| | - Michał Robaczyk
- Faculty of Computing, Poznan University of Technology, ul. Piotrowo 3, 60-965 Poznan, Poland
| | - Tania Tudorache
- Stanford Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road, Stanford, CA 94305, USA
| |
Collapse
|
18
|
ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data 2018; 5:180015. [PMID: 29485622 PMCID: PMC5827693 DOI: 10.1038/sdata.2018.15] [Citation(s) in RCA: 471] [Impact Index Per Article: 78.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 12/29/2017] [Indexed: 12/21/2022] Open
Abstract
Immunology researchers are beginning to explore the possibilities of reproducibility, reuse and secondary analyses of immunology data. Open-access datasets are being applied in the validation of the methods used in the original studies, leveraging studies for meta-analysis, or generating new hypotheses. To promote these goals, the ImmPort data repository was created for the broader research community to explore the wide spectrum of clinical and basic research data and associated findings. The ImmPort ecosystem consists of four components–Private Data, Shared Data, Data Analysis, and Resources—for data archiving, dissemination, analyses, and reuse. To date, more than 300 studies have been made freely available through the Shared Data portal (www.immport.org/immport-open), which allows research data to be repurposed to accelerate the translation of new insights into discoveries.
Collapse
|
19
|
Vita R, Overton JA, Peters B. Identification of errors in the IEDB using ontologies. Database (Oxford) 2018; 2018:4904119. [PMID: 29688357 PMCID: PMC5824775 DOI: 10.1093/database/bay005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/11/2017] [Accepted: 01/04/2018] [Indexed: 12/02/2022]
Abstract
The Immune Epitope Database (IEDB) is a free online resource that has manually curated over 18 500 references from the scientific literature. Our database presents experimental data relating to the recognition of immune epitopes by the adaptive immune system in a structured, searchable manner. In order to be consistent and accurate in our data representation across many different journals, authors and curators, we have implemented several quality control measures, such as curation rules, controlled vocabularies and links to external ontologies and other resources. Ontologies and other resources have greatly benefited the IEDB through improved search interfaces, easier curation practices, interoperability between the IEDB and other databases and the identification of errors within our dataset. Here, we will elaborate on how ontology mapping and usage can be used to find and correct errors in a manually curated database.Database URL: www.iedb.org.
Collapse
Affiliation(s)
- Randi Vita
- Center for Infectious Disease, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - James A Overton
- Center for Infectious Disease, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Bjoern Peters
- Center for Infectious Disease, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| |
Collapse
|
20
|
Liu Y, He M, Wang D, Diao L, Liu J, Tang L, Guo S, He F, Li D. HisgAtlas 1.0: a human immunosuppression gene database. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:4748971. [PMID: 31725860 PMCID: PMC7243927 DOI: 10.1093/database/bax094] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2017] [Revised: 11/02/2017] [Accepted: 11/21/2017] [Indexed: 01/06/2023]
Abstract
Immunosuppression is body's state in which the activation or efficacy of immune system is weakened. It is associated with a wide spectrum of human diseases. In the last two decades, tremendous efforts have been made to elucidate the mechanism of hundreds of immunosuppression genes. Immunosuppression genes could be valuable drug targets or biomarkers for the immunotherapeutic treatment of different diseases. However, the information of all previously identified immunosuppression genes is dispersed in thousands of publications. Here, we provide the HisgAtlas database that collects 995 previously identified human immunosuppression genes using text mining and manual curation. We believe HisgAtlas will be a valuable resource to search human immunosuppression genes as well as to investigate their functions in further research. Database URL: http://biokb.ncpsb.org/HisgAtlas/.
Collapse
Affiliation(s)
- Yuan Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Mengqi He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Dan Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Lihong Diao
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Jinying Liu
- School of Chinese Medicine, Beijing University of Chinese Medicine, Beijing 100029, China
| | - Li Tang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Shuzhen Guo
- School of Chinese Medicine, Beijing University of Chinese Medicine, Beijing 100029, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Dong Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| |
Collapse
|
21
|
Ives C, Campia I, Wang RL, Wittwehr C, Edwards S. Creating a Structured AOP Knowledgebase via Ontology-Based Annotations. ACTA ACUST UNITED AC 2017; 3:298-311. [PMID: 30057931 DOI: 10.1089/aivt.2017.0017] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Introduction The Adverse Outcome Pathway framework is increasingly used to integrate data generated based on traditional and emerging toxicity testing paradigms. As the number of AOP descriptions has increased, so has the need to define the AOP in computable terms. Materials and Methods Herein, we present a comprehensive annotation of 172 AOPs housed in the AOP-Wiki as of December 4, 2016 using terms from existing biological ontologies. Results AOP Key Events (KEs) were assigned ontology terms using a concept called the Event Component, which consists of a Process, an Object, and an Action term, with each term originating from ontologies and other controlled vocabularies. Annotation of KEs with ontology classes from fourteen ontologies and controlled vocabularies resulted in a total of 685 KEs being annotated with a total of 809 Event Components. A set of seven conventions resulted, defining the annotation of KEs via Event Components. Discussion This expanded annotation of AOPs allows computational reasoners to aid in both AOP development and applications. In addition, the incorporation of explicit biological objects will reduce the time required for converting a qualitative AOP description into a conceptual model that can support computational modeling. As high throughput genomics becomes a more important part of the high throughput toxicity testing landscape, the new approaches described here for annotating key events will also promote the visualization and analysis of genomics data in an AOP context.
Collapse
Affiliation(s)
- Cataia Ives
- Integrated Systems Toxicology Division, NHEERL, U.S. Environmental Protection Agency, RTP, NC, USA
| | - Ivana Campia
- European Commission's Joint Research Centre, NERL, U.S. Environmental Protection Agency, Cincinnati, OH, USA
| | - Rong-Lin Wang
- Exposure Methods and Measurements Division, NERL, U.S. Environmental Protection Agency, Cincinnati, OH, USA
| | - Clemens Wittwehr
- European Commission's Joint Research Centre, NERL, U.S. Environmental Protection Agency, Cincinnati, OH, USA
| | - Stephen Edwards
- Integrated Systems Toxicology Division, NHEERL, U.S. Environmental Protection Agency, RTP, NC, USA
| |
Collapse
|
22
|
Lin Y, Mehta S, Küçük-McGinty H, Turner JP, Vidovic D, Forlin M, Koleti A, Nguyen DT, Jensen LJ, Guha R, Mathias SL, Ursu O, Stathias V, Duan J, Nabizadeh N, Chung C, Mader C, Visser U, Yang JJ, Bologa CG, Oprea TI, Schürer SC. Drug target ontology to classify and integrate drug discovery data. J Biomed Semantics 2017; 8:50. [PMID: 29122012 PMCID: PMC5679337 DOI: 10.1186/s13326-017-0161-x] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 10/17/2017] [Indexed: 11/12/2022] Open
Abstract
Background One of the most successful approaches to develop new small molecule therapeutics has been to start from a validated druggable protein target. However, only a small subset of potentially druggable targets has attracted significant research and development resources. The Illuminating the Druggable Genome (IDG) project develops resources to catalyze the development of likely targetable, yet currently understudied prospective drug targets. A central component of the IDG program is a comprehensive knowledge resource of the druggable genome. Results As part of that effort, we have developed a framework to integrate, navigate, and analyze drug discovery data based on formalized and standardized classifications and annotations of druggable protein targets, the Drug Target Ontology (DTO). DTO was constructed by extensive curation and consolidation of various resources. DTO classifies the four major drug target protein families, GPCRs, kinases, ion channels and nuclear receptors, based on phylogenecity, function, target development level, disease association, tissue expression, chemical ligand and substrate characteristics, and target-family specific characteristics. The formal ontology was built using a new software tool to auto-generate most axioms from a database while supporting manual knowledge acquisition. A modular, hierarchical implementation facilitate ontology development and maintenance and makes use of various external ontologies, thus integrating the DTO into the ecosystem of biomedical ontologies. As a formal OWL-DL ontology, DTO contains asserted and inferred axioms. Modeling data from the Library of Integrated Network-based Cellular Signatures (LINCS) program illustrates the potential of DTO for contextual data integration and nuanced definition of important drug target characteristics. DTO has been implemented in the IDG user interface Portal, Pharos and the TIN-X explorer of protein target disease relationships. Conclusions DTO was built based on the need for a formal semantic model for druggable targets including various related information such as protein, gene, protein domain, protein structure, binding site, small molecule drug, mechanism of action, protein tissue localization, disease association, and many other types of information. DTO will further facilitate the otherwise challenging integration and formal linking to biological assays, phenotypes, disease models, drug poly-pharmacology, binding kinetics and many other processes, functions and qualities that are at the core of drug discovery. The first version of DTO is publically available via the website http://drugtargetontology.org/, Github (http://github.com/DrugTargetOntology/DTO), and the NCBO Bioportal (http://bioportal.bioontology.org/ontologies/DTO). The long-term goal of DTO is to provide such an integrative framework and to populate the ontology with this information as a community resource. Electronic supplementary material The online version of this article (10.1186/s13326-017-0161-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yu Lin
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Saurabh Mehta
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Applied Chemistry, Delhi Technological University, Delhi, India
| | - Hande Küçük-McGinty
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - John Paul Turner
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Dusica Vidovic
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Michele Forlin
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Amar Koleti
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Science, Rockville, MD, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Rajarshi Guha
- National Center for Advancing Translational Science, Rockville, MD, USA
| | - Stephen L Mathias
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Oleg Ursu
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Jianbin Duan
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Nooshin Nabizadeh
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Caty Chung
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Christopher Mader
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Jeremy J Yang
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Cristian G Bologa
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Tudor I Oprea
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA.
| | - Stephan C Schürer
- Center for Computational Science, University of Miami, Coral Gables, FL, USA. .,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA.
| |
Collapse
|
23
|
Sun D, Wang M, Li A. MPTM: A tool for mining protein post-translational modifications from literature. J Bioinform Comput Biol 2017; 15:1740005. [PMID: 28982288 DOI: 10.1142/s0219720017400054] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Due to the importance of post-translational modifications (PTMs) in human health and diseases, PTMs are regularly reported in the biomedical literature. However, the continuing and rapid pace of expansion of this literature brings a huge challenge for researchers and database curators. Therefore, there is a pressing need to aid them in identifying relevant PTM information more efficiently by using a text mining system. So far, only a few web servers are available for mining information of a very limited number of PTMs, which are based on simple pattern matching or pre-defined rules. In our work, in order to help researchers and database curators easily find and retrieve PTM information from available text, we have developed a text mining tool called MPTM, which extracts and organizes valuable knowledge about 11 common PTMs from abstracts in PubMed by using relations extracted from dependency parse trees and a heuristic algorithm. It is the first web server that provides literature mining service for hydroxylation, myristoylation and GPI-anchor. The tool is also used to find new publications on PTMs from PubMed and uncovers potential PTM information by large-scale text analysis. MPTM analyzes text sentences to identify protein names including substrates and protein-interacting enzymes, and automatically associates them with the UniProtKB protein entry. To facilitate further investigation, it also retrieves PTM-related information, such as human diseases, Gene Ontology terms and organisms from the input text and related databases. In addition, an online database (MPTMDB) with extracted PTM information and a local MPTM Lite package are provided on the MPTM website. MPTM is freely available online at http://bioinformatics.ustc.edu.cn/mptm/ and the source codes are hosted on GitHub: https://github.com/USTC-HILAB/MPTM .
Collapse
Affiliation(s)
- Dongdong Sun
- 1 School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P. R. China
| | - Minghui Wang
- 1 School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P. R. China
| | - Ao Li
- 1 School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P. R. China
| |
Collapse
|
24
|
Patwardhan A, Brandt R, Butcher SJ, Collinson L, Gault D, Grünewald K, Hecksel C, Huiskonen JT, Iudin A, Jones ML, Korir PK, Koster AJ, Lagerstedt I, Lawson CL, Mastronarde D, McCormick M, Parkinson H, Rosenthal PB, Saalfeld S, Saibil HR, Sarntivijai S, Solanes Valero I, Subramaniam S, Swedlow JR, Tudose I, Winn M, Kleywegt GJ. Building bridges between cellular and molecular structural biology. eLife 2017; 6:e25835. [PMID: 28682240 PMCID: PMC5524535 DOI: 10.7554/elife.25835] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 06/30/2017] [Indexed: 11/13/2022] Open
Abstract
The integration of cellular and molecular structural data is key to understanding the function of macromolecular assemblies and complexes in their in vivo context. Here we report on the outcomes of a workshop that discussed how to integrate structural data from a range of public archives. The workshop identified two main priorities: the development of tools and file formats to support segmentation (that is, the decomposition of a three-dimensional volume into regions that can be associated with defined objects), and the development of tools to support the annotation of biological structures.
Collapse
Affiliation(s)
- Ardan Patwardhan
- Cellular Structure and 3D Bioimaging, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | | | - Sarah J Butcher
- Institute of Biotechnology and the Department of Biosciences, University of Helsinki, Helsinki, Finland
| | - Lucy Collinson
- Electron Microscopy Science Technology Platform, Francis Crick Institute, London, United Kingdom
| | - David Gault
- Centre for Gene Regulation and Expression, University of Dundee, Dundee, United Kingdom
| | - Kay Grünewald
- Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Corey Hecksel
- National Center for Macromolecular Imaging, Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, United States
| | - Juha T Huiskonen
- Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Andrii Iudin
- Cellular Structure and 3D Bioimaging, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Martin L Jones
- Electron Microscopy Science Technology Platform, Francis Crick Institute, London, United Kingdom
| | - Paul K Korir
- Cellular Structure and 3D Bioimaging, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Abraham J Koster
- Department of Molecular Cell Biology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ingvar Lagerstedt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Catherine L Lawson
- Center for Integrative Proteomics Research and the Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Piscataway, United States
| | - David Mastronarde
- Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, United States
| | | | - Helen Parkinson
- Molecular Archival Resources, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Peter B Rosenthal
- Structural Biology of Cells and Viruses, Francis Crick Institute, London, United Kingdom
| | - Stephan Saalfeld
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, United States
| | - Helen R Saibil
- Institute of Structural and Molecular Biology, Department of Crystallography, Birkbeck College, London, United Kingdom
| | - Sirarat Sarntivijai
- Molecular Archival Resources, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Irene Solanes Valero
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Sriram Subramaniam
- Laboratory for Cell Biology, Center for Cancer Research, National Cancer Institute, Bethesda, United States
| | - Jason R Swedlow
- Centre for Gene Regulation and Expression and the Division of Computational Biology, University of Dundee, Dundee, United Kingdom
| | - Ilinca Tudose
- Molecular Archival Resources, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Martyn Winn
- Scientific Computing Department, Science and Technology Facilities Council, Research Complex at Harwell, Didcot, United Kingdom
| | - Gerard J Kleywegt
- Molecular and Cellular Structure Cluster, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| |
Collapse
|
25
|
Santana da Silva F, Jansen L, Freitas F, Schulz S. Ontological interpretation of biomedical database content. J Biomed Semantics 2017. [PMID: 28651575 PMCID: PMC5485580 DOI: 10.1186/s13326-017-0127-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Background Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. The exact meaning of such annotations in the context of a database record is often ambiguous. We address this problem by grounding implicit and explicit database content in a formal-ontological framework. Methods By using a typical extract from the databases UniProt and Ensembl, annotated with content from GO, PR, ChEBI and NCBI Taxonomy, we created four ontological models (in OWL), which generate explicit, distinct interpretations under the BioTopLite2 (BTL2) upper-level ontology. The first three models interpret database entries as individuals (IND), defined classes (SUBC), and classes with dispositions (DISP), respectively; the fourth model (HYBR) is a combination of SUBC and DISP. For the evaluation of these four models, we consider (i) database content retrieval, using ontologies as query vocabulary; (ii) information completeness; and, (iii) DL complexity and decidability. The models were tested under these criteria against four competency questions (CQs). Results IND does not raise any ontological claim, besides asserting the existence of sample individuals and relations among them. Modelling patterns have to be created for each type of annotation referent. SUBC is interpreted regarding maximally fine-grained defined subclasses under the classes referred to by the data. DISP attempts to extract truly ontological statements from the database records, claiming the existence of dispositions. HYBR is a hybrid of SUBC and DISP and is more parsimonious regarding expressiveness and query answering complexity. For each of the four models, the four CQs were submitted as DL queries. This shows the ability to retrieve individuals with IND, and classes in SUBC and HYBR. DISP does not retrieve anything because the axioms with disposition are embedded in General Class Inclusion (GCI) statements. Conclusion Ambiguity of biological database content is addressed by a method that identifies implicit knowledge behind semantic annotations in biological databases and grounds it in an expressive upper-level ontology. The result is a seamless representation of database structure, content and annotations as OWL models. Electronic supplementary material The online version of this article (doi:10.1186/s13326-017-0127-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Filipe Santana da Silva
- Centro de Informática, Universidade Federal de Pernambuco, Av. Jornalista Anibal Fernandes, 50.740-560, Recife, Brazil.,Núcleo de Telessaúde, Universidade Federal de Pernambuco, Av. Prof. Moraes Rego, 50670-420, Recife, Brazil
| | - Ludger Jansen
- Institut für Philosophie, Universität Rostock, D-18051, Rostock, Germany
| | - Fred Freitas
- Centro de Informática, Universidade Federal de Pernambuco, Av. Jornalista Anibal Fernandes, 50.740-560, Recife, Brazil
| | - Stefan Schulz
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerplatz 2/V, Graz, 8036, Austria.
| |
Collapse
|
26
|
Zaman S, Sarntivijai S, Abernethy DR. Use of Biomedical Ontologies for Integration of Biological Knowledge for Learning and Prediction of Adverse Drug Reactions. GENE REGULATION AND SYSTEMS BIOLOGY 2017; 11:1177625017696075. [PMID: 28469412 PMCID: PMC5398297 DOI: 10.1177/1177625017696075] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Accepted: 02/04/2017] [Indexed: 12/26/2022]
Abstract
Drug-induced toxicity is a major public health concern that leads to patient morbidity and mortality. To address this problem, the Food and Drug Administration is working on the PredicTox initiative, a pilot research program on tyrosine kinase inhibitors, to build mechanistic and predictive models for drug-induced toxicity. This program involves integrating data acquired during preclinical studies and clinical trials within pharmaceutical company development programs that they have agreed to put in the public domain and in publicly available biological, pharmacological, and chemical databases. The integration process is accommodated by biomedical ontologies, a set of standardized vocabularies that define terms and logical relationships between them in each vocabulary. We describe a few programs that have used ontologies to address biomedical questions. The PredicTox effort is leveraging the experience gathered from these early initiatives to develop an infrastructure that allows evaluation of the hypothesis that having a mechanistic understanding underlying adverse drug reactions will improve the capacity to understand drug-induced clinical adverse drug reactions.
Collapse
Affiliation(s)
- Shadia Zaman
- Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Sirarat Sarntivijai
- European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
| | - Darrell R Abernethy
- Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| |
Collapse
|
27
|
Hogan WR, Hanna J, Hicks A, Amirova S, Bramblett B, Diller M, Enderez R, Modzelewski T, Vasconcelos M, Delcher C. Therapeutic indications and other use-case-driven updates in the drug ontology: anti-malarials, anti-hypertensives, opioid analgesics, and a large term request. J Biomed Semantics 2017; 8:10. [PMID: 28253937 PMCID: PMC5335794 DOI: 10.1186/s13326-017-0121-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 02/24/2017] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND The Drug Ontology (DrOn) is an OWL2-based representation of drug products and their ingredients, mechanisms of action, strengths, and dose forms. We originally created DrOn for use cases in comparative effectiveness research, primarily to identify historically complete sets of United States National Drug Codes (NDCs) that represent packaged drug products, by the ingredient(s), mechanism(s) of action, and so on contained in those products. Although we had designed DrOn from the outset to carefully distinguish those entities that have a therapeutic indication from those entities that have a molecular mechanism of action, we had not previously represented in DrOn any particular therapeutic indication. RESULTS In this work, we add therapeutic indications for three research use cases: resistant hypertension, malaria, and opioid abuse research. We also added mechanisms of action for opioid analgesics and added 108 classes representing drug products in response to a large term request from the Program for Resistance, Immunology, Surveillance and Modeling of Malaria in Uganda (PRISM) project. The net result is a new version of DrOn, current to May 2016, that represents three major therapeutic classes of drugs and six new mechanisms of action. CONCLUSIONS A therapeutic indication of a drug product is represented as a therapeutic function in DrOn. Adverse effects of drug products, as well as other therapeutic uses for which the drug product was not designed are dispositions. Our work provides a framework for representing additional therapeutic indications, adverse effects, and uses of drug products beyond their design. Our work also validated our past modeling decisions for specific types of mechanisms of action, namely effects mediated via receptor and/or enzyme binding. DrOn is available at: http://purl.obolibrary.org/obo/dron.owl . A smaller version without NDCs is available at: http://purl.obolibrary.org/obo/dron/dron-lite.owl.
Collapse
Affiliation(s)
- William R. Hogan
- Department of Health Outcomes and Policy, University of Florida, Clinical and Translational Research Building, 2004 Mowry Road, P.O. Box 100219, Gainesville, FL 32610 USA
| | - Josh Hanna
- Department of Health Outcomes and Policy, University of Florida, Clinical and Translational Research Building, 2004 Mowry Road, P.O. Box 100219, Gainesville, FL 32610 USA
| | - Amanda Hicks
- Department of Health Outcomes and Policy, University of Florida, Clinical and Translational Research Building, 2004 Mowry Road, P.O. Box 100219, Gainesville, FL 32610 USA
| | - Samira Amirova
- Department of Health Outcomes and Policy, University of Florida, Clinical and Translational Research Building, 2004 Mowry Road, P.O. Box 100219, Gainesville, FL 32610 USA
| | - Baxter Bramblett
- Department of Health Outcomes and Policy, University of Florida, Clinical and Translational Research Building, 2004 Mowry Road, P.O. Box 100219, Gainesville, FL 32610 USA
| | - Matthew Diller
- Department of Health Outcomes and Policy, University of Florida, Clinical and Translational Research Building, 2004 Mowry Road, P.O. Box 100219, Gainesville, FL 32610 USA
| | - Rodel Enderez
- Department of Health Outcomes and Policy, University of Florida, Clinical and Translational Research Building, 2004 Mowry Road, P.O. Box 100219, Gainesville, FL 32610 USA
| | - Timothy Modzelewski
- Department of Health Outcomes and Policy, University of Florida, Clinical and Translational Research Building, 2004 Mowry Road, P.O. Box 100219, Gainesville, FL 32610 USA
| | - Mirela Vasconcelos
- Department of Health Outcomes and Policy, University of Florida, Clinical and Translational Research Building, 2004 Mowry Road, P.O. Box 100219, Gainesville, FL 32610 USA
| | - Chris Delcher
- Department of Health Outcomes and Policy, University of Florida, Clinical and Translational Research Building, 2004 Mowry Road, P.O. Box 100219, Gainesville, FL 32610 USA
| |
Collapse
|
28
|
Natale DA, Arighi CN, Blake JA, Bona J, Chen C, Chen SC, Christie KR, Cowart J, D'Eustachio P, Diehl AD, Drabkin HJ, Duncan WD, Huang H, Ren J, Ross K, Ruttenberg A, Shamovsky V, Smith B, Wang Q, Zhang J, El-Sayed A, Wu CH. Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 2017; 45:D339-D346. [PMID: 27899649 PMCID: PMC5210558 DOI: 10.1093/nar/gkw1075] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 10/21/2016] [Accepted: 10/25/2016] [Indexed: 12/04/2022] Open
Abstract
The Protein Ontology (PRO; http://purl.obolibrary.org/obo/pr) formally defines and describes taxon-specific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and protein-containing complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translational modification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities.
Collapse
Affiliation(s)
- Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cecilia N Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | | | - Jonathan Bona
- Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14214, USA
| | - Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Sheng-Chih Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | | | - Julie Cowart
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Peter D'Eustachio
- Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA
| | - Alexander D Diehl
- Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14203, USA
- New York State Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, NY 14203, USA
| | | | - William D Duncan
- Roswell Park Cancer Institute, Buffalo, NY 14203, USA
- New York State Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, NY 14203, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Jia Ren
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Karen Ross
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Alan Ruttenberg
- Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14214, USA
| | - Veronica Shamovsky
- Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA
| | - Barry Smith
- National Center for Ontological Research, University at Buffalo, Buffalo, NY 14214, USA
| | - Qinghua Wang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Jian Zhang
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Abdelrahman El-Sayed
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cathy H Wu
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| |
Collapse
|
29
|
Wang Q, Ross KE, Huang H, Ren J, Li G, Vijay-Shanker K, Wu CH, Arighi CN. Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature. Methods Mol Biol 2017; 1558:213-232. [PMID: 28150240 PMCID: PMC5446092 DOI: 10.1007/978-1-4939-6783-4_10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
Abstract
Post-translational modifications (PTMs) are one of the main contributors to the diversity of proteoforms in the proteomic landscape. In particular, protein phosphorylation represents an essential regulatory mechanism that plays a role in many biological processes. Protein kinases, the enzymes catalyzing this reaction, are key participants in metabolic and signaling pathways. Their activation or inactivation dictate downstream events: what substrates are modified and their subsequent impact (e.g., activation state, localization, protein-protein interactions (PPIs)). The biomedical literature continues to be the main source of evidence for experimental information about protein phosphorylation. Automatic methods to bring together phosphorylation events and phosphorylation-dependent PPIs can help to summarize the current knowledge and to expose hidden connections. In this chapter, we demonstrate two text mining tools, RLIMS-P and eFIP, for the retrieval and extraction of kinase-substrate-site data and phosphorylation-dependent PPIs from the literature. These tools offer several advantages over a literature search in PubMed as their results are specific for phosphorylation. RLIMS-P and eFIP results can be sorted, organized, and viewed in multiple ways to answer relevant biological questions, and the protein mentions are linked to UniProt identifiers.
Collapse
Affiliation(s)
- Qinghua Wang
- Center for Bioinformatics and Computational Biology, Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE, 19711, USA
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | - Karen E Ross
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, 20057, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE, 19711, USA
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | - Jia Ren
- Center for Bioinformatics and Computational Biology, Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE, 19711, USA
| | - Gang Li
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | - K Vijay-Shanker
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE, 19711, USA
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, 20057, USA
| | - Cecilia N Arighi
- Center for Bioinformatics and Computational Biology, Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE, 19711, USA.
- Department of Computer & Information Sciences, University of Delaware, Newark, DE, 19711, USA.
| |
Collapse
|
30
|
Abstract
The Protein Ontology (PRO) is the reference ontology for proteins in the Open Biomedical Ontologies (OBO) foundry and consists of three sub-ontologies representing protein classes of homologous genes, proteoforms (e.g., splice isoforms, sequence variants, and post-translationally modified forms), and protein complexes. PRO defines classes of proteins and protein complexes, both species-specific and species nonspecific, and indicates their relationships in a hierarchical framework, supporting accurate protein annotation at the appropriate level of granularity, analyses of protein conservation across species, and semantic reasoning. In the first section of this chapter, we describe the PRO framework including categories of PRO terms and the relationship of PRO to other ontologies and protein resources. Next, we provide a tutorial about the PRO website ( proconsortium.org ) where users can browse and search the PRO hierarchy, view reports on individual PRO terms, and visualize relationships among PRO terms in a hierarchical table view, a multiple sequence alignment view, and a Cytoscape network view. Finally, we describe several examples illustrating the unique and rich information available in PRO.
Collapse
|
31
|
Abstract
Many publicly available data repositories and resources have been developed to support protein-related information management, data-driven hypothesis generation, and biological knowledge discovery. To help researchers quickly find the appropriate protein-related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era.
Collapse
Affiliation(s)
- Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA.
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
- Protein Information Resource, Department of Biochemistry and Molecular and Cellular Biology, Georgetown University Medical Center, Washington, DC, 20007, USA
| |
Collapse
|
32
|
Abstract
Protein post-translational modification (PTM) is an essential cellular regulatory mechanism, and disruptions in PTM have been implicated in disease. PTMs are an active area of study in many fields, leading to a wealth of PTM information in the scientific literature. There is a need for user-friendly bioinformatics resources that capture PTM information from the literature and support analyses of PTMs and their functional consequences. This chapter describes the use of iPTMnet ( http://proteininformationresource.org/iPTMnet/ ), a resource that integrates PTM information from text mining, curated databases, and ontologies and provides visualization tools for exploring PTM networks, PTM crosstalk, and PTM conservation across species. We present several PTM-related queries and demonstrate how they can be addressed using iPTMnet.
Collapse
|
33
|
Wang D, Yang L, Zhang P, LaBaer J, Hermjakob H, Li D, Yu X. AAgAtlas 1.0: a human autoantigen database. Nucleic Acids Res 2016; 45:D769-D776. [PMID: 27924021 PMCID: PMC5210642 DOI: 10.1093/nar/gkw946] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2016] [Revised: 09/22/2016] [Accepted: 10/11/2016] [Indexed: 12/25/2022] Open
Abstract
Autoantibodies refer to antibodies that target self-antigens, which can play pivotal roles in maintaining homeostasis, distinguishing normal from tumor tissue and trigger autoimmune diseases. In the last three decades, tremendous efforts have been devoted to elucidate the generation, evolution and functions of autoantibodies, as well as their target autoantigens. However, reports of these countless previously identified autoantigens are randomly dispersed in the literature. Here, we constructed an AAgAtlas database 1.0 using text-mining and manual curation. We extracted 45 830 autoantigen-related abstracts and 94 313 sentences from PubMed using the keywords of either ‘autoantigen’ or ‘autoantibody’ or their lexical variants, which were further refined to 25 520 abstracts, 43 253 sentences and 3984 candidates by our bio-entity recognizer based on the Protein Ontology. Finally, we identified 1126 genes as human autoantigens and 1071 related human diseases, with which we constructed a human autoantigen database (AAgAtlas database 1.0). The database provides a user-friendly interface to conveniently browse, retrieve and download human autoantigens as well as their associated diseases. The database is freely accessible at http://biokb.ncpsb.org/aagatlas/. We believe this database will be a valuable resource to track and understand human autoantigens as well as to investigate their functions in basic and translational research.
Collapse
Affiliation(s)
- Dan Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China
| | - Liuhui Yang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China
| | - Ping Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China
| | - Joshua LaBaer
- The Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| | - Henning Hermjakob
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China .,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dong Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China
| | - Xiaobo Yu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China
| |
Collapse
|
34
|
Zheng J, Harris MR, Masci AM, Lin Y, Hero A, Smith B, He Y. The Ontology of Biological and Clinical Statistics (OBCS) for standardized and reproducible statistical analysis. J Biomed Semantics 2016; 7:53. [PMID: 27627881 PMCID: PMC5024438 DOI: 10.1186/s13326-016-0100-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 09/06/2016] [Indexed: 11/13/2022] Open
Abstract
Background Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. Results The terms in OBCS including ‘data collection’, ‘data transformation in statistics’, ‘data visualization’, ‘statistical data analysis’, and ‘drawing a conclusion based on data’, cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 research communities. Currently, OBCS comprehends 878 terms, representing 20 BFO classes, 403 OBI classes, 229 OBCS specific classes, and 122 classes imported from ten other OBO ontologies. We discuss two examples illustrating how the ontology is being applied. In the first (biological) use case, we describe how OBCS was applied to represent the high throughput microarray data analysis of immunological transcriptional profiles in human subjects vaccinated with an influenza vaccine. In the second (clinical outcomes) use case, we applied OBCS to represent the processing of electronic health care data to determine the associations between hospital staffing levels and patient mortality. Our case studies were designed to show how OBCS can be used for the consistent representation of statistical analysis pipelines under two different research paradigms. Other ongoing projects using OBCS for statistical data processing are also discussed. The OBCS source code and documentation are available at: https://github.com/obcs/obcs. Conclusions The Ontology of Biological and Clinical Statistics (OBCS) is a community-based open source ontology in the domain of biological and clinical statistics. OBCS is a timely ontology that represents statistics-related terms and their relations in a rigorous fashion, facilitates standard data analysis and integration, and supports reproducible biological and clinical research.
Collapse
Affiliation(s)
- Jie Zheng
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA.
| | - Marcelline R Harris
- Division of Systems Leadership and Effectiveness Science, University of Michigan School of Nursing, Ann Arbor, MI, 48109, USA
| | - Anna Maria Masci
- Department of Biostatistics and Bioinformatics, Duke Medical Center, Duke University, Durham, NC, 27710, USA
| | - Yu Lin
- Department of Microbiology and Immunology, Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Alfred Hero
- Department of Electrical Engineering and Computer Science, Department of Biomedical Engineering, and Department of Statistics, Michigan Institute of Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Barry Smith
- Department of Philosophy and National Center for Ontological Research, University at Buffalo, Buffalo, NY, 14203, USA
| | - Yongqun He
- Department of Microbiology and Immunology, Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
35
|
Ross KE, Natale DA, Arighi C, Chen SC, Huang H, Li G, Ren J, Wang M, Vijay-Shanker K, Wu CH. Scalable Text Mining Assisted Curation of Post-Translationally Modified Proteoforms in the Protein Ontology. CEUR WORKSHOP PROCEEDINGS 2016; 1747:http://ceur-ws.org/Vol-1747/BIT103_ICBO2016.pdf. [PMID: 28706471 PMCID: PMC5504912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The Protein Ontology (PRO) defines protein classes and their interrelationships from the family to the protein form (proteoform) level within and across species. One of the unique contributions of PRO is its representation of post-translationally modified (PTM) proteoforms. However, progress in adding PTM proteoform classes to PRO has been relatively slow due to the extensive manual curation effort required. Here we report an automated pipeline for creation of PTM proteoform classes that leverages two phosphorylation-focused text mining tools (RLIMS-P, which detects mentions of kinases, substrates, and phosphorylation sites, and eFIP, which detects phosphorylation-dependent protein-protein interactions (PPIs)) and our integrated PTM database, iPTMnet. By applying this pipeline, we obtained a set of ~820 substrate-site pairs that are suitable for automated PRO term generation with literature-based evidence attribution. Inclusion of these terms in PRO will increase PRO coverage of species-specific PTM proteoforms by 50%. Many of these new proteoforms also have associated kinase and/or PPI information. Finally, we show a phosphorylation network for the human and mouse peptidyl-prolyl cis-trans isomerase (PIN1/Pin1) derived from our dataset that demonstrates the biological complexity of the information we have extracted. Our approach addresses scalability in PRO curation and will be further expanded to advance PRO representation of phosphorylated proteoforms.
Collapse
Affiliation(s)
- Karen E Ross
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
| | - Cecilia Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Sheng-Chih Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Gang Li
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Jia Ren
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Michael Wang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - K Vijay-Shanker
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| |
Collapse
|
36
|
Bai T, Gong L, Wang Y, Wang Y, Kulikowski CA, Huang L. A method for exploring implicit concept relatedness in biomedical knowledge network. BMC Bioinformatics 2016; 17 Suppl 9:265. [PMID: 27454167 PMCID: PMC4959351 DOI: 10.1186/s12859-016-1131-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biomedical information and knowledge, structural and non-structural, stored in different repositories can be semantically connected to form a hybrid knowledge network. How to compute relatedness between concepts and discover valuable but implicit information or knowledge from it effectively and efficiently is of paramount importance for precision medicine, and a major challenge facing the biomedical research community. RESULTS In this study, a hybrid biomedical knowledge network is constructed by linking concepts across multiple biomedical ontologies as well as non-structural biomedical knowledge sources. To discover implicit relatedness between concepts in ontologies for which potentially valuable relationships (implicit knowledge) may exist, we developed a Multi-Ontology Relatedness Model (MORM) within the knowledge network, for which a relatedness network (RN) is defined and computed across multiple ontologies using a formal inference mechanism of set-theoretic operations. Semantic constraints are designed and implemented to prune the search space of the relatedness network. CONCLUSIONS Experiments to test examples of several biomedical applications have been carried out, and the evaluation of the results showed an encouraging potential of the proposed approach to biomedical knowledge discovery.
Collapse
Affiliation(s)
- Tian Bai
- College of Computer Science and Technology, Jilin Univesity, 2699 Qianjin St, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 2699 Qianjin St, Changchun, China
| | - Leiguang Gong
- College of Computer Science and Technology, Jilin Univesity, 2699 Qianjin St, Changchun, China
- Yantai Intelligent Information Technologies Ltd., 2699 Qianjin St, Yantai, China
| | - Ye Wang
- College of Computer Science and Technology, Jilin Univesity, 2699 Qianjin St, Changchun, China
| | - Yan Wang
- College of Computer Science and Technology, Jilin Univesity, 2699 Qianjin St, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 2699 Qianjin St, Changchun, China
| | - Casimir A. Kulikowski
- Department of Computer Science, Rutgers, The State University of New Jersey, 2699 Qianjin St, Piscataway, NJ USA
| | - Lan Huang
- College of Computer Science and Technology, Jilin Univesity, 2699 Qianjin St, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 2699 Qianjin St, Changchun, China
| |
Collapse
|
37
|
Diehl AD, Meehan TF, Bradford YM, Brush MH, Dahdul WM, Dougall DS, He Y, Osumi-Sutherland D, Ruttenberg A, Sarntivijai S, Van Slyke CE, Vasilevsky NA, Haendel MA, Blake JA, Mungall CJ. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability. J Biomed Semantics 2016; 7:44. [PMID: 27377652 PMCID: PMC4932724 DOI: 10.1186/s13326-016-0088-7] [Citation(s) in RCA: 145] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 06/23/2016] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND The Cell Ontology (CL) is an OBO Foundry candidate ontology covering the domain of canonical, natural biological cell types. Since its inception in 2005, the CL has undergone multiple rounds of revision and expansion, most notably in its representation of hematopoietic cells. For in vivo cells, the CL focuses on vertebrates but provides general classes that can be used for other metazoans, which can be subtyped in species-specific ontologies. CONSTRUCTION AND CONTENT Recent work on the CL has focused on extending the representation of various cell types, and developing new modules in the CL itself, and in related ontologies in coordination with the CL. For example, the Kidney and Urinary Pathway Ontology was used as a template to populate the CL with additional cell types. In addition, subtypes of the class 'cell in vitro' have received improved definitions and labels to provide for modularity with the representation of cells in the Cell Line Ontology and Reagent Ontology. Recent changes in the ontology development methodology for CL include a switch from OBO to OWL for the primary encoding of the ontology, and an increasing reliance on logical definitions for improved reasoning. UTILITY AND DISCUSSION The CL is now mandated as a metadata standard for large functional genomics and transcriptomics projects, and is used extensively for annotation, querying, and analyses of cell type specific data in sequencing consortia such as FANTOM5 and ENCODE, as well as for the NIAID ImmPort database and the Cell Image Library. The CL is also a vital component used in the modular construction of other biomedical ontologies-for example, the Gene Ontology and the cross-species anatomy ontology, Uberon, use CL to support the consistent representation of cell types across different levels of anatomical granularity, such as tissues and organs. CONCLUSIONS The ongoing improvements to the CL make it a valuable resource to both the OBO Foundry community and the wider scientific community, and we continue to experience increased interest in the CL both among developers and within the user community.
Collapse
Affiliation(s)
- Alexander D. Diehl
- />Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203 USA
| | - Terrence F. Meehan
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Yvonne M. Bradford
- />ZFIN, the Zebrafish Model Organism Database, 5291 University of Oregon, Eugene, OR 97403 USA
| | - Matthew H. Brush
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | - Wasila M. Dahdul
- />Department of Biology, University of South Dakota, Vermillion, SD 57069 USA
- />National Evolutionary Synthesis Center, Durham, NC 27705 USA
| | - David S. Dougall
- />Southwestern Medical Center, University of Texas, Dallas, TX 75235 USA
| | - Yongqun He
- />Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109 USA
| | - David Osumi-Sutherland
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Alan Ruttenberg
- />Oral Diagnostics Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14210 USA
| | - Sirarat Sarntivijai
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Ceri E. Van Slyke
- />ZFIN, the Zebrafish Model Organism Database, 5291 University of Oregon, Eugene, OR 97403 USA
| | - Nicole A. Vasilevsky
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | - Melissa A. Haendel
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | | | | |
Collapse
|
38
|
Nickerson D, Atalag K, de Bono B, Geiger J, Goble C, Hollmann S, Lonien J, Müller W, Regierer B, Stanford NJ, Golebiewski M, Hunter P. The Human Physiome: how standards, software and innovative service infrastructures are providing the building blocks to make it achievable. Interface Focus 2016; 6:20150103. [PMID: 27051515 PMCID: PMC4759754 DOI: 10.1098/rsfs.2015.0103] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Reconstructing and understanding the Human Physiome virtually is a complex mathematical problem, and a highly demanding computational challenge. Mathematical models spanning from the molecular level through to whole populations of individuals must be integrated, then personalized. This requires interoperability with multiple disparate and geographically separated data sources, and myriad computational software tools. Extracting and producing knowledge from such sources, even when the databases and software are readily available, is a challenging task. Despite the difficulties, researchers must frequently perform these tasks so that available knowledge can be continually integrated into the common framework required to realize the Human Physiome. Software and infrastructures that support the communities that generate these, together with their underlying standards to format, describe and interlink the corresponding data and computer models, are pivotal to the Human Physiome being realized. They provide the foundations for integrating, exchanging and re-using data and models efficiently, and correctly, while also supporting the dissemination of growing knowledge in these forms. In this paper, we explore the standards, software tooling, repositories and infrastructures that support this work, and detail what makes them vital to realizing the Human Physiome.
Collapse
Affiliation(s)
- David Nickerson
- Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
| | - Koray Atalag
- Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
- National Institute for Health Innovation (NIHI), The University of Auckland, Auckland, New Zealand
| | - Bernard de Bono
- Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
- Institute of Health Informatics, University College London, London NW1 2DA, UK
| | - Jörg Geiger
- Interdisciplinary Bank of Biomaterials and Data, University Hospital Würzburg, Würzburg, Germany
| | - Carole Goble
- School of Computer Science, University of Manchester, Manchester, UK
| | - Susanne Hollmann
- Research Center Plant Genomics and Systems Biology, Universitat Potsdam, Potsdam, Germany
| | | | - Wolfgang Müller
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Heidelberg, Germany
| | | | | | - Martin Golebiewski
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Heidelberg, Germany
| | - Peter Hunter
- Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
39
|
Fang Y. Compound annotation with real time cellular activity profiles to improve drug discovery. Expert Opin Drug Discov 2016; 11:269-80. [PMID: 26787137 DOI: 10.1517/17460441.2016.1143460] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
INTRODUCTION In the past decade, a range of innovative strategies have been developed to improve the productivity of pharmaceutical research and development. In particular, compound annotation, combined with informatics, has provided unprecedented opportunities for drug discovery. AREAS COVERED In this review, a literature search from 2000 to 2015 was conducted to provide an overview of the compound annotation approaches currently used in drug discovery. Based on this, a framework related to a compound annotation approach using real-time cellular activity profiles for probe, drug, and biology discovery is proposed. EXPERT OPINION Compound annotation with chemical structure, drug-like properties, bioactivities, genome-wide effects, clinical phenotypes, and textural abstracts has received significant attention in early drug discovery. However, these annotations are mostly associated with endpoint results. Advances in assay techniques have made it possible to obtain real-time cellular activity profiles of drug molecules under different phenotypes, so it is possible to generate compound annotation with real-time cellular activity profiles. Combining compound annotation with informatics, such as similarity analysis, presents a good opportunity to improve the rate of discovery of novel drugs and probes, and enhance our understanding of the underlying biology.
Collapse
Affiliation(s)
- Ye Fang
- a Biochemical Technologies, Science and Technology Division , Corning Incorporated , Corning , NY , USA
| |
Collapse
|
40
|
Semantics-Based Composition of Integrated Cardiomyocyte Models Motivated by Real-World Use Cases. PLoS One 2015; 10:e0145621. [PMID: 26716837 PMCID: PMC4696653 DOI: 10.1371/journal.pone.0145621] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 11/06/2015] [Indexed: 11/19/2022] Open
Abstract
Semantics-based model composition is an approach for generating complex biosimulation models from existing components that relies on capturing the biological meaning of model elements in a machine-readable fashion. This approach allows the user to work at the biological rather than computational level of abstraction and helps minimize the amount of manual effort required for model composition. To support this compositional approach, we have developed the SemGen software, and here report on SemGen's semantics-based merging capabilities using real-world modeling use cases. We successfully reproduced a large, manually-encoded, multi-model merge: the "Pandit-Hinch-Niederer" (PHN) cardiomyocyte excitation-contraction model, previously developed using CellML. We describe our approach for annotating the three component models used in the PHN composition and for merging them at the biological level of abstraction within SemGen. We demonstrate that we were able to reproduce the original PHN model results in a semi-automated, semantics-based fashion and also rapidly generate a second, novel cardiomyocyte model composed using an alternative, independently-developed tension generation component. We discuss the time-saving features of our compositional approach in the context of these merging exercises, the limitations we encountered, and potential solutions for enhancing the approach.
Collapse
|
41
|
Ceusters W, Nasri-Heir C, Alnaas D, Cairns BE, Michelotti A, Ohrbach R. Perspectives on next steps in classification of oro-facial pain - Part 3: biomarkers of chronic oro-facial pain - from research to clinic. J Oral Rehabil 2015; 42:956-66. [PMID: 26200973 PMCID: PMC4715524 DOI: 10.1111/joor.12324] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/31/2015] [Indexed: 11/28/2022]
Abstract
The purpose of this study was to review the current status of biomarkers used in oro-facial pain conditions. Specifically, we critically appraise their relative strengths and weaknesses for assessing mechanisms associated with the oro-facial pain conditions and interpret that information in the light of their current value for use in diagnosis. In the third section, we explore biomarkers through the perspective of ontological realism. We discuss ontological problems of biomarkers as currently widely conceptualised and implemented. This leads to recommendations for research practice aimed to a better understanding of the potential contribution that biomarkers might make to oro-facial pain diagnosis and thereby fulfil our goal for an expanded multidimensional framework for oro-facial pain conditions that would include a third axis.
Collapse
Affiliation(s)
- Werner Ceusters
- Department of Biomedical Informatics, University at Buffalo, NY, USA
| | | | | | - Brian E Cairns
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, Canada
| | - Ambra Michelotti
- Section of Orthodontics, School of Dentistry, University of Naples Federico II, Naples, Italy
| | - Richard Ohrbach
- Department of Oral Diagnostic Sciences, University at Buffalo, NY, USA
| |
Collapse
|
42
|
Fert-Bober J, Giles JT, Holewinski RJ, Kirk JA, Uhrigshardt H, Crowgey EL, Andrade F, Bingham CO, Park JK, Halushka MK, Kass DA, Bathon JM, Van Eyk JE. Citrullination of myofilament proteins in heart failure. Cardiovasc Res 2015; 108:232-42. [PMID: 26113265 PMCID: PMC4614685 DOI: 10.1093/cvr/cvv185] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Revised: 06/12/2015] [Accepted: 06/17/2015] [Indexed: 11/12/2022] Open
Abstract
AIMS Citrullination, the post-translational conversion of arginine to citrulline by the enzyme family of peptidylarginine deiminases (PADs), is associated with several diseases, and specific citrullinated proteins have been shown to alter function while others act as auto-antigens. In this study, we identified citrullinated proteins in human myocardial samples, from healthy and heart failure patients, and determined several potential functional consequences. Further we investigated PAD isoform cell-specific expression in the heart. METHODS AND RESULTS A citrullination-targeted proteomic strategy using data-independent (SWATH) acquisition method was used to identify the modified cardiac proteins. Citrullinated-induced sarcomeric proteins were validated using two-dimensional gel electrophoresis and investigated using biochemical and functional assays. Myocardial PAD isoforms were confirmed by RT-PCR with PAD2 being the major isoform in myocytes. In total, 304 citrullinated sites were identified that map to 145 proteins among the three study groups: normal, ischaemia, and dilated cardiomyopathy. Citrullination of myosin (using HMM fragment) decreased its intrinsic ATPase activity and inhibited the acto-HMM-ATPase activity. Citrullinated TM resulted in stronger F-actin binding and inhibited the acto-HMM-ATPase activity. Citrullinated TnI did not alter the binding to F-actin or acto-HMM-ATPase activity. Overall, citrullination of sarcomeric proteins caused a decrease in Ca(2+) sensitivity in skinned cardiomyocytes, with no change in maximal calcium-activated force or hill coefficient. CONCLUSION Citrullination unique to the cardiac proteome was identified. Our data indicate important structural and functional alterations to the cardiac sarcomere and the contribution of protein citrullination to this process.
Collapse
Affiliation(s)
- Justyna Fert-Bober
- The Heart Institute and Department of Medicine, Cedars-Sinai Medical Center, Advanced Clinical BioSystems Research Institute, Advanced Health Science Building, 9229, Los Angeles, CA, USA Bayview Proteomics Center, Division of Cardiology, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - John T Giles
- Division of Rheumatology, Department of Medicine, Columbia University, New York, NY, USA
| | - Ronald J Holewinski
- The Heart Institute and Department of Medicine, Cedars-Sinai Medical Center, Advanced Clinical BioSystems Research Institute, Advanced Health Science Building, 9229, Los Angeles, CA, USA
| | - Jonathan A Kirk
- Division of Cardiology, Department of Medicine, The Johns Hopkins University Medical Institutions, Baltimore, MD, USA
| | - Helge Uhrigshardt
- Bayview Proteomics Center, Division of Cardiology, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Erin L Crowgey
- The Heart Institute and Department of Medicine, Cedars-Sinai Medical Center, Advanced Clinical BioSystems Research Institute, Advanced Health Science Building, 9229, Los Angeles, CA, USA
| | - Felipe Andrade
- Division of Cardiology, Department of Medicine, The Johns Hopkins University Medical Institutions, Baltimore, MD, USA
| | - Clifton O Bingham
- Division of Rheumatology, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA Division of Rheumatology, Department of Medicine, Seoul National University Hospital, Seoul, Korea
| | - Jin Kyun Park
- Division of Rheumatology, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA Division of Rheumatology, Department of Medicine, Seoul National University Hospital, Seoul, Korea
| | - Marc K Halushka
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - David A Kass
- Division of Cardiology, Department of Medicine, The Johns Hopkins University Medical Institutions, Baltimore, MD, USA
| | - Joan M Bathon
- Division of Rheumatology, Department of Medicine, Columbia University, New York, NY, USA
| | - Jennifer E Van Eyk
- The Heart Institute and Department of Medicine, Cedars-Sinai Medical Center, Advanced Clinical BioSystems Research Institute, Advanced Health Science Building, 9229, Los Angeles, CA, USA Bayview Proteomics Center, Division of Cardiology, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
43
|
Bioinformatics Knowledge Map for Analysis of Beta-Catenin Function in Cancer. PLoS One 2015; 10:e0141773. [PMID: 26509276 PMCID: PMC4624812 DOI: 10.1371/journal.pone.0141773] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 10/13/2015] [Indexed: 01/26/2023] Open
Abstract
Given the wealth of bioinformatics resources and the growing complexity of biological information, it is valuable to integrate data from disparate sources to gain insight into the role of genes/proteins in health and disease. We have developed a bioinformatics framework that combines literature mining with information from biomedical ontologies and curated databases to create knowledge "maps" of genes/proteins of interest. We applied this approach to the study of beta-catenin, a cell adhesion molecule and transcriptional regulator implicated in cancer. The knowledge map includes post-translational modifications (PTMs), protein-protein interactions, disease-associated mutations, and transcription factors co-activated by beta-catenin and their targets and captures the major processes in which beta-catenin is known to participate. Using the map, we generated testable hypotheses about beta-catenin biology in normal and cancer cells. By focusing on proteins participating in multiple relation types, we identified proteins that may participate in feedback loops regulating beta-catenin transcriptional activity. By combining multiple network relations with PTM proteoform-specific functional information, we proposed a mechanism to explain the observation that the cyclin dependent kinase CDK5 positively regulates beta-catenin co-activator activity. Finally, by overlaying cancer-associated mutation data with sequence features, we observed mutation patterns in several beta-catenin PTM sites and PTM enzyme binding sites that varied by tissue type, suggesting multiple mechanisms by which beta-catenin mutations can contribute to cancer. The approach described, which captures rich information for molecular species from genes and proteins to PTM proteoforms, is extensible to other proteins and their involvement in disease.
Collapse
|
44
|
Lin Y, Xiang Z, He Y. Ontology-based representation and analysis of host-Brucella interactions. J Biomed Semantics 2015; 6:37. [PMID: 26445639 PMCID: PMC4594885 DOI: 10.1186/s13326-015-0036-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2012] [Accepted: 09/23/2015] [Indexed: 11/26/2022] Open
Abstract
Background Biomedical ontologies are representations of classes of entities in the biomedical domain and how these classes are related in computer- and human-interpretable formats. Ontologies support data standardization and exchange and provide a basis for computer-assisted automated reasoning. IDOBRU is an ontology in the domain of Brucella and brucellosis. Brucella is a Gram-negative intracellular bacterium that causes brucellosis, the most common zoonotic disease in the world. In this study, IDOBRU is used as a platform to model and analyze how the hosts, especially host macrophages, interact with virulent Brucella strains or live attenuated Brucella vaccine strains. Such a study allows us to better integrate and understand intricate Brucella pathogenesis and host immunity mechanisms. Results Different levels of host-Brucella interactions based on different host cell types and Brucella strains were first defined ontologically. Three important processes of virulent Brucella interacting with host macrophages were represented: Brucella entry into macrophage, intracellular trafficking, and intracellular replication. Two Brucella pathogenesis mechanisms were ontologically represented: Brucella Type IV secretion system that supports intracellular trafficking and replication, and Brucella erythritol metabolism that participates in Brucella intracellular survival and pathogenesis. The host cell death pathway is critical to the outcome of host-Brucella interactions. For better survival and replication, virulent Brucella prevents macrophage cell death. However, live attenuated B. abortus vaccine strain RB51 induces caspase-2-mediated proinflammatory cell death. Brucella-associated cell death processes are represented in IDOBRU. The gene and protein information of 432 manually annotated Brucella virulence factors were represented using the Ontology of Genes and Genomes (OGG) and Protein Ontology (PRO), respectively. Seven inference rules were defined to capture the knowledge of host-Brucella interactions and implemented in IDOBRU. Current IDOBRU includes 3611 ontology terms. SPARQL queries identified many results that are critical to the host-Brucella interactions. For example, out of 269 protein virulence factors related to macrophage-Brucella interactions, 81 are critical to Brucella intracellular replication inside macrophages. A SPARQL query also identified 11 biological processes important for Brucella virulence. Conclusions To systematically represent and analyze fundamental host-pathogen interaction mechanisms, we provided for the first time comprehensive ontological modeling of host-pathogen interactions using Brucella as the pathogen model. The methods and ontology representations used in our study are generic and can be broadened to study the interactions between hosts and other pathogens. Electronic supplementary material The online version of this article (doi:10.1186/s13326-015-0036-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yu Lin
- Unit of Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, 1150 W. Medical Center Dr, Ann Arbor, MI 48109 USA
| | - Zuoshuang Xiang
- Unit of Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, 1150 W. Medical Center Dr, Ann Arbor, MI 48109 USA
| | - Yongqun He
- Unit of Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, 1150 W. Medical Center Dr, Ann Arbor, MI 48109 USA
| |
Collapse
|
45
|
Ding R, Arighi CN, Lee JY, Wu CH, Vijay-Shanker K. pGenN, a gene normalization tool for plant genes and proteins in scientific literature. PLoS One 2015; 10:e0135305. [PMID: 26258475 PMCID: PMC4530884 DOI: 10.1371/journal.pone.0135305] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 07/20/2015] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Automatically detecting gene/protein names in the literature and connecting them to databases records, also known as gene normalization, provides a means to structure the information buried in free-text literature. Gene normalization is critical for improving the coverage of annotation in the databases, and is an essential component of many text mining systems and database curation pipelines. METHODS In this manuscript, we describe a gene normalization system specifically tailored for plant species, called pGenN (pivot-based Gene Normalization). The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. We have developed new heuristics to improve each of these phases. RESULTS We evaluated the performance of pGenN on an in-house expertly annotated corpus consisting of 104 plant relevant abstracts. Our system achieved an F-value of 88.9% (Precision 90.9% and Recall 87.2%) on this corpus, outperforming state-of-art systems presented in BioCreative III. We have processed over 440,000 plant-related Medline abstracts using pGenN. The gene normalization results are stored in a local database for direct query from the pGenN web interface (proteininformationresource.org/pgenn/). The annotated literature corpus is also publicly available through the PIR text mining portal (proteininformationresource.org/iprolink/).
Collapse
Affiliation(s)
- Ruoyao Ding
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
- * E-mail:
| | - Cecilia N. Arighi
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America
| | - Jung-Youn Lee
- Department of Plant and Soil Sciences, University of Delaware, Newark, Delaware, United States of America
| | - Cathy H. Wu
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America
| | - K. Vijay-Shanker
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
| |
Collapse
|
46
|
Smith B, Arabandi S, Brochhausen M, Calhoun M, Ciccarese P, Doyle S, Gibaud B, Goldberg I, Kahn CE, Overton J, Tomaszewski J, Gurcan M. Biomedical imaging ontologies: A survey and proposal for future work. J Pathol Inform 2015; 6:37. [PMID: 26167381 PMCID: PMC4485195 DOI: 10.4103/2153-3539.159214] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 04/30/2015] [Indexed: 12/24/2022] Open
Abstract
Background: Ontology is one strategy for promoting interoperability of heterogeneous data through consistent tagging. An ontology is a controlled structured vocabulary consisting of general terms (such as “cell” or “image” or “tissue” or “microscope”) that form the basis for such tagging. These terms are designed to represent the types of entities in the domain of reality that the ontology has been devised to capture; the terms are provided with logical definitions thereby also supporting reasoning over the tagged data. Aim: This paper provides a survey of the biomedical imaging ontologies that have been developed thus far. It outlines the challenges, particularly faced by ontologies in the fields of histopathological imaging and image analysis, and suggests a strategy for addressing these challenges in the example domain of quantitative histopathology imaging. Results and Conclusions: The ultimate goal is to support the multiscale understanding of disease that comes from using interoperable ontologies to integrate imaging data with clinical and genomics data.
Collapse
Affiliation(s)
- Barry Smith
- Department of Philosophy, The State University of New York at Buffalo, Buffalo, NY 14260, USA
| | | | - Mathias Brochhausen
- Division of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Michael Calhoun
- Department of Health and Human Performance, Elon University, Elon, NC 27244, USA
| | - Paolo Ciccarese
- Harvard Medical School, Massachusetts General Hospital, PerkinElmer Innovation Labs, Boston, MA 02115, USA
| | - Scott Doyle
- Department of Pathology and Anatomical Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| | - Bernard Gibaud
- Laboratoire du Traitement du Signal et de l'Image (LTSI), Inserm Unit 1099, University of Rennes 1, Rennes, France
| | - Ilya Goldberg
- National Institute on Aging, National Institutes of Health, Baltimore, MD 21224, USA
| | - Charles E Kahn
- Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - John Tomaszewski
- Department of Pathology and Anatomical Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| | - Metin Gurcan
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
47
|
Schriml LM, Mitraka E. The Disease Ontology: fostering interoperability between biological and clinical human disease-related data. Mamm Genome 2015; 26:584-9. [PMID: 26093607 PMCID: PMC4602048 DOI: 10.1007/s00335-015-9576-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Accepted: 06/08/2015] [Indexed: 12/15/2022]
Abstract
The Disease Ontology (DO) enables cross-domain data integration through a common standard of human disease terms and their etiological descriptions. Standardized disease descriptors that are integrated across mammalian genomic resources provide a human-readable, machine-interpretable, community-driven disease corpus that unifies the representation of human common and rare diseases. The DO is populated by consensus-driven disease data descriptors that incorporate disease terms utilized by genomic and genetic projects and resources engaged in studies to understand the genetics of human disease through the study of model organisms. The DO project serves multiple roles for the model organism community by providing: (1) a structured "backbone" of disease concepts represented among the model organism databases; (2) authoritative disease curation services to researchers and resource providers; and (3) development of subsets of the DO representative of human diseases annotated to animal models curated within the model organism databases.
Collapse
Affiliation(s)
- Lynn M Schriml
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| | - Elvira Mitraka
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| |
Collapse
|
48
|
Arighi C, Shamovsky V, Masci AM, Ruttenberg A, Smith B, Natale DA, Wu C, D’Eustachio P. Toll-like receptor signaling in vertebrates: testing the integration of protein, complex, and pathway data in the protein ontology framework. PLoS One 2015; 10:e0122978. [PMID: 25894391 PMCID: PMC4404318 DOI: 10.1371/journal.pone.0122978] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 02/26/2015] [Indexed: 11/20/2022] Open
Abstract
The Protein Ontology (PRO) provides terms for and supports annotation of species-specific protein complexes in an ontology framework that relates them both to their components and to species-independent families of complexes. Comprehensive curation of experimentally known forms and annotations thereof is expected to expose discrepancies, differences, and gaps in our knowledge. We have annotated the early events of innate immune signaling mediated by Toll-Like Receptor 3 and 4 complexes in human, mouse, and chicken. The resulting ontology and annotation data set has allowed us to identify species-specific gaps in experimental data and possible functional differences between species, and to employ inferred structural and functional relationships to suggest plausible resolutions of these discrepancies and gaps.
Collapse
Affiliation(s)
- Cecilia Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America
| | - Veronica Shamovsky
- Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, New York, United States of America
| | - Anna Maria Masci
- Department of Immunology, Duke University, Durham, North Carolina, United States of America
| | - Alan Ruttenberg
- School of Dental Medicine, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Barry Smith
- Department of Philosophy and Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Darren A. Natale
- Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D. C., United States of America
| | - Cathy Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America
- Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D. C., United States of America
| | - Peter D’Eustachio
- Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
49
|
Tudor CO, Ross KE, Li G, Vijay-Shanker K, Wu CH, Arighi CN. Construction of phosphorylation interaction networks by text mining of full-length articles using the eFIP system. Database (Oxford) 2015; 2015:bav020. [PMID: 25833953 PMCID: PMC4381107 DOI: 10.1093/database/bav020] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2014] [Revised: 02/17/2015] [Accepted: 02/18/2015] [Indexed: 12/11/2022]
Abstract
Protein phosphorylation is a reversible post-translational modification where a protein kinase adds a phosphate group to a protein, potentially regulating its function, localization and/or activity. Phosphorylation can affect protein-protein interactions (PPIs), abolishing interaction with previous binding partners or enabling new interactions. Extracting phosphorylation information coupled with PPI information from the scientific literature will facilitate the creation of phosphorylation interaction networks of kinases, substrates and interacting partners, toward knowledge discovery of functional outcomes of protein phosphorylation. Increasingly, PPI databases are interested in capturing the phosphorylation state of interacting partners. We have previously developed the eFIP (Extracting Functional Impact of Phosphorylation) text mining system, which identifies phosphorylated proteins and phosphorylation-dependent PPIs. In this work, we present several enhancements for the eFIP system: (i) text mining for full-length articles from the PubMed Central open-access collection; (ii) the integration of the RLIMS-P 2.0 system for the extraction of phosphorylation events with kinase, substrate and site information; (iii) the extension of the PPI module with new trigger words/phrases describing interactions and (iv) the addition of the iSimp tool for sentence simplification to aid in the matching of syntactic patterns. We enhance the website functionality to: (i) support searches based on protein roles (kinases, substrates, interacting partners) or using keywords; (ii) link protein entities to their corresponding UniProt identifiers if mapped and (iii) support visual exploration of phosphorylation interaction networks using Cytoscape. The evaluation of eFIP on full-length articles achieved 92.4% precision, 76.5% recall and 83.7% F-measure on 100 article sections. To demonstrate eFIP for knowledge extraction and discovery, we constructed phosphorylation-dependent interaction networks involving 14-3-3 proteins identified from cancer-related versus diabetes-related articles. Comparison of the phosphorylation interaction network of kinases, phosphoproteins and interactants obtained from eFIP searches, along with enrichment analysis of the protein set, revealed several shared interactions, highlighting common pathways discussed in the context of both diseases.
Collapse
Affiliation(s)
- Catalina O Tudor
- Department of Computer and Information Sciences and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA Department of Computer and Information Sciences and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Karen E Ross
- Department of Computer and Information Sciences and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Gang Li
- Department of Computer and Information Sciences and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - K Vijay-Shanker
- Department of Computer and Information Sciences and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Cathy H Wu
- Department of Computer and Information Sciences and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA Department of Computer and Information Sciences and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Cecilia N Arighi
- Department of Computer and Information Sciences and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA Department of Computer and Information Sciences and Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| |
Collapse
|
50
|
Fu X, Batista-Navarro R, Rak R, Ananiadou S. Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows. J Biomed Semantics 2015; 6:8. [PMID: 25789153 PMCID: PMC4364458 DOI: 10.1186/s13326-015-0004-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 02/22/2015] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. METHODS A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. RESULTS When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. CONCLUSIONS We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Collapse
Affiliation(s)
- Xiao Fu
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, UK
| | - Riza Batista-Navarro
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, UK ; Department of Computer Science, University of the Philippines Diliman, Quezon City, 1101 Philippines
| | - Rafal Rak
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, UK
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, UK
| |
Collapse
|