1
|
Faria D, Eugénio P, Contreiras Silva M, Balbi L, Bedran G, Kallor AA, Nunes S, Palkowski A, Waleron M, Alfaro JA, Pesquita C. The Immunopeptidomics Ontology (ImPO). Database (Oxford) 2024; 2024:baae014. [PMID: 38857186 PMCID: PMC11164101 DOI: 10.1093/database/baae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 11/30/2023] [Accepted: 02/22/2024] [Indexed: 06/12/2024]
Abstract
The adaptive immune response plays a vital role in eliminating infected and aberrant cells from the body. This process hinges on the presentation of short peptides by major histocompatibility complex Class I molecules on the cell surface. Immunopeptidomics, the study of peptides displayed on cells, delves into the wide variety of these peptides. Understanding the mechanisms behind antigen processing and presentation is crucial for effectively evaluating cancer immunotherapies. As an emerging domain, immunopeptidomics currently lacks standardization-there is neither an established terminology nor formally defined semantics-a critical concern considering the complexity, heterogeneity, and growing volume of data involved in immunopeptidomics studies. Additionally, there is a disconnection between how the proteomics community delivers the information about antigen presentation and its uptake by the clinical genomics community. Considering the significant relevance of immunopeptidomics in cancer, this shortcoming must be addressed to bridge the gap between research and clinical practice. In this work, we detail the development of the ImmunoPeptidomics Ontology, ImPO, the first effort at standardizing the terminology and semantics in the domain. ImPO aims to encapsulate and systematize data generated by immunopeptidomics experimental processes and bioinformatics analysis. ImPO establishes cross-references to 24 relevant ontologies, including the National Cancer Institute Thesaurus, Mondo Disease Ontology, Logical Observation Identifier Names and Codes and Experimental Factor Ontology. Although ImPO was developed using expert knowledge to characterize a large and representative data collection, it may be readily used to encode other datasets within the domain. Ultimately, ImPO facilitates data integration and analysis, enabling querying, inference and knowledge generation and importantly bridging the gap between the clinical proteomics and genomics communities. As the field of immunogenomics uses protein-level immunopeptidomics data, we expect ImPO to play a key role in supporting a rich and standardized description of the large-scale data that emerging high-throughput technologies are expected to bring in the near future. Ontology URL: https://zenodo.org/record/10237571 Project GitHub: https://github.com/liseda-lab/ImPO/blob/main/ImPO.owl.
Collapse
Affiliation(s)
- Daniel Faria
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol, 9, Lisboa 1000-029, Portugal
| | - Patrícia Eugénio
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Marta Contreiras Silva
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Laura Balbi
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Georges Bedran
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Ashwin Adrian Kallor
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Susana Nunes
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Aleksander Palkowski
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Michal Waleron
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Javier A Alfaro
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
- Department of Biochemistry and Microbiology, University of Victoria, 3800 Finnerty Rd, Victoria, British Columbia, BC V8P 5C2, Canada
- Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, Old College, South Bridge, Edinburgh, EH8 9YL, UK
- The Canadian Association for Responsible AI in Medicine, Victoria, Canada
| | - Catia Pesquita
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| |
Collapse
|
2
|
Callahan TJ, Tripodi IJ, Stefanski AL, Cappelletti L, Taneja SB, Wyrwa JM, Casiraghi E, Matentzoglu NA, Reese J, Silverstein JC, Hoyt CT, Boyce RD, Malec SA, Unni DR, Joachimiak MP, Robinson PN, Mungall CJ, Cavalleri E, Fontana T, Valentini G, Mesiti M, Gillenwater LA, Santangelo B, Vasilevsky NA, Hoehndorf R, Bennett TD, Ryan PB, Hripcsak G, Kahn MG, Bada M, Baumgartner WA, Hunter LE. An open source knowledge graph ecosystem for the life sciences. Sci Data 2024; 11:363. [PMID: 38605048 PMCID: PMC11009265 DOI: 10.1038/s41597-024-03171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/21/2024] [Indexed: 04/13/2024] Open
Abstract
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Ignacio J Tripodi
- Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder, Boulder, CO, 80301, USA
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Scott A Malec
- Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Berlin Institute of Health at Charité-Universitatsmedizin, 10117, Berlin, Germany
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Emanuele Cavalleri
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- ELLIS, European Laboratory for Learning and Intelligent Systems, Milan Unit, Italy
| | - Marco Mesiti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Lucas A Gillenwater
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Brook Santangelo
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Data Collaboration Center, Critical Path Institute, 1840 E River Rd. Suite 100, Tucson, AZ, 85718, USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tellen D Bennett
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael Bada
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - William A Baumgartner
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
3
|
Anandakrishnan M, Ross KE, Chen C, Shanker V, Cowart J, Wu CH. KSFinder-a knowledge graph model for link prediction of novel phosphorylated substrates of kinases. PeerJ 2023; 11:e16164. [PMID: 37818330 PMCID: PMC10561642 DOI: 10.7717/peerj.16164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 09/01/2023] [Indexed: 10/12/2023] Open
Abstract
Background Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less than 50% of known human kinases. They utilize local feature selection based on protein sequences, motifs, domains, structures, and/or functions, and do not consider the heterogeneous relationships of the proteins. In this work, we present KSFinder, a tool that predicts kinase-substrate links by capturing the inherent association of proteins in a network comprising 85% of the known human kinases. We also postulate the potential role of two understudied kinases based on their substrate predictions from KSFinder. Methods KSFinder learns the semantic relationships in a phosphoproteome knowledge graph using a knowledge graph embedding algorithm and represents the nodes in low-dimensional vectors. A multilayer perceptron (MLP) classifier is trained to discern kinase-substrate links using the embedded vectors. KSFinder uses a strategic negative generation approach that eliminates biases in entity representation and combines data from experimentally validated non-interacting protein pairs, proteins from different subcellular locations, and random sampling. We assess KSFinder's generalization capability on four different datasets and compare its performance with other state-of-the-art prediction models. We employ KSFinder to predict substrates of 68 "dark" kinases considered understudied by the Illuminating the Druggable Genome program and use our text-mining tool, RLIMS-P along with manual curation, to search for literature evidence for the predictions. In a case study, we performed functional enrichment analysis for two dark kinases - HIPK3 and CAMKK1 using their predicted substrates. Results KSFinder shows improved performance over other kinase-substrate prediction models and generalized prediction ability on different datasets. We identified literature evidence for 17 novel predictions involving an understudied kinase. All of these 17 predictions had a probability score ≥0.7 (nine at >0.9, six at 0.8-0.9, and two at 0.7-0.8). The evaluation of 93,593 negative predictions (probability ≤0.3) identified four false negatives. The top enriched biological processes of HIPK3 substrates relate to the regulation of extracellular matrix and epigenetic gene expression, while CAMKK1 substrates include lipid storage regulation and glucose homeostasis. Conclusions KSFinder outperforms the current kinase-substrate prediction tools with higher kinase coverage. The strategically developed negatives provide a superior generalization ability for KSFinder. We predicted substrates of 432 kinases, 68 of which are understudied, and hypothesized the potential functions of two dark kinases using their predicted substrates.
Collapse
Affiliation(s)
- Manju Anandakrishnan
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Karen E. Ross
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, United States of America
| | - Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Vijay Shanker
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Julie Cowart
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Cathy H. Wu
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, United States of America
| |
Collapse
|
4
|
Callahan TJ, Stefanski AL, Wyrwa JM, Zeng C, Ostropolets A, Banda JM, Baumgartner WA, Boyce RD, Casiraghi E, Coleman BD, Collins JH, Deakyne Davies SJ, Feinstein JA, Lin AY, Martin B, Matentzoglu NA, Meeker D, Reese J, Sinclair J, Taneja SB, Trinkley KE, Vasilevsky NA, Williams AE, Zhang XA, Denny JC, Ryan PB, Hripcsak G, Bennett TD, Haendel MA, Robinson PN, Hunter LE, Kahn MG. Ontologizing health systems data at scale: making translational discovery a reality. NPJ Digit Med 2023; 6:89. [PMID: 37208468 PMCID: PMC10196319 DOI: 10.1038/s41746-023-00830-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 04/28/2023] [Indexed: 05/21/2023] Open
Abstract
Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, 30303, USA
| | - William A Baumgartner
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15260, USA
| | - Elena Casiraghi
- Computer Science, Università degli Studi di Milano, Milan, Italy
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Ben D Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Janine H Collins
- Department of Haematology, University of Cambridge, Cambridge, UK
| | - Sara J Deakyne Davies
- Department of Research Informatics & Data Science, Analytics Resource Center, Children's Hospital Colorado, Aurora, CO, 80045, USA
| | - James A Feinstein
- Adult and Child Center for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
| | - Asiyah Y Lin
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Blake Martin
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | | | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Katy E Trinkley
- Department of Family Medicine, University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Translational and Integrative Sciences Lab, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Andrew E Williams
- Tufts Institute for Clinical Research and Health Policy Studies, Tufts University, Boston, MA, 02155, USA
| | - Xingmin A Zhang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Tellen D Bennett
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Melissa A Haendel
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| |
Collapse
|
5
|
Taneja SB, Callahan TJ, Paine MF, Kane-Gill SL, Kilicoglu H, Joachimiak MP, Boyce RD. Developing a Knowledge Graph for Pharmacokinetic Natural Product-Drug Interactions. J Biomed Inform 2023; 140:104341. [PMID: 36933632 PMCID: PMC10150409 DOI: 10.1016/j.jbi.2023.104341] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/09/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023]
Abstract
BACKGROUND Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events. Although biomedical knowledge graphs (KGs) have been widely used for drug-drug interaction applications, computational investigation of NPDIs is novel. We constructed NP-KG as a first step toward computational discovery of plausible mechanistic explanations for pharmacokinetic NPDIs that can be used to guide scientific research. METHODS We developed a large-scale, heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature. To construct the KG, biomedical ontologies and drug databases were integrated with the Phenotype Knowledge Translator framework. The semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler, were used to extract semantic predications (subject-relation-object triples) from full texts of the scientific literature related to the exemplar natural products green tea and kratom. A literature-based graph constructed from the predications was integrated into the ontology-grounded KG to create NP-KG. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through KG path searches and meta-path discovery to determine congruent and contradictory information in NP-KG compared to ground truth data. We also conducted an error analysis to identify knowledge gaps and incorrect predications in the KG. RESULTS The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information compared to ground truth data. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. CONCLUSION NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify known pharmacokinetic interactions between natural products and pharmaceutical drugs mediated by drug metabolizing enzymes and transporters. Future work will incorporate context, contradiction analysis, and embedding-based methods to enrich NP-KG. NP-KG is publicly available at https://doi.org/10.5281/zenodo.6814507. The code for relation extraction, KG construction, and hypothesis generation is available at https://github.com/sanyabt/np-kg.
Collapse
Affiliation(s)
- Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15206, USA.
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Mary F Paine
- Department of Pharmaceutical Sciences, College of Pharmacy and Pharmaceutical Sciences, Washington State University, Spokane, WA 99202, USA
| | | | - Halil Kilicoglu
- School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
| | - Marcin P Joachimiak
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
| |
Collapse
|
6
|
Brain Data Standards - A method for building data-driven cell-type ontologies. Sci Data 2023; 10:50. [PMID: 36693887 PMCID: PMC9873614 DOI: 10.1038/s41597-022-01886-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 12/06/2022] [Indexed: 01/25/2023] Open
Abstract
Large-scale single-cell 'omics profiling is being used to define a complete catalogue of brain cell types, something that traditional methods struggle with due to the diversity and complexity of the brain. But this poses a problem: How do we organise such a catalogue - providing a standard way to refer to the cell types discovered, linking their classification and properties to supporting data? Cell ontologies provide a partial solution to these problems, but no existing ontology schemas support the definition of cell types by direct reference to supporting data, classification of cell types using classifications derived directly from data, or links from cell types to marker sets along with confidence scores. Here we describe a generally applicable schema that solves these problems and its application in a semi-automated pipeline to build a data-linked extension to the Cell Ontology representing cell types in the Primary Motor Cortex of humans, mice and marmosets. The methods and resulting ontology are designed to be scalable and applicable to similar whole-brain atlases currently in preparation.
Collapse
|
7
|
Feng B, Gao J. AnthraxKP: a knowledge graph-based, Anthrax Knowledge Portal mined from biomedical literature. Database (Oxford) 2022; 2022:6598946. [PMID: 35653350 PMCID: PMC9216567 DOI: 10.1093/database/baac037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 04/13/2022] [Accepted: 05/13/2022] [Indexed: 11/15/2022]
Abstract
Abstract
Anthrax is a zoonotic infectious disease caused by Bacillus anthracis (anthrax bacterium) that affects not only domestic and wild animals worldwide but also human health. As the study develops in-depth, a large quantity of related biomedical publications emerge. Acquiring knowledge from the literature is essential for gaining insight into anthrax etiology, diagnosis, treatment and research. In this study, we used a set of text mining tools to identify nearly 14 000 entities of 29 categories, such as genes, diseases, chemicals, species, vaccines and proteins, from nearly 8000 anthrax biomedical literature and extracted 281 categories of association relationships among the entities. We curated Anthrax-related Entities Dictionary and Anthrax Ontology. We formed Anthrax Knowledge Graph (AnthraxKG) containing more than 6000 nodes, 6000 edges and 32 000 properties. An interactive visualized Anthrax Knowledge Portal(AnthraxKP) was also developed based on AnthraxKG by using Web technology. AnthraxKP in this study provides rich and authentic relevant knowledge in many forms, which can help researchers carry out research more efficiently.
Database URL: AnthraxKP is permitted users to query and download data at http://139.224.212.120:18095/.
Collapse
Affiliation(s)
- Baiyang Feng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University , Erdos East Street No. 29, Hohhot 010011, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry , Zhaowuda Road No. 306, Hohhot 010018, China
| | - Jing Gao
- College of Computer and Information Engineering, Inner Mongolia Agricultural University , Erdos East Street No. 29, Hohhot 010011, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry , Zhaowuda Road No. 306, Hohhot 010018, China
- Inner Mongolia Autonomous Region Big Data Center , Chilechuan Street No. 1, Hohhot 010091, China
| |
Collapse
|
8
|
Rodriguez-Esteban R, Duarte J, Teixeira PC, Richard F, Koltsova S, So WV. Prediction of standard cell types and functional markers from textual descriptions of flow cytometry gating definitions using machine learning. CYTOMETRY. PART B, CLINICAL CYTOMETRY 2022; 102:220-227. [PMID: 35253974 DOI: 10.1002/cyto.b.22065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 02/02/2022] [Accepted: 02/28/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND A key step in clinical flow cytometry data analysis is gating, which involves the identification of cell populations. The process of gating produces a set of reportable results, which are typically described by gating definitions. The non-standardized, non-interpreted nature of gating definitions represents a hurdle for data interpretation and data sharing across and within organizations. Interpreting and standardizing gating definitions for subsequent analysis of gating results requires a curation effort from experts. Machine learning approaches have the potential to help in this process by predicting expert annotations associated with gating definitions. METHODS We created a gold-standard dataset by manually annotating thousands of gating definitions with cell type and functional marker annotations. We used this dataset to train and test a machine learning pipeline able to predict standard cell types and functional marker genes associated with gating definitions. RESULTS The machine learning pipeline predicted annotations with high accuracy for both cell types and functional marker genes. Accuracy was lower for gating definitions from assays belonging to laboratories from which limited or no prior data was available in the training. Manual error review ensured that resulting predicted annotations could be reused subsequently as additional gold-standard training data. CONCLUSIONS Machine learning methods are able to consistently predict annotations associated with gating definitions from flow cytometry assays. However, a hybrid automatic and manual annotation workflow would be recommended to achieve optimal results.
Collapse
Affiliation(s)
- Raul Rodriguez-Esteban
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland
| | - José Duarte
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland
| | - Priscila C Teixeira
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland
| | - Fabien Richard
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland
| | - Svetlana Koltsova
- Curation Department, Rancho BioSciences LLC, San Diego, California, USA
| | - W Venus So
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center New York, New York, USA
| |
Collapse
|
9
|
Furrer L, Cornelius J, Rinaldi F. Parallel sequence tagging for concept recognition. BMC Bioinformatics 2022; 22:623. [PMID: 35331131 PMCID: PMC8943923 DOI: 10.1186/s12859-021-04511-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 12/01/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are modeled as a sequence-labeling task, operating directly on the source text. We examine different harmonisation strategies for merging the predictions of the two classifiers into a single output sequence. RESULTS We test our approach on the recent Version 4 of the CRAFT corpus. In all 20 annotation sets of the concept-annotation task, our system outperforms the pipeline system reported as a baseline in the CRAFT shared task, a competition of the BioNLP Open Shared Tasks 2019. We further refine the systems from the shared task by optimising the harmonisation strategy separately for each annotation set. CONCLUSIONS Our analysis shows that the strengths of the two classifiers can be combined in a fruitful way. However, prediction harmonisation requires individual calibration on a development set for each annotation set. This allows achieving a good trade-off between established knowledge (training set) and novel information (unseen concepts).
Collapse
Affiliation(s)
- Lenz Furrer
- Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Joseph Cornelius
- Dalle Molle Institute for Artificial Intelligence Research (IDSIA USI/SUPSI), Lugano, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Fabio Rinaldi
- Dalle Molle Institute for Artificial Intelligence Research (IDSIA USI/SUPSI), Lugano, Switzerland.
- Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Zurich, Switzerland.
- Fondazione Bruno Kessler, Trento, Italy.
| |
Collapse
|
10
|
Martens M, Evelo CT, Willighagen EL. Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. APPLIED IN VITRO TOXICOLOGY 2022; 8:2-13. [PMID: 35388368 DOI: 10.26434/chemrxiv.13524191] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
INTRODUCTION The AOP-Wiki is the main platform for the development and storage of adverse outcome pathways (AOPs). These AOPs describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content, and to link it with chemical and biological resources. Together, this allows more detailed exploration, which can be automated. MATERIALS AND METHODS We converted the complete AOP-Wiki content into resource description framework (RDF) triples. We used >20 ontologies for the semantic annotation of property-object relations, including the Chemical Information Ontology, Dublin Core, and the AOP Ontology. RESULTS The resulting RDF contains >122,000 triples describing 158 unique properties of >15,000 unique subjects. Furthermore, >3500 link-outs were added to 12 chemical databases, and >7500 link-outs to 4 gene and protein databases. The AOP-Wiki RDF has been made available at https://aopwiki.rdf.bigcat-bioinformatics.org. DISCUSSION SPARQL queries can be used to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. CONCLUSION Overall, the AOP-Wiki RDF allows new ways to explore the rapidly growing AOP knowledge and makes the integration of this database in automated workflows possible, making the AOP-Wiki more FAIR.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
11
|
Martens M, Evelo CT, Willighagen EL. Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. APPLIED IN VITRO TOXICOLOGY 2022; 8:2-13. [PMID: 35388368 PMCID: PMC8978481 DOI: 10.1089/aivt.2021.0010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Introduction: The AOP-Wiki is the main platform for the development and storage of adverse outcome pathways (AOPs). These AOPs describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content, and to link it with chemical and biological resources. Together, this allows more detailed exploration, which can be automated. Materials and Methods: We converted the complete AOP-Wiki content into resource description framework (RDF) triples. We used >20 ontologies for the semantic annotation of property–object relations, including the Chemical Information Ontology, Dublin Core, and the AOP Ontology. Results: The resulting RDF contains >122,000 triples describing 158 unique properties of >15,000 unique subjects. Furthermore, >3500 link-outs were added to 12 chemical databases, and >7500 link-outs to 4 gene and protein databases. The AOP-Wiki RDF has been made available at https://aopwiki.rdf.bigcat-bioinformatics.org Discussion: SPARQL queries can be used to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. Conclusion: Overall, the AOP-Wiki RDF allows new ways to explore the rapidly growing AOP knowledge and makes the integration of this database in automated workflows possible, making the AOP-Wiki more FAIR.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Egon L. Willighagen
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
12
|
Gavali S, Ross KE, Cowart J, Chen C, Wu CH. iPTMnet RESTful API for Post-translational Modification Network Analysis. Methods Mol Biol 2022; 2499:187-204. [PMID: 35696082 PMCID: PMC10082948 DOI: 10.1007/978-1-0716-2317-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
iPTMnet is a resource that combines rich information about protein post-translational modifications (PTM) from curated databases as well as text mining tools. Researchers can use the iPTMnet website to query, analyze and download the PTM data. In this chapter we describe the iPTMnet RESTful API which provides a way to streamline the integration of iPTMnet data into an automated data analysis workflow. In the first section, we give an overview of the architecture of the API. In the second section, we describe various function defined by the API and provide detailed examples of using these functions.
Collapse
|
13
|
Zheng L, Perl Y, He Y, Ochs C, Geller J, Liu H, Keloth VK. Visual comprehension and orientation into the COVID-19 CIDO ontology. J Biomed Inform 2021; 120:103861. [PMID: 34224898 PMCID: PMC8252699 DOI: 10.1016/j.jbi.2021.103861] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 05/11/2021] [Accepted: 06/30/2021] [Indexed: 12/12/2022]
Abstract
The current intensive research on potential remedies and vaccinations for COVID-19 would greatly benefit from an ontology of standardized COVID terms. The Coronavirus Infectious Disease Ontology (CIDO) is the largest among several COVID ontologies, and it keeps growing, but it is still a medium sized ontology. Sophisticated CIDO users, who need more than searching for a specific concept, require orientation and comprehension of CIDO. In previous research, we designed a summarization network called "partial-area taxonomy" to support comprehension of ontologies. The partial-area taxonomy for CIDO is of smaller magnitude than CIDO, but is still too large for comprehension. We present here the "weighted aggregate taxonomy" of CIDO, designed to provide compact views at various granularities of our partial-area taxonomy (and the CIDO ontology). Such a compact view provides a "big picture" of the content of an ontology. In previous work, in the visualization patterns used for partial-area taxonomies, the nodes were arranged in levels according to the numbers of relationships of their concepts. Applying this visualization pattern to CIDO's weighted aggregate taxonomy resulted in an overly long and narrow layout that does not support orientation and comprehension since the names of nodes are barely readable. Thus, we introduce in this paper an innovative visualization of the weighted aggregate taxonomy for better orientation and comprehension of CIDO (and other ontologies). A measure for the efficiency of a layout is introduced and is used to demonstrate the advantage of the new layout over the previous one. With this new visualization, the user can "see the forest for the trees" of the ontology. Benefits of this visualization in highlighting insights into CIDO's content are provided. Generality of the new layout is demonstrated.
Collapse
Affiliation(s)
- Ling Zheng
- Computer Science and Software Engineering Department, Monmouth University, West Long Branch, NJ, USA.
| | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Yongqun He
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - James Geller
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Hao Liu
- Columbia University Irving Medical Center, New York, NY, USA
| | - Vipina K Keloth
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| |
Collapse
|
14
|
Yamada I, Campbell MP, Edwards N, Castro LJ, Lisacek F, Mariethoz J, Ono T, Ranzinger R, Shinmachi D, Aoki-Kinoshita KF. The glycoconjugate ontology (GlycoCoO) for standardizing the annotation of glycoconjugate data and its application. Glycobiology 2021; 31:741-750. [PMID: 33677548 PMCID: PMC8351504 DOI: 10.1093/glycob/cwab013] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 12/31/2020] [Accepted: 01/01/2021] [Indexed: 01/19/2023] Open
Abstract
Recent years have seen great advances in the development of glycoproteomics protocols and methods resulting in a sustainable increase in the reporting proteins, their attached glycans and glycosylation sites. However, only very few of these reports find their way into databases or data repositories. One of the major reasons is the absence of digital standard to represent glycoproteins and the challenging annotations with glycans. Depending on the experimental method, such a standard must be able to represent glycans as complete structures or as compositions, store not just single glycans but also represent glycoforms on a specific glycosylation side, deal with partially missing site information if no site mapping was performed, and store abundances or ratios of glycans within a glycoform of a specific site. To support the above, we have developed the GlycoConjugate Ontology (GlycoCoO) as a standard semantic framework to describe and represent glycoproteomics data. GlycoCoO can be used to represent glycoproteomics data in triplestores and can serve as a basis for data exchange formats. The ontology, database providers and supporting documentation are available online (https://github.com/glycoinfo/GlycoCoO).
Collapse
Affiliation(s)
- Issaku Yamada
- Research Department, The Noguchi Institute, 1-9-7 Kaga, Itabashi, Tokyo 173-0003, Japan
| | - Matthew P Campbell
- Institute for Glycomics, Griffith University at Gold Coast, Southport, QLD 4215, Australia
| | - Nathan Edwards
- Department of Biochemistry, Molecular and Cellular Biology, Georgetown University Medical Center, Washington, D.C. 20007, USA
| | - Leyla Jael Castro
- ZB MED Information Centre for Life Sciences, Gleueler Str. 60, 50931 Cologne, Germany
| | - Frederique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Computer Science Department, University of Geneva, route de Drize 7, CH - 1227 Geneva Switzerland, and also Section of Biology, University of Geneva, Geneva, Switzerland
| | - Julien Mariethoz
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, 7 Route de Drize, 1227 Geneva, Switzerland
| | - Tamiko Ono
- Faculty of Science and Engineering, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| | - Rene Ranzinger
- Complex Carbohydrate Research Center, The University of Georgia, 315 Riverbend Rd, Athens, Georgia 30602, USA
| | - Daisuke Shinmachi
- R&D Department, SparqLite LLC., 1615-22 Ishikawamachi, Hachioji, Tokyo 192-0032, Japan
| | - Kiyoko F Aoki-Kinoshita
- Glycan & Life Science Integration Center (GaLSIC), Faculty of Science and Engineering, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| |
Collapse
|
15
|
Kanza S, Graham Frey J. Semantic Technologies in Drug Discovery. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
16
|
Smaili FZ, Gao X, Hoehndorf R. Formal axioms in biomedical ontologies improve analysis and interpretation of associated data. Bioinformatics 2020; 36:2229-2236. [PMID: 31821406 PMCID: PMC7141863 DOI: 10.1093/bioinformatics/btz920] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 10/16/2019] [Accepted: 12/06/2019] [Indexed: 12/30/2022] Open
Abstract
Motivation Over the past years, significant resources have been invested into formalizing biomedical ontologies. Formal axioms in ontologies have been developed and used to detect and ensure ontology consistency, find unsatisfiable classes, improve interoperability, guide ontology extension through the application of axiom-based design patterns and encode domain background knowledge. The domain knowledge of biomedical ontologies may have also the potential to provide background knowledge for machine learning and predictive modelling. Results We use ontology-based machine learning methods to evaluate the contribution of formal axioms and ontology meta-data to the prediction of protein–protein interactions and gene–disease associations. We find that the background knowledge provided by the Gene Ontology and other ontologies significantly improves the performance of ontology-based prediction models through provision of domain-specific background knowledge. Furthermore, we find that the labels, synonyms and definitions in ontologies can also provide background knowledge that may be exploited for prediction. The axioms and meta-data of different ontologies contribute to improving data analysis in a context-specific manner. Our results have implications on the further development of formal knowledge bases and ontologies in the life sciences, in particular as machine learning methods are more frequently being applied. Our findings motivate the need for further development, and the systematic, application-driven evaluation and improvement, of formal axioms in ontologies. Availability and implementation https://github.com/bio-ontology-research-group/tsoe. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fatima Zohra Smaili
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Xin Gao
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
17
|
Karaman B, Sippl W. Computational Drug Repurposing: Current Trends. Curr Med Chem 2019; 26:5389-5409. [DOI: 10.2174/0929867325666180530100332] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Revised: 05/06/2018] [Accepted: 05/14/2018] [Indexed: 01/31/2023]
Abstract
:
Biomedical discovery has been reshaped upon the exploding digitization of data
which can be retrieved from a number of sources, ranging from clinical pharmacology to
cheminformatics-driven databases. Now, supercomputing platforms and publicly available
resources such as biological, physicochemical, and clinical data, can all be integrated to construct
a detailed map of signaling pathways and drug mechanisms of action in relation to drug
candidates. Recent advancements in computer-aided data mining have facilitated analyses of
‘big data’ approaches and the discovery of new indications for pre-existing drugs has been
accelerated. Linking gene-phenotype associations to predict novel drug-disease signatures or
incorporating molecular structure information of drugs and protein targets with other kinds of
data derived from systems biology provide great potential to accelerate drug discovery and
improve the success of drug repurposing attempts. In this review, we highlight commonly
used computational drug repurposing strategies, including bioinformatics and cheminformatics
tools, to integrate large-scale data emerging from the systems biology, and consider both
the challenges and opportunities of using this approach. Moreover, we provide successful examples
and case studies that combined various in silico drug-repurposing strategies to predict
potential novel uses for known therapeutics.
Collapse
Affiliation(s)
- Berin Karaman
- Biruni University - Department of Pharmaceutical Chemistry, Istanbul, Turkey
| | - Wolfgang Sippl
- Martin-Luther University of Halle-Wittenberg - Institute of Pharmacy, Halle (Saale), Germany
| |
Collapse
|
18
|
Marín de Evsikova C, Raplee ID, Lockhart J, Jaimes G, Evsikov AV. The Transcriptomic Toolbox: Resources for Interpreting Large Gene Expression Data within a Precision Medicine Context for Metabolic Disease Atherosclerosis. J Pers Med 2019; 9:E21. [PMID: 31032818 PMCID: PMC6617151 DOI: 10.3390/jpm9020021] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 04/20/2019] [Accepted: 04/25/2019] [Indexed: 11/16/2022] Open
Abstract
As one of the most widespread metabolic diseases, atherosclerosis affects nearly everyone as they age; arteries gradually narrow from plaque accumulation over time reducing oxygenated blood flow to central and periphery causing heart disease, stroke, kidney problems, and even pulmonary disease. Personalized medicine promises to bring treatments based on individual genome sequencing that precisely target the molecular pathways underlying atherosclerosis and its symptoms, but to date only a few genotypes have been identified. A promising alternative to this genetic approach is the identification of pathways altered in atherosclerosis by transcriptome analysis of atherosclerotic tissues to target specific aspects of disease. Transcriptomics is a potentially useful tool for both diagnostics and discovery science, exposing novel cellular and molecular mechanisms in clinical and translational models, and depending on experimental design to identify and test novel therapeutics. The cost and time required for transcriptome analysis has been greatly reduced by the development of next generation sequencing. The goal of this resource article is to provide background and a guide to appropriate technologies and downstream analyses in transcriptomics experiments generating ever-increasing amounts of gene expression data.
Collapse
Affiliation(s)
- Caralina Marín de Evsikova
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA.
- Epigenetics & Functional Genomics Laboratories, Department of Research and Development, Bay Pines Veteran Administration Healthcare System, Bay Pines, FL 33744, USA.
| | - Isaac D Raplee
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA.
| | - John Lockhart
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA.
| | - Gilberto Jaimes
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA.
| | - Alexei V Evsikov
- Epigenetics & Functional Genomics Laboratories, Department of Research and Development, Bay Pines Veteran Administration Healthcare System, Bay Pines, FL 33744, USA.
| |
Collapse
|
19
|
Xu H, Wang Y, Diao L, Wang X, Zhang Y, Zhu J, Liu J, Yao J, Liu Z, Li Y, He F, Wang Z, Liu Y, Li D. UVGD 1.0: a gene-centric database bridging ultraviolet radiation and molecular biology effects in organisms. Int J Radiat Biol 2019; 95:1172-1177. [PMID: 31021279 DOI: 10.1080/09553002.2019.1609127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Objectives: Exposing to ultraviolet for a certain time will trigger some significant molecular biology effects in an organism. In the past few decades, varied ultraviolet-associated biological effects as well as their related genes, have been discovered under biologists' efforts. However, information about ultraviolet-related genes is dispersed in thousands of scientific papers, and there is still no study emphasizing on the systematic collection of ultraviolet-related genes. Methods: We collected ultraviolet-related genes and built this gene-centric database UVGD based on literature mining and manual curation. Literature mining was based on the ultraviolet-related abstracts downloaded from PubMed, and we obtained sentences in which ultraviolet keywords and genes co-occur at single-sentence level by using bio-entity recognizer. After that, manual curation was implemented in order to identify whether the genes are related to ultraviolet or not. Results: We built the ultraviolet-related knowledge base UVGD 1.0 (URL: http://biokb.ncpsb.org/UVGD/ ), which contains 663 ultraviolet-related genes, together with 17 associated biological processes, 117 associated phenotypes, and 2628 MeSH terms. Conclusion: UVGD is helpful to understand the ultraviolet-related biological processes in organisms and we believe it would be useful for biologists to study the responding mechanisms to ultraviolet.
Collapse
Affiliation(s)
- Hao Xu
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Yan Wang
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Lihong Diao
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Xun Wang
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Yi Zhang
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Jiarun Zhu
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Jinying Liu
- b School of Traditional Chinese Medicine, Beijing University of Chinese Medicine , Beijing , China
| | - Jingwen Yao
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Zhongyang Liu
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Yang Li
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Fuchu He
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Zhidong Wang
- c Beijing Institute of Radiation Medicine , Beijing , China
| | - Yuan Liu
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| | - Dong Li
- a State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics , Beijing , China
| |
Collapse
|
20
|
Wang X, Diao L, Sun D, Wang D, Zhu J, He Y, Liu Y, Xu H, Zhang Y, Liu J, Wang Y, He F, Li Y, Li D. OsteoporosAtlas: a human osteoporosis-related gene database. PeerJ 2019; 7:e6778. [PMID: 31086734 PMCID: PMC6487800 DOI: 10.7717/peerj.6778] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 03/13/2019] [Indexed: 01/12/2023] Open
Abstract
Background Osteoporosis is a common, complex disease of bone with a strong heritable component, characterized by low bone mineral density, microarchitectural deterioration of bone tissue and an increased risk of fracture. Due to limited drug selection for osteoporosis and increasing morbidity, mortality of osteoporotic fractures, osteoporosis has become a major health burden in aging societies. Current researches for identifying specific loci or genes involved in osteoporosis contribute to a greater understanding of the pathogenesis of osteoporosis and the development of better diagnosis, prevention and treatment strategies. However, little is known about how most causal genes work and interact to influence osteoporosis. Therefore, it is greatly significant to collect and analyze the studies involved in osteoporosis-related genes. Unfortunately, the information about all these osteoporosis-related genes is scattered in a large amount of extensive literature. Currently, there is no specialized database for easily accessing relevant information about osteoporosis-related genes and miRNAs. Methods We extracted data from literature abstracts in PubMed by text-mining and manual curation. Moreover, a local MySQL database containing all the data was developed with PHP on a Windows server. Results OsteoporosAtlas (http://biokb.ncpsb.org/osteoporosis/), the first specialized database for easily accessing relevant information such as osteoporosis-related genes and miRNAs, was constructed and served for researchers. OsteoporosAtlas enables users to retrieve, browse and download osteoporosis-related genes and miRNAs. Gene ontology and pathway analyses were integrated into OsteoporosAtlas. It currently includes 617 human encoding genes, 131 human non-coding miRNAs, and 128 functional roles. We think that OsteoporosAtlas will be an important bioinformatics resource to facilitate a better understanding of the pathogenesis of osteoporosis and developing better diagnosis, prevention and treatment strategies.
Collapse
Affiliation(s)
- Xun Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China
| | - Lihong Diao
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China
| | - Dezhi Sun
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China
| | - Dan Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China
| | - Jiarun Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China.,College of life Sciences, Hebei University, Baoding, China
| | - Yangzhige He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China.,Central Research Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Yuan Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China
| | - Hao Xu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China
| | - Yi Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China.,College of life Sciences, Hebei University, Baoding, China
| | - Jinying Liu
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Yan Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China
| | - Yang Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China
| | - Dong Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing, China
| |
Collapse
|
21
|
Serra LM, Duncan WD, Diehl AD. An ontology for representing hematologic malignancies: the cancer cell ontology. BMC Bioinformatics 2019; 20:181. [PMID: 31272372 PMCID: PMC6509834 DOI: 10.1186/s12859-019-2722-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Within the cancer domain, ontologies play an important role in the integration and annotation of data in order to support numerous biomedical tools and applications. This work seeks to leverage existing standards in immunophenotyping cell types found in hematologic malignancies to provide an ontological representation of them to aid in data annotation and analysis for patient data. RESULTS We have developed the Cancer Cell Ontology according to OBO Foundry principles as an extension of the Cell Ontology. We define classes in Cancer Cell Ontology by using a genus-differentia approach using logical axioms capturing the expression of cellular surface markers in order to represent types of hematologic malignancies. By adopting conventions used in the Cell Ontology, we have created human and computer-readable definitions for 300 classes of blood cancers, based on the EGIL classification system for leukemias, and relying upon additional classification approaches for multiple myelomas and other hematologic malignancies. CONCLUSION We have demonstrated a proof of concept for leveraging the built-in logical axioms of the ontology in order to classify patient surface marker data into appropriate diagnostic categories. We plan to integrate our ontology into existing tools for flow cytometry data analysis to facilitate the automated diagnosis of hematologic malignancies.
Collapse
Affiliation(s)
- Lucas M Serra
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA.
| | - William D Duncan
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Alexander D Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
22
|
Ali NM, Khan HA, Then AYH, Ving Ching C, Gaur M, Dhillon SK. Fish Ontology framework for taxonomy-based fish recognition. PeerJ 2017; 5:e3811. [PMID: 28929028 PMCID: PMC5602685 DOI: 10.7717/peerj.3811] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 08/25/2017] [Indexed: 11/20/2022] Open
Abstract
Life science ontologies play an important role in Semantic Web. Given the diversity in fish species and the associated wealth of information, it is imperative to develop an ontology capable of linking and integrating this information in an automated fashion. As such, we introduce the Fish Ontology (FO), an automated classification architecture of existing fish taxa which provides taxonomic information on unknown fish based on metadata restrictions. It is designed to support knowledge discovery, provide semantic annotation of fish and fisheries resources, data integration, and information retrieval. Automated classification for unknown specimens is a unique feature that currently does not appear to exist in other known ontologies. Examples of automated classification for major groups of fish are demonstrated, showing the inferred information by introducing several restrictions at the species or specimen level. The current version of FO has 1,830 classes, includes widely used fisheries terminology, and models major aspects of fish taxonomy, grouping, and character. With more than 30,000 known fish species globally, the FO will be an indispensable tool for fish scientists and other interested users.
Collapse
Affiliation(s)
- Najib M. Ali
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Haris A. Khan
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Amy Y-Hui Then
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Chong Ving Ching
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Manas Gaur
- Wright State University, Kno.e.sis Center, Dayton, OH, United States of America
| | - Sarinder Kaur Dhillon
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
23
|
Li J, Tian W, Song J. Proteomics Applications in Dental Derived Stem Cells. J Cell Physiol 2017; 232:1602-1610. [PMID: 27791269 DOI: 10.1002/jcp.25667] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 10/26/2016] [Indexed: 02/05/2023]
Affiliation(s)
- Jie Li
- College of Stomatology; Chongqing Medical University; Chongqing China
- Chongqing Key Laboratory for Oral Diseases and Biomedical Sciences; Chongqing China
- Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education; Chongqing China
| | - Weidong Tian
- National Engineering Laboratory for Oral Regenerative Medicine; West China Hospital of Stomatology; Sichuan University; Chengdu China
| | - Jinlin Song
- College of Stomatology; Chongqing Medical University; Chongqing China
- Chongqing Key Laboratory for Oral Diseases and Biomedical Sciences; Chongqing China
- Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education; Chongqing China
| |
Collapse
|
24
|
Küçük EE, Yapar K, Küçük D, Küçük D. Ontology-based automatic identification of public health-related Turkish tweets. Comput Biol Med 2017; 83:1-9. [PMID: 28187367 DOI: 10.1016/j.compbiomed.2017.02.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 02/01/2017] [Accepted: 02/03/2017] [Indexed: 11/19/2022]
Abstract
Social media analysis, such as the analysis of tweets, is a promising research topic for tracking public health concerns including epidemics. In this paper, we present an ontology-based approach to automatically identify public health-related Turkish tweets. The system is based on a public health ontology that we have constructed through a semi-automated procedure. The ontology concepts are expanded through a linguistically motivated relaxation scheme as the last stage of ontology development, before being integrated into our system to increase its coverage. The ultimate lexical resource which includes the terms corresponding to the ontology concepts is used to filter the Twitter stream so that a plausible tweet subset, including mostly public-health related tweets, can be obtained. Experiments are carried out on two million genuine tweets and promising precision rates are obtained. Also implemented within the course of the current study is a Web-based interface, to track the results of this identification system, to be used by the related public health staff. Hence, the current social media analysis study has both technical and practical contributions to the significant domain of public health.
Collapse
Affiliation(s)
- Emine Ela Küçük
- Department of Public Health, Faculty of Health Sciences, Giresun University, Giresun, Turkey.
| | - Kürşad Yapar
- Department of Medical Pharmacology, Faculty of Medicine, Giresun University, Giresun, Turkey.
| | - Dilek Küçük
- Electrical Power Technologies Group, TÜBİTAK Energy Institute, Ankara, Turkey.
| | - Doğan Küçük
- Department of Computer Engineering, Gazi University, Ankara , Turkey.
| |
Collapse
|
25
|
Natale DA, Arighi CN, Blake JA, Bona J, Chen C, Chen SC, Christie KR, Cowart J, D'Eustachio P, Diehl AD, Drabkin HJ, Duncan WD, Huang H, Ren J, Ross K, Ruttenberg A, Shamovsky V, Smith B, Wang Q, Zhang J, El-Sayed A, Wu CH. Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 2017; 45:D339-D346. [PMID: 27899649 PMCID: PMC5210558 DOI: 10.1093/nar/gkw1075] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 10/21/2016] [Accepted: 10/25/2016] [Indexed: 12/04/2022] Open
Abstract
The Protein Ontology (PRO; http://purl.obolibrary.org/obo/pr) formally defines and describes taxon-specific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and protein-containing complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translational modification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities.
Collapse
Affiliation(s)
- Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cecilia N Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | | | - Jonathan Bona
- Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14214, USA
| | - Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Sheng-Chih Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | | | - Julie Cowart
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Peter D'Eustachio
- Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA
| | - Alexander D Diehl
- Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14203, USA
- New York State Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, NY 14203, USA
| | | | - William D Duncan
- Roswell Park Cancer Institute, Buffalo, NY 14203, USA
- New York State Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, NY 14203, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Jia Ren
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Karen Ross
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Alan Ruttenberg
- Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14214, USA
| | - Veronica Shamovsky
- Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA
| | - Barry Smith
- National Center for Ontological Research, University at Buffalo, Buffalo, NY 14214, USA
| | - Qinghua Wang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Jian Zhang
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Abdelrahman El-Sayed
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cathy H Wu
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| |
Collapse
|
26
|
Huang J, Eilbeck K, Smith B, Blake JA, Dou D, Huang W, Natale DA, Ruttenberg A, Huan J, Zimmermann MT, Jiang G, Lin Y, Wu B, Strachan HJ, de Silva N, Kasukurthi MV, Jha VK, He Y, Zhang S, Wang X, Liu Z, Borchert GM, Tan M. The development of non-coding RNA ontology. INT J DATA MIN BIOIN 2016; 15:214-232. [PMID: 27990175 DOI: 10.1504/ijdmb.2016.077072] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Identification of non-coding RNAs (ncRNAs) has been significantly improved over the past decade. On the other hand, semantic annotation of ncRNA data is facing critical challenges due to the lack of a comprehensive ontology to serve as common data elements and data exchange standards in the field. We developed the Non-Coding RNA Ontology (NCRO) to handle this situation. By providing a formally defined ncRNA controlled vocabulary, the NCRO aims to fill a specific and highly needed niche in semantic annotation of large amounts of ncRNA biological and clinical data.
Collapse
Affiliation(s)
- Jingshan Huang
- School of Computing, University of South Alabama, Shelby Hall, Room 1123, 150 Jaguar Drive Mobile, AL 36688, USA,
| | - Karen Eilbeck
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, Utah, USA,
| | - Barry Smith
- University at Buffalo - SUNY, Buffalo, New York 14260, USA,
| | | | - Dejing Dou
- Computer and Information Science Department, University of Oregon, Eugene, Oregon 97403, USA,
| | - Weili Huang
- Miracle Query, Inc., Eugene, Oregon 97405, USA,
| | - Darren A Natale
- Georgetown University Medical Center, Washington DC 20007, USA,
| | | | - Jun Huan
- Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, Kansas 66045, USA,
| | - Michael T Zimmermann
- Division of Biomedical Statistics and Informatics, College of Medicine at Mayo Clinic, Rochester, Minnesota 55905, USA,
| | - Guoqian Jiang
- Division of Biomedical Statistics and Informatics, College of Medicine at Mayo Clinic, Rochester, Minnesota 55905, USA,
| | - Yu Lin
- Data Coordination and Integration Center, University of Miami, Miami, Florida 33146, USA,
| | - Bin Wu
- Endocrinology Department, Kunming Medical University, Kunming, Yunnan, 650032 China,
| | - Harrison J Strachan
- School of Computing, University of South Alabama, Mobile, Alabama 36688, USA,
| | - Nisansa de Silva
- Computer and Information Science, University of Oregon, Eugene, Oregon 97403, USA,
| | | | - Vikash Kumar Jha
- School of Computing, University of South Alabama, Mobile, Alabama 36688, USA,
| | - Yongqun He
- Lab Animal Medicine, Microbiology, Immunology and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA,
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, Florida 32816, USA,
| | - Xiaowei Wang
- Cancer Biology, Washington University in St. Louis, St. Louis, Missouri 63130, USA,
| | - Zixing Liu
- Mitchell Cancer Institute, University of South Alabama, Mobile, Alabama 36604, USA,
| | - Glen M Borchert
- Department of Biology, University of South Alabama, Mobile, Alabama 36688, USA,
| | - Ming Tan
- Mitchell Cancer Institute, University of South Alabama, Mobile, Alabama 36604, USA,
| |
Collapse
|
27
|
McSkimming DI, Dastgheib S, Baffi TR, Byrne DP, Ferries S, Scott ST, Newton AC, Eyers CE, Kochut KJ, Eyers PA, Kannan N. KinView: a visual comparative sequence analysis tool for integrated kinome research. MOLECULAR BIOSYSTEMS 2016; 12:3651-3665. [PMID: 27731453 PMCID: PMC5508867 DOI: 10.1039/c6mb00466k] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Multiple sequence alignments (MSAs) are a fundamental analysis tool used throughout biology to investigate relationships between protein sequence, structure, function, evolutionary history, and patterns of disease-associated variants. However, their widespread application in systems biology research is currently hindered by the lack of user-friendly tools to simultaneously visualize, manipulate and query the information conceptualized in large sequence alignments, and the challenges in integrating MSAs with multiple orthogonal data such as cancer variants and post-translational modifications, which are often stored in heterogeneous data sources and formats. Here, we present the Multiple Sequence Alignment Ontology (MSAOnt), which represents a profile or consensus alignment in an ontological format. Subsets of the alignment are easily selected through the SPARQL Protocol and RDF Query Language for downstream statistical analysis or visualization. We have also created the Kinome Viewer (KinView), an interactive integrative visualization that places eukaryotic protein kinase cancer variants in the context of natural sequence variation and experimentally determined post-translational modifications, which play central roles in the regulation of cellular signaling pathways. Using KinView, we identified differential phosphorylation patterns between tyrosine and serine/threonine kinases in the activation segment, a major kinase regulatory region that is often mutated in proliferative diseases. We discuss cancer variants that disrupt phosphorylation sites in the activation segment, and show how KinView can be used as a comparative tool to identify differences and similarities in natural variation, cancer variants and post-translational modifications between kinase groups, families and subfamilies. Based on KinView comparisons, we identify and experimentally characterize a regulatory tyrosine (Y177PLK4) in the PLK4 C-terminal activation segment region termed the P+1 loop. To further demonstrate the application of KinView in hypothesis generation and testing, we formulate and validate a hypothesis explaining a novel predicted loss-of-function variant (D523NPKCβ) in the regulatory spine of PKCβ, a recently identified tumor suppressor kinase. KinView provides a novel, extensible interface for performing comparative analyses between subsets of kinases and for integrating multiple types of residue specific annotations in user friendly formats.
Collapse
Affiliation(s)
| | - Shima Dastgheib
- Department of Computer Science, University of Georgia, Athens, GA 30602, USA
| | - Timothy R Baffi
- Department of Pharmacology, University of California at San Diego, La Jolla, CA 92093, USA
| | - Dominic P Byrne
- Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Samantha Ferries
- Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Steven Thomas Scott
- Department of Biochemistry & Molecular Biology, University of Georgia, Athens, GA 30602, USA
| | - Alexandra C Newton
- Department of Pharmacology, University of California at San Diego, La Jolla, CA 92093, USA
| | - Claire E Eyers
- Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Krzysztof J Kochut
- Department of Computer Science, University of Georgia, Athens, GA 30602, USA
| | - Patrick A Eyers
- Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA. and Department of Biochemistry & Molecular Biology, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
28
|
Mısırlı G, Hallinan J, Pocock M, Lord P, McLaughlin JA, Sauro H, Wipat A. Data Integration and Mining for Synthetic Biology Design. ACS Synth Biol 2016; 5:1086-1097. [PMID: 27110921 DOI: 10.1021/acssynbio.5b00295] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.
Collapse
Affiliation(s)
- Göksel Mısırlı
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| | - Jennifer Hallinan
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| | - Matthew Pocock
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
- Turing Ate My Hamster Ltd, NE27
0RT Newcastle upon Tyne, United Kingdom
| | - Phillip Lord
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| | | | - Herbert Sauro
- Department
of Bioengineering, University of Washington, Seattle, Washington 98105, United States
| | - Anil Wipat
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| |
Collapse
|
29
|
Wang D, Yang L, Zhang P, LaBaer J, Hermjakob H, Li D, Yu X. AAgAtlas 1.0: a human autoantigen database. Nucleic Acids Res 2016; 45:D769-D776. [PMID: 27924021 PMCID: PMC5210642 DOI: 10.1093/nar/gkw946] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2016] [Revised: 09/22/2016] [Accepted: 10/11/2016] [Indexed: 12/25/2022] Open
Abstract
Autoantibodies refer to antibodies that target self-antigens, which can play pivotal roles in maintaining homeostasis, distinguishing normal from tumor tissue and trigger autoimmune diseases. In the last three decades, tremendous efforts have been devoted to elucidate the generation, evolution and functions of autoantibodies, as well as their target autoantigens. However, reports of these countless previously identified autoantigens are randomly dispersed in the literature. Here, we constructed an AAgAtlas database 1.0 using text-mining and manual curation. We extracted 45 830 autoantigen-related abstracts and 94 313 sentences from PubMed using the keywords of either ‘autoantigen’ or ‘autoantibody’ or their lexical variants, which were further refined to 25 520 abstracts, 43 253 sentences and 3984 candidates by our bio-entity recognizer based on the Protein Ontology. Finally, we identified 1126 genes as human autoantigens and 1071 related human diseases, with which we constructed a human autoantigen database (AAgAtlas database 1.0). The database provides a user-friendly interface to conveniently browse, retrieve and download human autoantigens as well as their associated diseases. The database is freely accessible at http://biokb.ncpsb.org/aagatlas/. We believe this database will be a valuable resource to track and understand human autoantigens as well as to investigate their functions in basic and translational research.
Collapse
Affiliation(s)
- Dan Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China
| | - Liuhui Yang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China
| | - Ping Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China
| | - Joshua LaBaer
- The Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| | - Henning Hermjakob
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China .,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dong Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China
| | - Xiaobo Yu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Radiation Medicine, Beijing 102206, China
| |
Collapse
|
30
|
Choi M, Liu H, Baumgartner W, Zobel J, Verspoor K. Coreference resolution improves extraction of Biological Expression Language statements from texts. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw076. [PMID: 27374122 PMCID: PMC4930833 DOI: 10.1093/database/baw076] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 04/21/2016] [Indexed: 01/07/2023]
Abstract
We describe a system that automatically extracts biological events from biomedical journal articles, and translates those events into Biological Expression Language (BEL) statements. The system incorporates existing text mining components for coreference resolution, biological event extraction and a previously formally untested strategy for BEL statement generation. Although addressing the BEL track (Track 4) at BioCreative V (2015), we also investigate how incorporating coreference resolution might impact event extraction in the biomedical domain. In this paper, we report that our system achieved the best performance of 20.2 and 35.2 in F-score for the full BEL statement level on both stage 1, and stage 2 using provided gold standard entities, respectively. We also report that our results evaluated on the training dataset show benefit from integrating coreference resolution with event extraction.
Collapse
Affiliation(s)
- Miji Choi
- Department of Computing and Information Systems, the University of Melbourne National ICT Australia (NICTA) Victoria Research Laboratory, Parkville, Victoria, Australia
| | | | | | - Justin Zobel
- Department of Computing and Information Systems, the University of Melbourne
| | - Karin Verspoor
- Department of Computing and Information Systems, the University of Melbourne
| |
Collapse
|
31
|
Fernández-Breis JT, Chiba H, Legaz-García MDC, Uchiyama I. The Orthology Ontology: development and applications. J Biomed Semantics 2016; 7:34. [PMID: 27259657 PMCID: PMC4893294 DOI: 10.1186/s13326-016-0077-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 05/17/2016] [Indexed: 11/16/2022] Open
Abstract
Background Computational comparative analysis of multiple genomes provides valuable opportunities to biomedical research. In particular, orthology analysis can play a central role in comparative genomics; it guides establishing evolutionary relations among genes of organisms and allows functional inference of gene products. However, the wide variations in current orthology databases necessitate the research toward the shareability of the content that is generated by different tools and stored in different structures. Exchanging the content with other research communities requires making the meaning of the content explicit. Description The need for a common ontology has led to the creation of the Orthology Ontology (ORTH) following the best practices in ontology construction. Here, we describe our model and major entities of the ontology that is implemented in the Web Ontology Language (OWL), followed by the assessment of the quality of the ontology and the application of the ORTH to existing orthology datasets. This shareable ontology enables the possibility to develop Linked Orthology Datasets and a meta-predictor of orthology through standardization for the representation of orthology databases. The ORTH is freely available in OWL format to all users at http://purl.org/net/orth. Conclusions The Orthology Ontology can serve as a framework for the semantic standardization of orthology content and it will contribute to a better exploitation of orthology resources in biomedical research. The results demonstrate the feasibility of developing shareable datasets using this ontology. Further applications will maximize the usefulness of this ontology.
Collapse
Affiliation(s)
| | - Hirokazu Chiba
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, 444-8585, Aichi, Japan
| | | | - Ikuo Uchiyama
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, 444-8585, Aichi, Japan
| |
Collapse
|
32
|
Arguello Casteleiro M, Klein J, Stevens R. The Proteasix Ontology. J Biomed Semantics 2016; 7:33. [PMID: 27259807 PMCID: PMC4893253 DOI: 10.1186/s13326-016-0078-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Accepted: 05/19/2016] [Indexed: 11/10/2022] Open
Abstract
Background The Proteasix Ontology (PxO) is an ontology that supports the Proteasix tool; an open-source peptide-centric tool that can be used to predict automatically and in a large-scale fashion in silico the proteases involved in the generation of proteolytic cleavage fragments (peptides) Methods The PxO re-uses parts of the Protein Ontology, the three Gene Ontology sub-ontologies, the Chemical Entities of Biological Interest Ontology, the Sequence Ontology and bespoke extensions to the PxO in support of a series of roles: 1. To describe the known proteases and their target cleaveage sites. 2. To enable the description of proteolytic cleaveage fragments as the outputs of observed and predicted proteolysis. 3. To use knowledge about the function, species and cellular location of a protease and protein substrate to support the prioritisation of proteases in observed and predicted proteolysis. Results The PxO is designed to describe the biological underpinnings of the generation of peptides. The peptide-centric PxO seeks to support the Proteasix tool by separating domain knowledge from the operational knowledge used in protease prediction by Proteasix and to support the confirmation of its analyses and results. Availability The Proteasix Ontology may be found at: http://bioportal.bioontology.org/ontologies/PXO. This ontology is free and open for use by everyone. Electronic supplementary material The online version of this article (doi:10.1186/s13326-016-0078-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Julie Klein
- Institut National de la Sante et de la Recherche Medicale (INSERM), U1048, Toulouse, 24105, France
| | - Robert Stevens
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK.
| |
Collapse
|
33
|
Huang J, Gutierrez F, Strachan HJ, Dou D, Huang W, Smith B, Blake JA, Eilbeck K, Natale DA, Lin Y, Wu B, Silva ND, Wang X, Liu Z, Borchert GM, Tan M, Ruttenberg A. OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction data. J Biomed Semantics 2016; 7:25. [PMID: 27175225 PMCID: PMC4863347 DOI: 10.1186/s13326-016-0064-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 04/12/2016] [Indexed: 01/05/2023] Open
Abstract
As a special class of non-coding RNAs (ncRNAs), microRNAs (miRNAs) perform important roles in numerous biological and pathological processes. The realization of miRNA functions depends largely on how miRNAs regulate specific target genes. It is therefore critical to identify, analyze, and cross-reference miRNA-target interactions to better explore and delineate miRNA functions. Semantic technologies can help in this regard. We previously developed a miRNA domain-specific application ontology, Ontology for MIcroRNA Target (OMIT), whose goal was to serve as a foundation for semantic annotation, data integration, and semantic search in the miRNA field. In this paper we describe our continuing effort to develop the OMIT, and demonstrate its use within a semantic search system, OmniSearch, designed to facilitate knowledge capture of miRNA-target interaction data. Important changes in the current version OMIT are summarized as: (1) following a modularized ontology design (with 2559 terms imported from the NCRO ontology); (2) encoding all 1884 human miRNAs (vs. 300 in previous versions); and (3) setting up a GitHub project site along with an issue tracker for more effective community collaboration on the ontology development. The OMIT ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/omit.owl. The OmniSearch system is also free and open to all users, accessible at: http://omnisearch.soc.southalabama.edu/index.php/Software.
Collapse
Affiliation(s)
- Jingshan Huang
- School of Computing, University of South Alabama, Mobile, Alabama, 36688-0002 USA
| | - Fernando Gutierrez
- Computer and Information Science Department, University of Oregon, Eugene, Oregon, 97403-1202 USA
| | - Harrison J Strachan
- School of Computing, University of South Alabama, Mobile, Alabama, 36688-0002 USA
| | - Dejing Dou
- Computer and Information Science Department, University of Oregon, Eugene, Oregon, 97403-1202 USA
| | - Weili Huang
- Miracle Query, Inc., Eugene, Oregon, 97403-1202 USA
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, New York, 14260-4150 USA
| | - Judith A Blake
- Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine, 04609-1523 USA
| | - Karen Eilbeck
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, 84112-5775 USA
| | - Darren A Natale
- Department of Biochemistry and Molecular and Cellular Biology, Georgetown University Medical Center, Washington D.C., 20007-1485 USA
| | - Yu Lin
- Center for Computational Science, University of Miami, Miami, Florida, 33146-2960 U.S.A
| | - Bin Wu
- Department of Microbiology and Immunology, First Affiliated Hospital, Kunming Medical University, Kunming, Yunnan, 650032 China
| | - Nisansa de Silva
- Computer and Information Science Department, University of Oregon, Eugene, Oregon, 97403-1202 USA
| | - Xiaowei Wang
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, Missouri, 63110-0001 USA
| | - Zixing Liu
- Mitchell Cancer Institute, University of South Alabama, Mobile, Alabama, 36604-1405 USA
| | - Glen M Borchert
- Department of Biology, University of South Alabama, Mobile, Alabama, 36688-0002 USA
| | - Ming Tan
- Mitchell Cancer Institute, University of South Alabama, Mobile, Alabama, 36604-1405 USA
| | - Alan Ruttenberg
- School of Dental Medicine, University at Buffalo, Buffalo, New York, 14214-8006 USA
| |
Collapse
|
34
|
The Non-Coding RNA Ontology (NCRO): a comprehensive resource for the unification of non-coding RNA biology. J Biomed Semantics 2016; 7:24. [PMID: 27152146 PMCID: PMC4857245 DOI: 10.1186/s13326-016-0066-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 04/19/2016] [Indexed: 11/17/2022] Open
Abstract
In recent years, sequencing technologies have enabled the identification of a wide range of non-coding RNAs (ncRNAs). Unfortunately, annotation and integration of ncRNA data has lagged behind their identification. Given the large quantity of information being obtained in this area, there emerges an urgent need to integrate what is being discovered by a broad range of relevant communities. To this end, the Non-Coding RNA Ontology (NCRO) is being developed to provide a systematically structured and precisely defined controlled vocabulary for the domain of ncRNAs, thereby facilitating the discovery, curation, analysis, exchange, and reasoning of data about structures of ncRNAs, their molecular and cellular functions, and their impacts upon phenotypes. The goal of NCRO is to serve as a common resource for annotations of diverse research in a way that will significantly enhance integrative and comparative analysis of the myriad resources currently housed in disparate sources. It is our belief that the NCRO ontology can perform an important role in the comprehensive unification of ncRNA biology and, indeed, fill a critical gap in both the Open Biological and Biomedical Ontologies (OBO) Library and the National Center for Biomedical Ontology (NCBO) BioPortal. Our initial focus is on the ontological representation of small regulatory ncRNAs, which we see as the first step in providing a resource for the annotation of data about all forms of ncRNAs. The NCRO ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/ncro.owl.
Collapse
|
35
|
Misirli G, Cavaliere M, Waites W, Pocock M, Madsen C, Gilfellon O, Honorato-Zimmer R, Zuliani P, Danos V, Wipat A. Annotation of rule-based models with formal semantics to enable creation, analysis, reuse and visualization. Bioinformatics 2016; 32:908-17. [PMID: 26559508 PMCID: PMC4803388 DOI: 10.1093/bioinformatics/btv660] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Revised: 10/08/2015] [Accepted: 11/03/2015] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Biological systems are complex and challenging to model and therefore model reuse is highly desirable. To promote model reuse, models should include both information about the specifics of simulations and the underlying biology in the form of metadata. The availability of computationally tractable metadata is especially important for the effective automated interpretation and processing of models. Metadata are typically represented as machine-readable annotations which enhance programmatic access to information about models. Rule-based languages have emerged as a modelling framework to represent the complexity of biological systems. Annotation approaches have been widely used for reaction-based formalisms such as SBML. However, rule-based languages still lack a rich annotation framework to add semantic information, such as machine-readable descriptions, to the components of a model. RESULTS We present an annotation framework and guidelines for annotating rule-based models, encoded in the commonly used Kappa and BioNetGen languages. We adapt widely adopted annotation approaches to rule-based models. We initially propose a syntax to store machine-readable annotations and describe a mapping between rule-based modelling entities, such as agents and rules, and their annotations. We then describe an ontology to both annotate these models and capture the information contained therein, and demonstrate annotating these models using examples. Finally, we present a proof of concept tool for extracting annotations from a model that can be queried and analyzed in a uniform way. The uniform representation of the annotations can be used to facilitate the creation, analysis, reuse and visualization of rule-based models. Although examples are given, using specific implementations the proposed techniques can be applied to rule-based models in general. AVAILABILITY AND IMPLEMENTATION The annotation ontology for rule-based models can be found at http://purl.org/rbm/rbmo The krdf tool and associated executable examples are available at http://purl.org/rbm/rbmo/krdf CONTACT anil.wipat@newcastle.ac.uk or vdanos@inf.ed.ac.uk.
Collapse
Affiliation(s)
- Goksel Misirli
- Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science and Centre for Synthetic Biology and the Bioeconomy, Newcastle University, Newcastle upon Tyne, UK
| | - Matteo Cavaliere
- School of Informatics, University of Edinburgh, Edinburgh, UK and
| | - William Waites
- School of Informatics, University of Edinburgh, Edinburgh, UK and
| | | | - Curtis Madsen
- Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science and Centre for Synthetic Biology and the Bioeconomy, Newcastle University, Newcastle upon Tyne, UK
| | - Owen Gilfellon
- Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science and Centre for Synthetic Biology and the Bioeconomy, Newcastle University, Newcastle upon Tyne, UK
| | | | - Paolo Zuliani
- Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science and Centre for Synthetic Biology and the Bioeconomy, Newcastle University, Newcastle upon Tyne, UK
| | - Vincent Danos
- School of Informatics, University of Edinburgh, Edinburgh, UK and
| | - Anil Wipat
- Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science and Centre for Synthetic Biology and the Bioeconomy, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
36
|
Vita R, Overton JA, Seymour E, Sidney J, Kaufman J, Tallmadge RL, Ellis S, Hammond J, Butcher GW, Sette A, Peters B. An ontology for major histocompatibility restriction. J Biomed Semantics 2016; 7:1. [PMID: 26759709 PMCID: PMC4709943 DOI: 10.1186/s13326-016-0045-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Accepted: 01/03/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND MHC molecules are a highly diverse family of proteins that play a key role in cellular immune recognition. Over time, different techniques and terminologies have been developed to identify the specific type(s) of MHC molecule involved in a specific immune recognition context. No consistent nomenclature exists across different vertebrate species. PURPOSE To correctly represent MHC related data in The Immune Epitope Database (IEDB), we built upon a previously established MHC ontology and created an ontology to represent MHC molecules as they relate to immunological experiments. DESCRIPTION This ontology models MHC protein chains from 16 species, deals with different approaches used to identify MHC, such as direct sequencing verses serotyping, relates engineered MHC molecules to naturally occurring ones, connects genetic loci, alleles, protein chains and multi-chain proteins, and establishes evidence codes for MHC restriction. Where available, this work is based on existing ontologies from the OBO foundry. CONCLUSIONS Overall, representing MHC molecules provides a challenging and practically important test case for ontology building, and could serve as an example of how to integrate other ontology building efforts into web resources.
Collapse
Affiliation(s)
- Randi Vita
- La Jolla Institute for Allergy and Immunology, 9420 Athena Circle La Jolla, San Diego, California 92037 USA
| | - James A Overton
- La Jolla Institute for Allergy and Immunology, 9420 Athena Circle La Jolla, San Diego, California 92037 USA
| | - Emily Seymour
- La Jolla Institute for Allergy and Immunology, 9420 Athena Circle La Jolla, San Diego, California 92037 USA
| | - John Sidney
- La Jolla Institute for Allergy and Immunology, 9420 Athena Circle La Jolla, San Diego, California 92037 USA
| | - Jim Kaufman
- University of Cambridge, Trinity Ln, Cambridge, CB2 1TN UK
| | - Rebecca L Tallmadge
- Cornell University College of Veterinary Medicine, Ithaca, New York 14853-6401 USA
| | - Shirley Ellis
- The Pirbright Institute, Ash Rd, Woking, GU24 0NF UK
| | - John Hammond
- The Pirbright Institute, Ash Rd, Woking, GU24 0NF UK
| | | | - Alessandro Sette
- La Jolla Institute for Allergy and Immunology, 9420 Athena Circle La Jolla, San Diego, California 92037 USA
| | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, 9420 Athena Circle La Jolla, San Diego, California 92037 USA
| |
Collapse
|
37
|
Li J, Li H, Tian Y, Yang Y, Chen G, Guo W, Tian W. Cytoskeletal binding proteins distinguish cultured dental follicle cells and periodontal ligament cells. Exp Cell Res 2015; 345:6-16. [PMID: 26708290 DOI: 10.1016/j.yexcr.2015.12.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Revised: 12/15/2015] [Accepted: 12/18/2015] [Indexed: 02/05/2023]
Abstract
Human dental follicle cells (DFCs) and periodontal ligament cells (PDLCs) derived from the ectomesenchymal tissue, have been shown to exhibit stem/progenitor cell properties and the ability to induce tissue regeneration. Stem cells in dental follicle differentiate into cementoblasts, periodontal ligament fibroblasts and osteoblasts, these cells form cementum, periodontal ligament and alveolar bone, respectively. While stem cells in dental follicle are a precursor to periodontal ligament fibroblasts, the molecular changes that distinguish cultured DFCs from PDLCs are still unknown. In this study, we have compared the immunophenotypic features and cell cycle status of the two cell lines. The results suggest that DFCs and PDLCs displayed similar features related to immunophenotype and cell cycle. Then we employed an isobaric tag for relative and absolute quantitation (iTRAQ) proteomics strategy to reveal the molecular differences between the two cell types. A total of 2138 proteins were identified and 39 of these proteins were consistently differentially expressed between DFCs and PDLCs. Gene ontology analyses revealed that the protein subsets expressed higher in PDLCs were related to actin binding, cytoskeletal protein binding, and structural constituent of muscle. Upon validation by real-time PCR, western blotting, and immunofluorescence staining. Tropomyosin 1 (TPM1) and caldesmon 1 (CALD1) were expressed higher in PDLCs than in DFCs. Our results suggested that PDLCs display enhanced actin cytoskeletal dynamics relative to DFCs while DFCs may exhibit a more robust antioxidant defense ability relative to PDLCs. This study expands our knowledge of the cultured DFCs and PDLCs proteome and provides new insights into possible mechanisms responsible for the different biological features observed in each cell type.
Collapse
Affiliation(s)
- Jie Li
- College of Life Science, Sichuan University, Chengdu, China; National Engineering Laboratory for Oral Regenerative Medicine, West China Hospital of Stomatology, Sichuan University, Chengdu, China; State Key Laboratory of Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Hui Li
- National Engineering Laboratory for Oral Regenerative Medicine, West China Hospital of Stomatology, Sichuan University, Chengdu, China; State Key Laboratory of Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Ye Tian
- National Engineering Laboratory for Oral Regenerative Medicine, West China Hospital of Stomatology, Sichuan University, Chengdu, China; State Key Laboratory of Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Yaling Yang
- National Engineering Laboratory for Oral Regenerative Medicine, West China Hospital of Stomatology, Sichuan University, Chengdu, China; State Key Laboratory of Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Guoqing Chen
- National Engineering Laboratory for Oral Regenerative Medicine, West China Hospital of Stomatology, Sichuan University, Chengdu, China; State Key Laboratory of Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Weihua Guo
- National Engineering Laboratory for Oral Regenerative Medicine, West China Hospital of Stomatology, Sichuan University, Chengdu, China; State Key Laboratory of Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, China; Department of Pedodontics, West China School of Stomatology, Sichuan University, Chengdu, China.
| | - Weidong Tian
- National Engineering Laboratory for Oral Regenerative Medicine, West China Hospital of Stomatology, Sichuan University, Chengdu, China; State Key Laboratory of Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, China; Department of Oral and Maxillofacial Surgery, West China School of Stomatology, Sichuan University, Chengdu, China.
| |
Collapse
|
38
|
Thibault JC, Roe DR, Eilbeck K, Cheatham TE, Facelli JC. Development of an informatics infrastructure for data exchange of biomolecular simulations: Architecture, data models and ontology. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2015; 26:577-593. [PMID: 26387907 PMCID: PMC4672732 DOI: 10.1080/1062936x.2015.1076515] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 07/22/2015] [Indexed: 06/05/2023]
Abstract
Biomolecular simulations aim to simulate structure, dynamics, interactions, and energetics of complex biomolecular systems. With the recent advances in hardware, it is now possible to use more complex and accurate models, but also reach time scales that are biologically significant. Molecular simulations have become a standard tool for toxicology and pharmacology research, but organizing and sharing data - both within the same organization and among different ones - remains a substantial challenge. In this paper we review our recent work leading to the development of a comprehensive informatics infrastructure to facilitate the organization and exchange of biomolecular simulations data. Our efforts include the design of data models and dictionary tools that allow the standardization of the metadata used to describe the biomedical simulations, the development of a thesaurus and ontology for computational reasoning when searching for biomolecular simulations in distributed environments, and the development of systems based on these models to manage and share the data at a large scale (iBIOMES), and within smaller groups of researchers at laboratory scale (iBIOMES Lite), that take advantage of the standardization of the meta data used to describe biomolecular simulations.
Collapse
Affiliation(s)
- J. C. Thibault
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, US
| | - D. R. Roe
- Department of Medicinal Chemistry and Center for High Performance Computing, University of Utah, Salt Lake City, Utah, US
| | - K. Eilbeck
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, US
| | - T. E. Cheatham
- Department of Medicinal Chemistry and Center for High Performance Computing, University of Utah, Salt Lake City, Utah, US
| | - J. C. Facelli
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, US
| |
Collapse
|
39
|
Hastings J, Jeliazkova N, Owen G, Tsiliki G, Munteanu CR, Steinbeck C, Willighagen E. eNanoMapper: harnessing ontologies to enable data integration for nanomaterial risk assessment. J Biomed Semantics 2015; 6:10. [PMID: 25815161 PMCID: PMC4374589 DOI: 10.1186/s13326-015-0005-5] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 02/27/2015] [Indexed: 11/18/2022] Open
Abstract
Engineered nanomaterials (ENMs) are being developed to meet specific application needs in diverse domains across the engineering and biomedical sciences (e.g. drug delivery). However, accompanying the exciting proliferation of novel nanomaterials is a challenging race to understand and predict their possibly detrimental effects on human health and the environment. The eNanoMapper project (www.enanomapper.net) is creating a pan-European computational infrastructure for toxicological data management for ENMs, based on semantic web standards and ontologies. Here, we describe the development of the eNanoMapper ontology based on adopting and extending existing ontologies of relevance for the nanosafety domain. The resulting eNanoMapper ontology is available at http://purl.enanomapper.net/onto/enanomapper.owl. We aim to make the re-use of external ontology content seamless and thus we have developed a library to automate the extraction of subsets of ontology content and the assembly of the subsets into an integrated whole. The library is available (open source) at http://github.com/enanomapper/slimmer/. Finally, we give a comprehensive survey of the domain content and identify gap areas. ENM safety is at the boundary between engineering and the life sciences, and at the boundary between molecular granularity and bulk granularity. This creates challenges for the definition of key entities in the domain, which we also discuss.
Collapse
Affiliation(s)
- Janna Hastings
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | | | - Gareth Owen
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | - Georgia Tsiliki
- National Technical University of Athens (NTUA), Athens, Greece
| | - Cristian R Munteanu
- Computer Science Faculty, University of A Coruña, A Coruña, Spain ; Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, Netherlands
| | - Christoph Steinbeck
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | - Egon Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
40
|
Malladi VS, Erickson DT, Podduturi NR, Rowe LD, Chan ET, Davidson JM, Hitz BC, Ho M, Lee BT, Miyasato S, Roe GR, Simison M, Sloan CA, Strattan JS, Tanaka F, Kent WJ, Cherry JM, Hong EL. Ontology application and use at the ENCODE DCC. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav010. [PMID: 25776021 PMCID: PMC4360730 DOI: 10.1093/database/bav010] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC’s use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects. Database URL: https://www.encodeproject.org/
Collapse
Affiliation(s)
- Venkat S Malladi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Drew T Erickson
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nikhil R Podduturi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Laurence D Rowe
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Esther T Chan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Marcus Ho
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Stuart Miyasato
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Gregory R Roe
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matt Simison
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Forrest Tanaka
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Eurie L Hong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA and Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
41
|
Abstract
The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO 2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology.
Collapse
|
42
|
Campos D, Lourenço J, Matos S, Oliveira JL. Egas: a collaborative and interactive document curation platform. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau048. [PMID: 24923820 PMCID: PMC4207226 DOI: 10.1093/database/bau048] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
With the overwhelming amount of biomedical textual information being produced, several manual curation efforts have been set up to extract and store concepts and their relationships into structured resources. As manual annotation is a demanding and expensive task, computerized solutions were developed to perform such tasks automatically. However, high-end information extraction techniques are still not widely used by biomedical research communities, mainly because of the lack of standards and limitations in usability. Interactive annotation tools intend to fill this gap, taking advantage of automatic techniques and existing knowledge bases to assist expert curators in their daily tasks. This article presents Egas, a web-based platform for biomedical text mining and assisted curation with highly usable interfaces for manual and automatic in-line annotation of concepts and relations. A comprehensive set of de facto standard knowledge bases are integrated and indexed to provide straightforward concept normalization features. Real-time collaboration and conversation functionalities allow discussing details of the annotation task as well as providing instant feedback of curator's interactions. Egas also provides interfaces for on-demand management of the annotation task settings and guidelines, and supports standard formats and literature services to import and export documents. By taking advantage of Egas, we participated in the BioCreative IV interactive annotation task, targeting the assisted identification of protein-protein interactions described in PubMed abstracts related to neuropathological disorders. When evaluated by expert curators, it obtained positive scores in terms of usability, reliability and performance. These results, together with the provided innovative features, place Egas as a state-of-the-art solution for fast and accurate curation of information, facilitating the task of creating and updating knowledge bases and annotated resources. Database URL: http://bioinformatics.ua.pt/egas.
Collapse
Affiliation(s)
- David Campos
- BMD Software, Lda., Rua Calouste Gulbenkian n. 1, 3810-074 Aveiro, Portugal and IEETA/DETI, Campus Universitário de Santiago, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Jóni Lourenço
- BMD Software, Lda., Rua Calouste Gulbenkian n. 1, 3810-074 Aveiro, Portugal and IEETA/DETI, Campus Universitário de Santiago, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Sérgio Matos
- BMD Software, Lda., Rua Calouste Gulbenkian n. 1, 3810-074 Aveiro, Portugal and IEETA/DETI, Campus Universitário de Santiago, University of Aveiro, 3810-193 Aveiro, Portugal
| | - José Luís Oliveira
- BMD Software, Lda., Rua Calouste Gulbenkian n. 1, 3810-074 Aveiro, Portugal and IEETA/DETI, Campus Universitário de Santiago, University of Aveiro, 3810-193 Aveiro, Portugal
| |
Collapse
|
43
|
Abstract
The use of model organisms as tools for the investigation of human genetic variation has significantly and rapidly advanced our understanding of the aetiologies underlying hereditary traits. However, while equivalences in the DNA sequence of two species may be readily inferred through evolutionary models, the identification of equivalence in the phenotypic consequences resulting from comparable genetic variation is far from straightforward, limiting the value of the modelling paradigm. In this review, we provide an overview of the emerging statistical and computational approaches to objectively identify phenotypic equivalence between human and model organisms with examples from the vertebrate models, mouse and zebrafish. Firstly, we discuss enrichment approaches, which deem the most frequent phenotype among the orthologues of a set of genes associated with a common human phenotype as the orthologous phenotype, or phenolog, in the model species. Secondly, we introduce and discuss computational reasoning approaches to identify phenotypic equivalences made possible through the development of intra- and interspecies ontologies. Finally, we consider the particular challenges involved in modelling neuropsychiatric disorders, which illustrate many of the remaining difficulties in developing comprehensive and unequivocal interspecies phenotype mappings.
Collapse
Affiliation(s)
- Peter N. Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- * E-mail: (PNR); (CW)
| | - Caleb Webber
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail: (PNR); (CW)
| |
Collapse
|
44
|
Funk C, Baumgartner W, Garcia B, Roeder C, Bada M, Cohen KB, Hunter LE, Verspoor K. Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinformatics 2014; 15:59. [PMID: 24571547 PMCID: PMC4015610 DOI: 10.1186/1471-2105-15-59] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Accepted: 01/24/2014] [Indexed: 11/10/2022] Open
Abstract
Background Ontological concepts are useful for many different biomedical tasks. Concepts are difficult to recognize in text due to a disconnect between what is captured in an ontology and how the concepts are expressed in text. There are many recognizers for specific ontologies, but a general approach for concept recognition is an open problem. Results Three dictionary-based systems (MetaMap, NCBO Annotator, and ConceptMapper) are evaluated on eight biomedical ontologies in the Colorado Richly Annotated Full-Text (CRAFT) Corpus. Over 1,000 parameter combinations are examined, and best-performing parameters for each system-ontology pair are presented. Conclusions Baselines for concept recognition by three systems on eight biomedical ontologies are established (F-measures range from 0.14–0.83). Out of the three systems we tested, ConceptMapper is generally the best-performing system; it produces the highest F-measure of seven out of eight ontologies. Default parameters are not ideal for most systems on most ontologies; by changing parameters F-measure can be increased by up to 0.4. Not only are best performing parameters presented, but suggestions for choosing the best parameters based on ontology characteristics are presented.
Collapse
Affiliation(s)
- Christopher Funk
- Computational Bioscience Program, U, of Colorado School of Medicine, Aurora, CO 80045, USA.
| | | | | | | | | | | | | | | |
Collapse
|
45
|
Mayer G, Jones AR, Binz PA, Deutsch EW, Orchard S, Montecchi-Palazzi L, Vizcaíno JA, Hermjakob H, Oveillero D, Julian R, Stephan C, Meyer HE, Eisenacher M. Controlled vocabularies and ontologies in proteomics: overview, principles and practice. BIOCHIMICA ET BIOPHYSICA ACTA 2014; 1844:98-107. [PMID: 23429179 PMCID: PMC3898906 DOI: 10.1016/j.bbapap.2013.02.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Revised: 02/05/2013] [Accepted: 02/09/2013] [Indexed: 11/30/2022]
Abstract
This paper focuses on the use of controlled vocabularies (CVs) and ontologies especially in the area of proteomics, primarily related to the work of the Proteomics Standards Initiative (PSI). It describes the relevant proteomics standard formats and the ontologies used within them. Software and tools for working with these ontology files are also discussed. The article also examines the "mapping files" used to ensure correct controlled vocabulary terms that are placed within PSI standards and the fulfillment of the MIAPE (Minimum Information about a Proteomics Experiment) requirements. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
Affiliation(s)
- Gerhard Mayer
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, D-44801 Bochum, Germany
| | - Andrew R. Jones
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Pierre-Alain Binz
- SIB Swiss Institute of Bioinformatics, Swiss-Prot group, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland
| | - Eric W. Deutsch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA
| | - Sandra Orchard
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | | | - Henning Hermjakob
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - David Oveillero
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Christian Stephan
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, D-44801 Bochum, Germany
- Kairos GmbH, Universitätsstraße 136, D-44799 Bochum, Germany
| | - Helmut E. Meyer
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, D-44801 Bochum, Germany
| | - Martin Eisenacher
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, D-44801 Bochum, Germany
| |
Collapse
|
46
|
Abstract
The mission of the Universal Protein Resource (UniProt) (http://www.uniprot.org) is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequences and functional annotation. It integrates, interprets and standardizes data from literature and numerous resources to achieve the most comprehensive catalog possible of protein information. The central activities are the biocuration of the UniProt Knowledgebase and the dissemination of these data through our Web site and web services. UniProt is produced by the UniProt Consortium, which consists of groups from the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is updated and distributed every 4 weeks and can be accessed online for searches or downloads.
Collapse
Affiliation(s)
- The UniProt Consortium
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland, Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven Street North West, Suite 1200, Washington, DC 20007, USA and Protein Information Resource, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA
| |
Collapse
|
47
|
Przydzial MJ, Bhhatarai B, Koleti A, Vempati U, Schürer SC. GPCR ontology: development and application of a G protein-coupled receptor pharmacology knowledge framework. Bioinformatics 2013; 29:3211-9. [PMID: 24078711 PMCID: PMC3842764 DOI: 10.1093/bioinformatics/btt565] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2013] [Revised: 09/20/2013] [Accepted: 09/24/2013] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Novel tools need to be developed to help scientists analyze large amounts of available screening data with the goal to identify entry points for the development of novel chemical probes and drugs. As the largest class of drug targets, G protein-coupled receptors (GPCRs) remain of particular interest and are pursued by numerous academic and industrial research projects. RESULTS We report the first GPCR ontology to facilitate integration and aggregation of GPCR-targeting drugs and demonstrate its application to classify and analyze a large subset of the PubChem database. The GPCR ontology, based on previously reported BioAssay Ontology, depicts available pharmacological, biochemical and physiological profiles of GPCRs and their ligands. The novelty of the GPCR ontology lies in the use of diverse experimental datasets linked by a model to formally define these concepts. Using a reasoning system, GPCR ontology offers potential for knowledge-based classification of individuals (such as small molecules) as a function of the data. AVAILABILITY The GPCR ontology is available at http://www.bioassayontology.org/bao_gpcr and the National Center for Biomedical Ontologies Web site.
Collapse
Affiliation(s)
- Magdalena J Przydzial
- Center for Computational Science, University of Miami, Miami, FL 33136, USA and Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | | | | | | | | |
Collapse
|
48
|
Jensen M, Cox AP, Chaudhry N, Ng M, Sule D, Duncan W, Ray P, Weinstock-Guttman B, Smith B, Ruttenberg A, Szigeti K, Diehl AD. The neurological disease ontology. J Biomed Semantics 2013; 4:42. [PMID: 24314207 PMCID: PMC4028878 DOI: 10.1186/2041-1480-4-42] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2013] [Accepted: 11/29/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We are developing the Neurological Disease Ontology (ND) to provide a framework to enable representation of aspects of neurological diseases that are relevant to their treatment and study. ND is a representational tool that addresses the need for unambiguous annotation, storage, and retrieval of data associated with the treatment and study of neurological diseases. ND is being developed in compliance with the Open Biomedical Ontology Foundry principles and builds upon the paradigm established by the Ontology for General Medical Science (OGMS) for the representation of entities in the domain of disease and medical practice. Initial applications of ND will include the annotation and analysis of large data sets and patient records for Alzheimer's disease, multiple sclerosis, and stroke. DESCRIPTION ND is implemented in OWL 2 and currently has more than 450 terms that refer to and describe various aspects of neurological diseases. ND directly imports the development version of OGMS, which uses BFO 2. Term development in ND has primarily extended the OGMS terms 'disease', 'diagnosis', 'disease course', and 'disorder'. We have imported and utilize over 700 classes from related ontology efforts including the Foundational Model of Anatomy, Ontology for Biomedical Investigations, and Protein Ontology. ND terms are annotated with ontology metadata such as a label (term name), term editors, textual definition, definition source, curation status, and alternative terms (synonyms). Many terms have logical definitions in addition to these annotations. Current development has focused on the establishment of the upper-level structure of the ND hierarchy, as well as on the representation of Alzheimer's disease, multiple sclerosis, and stroke. The ontology is available as a version-controlled file at http://code.google.com/p/neurological-disease-ontology along with a discussion list and an issue tracker. CONCLUSION ND seeks to provide a formal foundation for the representation of clinical and research data pertaining to neurological diseases. ND will enable its users to connect data in a robust way with related data that is annotated using other terminologies and ontologies in the biomedical domain.
Collapse
Affiliation(s)
- Mark Jensen
- Department of Philosophy, University at Buffalo, 135 Park Hall, Buffalo, NY 14260, USA
| | - Alexander P Cox
- Department of Philosophy, University at Buffalo, 135 Park Hall, Buffalo, NY 14260, USA
| | - Naveed Chaudhry
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - Marcus Ng
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - Donat Sule
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - William Duncan
- Department of Philosophy, University at Buffalo, 135 Park Hall, Buffalo, NY 14260, USA
| | - Patrick Ray
- Department of Philosophy, University at Buffalo, 135 Park Hall, Buffalo, NY 14260, USA
| | - Bianca Weinstock-Guttman
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - Barry Smith
- Department of Philosophy, University at Buffalo, 135 Park Hall, Buffalo, NY 14260, USA
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - Alan Ruttenberg
- Department of Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, 355 Squire Hall, Buffalo, NY 14214, USA
| | - Kinga Szigeti
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - Alexander D Diehl
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| |
Collapse
|
49
|
Livingston KM, Bada M, Hunter LE, Verspoor K. Representing annotation compositionality and provenance for the Semantic Web. J Biomed Semantics 2013; 4:38. [PMID: 24268021 PMCID: PMC4129183 DOI: 10.1186/2041-1480-4-38] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 09/20/2013] [Indexed: 12/03/2022] Open
Abstract
Background Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations. Results We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences. Conclusions With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.
Collapse
Affiliation(s)
- Kevin M Livingston
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Michael Bada
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lawrence E Hunter
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Karin Verspoor
- National ICT Australia, Victoria Research Laboratory, Melbourne, VIC, 3010, Australia ; Department of Computing and Information Systems, The University of Melbourne, Melbourne 3010 VIC, Australia
| |
Collapse
|
50
|
Natale DA, Arighi CN, Blake JA, Bult CJ, Christie KR, Cowart J, D'Eustachio P, Diehl AD, Drabkin HJ, Helfer O, Huang H, Masci AM, Ren J, Roberts NV, Ross K, Ruttenberg A, Shamovsky V, Smith B, Yerramalla MS, Zhang J, AlJanahi A, Çelen I, Gan C, Lv M, Schuster-Lezell E, Wu CH. Protein Ontology: a controlled structured network of protein entities. Nucleic Acids Res 2013; 42:D415-21. [PMID: 24270789 PMCID: PMC3964965 DOI: 10.1093/nar/gkt1173] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
The Protein Ontology (PRO; http://proconsortium.org) formally defines protein entities and explicitly represents their major forms and interrelations. Protein entities represented in PRO corresponding to single amino acid chains are categorized by level of specificity into family, gene, sequence and modification metaclasses, and there is a separate metaclass for protein complexes. All metaclasses also have organism-specific derivatives. PRO complements established sequence databases such as UniProtKB, and interoperates with other biomedical and biological ontologies such as the Gene Ontology (GO). PRO relates to UniProtKB in that PRO’s organism-specific classes of proteins encoded by a specific gene correspond to entities documented in UniProtKB entries. PRO relates to the GO in that PRO’s representations of organism-specific protein complexes are subclasses of the organism-agnostic protein complex terms in the GO Cellular Component Ontology. The past few years have seen growth and changes to the PRO, as well as new points of access to the data and new applications of PRO in immunology and proteomics. Here we describe some of these developments.
Collapse
Affiliation(s)
- Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, WA 20007, USA, Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA, Department of Bioinformatics and Computational Biology, The Jackson Laboratory, Bar Harbor, ME 04609, USA, Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, NY 10016, USA, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, NY 14203, USA, Department of Immunology, Duke University Medical Center, Durham, NC 27705, USA and Department of Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14214, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|