1
|
Vollmar M, Tirunagari S, Harrus D, Armstrong D, Gáborová R, Gupta D, Afonso MQL, Evans G, Velankar S. Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature. Sci Data 2024; 11:1032. [PMID: 39333508 PMCID: PMC11436914 DOI: 10.1038/s41597-024-03841-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 08/29/2024] [Indexed: 09/29/2024] Open
Abstract
We present a novel system that leverages curators in the loop to develop a dataset and model for detecting structure features and functional annotations at residue-level from standard publication text. Our approach involves the integration of data from multiple resources, including PDBe, EuropePMC, PubMedCentral, and PubMed, combined with annotation guidelines from UniProt, and LitSuggest and HuggingFace models as tools in the annotation process. A team of seven annotators manually curated ten articles for named entities, which we utilized to train a starting PubmedBert model from HuggingFace. Using a human-in-the-loop annotation system, we iteratively developed the best model with commendable performance metrics of 0.90 for precision, 0.92 for recall, and 0.91 for F1-measure. Our proposed system showcases a successful synergy of machine learning techniques and human expertise in curating a dataset for residue-level functional annotations and protein structure features. The results demonstrate the potential for broader applications in protein research, bridging the gap between advanced machine learning models and the indispensable insights of domain experts.
Collapse
Affiliation(s)
- Melanie Vollmar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Santosh Tirunagari
- Literature Services, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Deborah Harrus
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - David Armstrong
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Romana Gáborová
- CEITEC - Central European Institute of Technology, Masaryk University, Kamenice 5, 62500, Brno, Czech Republic
| | - Deepti Gupta
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Marcelo Querino Lima Afonso
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Genevieve Evans
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
2
|
Zheng J, Li X, Masci AM, Kahn H, Huffman A, Asfaw E, Pan Y, Guo J, He V, Song J, Seleznev AI, Lin AY, He Y. Empowering standardization of cancer vaccines through ontology: enhanced modeling and data analysis. J Biomed Semantics 2024; 15:12. [PMID: 38890666 PMCID: PMC11186274 DOI: 10.1186/s13326-024-00312-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 05/21/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND The exploration of cancer vaccines has yielded a multitude of studies, resulting in a diverse collection of information. The heterogeneity of cancer vaccine data significantly impedes effective integration and analysis. While CanVaxKB serves as a pioneering database for over 670 manually annotated cancer vaccines, it is important to distinguish that a database, on its own, does not offer the structured relationships and standardized definitions found in an ontology. Recognizing this, we expanded the Vaccine Ontology (VO) to include those cancer vaccines present in CanVaxKB that were not initially covered, enhancing VO's capacity to systematically define and interrelate cancer vaccines. RESULTS An ontology design pattern (ODP) was first developed and applied to semantically represent various cancer vaccines, capturing their associated entities and relations. By applying the ODP, we generated a cancer vaccine template in a tabular format and converted it into the RDF/OWL format for generation of cancer vaccine terms in the VO. '12MP vaccine' was used as an example of cancer vaccines to demonstrate the application of the ODP. VO also reuses reference ontology terms to represent entities such as cancer diseases and vaccine hosts. Description Logic (DL) and SPARQL query scripts were developed and used to query for cancer vaccines based on different vaccine's features and to demonstrate the versatility of the VO representation. Additionally, ontological modeling was applied to illustrate cancer vaccine related concepts and studies for in-depth cancer vaccine analysis. A cancer vaccine-specific VO view, referred to as "CVO," was generated, and it contains 928 classes including 704 cancer vaccines. The CVO OWL file is publicly available on: http://purl.obolibrary.org/obo/vo/cvo.owl , for sharing and applications. CONCLUSION To facilitate the standardization, integration, and analysis of cancer vaccine data, we expanded the Vaccine Ontology (VO) to systematically model and represent cancer vaccines. We also developed a pipeline to automate the inclusion of cancer vaccines and associated terms in the VO. This not only enriches the data's standardization and integration, but also leverages ontological modeling to deepen the analysis of cancer vaccine information, maximizing benefits for researchers and clinicians. AVAILABILITY The VO-cancer GitHub website is: https://github.com/vaccineontology/VO/tree/master/CVO .
Collapse
Affiliation(s)
- Jie Zheng
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Xingxian Li
- College of Literature, Science, and the Arts, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Anna Maria Masci
- Data Impact and Governance, Technology Data and Innovation, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Hayleigh Kahn
- College of Literature, Science, and the Arts, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Anthony Huffman
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Eliyas Asfaw
- University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Yuanyi Pan
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jinjing Guo
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Virginia He
- The College of Brown University, Brown University, Providence, RI, 02912, USA
| | - Justin Song
- College of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Andrey I Seleznev
- Dietrich School of Arts and Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Asiyah Yu Lin
- Axle Research and Technology, Rockville, MD, 20852, USA
| | - Yongqun He
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Rogel Cancer Center, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
3
|
Yamagata Y, Yamada R. Survey on large language model annotation of cellular senescence from figures in review articles. Genomics Inform 2024; 22:7. [PMID: 38907285 DOI: 10.1186/s44342-024-00011-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 05/03/2024] [Indexed: 06/23/2024] Open
Abstract
This study evaluated large language models (LLMs), particularly the GPT-4 with vision (GPT-4 V) and GPT-4 Turbo, for annotating biomedical figures, focusing on cellular senescence. We assessed the ability of LLMs to categorize and annotate complex biomedical images to enhance their accuracy and efficiency. Our experiments employed prompt engineering with figures from review articles, achieving more than 70% accuracy for label extraction and approximately 80% accuracy for node-type classification. Challenges were noted in the correct annotation of the relationship between directionality and inhibitory processes, which were exacerbated as the number of nodes increased. Using figure legends was a more precise identification of sources and targets than using captions, but sometimes lacked pathway details. This study underscores the potential of LLMs in decoding biological mechanisms from text and outlines avenues for improving inhibitory relationship representations in biomedical informatics.
Collapse
Affiliation(s)
- Yuki Yamagata
- R-IH, BioResource Research Center RIKEN, Tsukuba, 305-0074, Japan.
- BioResource Research Center RIKEN, Tsukuba, 305-0074, Japan.
| | | |
Collapse
|
4
|
Yamagata Y, Kushida T, Onami S, Masuya H. Homeostasis imbalance process ontology: a study on COVID-19 infectious processes. BMC Med Inform Decis Mak 2024; 23:301. [PMID: 38778394 PMCID: PMC11110177 DOI: 10.1186/s12911-024-02516-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 04/15/2024] [Indexed: 05/25/2024] Open
Abstract
BACKGROUND One significant challenge in addressing the coronavirus disease 2019 (COVID-19) pandemic is to grasp a comprehensive picture of its infectious mechanisms. We urgently need a consistent framework to capture the intricacies of its complicated viral infectious processes and diverse symptoms. RESULTS We systematized COVID-19 infectious processes through an ontological approach and provided a unified description framework of causal relationships from the early infectious stage to severe clinical manifestations based on the homeostasis imbalance process ontology (HoIP). HoIP covers a broad range of processes in the body, ranging from normal to abnormal. Moreover, our imbalance model enabled us to distinguish viral functional demands from immune defense processes, thereby supporting the development of new drugs, and our research demonstrates how ontological reasoning contributes to the identification of patients at severe risk. CONCLUSIONS The HoIP organises knowledge of COVID-19 infectious processes and related entities, such as molecules, drugs, and symptoms, with a consistent descriptive framework. HoIP is expected to harmonise the description of various heterogeneous processes and improve the interoperability of COVID-19 knowledge through the COVID-19 ontology harmonisation working group.
Collapse
Affiliation(s)
- Yuki Yamagata
- Life Science Data Sharing Unit, Infrastructure Research and Development Division, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan.
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan.
| | - Tatsuya Kushida
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Shuichi Onami
- Life Science Data Sharing Unit, Infrastructure Research and Development Division, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
| | - Hiroshi Masuya
- Life Science Data Sharing Unit, Infrastructure Research and Development Division, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| |
Collapse
|
5
|
Yamagata Y, Fukuyama T, Onami S, Masuya H. Prototyping an Ontological Framework for Cellular Senescence Mechanisms: A Homeostasis Imbalance Perspective. Sci Data 2024; 11:485. [PMID: 38729991 PMCID: PMC11087592 DOI: 10.1038/s41597-024-03331-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 04/29/2024] [Indexed: 05/12/2024] Open
Abstract
Although cellular senescence is a key factor in organismal aging, with both positive and negative effects on individuals, its mechanisms remain largely unknown. Thus, integrating knowledge is essential to explain how cellular senescence manifests in tissue damage and age-related diseases. Here, we propose an ontological model that organizes knowledge of cellular senescence in a computer-readable form. We manually annotated and defined cellular senescence processes, molecules, anatomical structures, phenotypes, and other entities based on the Homeostasis Imbalance Process ontology (HOIP). We described the mechanisms as causal relationships of processes and modelled a homeostatic imbalance between stress and stress response in cellular senescence for a unified framework. HOIP was assessed formally, and the relationships between cellular senescence and diseases were inferred for higher-order knowledge processing. We visualized cellular senescence processes to support knowledge utilization. Our study provides a knowledge base to help elucidate mechanisms linking cellular and organismal aging.
Collapse
Affiliation(s)
- Yuki Yamagata
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan.
- Life Science Data Sharing Unit, Infrastructure Research and Development Division, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-Minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan.
| | - Tsubasa Fukuyama
- AXIOHELIX CO. LTD., 8F Kubota Bldg., 1-12-17 Kandaizumicho, Chiyoda-ku, Tokyo, 101-0024, Japan
| | - Shuichi Onami
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
- Life Science Data Sharing Unit, Infrastructure Research and Development Division, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-Minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
| | - Hiroshi Masuya
- Life Science Data Sharing Unit, Infrastructure Research and Development Division, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-Minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan.
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, Kouyadai 3-1-1 Tsukuba, Ibaraki, 305-0074, Japan.
| |
Collapse
|
6
|
Rutherford KM, Lera-Ramírez M, Wood V. PomBase: a Global Core Biodata Resource-growth, collaboration, and sustainability. Genetics 2024; 227:iyae007. [PMID: 38376816 PMCID: PMC11075564 DOI: 10.1093/genetics/iyae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/13/2024] [Indexed: 02/21/2024] Open
Abstract
PomBase (https://www.pombase.org), the model organism database (MOD) for fission yeast, was recently awarded Global Core Biodata Resource (GCBR) status by the Global Biodata Coalition (GBC; https://globalbiodata.org/) after a rigorous selection process. In this MOD review, we present PomBase's continuing growth and improvement over the last 2 years. We describe these improvements in the context of the qualitative GCBR indicators related to scientific quality, comprehensivity, accelerating science, user stories, and collaborations with other biodata resources. This review also showcases the depth of existing connections both within the biocuration ecosystem and between PomBase and its user community.
Collapse
Affiliation(s)
- Kim M Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Manuel Lera-Ramírez
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
7
|
Baldarelli RM, Smith CL, Ringwald M, Richardson JE, Bult CJ. Mouse Genome Informatics: an integrated knowledgebase system for the laboratory mouse. Genetics 2024; 227:iyae031. [PMID: 38531069 PMCID: PMC11075557 DOI: 10.1093/genetics/iyae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Accepted: 02/13/2024] [Indexed: 03/28/2024] Open
Abstract
Mouse Genome Informatics (MGI) is a federation of expertly curated information resources designed to support experimental and computational investigations into genetic and genomic aspects of human biology and disease using the laboratory mouse as a model system. The Mouse Genome Database (MGD) and the Gene Expression Database (GXD) are core MGI databases that share data and system architecture. MGI serves as the central community resource of integrated information about mouse genome features, variation, expression, gene function, phenotype, and human disease models acquired from peer-reviewed publications, author submissions, and major bioinformatics resources. To facilitate integration and standardization of data, biocuration scientists annotate using terms from controlled metadata vocabularies and biological ontologies (e.g. Mammalian Phenotype Ontology, Mouse Developmental Anatomy, Disease Ontology, Gene Ontology, etc.), and by applying international community standards for gene, allele, and mouse strain nomenclature. MGI serves basic scientists, translational researchers, and data scientists by providing access to FAIR-compliant data in both human-readable and compute-ready formats. The MGI resource is accessible at https://informatics.jax.org. Here, we present an overview of the core data types represented in MGI and highlight recent enhancements to the resource with a focus on new data and functionality for MGD and GXD.
Collapse
Affiliation(s)
| | | | | | | | - Carol J Bult
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| |
Collapse
|
8
|
Ross KE, Bastian FB, Buys M, Cook CE, D’Eustachio P, Harrison M, Hermjakob H, Li D, Lord P, Natale DA, Peters B, Sternberg PW, Su AI, Thakur M, Thomas PD, Bateman A. Perspectives on tracking data reuse across biodata resources. BIOINFORMATICS ADVANCES 2024; 4:vbae057. [PMID: 38721398 PMCID: PMC11076920 DOI: 10.1093/bioadv/vbae057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/13/2024] [Accepted: 04/11/2024] [Indexed: 06/14/2024]
Abstract
Motivation Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge. Results The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources. Availability and implementation Summaries of survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEzbOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).
Collapse
Affiliation(s)
- Karen E Ross
- Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, United States
| | - Frederic B Bastian
- Evolutionary Bioinformatics Group, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | | | | | - Peter D’Eustachio
- Department of Biochemistry & Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10012, United States
| | - Melissa Harrison
- Literature Services, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Henning Hermjakob
- Molecular Systems, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Donghui Li
- Chan Zuckerberg Initiative, Redwood City, CA 94063, United States
| | - Phillip Lord
- School of Computing, Newcastle University, Newcastle upon Tyne NE4 5TG, United Kingdom
| | - Darren A Natale
- Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, United States
| | - Bjoern Peters
- Center for Vaccine Innovation, La Jolla Institute of Immunology, La Jolla, CA 92037, United States
| | - Paul W Sternberg
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, United States
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Matthew Thakur
- Data Services, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90089, United States
| | - Alex Bateman
- MSCB, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| |
Collapse
|
9
|
Kilicoglu H, Ensan F, McInnes B, Wang LL. Semantics-enabled biomedical literature analytics. J Biomed Inform 2024; 150:104588. [PMID: 38244957 DOI: 10.1016/j.jbi.2024.104588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 01/10/2024] [Indexed: 01/22/2024]
Affiliation(s)
- Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana Champaign, Champaign, IL, USA.
| | - Faezeh Ensan
- Department of Electrical, Computer, and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON, Canada.
| | - Bridget McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Lucy Lu Wang
- Information School, University of Washington, Seattle, WA, USA.
| |
Collapse
|
10
|
Girón JC, Tarasov S, González Montaña LA, Matentzoglu N, Smith AD, Koch M, Boudinot BE, Bouchard P, Burks R, Vogt L, Yoder M, Osumi-Sutherland D, Friedrich F, Beutel RG, Mikó I. Formalizing Invertebrate Morphological Data: A Descriptive Model for Cuticle-Based Skeleto-Muscular Systems, an Ontology for Insect Anatomy, and their Potential Applications in Biodiversity Research and Informatics. Syst Biol 2023; 72:1084-1100. [PMID: 37094905 DOI: 10.1093/sysbio/syad025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 04/17/2023] [Accepted: 04/21/2023] [Indexed: 04/26/2023] Open
Abstract
The spectacular radiation of insects has produced a stunning diversity of phenotypes. During the past 250 years, research on insect systematics has generated hundreds of terms for naming and comparing them. In its current form, this terminological diversity is presented in natural language and lacks formalization, which prohibits computer-assisted comparison using semantic web technologies. Here we propose a Model for Describing Cuticular Anatomical Structures (MoDCAS) which incorporates structural properties and positional relationships for standardized, consistent, and reproducible descriptions of arthropod phenotypes. We applied the MoDCAS framework in creating the ontology for the Anatomy of the Insect Skeleto-Muscular system (AISM). The AISM is the first general insect ontology that aims to cover all taxa by providing generalized, fully logical, and queryable, definitions for each term. It was built using the Ontology Development Kit (ODK), which maximizes interoperability with Uberon (Uberon multispecies anatomy ontology) and other basic ontologies, enhancing the integration of insect anatomy into the broader biological sciences. A template system for adding new terms, extending, and linking the AISM to additional anatomical, phenotypic, genetic, and chemical ontologies is also introduced. The AISM is proposed as the backbone for taxon-specific insect ontologies and has potential applications spanning systematic biology and biodiversity informatics, allowing users to: 1) use controlled vocabularies and create semiautomated computer-parsable insect morphological descriptions; 2) integrate insect morphology into broader fields of research, including ontology-informed phylogenetic methods, logical homology hypothesis testing, evo-devo studies, and genotype to phenotype mapping; and 3) automate the extraction of morphological data from the literature, enabling the generation of large-scale phenomic data, by facilitating the production and testing of informatic tools able to extract, link, annotate, and process morphological data. This descriptive model and its ontological applications will allow for clear and semantically interoperable integration of arthropod phenotypes in biodiversity studies.
Collapse
Affiliation(s)
- Jennifer C Girón
- Department of Entomology, Purdue University, West Lafayette, IN, USA
- Natural Science Research Laboratory, Museum of Texas Tech University, Lubbock, TX, USA
| | - Sergei Tarasov
- Finnish Museum of Natural History, University of Helsinki, Pohjoinen Rautatiekatu 13, FI-00014 Helsinki, Finland
| | | | | | - Aaron D Smith
- Department of Entomology, Purdue University, West Lafayette, IN, USA
| | - Markus Koch
- Institute of Evolutionary Biology and Ecology, University of Bonn, An der Immenburg 1, 53121 Bonn, Germany
| | - Brendon E Boudinot
- Department of Entomology & Nematology, University of California, Davis, One Shields Ave, CA, USA
- Institut für Zoologie und Evolutionsforschung, Friedrich-Schiller-Universität Jena, Erbertstraße 1, 07743 Jena, Germany
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington DC, USA
| | - Patrice Bouchard
- Biodiversity and Bioresources, Canadian National Collection of Insects, Arachnids and Nematodes, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, Ontario, K1A 0C6, Canada
| | - Roger Burks
- Entomology Department, University of California, Riverside, 900 University Ave. Riverside, CA, USA
| | - Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167 Hannover, Germany
| | - Matthew Yoder
- Illinois Natural History Survey, University of Illinois, Champaign, IL, USA
| | | | - Frank Friedrich
- Institut für Zell- und Systembiologie der Tiere, Universität Hamburg, Martin-Luther-King-Platz 3, 20146, Hamburg, Germany
| | - Rolf G Beutel
- Institut für Zoologie und Evolutionsforschung, Friedrich-Schiller-Universität Jena, Erbertstraße 1, 07743 Jena, Germany
| | - István Mikó
- Department of Biological Sciences, University of New Hampshire, Durham, NH, USA
| |
Collapse
|
11
|
Gonzalez-Cavazos AC, Tanska A, Mayers M, Carvalho-Silva D, Sridharan B, Rewers PA, Sankarlal U, Jagannathan L, Su AI. DrugMechDB: A Curated Database of Drug Mechanisms. Sci Data 2023; 10:632. [PMID: 37717042 PMCID: PMC10505144 DOI: 10.1038/s41597-023-02534-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 09/01/2023] [Indexed: 09/18/2023] Open
Abstract
Computational drug repositioning methods have emerged as an attractive and effective solution to find new candidates for existing therapies, reducing the time and cost of drug development. Repositioning methods based on biomedical knowledge graphs typically offer useful supporting biological evidence. This evidence is based on reasoning chains or subgraphs that connect a drug to a disease prediction. However, there are no databases of drug mechanisms that can be used to train and evaluate such methods. Here, we introduce the Drug Mechanism Database (DrugMechDB), a manually curated database that describes drug mechanisms as paths through a knowledge graph. DrugMechDB integrates a diverse range of authoritative free-text resources to describe 4,583 drug indications with 32,249 relationships, representing 14 major biological scales. DrugMechDB can be employed as a benchmark dataset for assessing computational drug repositioning models or as a valuable resource for training such models.
Collapse
Affiliation(s)
- Adriana Carolina Gonzalez-Cavazos
- The Scripps Research Institute, Department of Integrative Structural and Computational Biology, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Anna Tanska
- The Scripps Research Institute, Department of Integrative Structural and Computational Biology, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Michael Mayers
- The Scripps Research Institute, Department of Integrative Structural and Computational Biology, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Denise Carvalho-Silva
- The Scripps Research Institute, Department of Integrative Structural and Computational Biology, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Brindha Sridharan
- The Scripps Research Institute, Department of Integrative Structural and Computational Biology, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Patrick A Rewers
- The Scripps Research Institute, Department of Integrative Structural and Computational Biology, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Umasri Sankarlal
- The Scripps Research Institute, Department of Integrative Structural and Computational Biology, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Lakshmanan Jagannathan
- The Scripps Research Institute, Department of Integrative Structural and Computational Biology, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Andrew I Su
- The Scripps Research Institute, Department of Integrative Structural and Computational Biology, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA.
| |
Collapse
|
12
|
Lin AY, Arabandi S, Beale T, Duncan WD, Hicks A, Hogan WR, Jensen M, Koppel R, Martínez-Costa C, Nytrø Ø, Obeid JS, de Oliveira JP, Ruttenberg A, Seppälä S, Smith B, Soergel D, Zheng J, Schulz S. Improving the Quality and Utility of Electronic Health Record Data through Ontologies. STANDARDS (BASEL, SWITZERLAND) 2023; 3:316-340. [PMID: 37873508 PMCID: PMC10591519 DOI: 10.3390/standards3030023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The translational research community, in general, and the Clinical and Translational Science Awards (CTSA) community, in particular, share the vision of repurposing EHRs for research that will improve the quality of clinical practice. Many members of these communities are also aware that electronic health records (EHRs) suffer limitations of data becoming poorly structured, biased, and unusable out of original context. This creates obstacles to the continuity of care, utility, quality improvement, and translational research. Analogous limitations to sharing objective data in other areas of the natural sciences have been successfully overcome by developing and using common ontologies. This White Paper presents the authors' rationale for the use of ontologies with computable semantics for the improvement of clinical data quality and EHR usability formulated for researchers with a stake in clinical and translational science and who are advocates for the use of information technology in medicine but at the same time are concerned by current major shortfalls. This White Paper outlines pitfalls, opportunities, and solutions and recommends increased investment in research and development of ontologies with computable semantics for a new generation of EHRs.
Collapse
Affiliation(s)
- Asiyah Yu Lin
- National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | - William D. Duncan
- College of Dentistry, University of Florida, Gainesville, FL 32610, USA
| | - Amanda Hicks
- The Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723, USA
| | - William R. Hogan
- Data Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | | - Ross Koppel
- Department of Medical Informatics, Jacobs School of Medicine, University at Buffalo, Buffalo, NY 14260, USA
- Department of Medical Informatics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Catalina Martínez-Costa
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, 30100 Murcia, Spain
| | - Øystein Nytrø
- Department of Computer Science, UIT Arctic University of Norway, 9037 Tromsø, Norway
- Department of Computer Science, Norwegian University of Science and Technology, 7491 Trondheim, Norway
| | - Jihad S. Obeid
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
| | | | - Alan Ruttenberg
- School of Dental Medicine, University at Buffalo, Buffalo, NY 14260, USA
| | - Selja Seppälä
- Department of Business Information Systems, University College Cork, T12 K8AF Cork, Ireland
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, NY 14260, USA
| | - Dagobert Soergel
- Department of Philosophy, University at Buffalo, Buffalo, NY 14260, USA
| | - Jie Zheng
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48104, USA
| | - Stefan Schulz
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, 8036 Graz, Austria
- Averbis GmbH, Salzstrasse 15, 79098 Freiburg im Breisgau, Germany
| |
Collapse
|
13
|
Badenes-Olmedo C, Corcho O. Lessons learned to enable question answering on knowledge graphs extracted from scientific publications: A case study on the coronavirus literature. J Biomed Inform 2023; 142:104382. [PMID: 37156393 PMCID: PMC10163941 DOI: 10.1016/j.jbi.2023.104382] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 04/14/2023] [Accepted: 05/03/2023] [Indexed: 05/10/2023]
Abstract
The article presents a workflow to create a question-answering system whose knowledge base combines knowledge graphs and scientific publications on coronaviruses. It is based on the experience gained in modeling evidence from research articles to provide answers to questions in natural language. The work contains best practices for acquiring scientific publications, tuning language models to identify and normalize relevant entities, creating representational models based on probabilistic topics, and formalizing an ontology that describes the associations between domain concepts supported by the scientific literature. All the resources generated in the domain of coronavirus are available openly as part of the Drugs4COVID initiative, and can be (re)-used independently or as a whole. They can be exploited by scientific communities conducting research related to SARS-CoV-2/COVID-19 and also by therapeutic communities, laboratories, etc., wishing to find and understand relationships between symptoms, drugs, active ingredients and their documentary evidence.
Collapse
Affiliation(s)
| | - Oscar Corcho
- Artificial Intelligence Department, Campus de Montegancedo, s/n., Boadilla del Monte, 28660, Madrid, Spain
| |
Collapse
|
14
|
Aleksander SA, Balhoff J, Carbon S, Cherry JM, Drabkin HJ, Ebert D, Feuermann M, Gaudet P, Harris NL, Hill DP, Lee R, Mi H, Moxon S, Mungall CJ, Muruganugan A, Mushayahama T, Sternberg PW, Thomas PD, Van Auken K, Ramsey J, Siegele DA, Chisholm RL, Fey P, Aspromonte MC, Nugnes MV, Quaglia F, Tosatto S, Giglio M, Nadendla S, Antonazzo G, Attrill H, Dos Santos G, Marygold S, Strelets V, Tabone CJ, Thurmond J, Zhou P, Ahmed SH, Asanitthong P, Luna Buitrago D, Erdol MN, Gage MC, Ali Kadhum M, Li KYC, Long M, Michalak A, Pesala A, Pritazahra A, Saverimuttu SCC, Su R, Thurlow KE, Lovering RC, Logie C, Oliferenko S, Blake J, Christie K, Corbani L, Dolan ME, Drabkin HJ, Hill DP, Ni L, Sitnikov D, Smith C, Cuzick A, Seager J, Cooper L, Elser J, Jaiswal P, Gupta P, Jaiswal P, Naithani S, Lera-Ramirez M, Rutherford K, Wood V, De Pons JL, Dwinell MR, Hayman GT, Kaldunski ML, Kwitek AE, Laulederkind SJF, Tutaj MA, Vedi M, Wang SJ, D'Eustachio P, Aimo L, Axelsen K, Bridge A, Hyka-Nouspikel N, Morgat A, Aleksander SA, Cherry JM, Engel SR, Karra K, Miyasato SR, Nash RS, Skrzypek MS, Weng S, Wong ED, Bakker E, Berardini TZ, Reiser L, Auchincloss A, Axelsen K, Argoud-Puy G, Blatter MC, Boutet E, Breuza L, Bridge A, Casals-Casas C, Coudert E, Estreicher A, Livia Famiglietti M, Feuermann M, Gos A, Gruaz-Gumowski N, Hulo C, Hyka-Nouspikel N, Jungo F, Le Mercier P, Lieberherr D, Masson P, Morgat A, Pedruzzi I, Pourcel L, Poux S, Rivoire C, Sundaram S, Bateman A, Bowler-Barnett E, Bye-A-Jee H, Denny P, Ignatchenko A, Ishtiaq R, Lock A, Lussi Y, Magrane M, Martin MJ, Orchard S, Raposo P, Speretta E, Tyagi N, Warner K, Zaru R, Diehl AD, Lee R, Chan J, Diamantakis S, Raciti D, Zarowiecki M, Fisher M, James-Zorn C, Ponferrada V, Zorn A, Ramachandran S, Ruzicka L, Westerfield M. The Gene Ontology knowledgebase in 2023. Genetics 2023; 224:iyad031. [PMID: 36866529 PMCID: PMC10158837 DOI: 10.1093/genetics/iyad031] [Citation(s) in RCA: 420] [Impact Index Per Article: 420.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/10/2023] [Accepted: 02/11/2023] [Indexed: 03/04/2023] Open
Abstract
The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO-a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations-evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)-mechanistic models of molecular "pathways" (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.
Collapse
|
15
|
Gonzalez-Cavazos AC, Tanska A, Mayers MD, Carvalho-Silva D, Sridharan B, Rewers PA, Sankarlal U, Jagannathan L, Su AI. DrugMechDB: A Curated Database of Drug Mechanisms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.01.538993. [PMID: 37205439 PMCID: PMC10187194 DOI: 10.1101/2023.05.01.538993] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Computational drug repositioning methods have emerged as an attractive and effective solution to find new candidates for existing therapies, reducing the time and cost of drug development. Repositioning methods based on biomedical knowledge graphs typically offer useful supporting biological evidence. This evidence is based on reasoning chains or subgraphs that connect a drug to disease predictions. However, there are no databases of drug mechanisms that can be used to train and evaluate such methods. Here, we introduce the Drug Mechanism Database (DrugMechDB), a manually curated database that describes drug mechanisms as paths through a knowledge graph. DrugMechDB integrates a diverse range of authoritative free-text resources to describe 4,583 drug indications with 32,249 relationships, representing 14 major biological scales. DrugMechDB can be employed as a benchmark dataset for assessing computational drug repurposing models or as a valuable resource for training such models.
Collapse
Affiliation(s)
| | - Anna Tanska
- The Scripps Research Institute, Department of Integrative and Structural Biology, 10550 N Torrey Pines Rd. La Jolla, CA, 92037, USA
| | - Michael D. Mayers
- The Scripps Research Institute, Department of Integrative and Structural Biology, 10550 N Torrey Pines Rd. La Jolla, CA, 92037, USA
| | - Denise Carvalho-Silva
- The Scripps Research Institute, Department of Integrative and Structural Biology, 10550 N Torrey Pines Rd. La Jolla, CA, 92037, USA
| | - Brindha Sridharan
- The Scripps Research Institute, Department of Integrative and Structural Biology, 10550 N Torrey Pines Rd. La Jolla, CA, 92037, USA
| | - Patrik A. Rewers
- The Scripps Research Institute, Department of Integrative and Structural Biology, 10550 N Torrey Pines Rd. La Jolla, CA, 92037, USA
| | - Umasri Sankarlal
- The Scripps Research Institute, Department of Integrative and Structural Biology, 10550 N Torrey Pines Rd. La Jolla, CA, 92037, USA
| | - Lakshmanan Jagannathan
- The Scripps Research Institute, Department of Integrative and Structural Biology, 10550 N Torrey Pines Rd. La Jolla, CA, 92037, USA
| | - Andrew I. Su
- The Scripps Research Institute, Department of Integrative and Structural Biology, 10550 N Torrey Pines Rd. La Jolla, CA, 92037, USA
| |
Collapse
|
16
|
Sobral PS, Luz VCC, Almeida JMGCF, Videira PA, Pereira F. Computational Approaches Drive Developments in Immune-Oncology Therapies for PD-1/PD-L1 Immune Checkpoint Inhibitors. Int J Mol Sci 2023; 24:ijms24065908. [PMID: 36982981 PMCID: PMC10054797 DOI: 10.3390/ijms24065908] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/16/2023] [Accepted: 03/19/2023] [Indexed: 03/30/2023] Open
Abstract
Computational approaches in immune-oncology therapies focus on using data-driven methods to identify potential immune targets and develop novel drug candidates. In particular, the search for PD-1/PD-L1 immune checkpoint inhibitors (ICIs) has enlivened the field, leveraging the use of cheminformatics and bioinformatics tools to analyze large datasets of molecules, gene expression and protein-protein interactions. Up to now, there is still an unmet clinical need for improved ICIs and reliable predictive biomarkers. In this review, we highlight the computational methodologies applied to discovering and developing PD-1/PD-L1 ICIs for improved cancer immunotherapies with a greater focus in the last five years. The use of computer-aided drug design structure- and ligand-based virtual screening processes, molecular docking, homology modeling and molecular dynamics simulations methodologies essential for successful drug discovery campaigns focusing on antibodies, peptides or small-molecule ICIs are addressed. A list of recent databases and web tools used in the context of cancer and immunotherapy has been compilated and made available, namely regarding a general scope, cancer and immunology. In summary, computational approaches have become valuable tools for discovering and developing ICIs. Despite significant progress, there is still a need for improved ICIs and biomarkers, and recent databases and web tools have been compiled to aid in this pursuit.
Collapse
Affiliation(s)
- Patrícia S Sobral
- LAQV and REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
- UCIBIO, Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
- Associate Laboratory i4HB-Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - Vanessa C C Luz
- UCIBIO, Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
- Associate Laboratory i4HB-Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - João M G C F Almeida
- UCIBIO, Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - Paula A Videira
- UCIBIO, Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
- Associate Laboratory i4HB-Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - Florbela Pereira
- LAQV and REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| |
Collapse
|
17
|
Mahita J, Ha B, Gambiez A, Schendel SL, Li H, Hastie KM, Dennison SM, Li K, Kuzmina N, Periasamy S, Bukreyev A, Munt JE, Osei-Twum M, Atyeo C, Overton JA, Vita R, Guzman-Orozco H, Mendes M, Kojima M, Halfmann PJ, Kawaoka Y, Alter G, Gagnon L, Baric RS, Tomaras GD, Germann T, Bedinger D, Greenbaum JA, Saphire EO, Peters B. Coronavirus Immunotherapeutic Consortium Database. Database (Oxford) 2023; 2023:7034146. [PMID: 36763096 PMCID: PMC9913043 DOI: 10.1093/database/baac112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 11/30/2022] [Accepted: 12/22/2022] [Indexed: 02/11/2023]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has seen multiple anti-SARS-CoV-2 antibodies being generated globally. It is difficult, however, to assemble a useful compendium of these biological properties if they are derived from experimental measurements performed at different sites under different experimental conditions. The Coronavirus Immunotherapeutic Consortium (COVIC) circumvents these issues by experimentally testing blinded antibodies side by side for several functional activities. To collect these data in a consistent fashion and make it publicly available, we established the COVIC database (COVIC-DB, https://covicdb.lji.org/). This database enables systematic analysis and interpretation of this large-scale dataset by providing a comprehensive view of various features such as affinity, neutralization, in vivo protection and effector functions for each antibody. Interactive graphs enable direct comparisons of antibodies based on select functional properties. We demonstrate how the COVIC-DB can be utilized to examine relationships among antibody features, thereby guiding the design of therapeutic antibody cocktails. Database URL https://covicdb.lji.org/.
Collapse
Affiliation(s)
| | | | - Anais Gambiez
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Sharon L Schendel
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Haoyang Li
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Kathryn M Hastie
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - S Moses Dennison
- Center for Human Systems Immunology, Departments of Surgery, Immunology, and Molecular Genetics and Microbiology and Duke Human Vaccine Institute, Duke University, Durham, NC 27701, USA
| | - Kan Li
- Center for Human Systems Immunology, Departments of Surgery, Immunology, and Molecular Genetics and Microbiology and Duke Human Vaccine Institute, Duke University, Durham, NC 27701, USA
| | - Natalia Kuzmina
- Department of Pathology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-0609, USA,Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-1019, USA
| | - Sivakumar Periasamy
- Department of Pathology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-0609, USA,Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-1019, USA
| | - Alexander Bukreyev
- Department of Pathology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-0609, USA,Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-1019, USA,Galveston National Laboratory, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77550, USA
| | - Jennifer E Munt
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, 135 Dauer Drive, 2101 McGavran-Greenberg Hall,CB #7435, Chapel Hill, NC 27599-7435, USA
| | - Mary Osei-Twum
- Nexelis, a Q2 Solutions Company, 525 Boulevard Cartier Ouest, Laval, Quebec H7V 3S8, Canada
| | - Caroline Atyeo
- Ragon Institute of MGH, MIT and Harvard, 400 Technology Square, Cambrige, MA 02139-3583, USA
| | - James A Overton
- Knocean Inc., 107 Quebec Ave. Toronto, Ontario, M6P 2T3, Canada
| | - Randi Vita
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Hector Guzman-Orozco
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Marcus Mendes
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Mari Kojima
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Peter J Halfmann
- Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, WI 53711, USA
| | - Yoshihiro Kawaoka
- Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, WI 53711, USA,Division of Virology, Department of Microbiology and Immunology, Institute of Medical Science, University of Tokyo, Tokyo 108-8639, Japan,The Research Center for Global Viral Diseases, National Center for Global Health and Medicine Research Institute, Tokyo 162-8655, Japan
| | - Galit Alter
- Ragon Institute of MGH, MIT and Harvard, 400 Technology Square, Cambrige, MA 02139-3583, USA
| | - Luc Gagnon
- Nexelis, a Q2 Solutions Company, 525 Boulevard Cartier Ouest, Laval, Quebec H7V 3S8, Canada
| | - Ralph S Baric
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, 135 Dauer Drive, 2101 McGavran-Greenberg Hall,CB #7435, Chapel Hill, NC 27599-7435, USA,Department of Microbiology and Immunology, School of Medicine, 125 Marson Farm Road, Chapel Hill, NC 27599-7290, USA
| | - Georgia D Tomaras
- Center for Human Systems Immunology, Departments of Surgery, Immunology, and Molecular Genetics and Microbiology and Duke Human Vaccine Institute, Duke University, Durham, NC 27701, USA
| | - Tim Germann
- Carterra Inc., 825 N. 300 W.Ste, C309, Salt Lake City, UT 84103, USA
| | - Daniel Bedinger
- Carterra Inc., 825 N. 300 W.Ste, C309, Salt Lake City, UT 84103, USA
| | - Jason A Greenbaum
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | | | - Bjoern Peters
- Correspondence may also be addressed to Bjoern Peters. Tel: +1858 752 6914; Fax: +858-752-6987;
| |
Collapse
|
18
|
A Tissue-Specific and Toxicology-Focused Knowledge Graph. INFORMATION 2023. [DOI: 10.3390/info14020091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Molecular biology-focused knowledge graphs (KGs) are directed graphs that integrate information from heterogeneous sources of biological and biomedical data, such as ontologies and public databases. They provide a holistic view of biology, chemistry, and disease, allowing users to draw non-obvious connections between concepts through shared associations. While these massive graphs are constructed using carefully curated ontologies and annotations from public databases, much of the information relating the concepts is context specific. Two important variables that determine the applicability of a given ontology annotation are the species and (especially) the tissue type in which it takes place. Using a data-driven approach and the results from thousands of high-quality gene expression samples, we have constructed tissue-specific KGs (using liver, kidney, and heart as examples) that empirically validate the annotations provided by ontology curators. The resulting human-centered KGs are designed for toxicology applications but are generalizable to other areas of human biology, addressing the issue of tissue specificity that often limits the applicability of other large KGs. These knowledge graphs can serve as valuable tools for generating transparent explanations of experimental results in the form of mechanistic hypotheses that are highly relevant to the studied tissue. Because the data-driven relations are derived from a large collection of human in vitro data, these KGs are particularly well suited for in vitro toxicology applications.
Collapse
|
19
|
A curated collection of human vaccination response signatures. Sci Data 2022; 9:678. [PMID: 36347894 PMCID: PMC9643367 DOI: 10.1038/s41597-022-01558-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 07/14/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractRecent advances in high-throughput experiments and systems biology approaches have resulted in hundreds of publications identifying “immune signatures”. Unfortunately, these are often described within text, figures, or tables in a format not amenable to computational processing, thus severely hampering our ability to fully exploit this information. Here we present a data model to represent immune signatures, along with the Human Immunology Project Consortium (HIPC) Dashboard (www.hipc-dashboard.org), a web-enabled application to facilitate signature access and querying. The data model captures the biological response components (e.g., genes, proteins, cell types or metabolites) and metadata describing the context under which the signature was identified using standardized terms from established resources (e.g., HGNC, Protein Ontology, Cell Ontology). We have manually curated a collection of >600 immune signatures from >60 published studies profiling human vaccination responses for the current release. The system will aid in building a broader understanding of the human immune response to stimuli by enabling researchers to easily access and interrogate published immune signatures.
Collapse
|
20
|
Gavali S, Ross K, Chen C, Cowart J, Wu CH. A knowledge graph representation learning approach to predict novel kinase-substrate interactions. Mol Omics 2022; 18:853-864. [PMID: 35975455 PMCID: PMC9621340 DOI: 10.1039/d1mo00521a] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 07/22/2022] [Indexed: 10/12/2023]
Abstract
The human proteome contains a vast network of interacting kinases and substrates. Even though some kinases have proven to be immensely useful as therapeutic targets, a majority are still understudied. In this work, we present a novel knowledge graph representation learning approach to predict novel interaction partners for understudied kinases. Our approach uses a phosphoproteomic knowledge graph constructed by integrating data from iPTMnet, protein ontology, gene ontology and BioKG. The representations of kinases and substrates in this knowledge graph are learned by performing directed random walks on triples coupled with a modified SkipGram or CBOW model. These representations are then used as an input to a supervised classification model to predict novel interactions for understudied kinases. We also present a post-predictive analysis of the predicted interactions and an ablation study of the phosphoproteomic knowledge graph to gain an insight into the biology of the understudied kinases.
Collapse
Affiliation(s)
- Sachin Gavali
- University of Delaware, Newark, DE 590 Avenue 1743, Suite 147, Newark, DE, USA.
| | - Karen Ross
- Georgetown University Medical Center, Washington DC, USA
| | - Chuming Chen
- University of Delaware, Newark, DE 590 Avenue 1743, Suite 147, Newark, DE, USA.
| | - Julie Cowart
- University of Delaware, Newark, DE 590 Avenue 1743, Suite 147, Newark, DE, USA.
| | - Cathy H Wu
- University of Delaware, Newark, DE 590 Avenue 1743, Suite 147, Newark, DE, USA.
| |
Collapse
|
21
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2023 update. Nucleic Acids Res 2022; 51:D1373-D1380. [PMID: 36305812 PMCID: PMC9825602 DOI: 10.1093/nar/gkac956] [Citation(s) in RCA: 697] [Impact Index Per Article: 348.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/06/2022] [Accepted: 10/13/2022] [Indexed: 01/30/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- To whom correspondence should be addressed. Tel: +1 301 451 1811; Fax: +1 301 480 4559;
| |
Collapse
|
22
|
Hill C, Avila-Palencia I, Maxwell AP, Hunter RF, McKnight AJ. Harnessing the Full Potential of Multi-Omic Analyses to Advance the Study and Treatment of Chronic Kidney Disease. FRONTIERS IN NEPHROLOGY 2022; 2:923068. [PMID: 37674991 PMCID: PMC10479694 DOI: 10.3389/fneph.2022.923068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 05/30/2022] [Indexed: 09/08/2023]
Abstract
Chronic kidney disease (CKD) was the 12th leading cause of death globally in 2017 with the prevalence of CKD estimated at ~9%. Early detection and intervention for CKD may improve patient outcomes, but standard testing approaches even in developed countries do not facilitate identification of patients at high risk of developing CKD, nor those progressing to end-stage kidney disease (ESKD). Recent advances in CKD research are moving towards a more personalised approach for CKD. Heritability for CKD ranges from 30% to 75%, yet identified genetic risk factors account for only a small proportion of the inherited contribution to CKD. More in depth analysis of genomic sequencing data in large cohorts is revealing new genetic risk factors for common diagnoses of CKD and providing novel diagnoses for rare forms of CKD. Multi-omic approaches are now being harnessed to improve our understanding of CKD and explain some of the so-called 'missing heritability'. The most common omic analyses employed for CKD are genomics, epigenomics, transcriptomics, metabolomics, proteomics and phenomics. While each of these omics have been reviewed individually, considering integrated multi-omic analysis offers considerable scope to improve our understanding and treatment of CKD. This narrative review summarises current understanding of multi-omic research alongside recent experimental and analytical approaches, discusses current challenges and future perspectives, and offers new insights for CKD.
Collapse
Affiliation(s)
| | | | | | | | - Amy Jayne McKnight
- Centre for Public Health, Queen’s University Belfast, Belfast, United Kingdom
| |
Collapse
|
23
|
Mayers M, Tu R, Steinecke D, Li TS, Queralt-Rosinach N, Su AI. Design and application of a knowledge network for automatic prioritization of drug mechanisms. Bioinformatics 2022; 38:2880-2891. [PMID: 35561182 PMCID: PMC9113361 DOI: 10.1093/bioinformatics/btac205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 02/17/2022] [Accepted: 04/04/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Drug repositioning is an attractive alternative to de novo drug discovery due to reduced time and costs to bring drugs to market. Computational repositioning methods, particularly non-black-box methods that can account for and predict a drug's mechanism, may provide great benefit for directing future development. By tuning both data and algorithm to utilize relationships important to drug mechanisms, a computational repositioning algorithm can be trained to both predict and explain mechanistically novel indications. RESULTS In this work, we examined the 123 curated drug mechanism paths found in the drug mechanism database (DrugMechDB) and after identifying the most important relationships, we integrated 18 data sources to produce a heterogeneous knowledge graph, MechRepoNet, capable of capturing the information in these paths. We applied the Rephetio repurposing algorithm to MechRepoNet using only a subset of relationships known to be mechanistic in nature and found adequate predictive ability on an evaluation set with AUROC value of 0.83. The resulting repurposing model allowed us to prioritize paths in our knowledge graph to produce a predicted treatment mechanism. We found that DrugMechDB paths, when present in the network were rated highly among predicted mechanisms. We then demonstrated MechRepoNet's ability to use mechanistic insight to identify a drug's mechanistic target, with a mean reciprocal rank of 0.525 on a test set of known drug-target interactions. Finally, we walked through repurposing examples of the anti-cancer drug imatinib for use in the treatment of asthma, and metolazone for use in the treatment of osteoporosis, to demonstrate this method's utility in providing mechanistic insight into repurposing predictions it provides. AVAILABILITY AND IMPLEMENTATION The Python code to reproduce the entirety of this analysis is available at: https://github.com/SuLab/MechRepoNet (archived at https://doi.org/10.5281/zenodo.6456335). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Dylan Steinecke
- Department of Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Tong Shu Li
- Department of Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Núria Queralt-Rosinach
- Department of Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | |
Collapse
|
24
|
Kim S, Cheng T, He S, Thiessen PA, Li Q, Gindulyte A, Bolton EE. PubChem Protein, Gene, Pathway, and Taxonomy Data Collections: Bridging Biology and Chemistry through Target-Centric Views of PubChem Data. J Mol Biol 2022; 434:167514. [DOI: 10.1016/j.jmb.2022.167514] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 02/17/2022] [Accepted: 02/22/2022] [Indexed: 12/21/2022]
|
25
|
Vita R, Mody A, Overton JA, Buus S, Haley ST, Sette A, Mallajosyula V, Davis MM, Long DL, Willis RA, Peters B, Altman JD. Minimal Information about MHC Multimers (MIAMM). JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2022; 208:531-537. [PMID: 35042788 PMCID: PMC8830768 DOI: 10.4049/jimmunol.2100961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 11/09/2021] [Indexed: 02/03/2023]
Abstract
With the goal of improving the reproducibility and annotatability of MHC multimer reagent data, we present the establishment of a new data standard: Minimal Information about MHC Multimers (https://miamm.lji.org/). Multimers are engineered reagents composed of a ligand and a MHC, which can be represented in a standardized format using ontology terminology. We provide an online Web site to host the details of the standard, as well as a validation tool to assist with the adoption of the standard. We hope that this publication will bring increased awareness of Minimal Information about MHC Multimers and drive acceptance, ultimately improving the quality and documentation of multimer data in the scientific literature.
Collapse
Affiliation(s)
- Randi Vita
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA;
| | - Apurva Mody
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA
| | | | - Soren Buus
- Laboratory of Experimental Immunology, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Alessandro Sette
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA
- Department of Medicine, Division of Infectious Diseases and Global Public Health, University of California, San Diego, La Jolla, CA
| | - Vamsee Mallajosyula
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA
| | - Mark M Davis
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA
| | - Dale L Long
- Department of Microbiology and Immunology, Emory University School of Medicine, Atlanta, GA; and
| | - Richard A Willis
- Department of Microbiology and Immunology, Emory University School of Medicine, Atlanta, GA; and
| | - Bjoern Peters
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA
- Department of Medicine, Division of Infectious Diseases and Global Public Health, University of California, San Diego, La Jolla, CA
| | - John D Altman
- Department of Microbiology and Immunology, Emory University School of Medicine, Atlanta, GA; and
- Emory Vaccine Center and Yerkes National Primate Research Center, Emory University, Atlanta, GA
| |
Collapse
|
26
|
Hollas MAR, Robey M, Fellers R, LeDuc R, Thomas P, Kelleher N. The Human Proteoform Atlas: a FAIR community resource for experimentally derived proteoforms. Nucleic Acids Res 2022; 50:D526-D533. [PMID: 34986596 PMCID: PMC8728143 DOI: 10.1093/nar/gkab1086] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/06/2021] [Accepted: 11/14/2021] [Indexed: 01/01/2023] Open
Abstract
The Human Proteoform Atlas (HPfA) is a web-based repository of experimentally verified human proteoforms on-line at http://human-proteoform-atlas.org and is a direct descendant of the Consortium of Top-Down Proteomics' (CTDP) Proteoform Atlas. Proteoforms are the specific forms of protein molecules expressed by our cells and include the unique combination of post-translational modifications (PTMs), alternative splicing and other sources of variation deriving from a specific gene. The HPfA uses a FAIR system to assign persistent identifiers to proteoforms which allows for redundancy calling and tracking from prior and future studies in the growing community of proteoform biology and measurement. The HPfA is organized around open ontologies and enables flexible classification of proteoforms. To achieve this, a public registry of experimentally verified proteoforms was also created. Submission of new proteoforms can be processed through email vianrtdphelp@northwestern.edu, and future iterations of these proteoform atlases will help to organize and assign function to proteoforms, their PTMs and their complexes in the years ahead.
Collapse
Affiliation(s)
- Michael A R Hollas
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Matthew T Robey
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Ryan T Fellers
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Richard D LeDuc
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Paul M Thomas
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Neil L Kelleher
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| |
Collapse
|
27
|
Muscolino A, Di Maria A, Rapicavoli RV, Alaimo S, Bellomo L, Billeci F, Borzì S, Ferragina P, Ferro A, Pulvirenti A. NETME: on-the-fly knowledge network construction from biomedical literature. APPLIED NETWORK SCIENCE 2022; 7:1. [PMID: 35013714 PMCID: PMC8733431 DOI: 10.1007/s41109-021-00435-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 09/21/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND The rapidly increasing biological literature is a key resource to automatically extract and gain knowledge concerning biological elements and their relations. Knowledge Networks are helpful tools in the context of biological knowledge discovery and modeling. RESULTS We introduce a novel system called NETME, which, starting from a set of full-texts obtained from PubMed, through an easy-to-use web interface, interactively extracts biological elements from ontological databases and then synthesizes a network inferring relations among such elements. The results clearly show that our tool is capable of inferring comprehensive and reliable biological networks. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s41109-021-00435-x.
Collapse
Affiliation(s)
| | - Antonio Di Maria
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | | | - Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Lorenzo Bellomo
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Fabrizio Billeci
- Department of Maths and Computer Science, University of Catania, Catania, Italy
| | - Stefano Borzì
- Department of Maths and Computer Science, University of Catania, Catania, Italy
| | - Paolo Ferragina
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| |
Collapse
|
28
|
Manoharan S, Iyyappan OR. A Hybrid Protocol for Finding Novel Gene Targets for Various Diseases Using Microarray Expression Data Analysis and Text Mining. Methods Mol Biol 2022; 2496:41-70. [PMID: 35713858 DOI: 10.1007/978-1-0716-2305-3_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The advancement in technology for various scientific experiments and the amount of raw data produced from that is enormous, thus giving rise to various subsets of biologists working with genome, proteome, transcriptome, expression, pathway, and so on. This has led to exponential growth in scientific literature which is becoming beyond the means of manual curation and annotation for extracting information of importance. Microarray data are expression data, analysis of which results in a set of up/downregulated lists of genes that are functionally annotated to ascertain the biological meaning of genes. These genes are represented as vocabularies and/or Gene Ontology terms when associated with pathway enrichment analysis need relational and conceptual understanding to a disease. The chapter deals with a hybrid approach we designed for identifying novel drug-disease targets. Microarray data for muscular dystrophy is explored here as an example and text mining approaches are utilized with an aim to identify promisingly novel drug targets. Our main objective is to give a basic overview from a biologist's perspective for whom text mining approaches of data mining and information retrieval is fairly a new concept. The chapter aims to bridge the gap between biologist and computational text miners and bring about unison for a more informative research in a fast and time efficient manner.
Collapse
Affiliation(s)
- Sharanya Manoharan
- Department of Bioinformatics, Stella Maris College (Autonomous), Chennai, Tamilnadu, India.
| | - Oviya Ramalakshmi Iyyappan
- Department of Sciences, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Chennai, Tamilnadu, India
| |
Collapse
|
29
|
Harris MA, Rutherford KM, Hayles J, Lock A, Bähler J, Oliver SG, Mata J, Wood V. Fission stories: using PomBase to understand Schizosaccharomyces pombe biology. Genetics 2021; 220:6481557. [PMID: 35100366 PMCID: PMC9209812 DOI: 10.1093/genetics/iyab222] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 11/11/2021] [Indexed: 02/03/2023] Open
Abstract
PomBase (www.pombase.org), the model organism database (MOD) for the fission yeast Schizosaccharomyces pombe, supports research within and beyond the S. pombe community by integrating and presenting genetic, molecular, and cell biological knowledge into intuitive displays and comprehensive data collections. With new content, novel query capabilities, and biologist-friendly data summaries and visualization, PomBase also drives innovation in the MOD community.
Collapse
Affiliation(s)
- Midori A Harris
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK,Corresponding author: (M.A.H.); (V.W.)
| | - Kim M Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Jacqueline Hayles
- Cell Cycle Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - Antonia Lock
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Jürg Bähler
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Stephen G Oliver
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Juan Mata
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK,Corresponding author: (M.A.H.); (V.W.)
| |
Collapse
|
30
|
Babcock S, Beverley J, Cowell LG, Smith B. The Infectious Disease Ontology in the age of COVID-19. J Biomed Semantics 2021; 12:13. [PMID: 34275487 PMCID: PMC8286442 DOI: 10.1186/s13326-021-00245-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 06/21/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Effective response to public health emergencies, such as we are now experiencing with COVID-19, requires data sharing across multiple disciplines and data systems. Ontologies offer a powerful data sharing tool, and this holds especially for those ontologies built on the design principles of the Open Biomedical Ontologies Foundry. These principles are exemplified by the Infectious Disease Ontology (IDO), a suite of interoperable ontology modules aiming to provide coverage of all aspects of the infectious disease domain. At its center is IDO Core, a disease- and pathogen-neutral ontology covering just those types of entities and relations that are relevant to infectious diseases generally. IDO Core is extended by disease and pathogen-specific ontology modules. RESULTS To assist the integration and analysis of COVID-19 data, and viral infectious disease data more generally, we have recently developed three new IDO extensions: IDO Virus (VIDO); the Coronavirus Infectious Disease Ontology (CIDO); and an extension of CIDO focusing on COVID-19 (IDO-COVID-19). Reflecting the fact that viruses lack cellular parts, we have introduced into IDO Core the term acellular structure to cover viruses and other acellular entities studied by virologists. We now distinguish between infectious agents - organisms with an infectious disposition - and infectious structures - acellular structures with an infectious disposition. This in turn has led to various updates and refinements of IDO Core's content. We believe that our work on VIDO, CIDO, and IDO-COVID-19 can serve as a model for yielding greater conformance with ontology building best practices. CONCLUSIONS IDO provides a simple recipe for building new pathogen-specific ontologies in a way that allows data about novel diseases to be easily compared, along multiple dimensions, with data represented by existing disease ontologies. The IDO strategy, moreover, supports ontology coordination, providing a powerful method of data integration and sharing that allows physicians, researchers, and public health organizations to respond rapidly and efficiently to current and future public health crises.
Collapse
Affiliation(s)
- Shane Babcock
- Department of Philosophy, Niagara University, Lewiston, NY, USA.
- National Center for Ontological Research, University at Buffalo, Buffalo, NY, USA.
| | - John Beverley
- National Center for Ontological Research, University at Buffalo, Buffalo, NY, USA
- Department of Philosophy, Northwestern University, Evanston, IL, USA
| | - Lindsay G Cowell
- National Center for Ontological Research, University at Buffalo, Buffalo, NY, USA
- Cowell Lab, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Barry Smith
- National Center for Ontological Research, University at Buffalo, Buffalo, NY, USA
- Department of Philosophy, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
31
|
Jaiswal S, Jagannadham J, Kumari J, Iquebal MA, Gurjar AKS, Nayan V, Angadi UB, Kumar S, Kumar R, Datta TK, Rai A, Kumar D. Genome Wide Prediction, Mapping and Development of Genomic Resources of Mastitis Associated Genes in Water Buffalo. Front Vet Sci 2021; 8:593871. [PMID: 34222390 PMCID: PMC8253262 DOI: 10.3389/fvets.2021.593871] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 04/30/2021] [Indexed: 12/16/2022] Open
Abstract
Water buffalo (Bubalus bubalis) are an important animal resource that contributes milk, meat, leather, dairy products, and power for plowing and transport. However, mastitis, a bacterial disease affecting milk production and reproduction efficiency, is most prevalent in populations having intensive selection for higher milk yield, especially where the inbreeding level is also high. Climate change and poor hygiene management practices further complicate the issue. The management of this disease faces major challenges, like antibiotic resistance, maximum residue level, horizontal gene transfer, and limited success in resistance breeding. Bovine mastitis genome wide association studies have had limited success due to breed differences, sample sizes, and minor allele frequency, lowering the power to detect the diseases associated with SNPs. In this work, we focused on the application of targeted gene panels (TGPs) in screening for candidate gene association analysis, and how this approach overcomes the limitation of genome wide association studies. This work will facilitate the targeted sequencing of buffalo genomic regions with high depth coverage required to mine the extremely rare variants potentially associated with buffalo mastitis. Although the whole genome assembly of water buffalo is available, neither mastitis genes are predicted nor TGP in the form of web-genomic resources are available for future variant mining and association studies. Out of the 129 mastitis associated genes of cattle, 101 were completely mapped on the buffalo genome to make TGP. This further helped in identifying rare variants in water buffalo. Eighty-five genes were validated in the buffalo gene expression atlas, with the RNA-Seq data of 50 tissues. The functions of 97 genes were predicted, revealing 225 pathways. The mastitis proteins were used for protein-protein interaction network analysis to obtain additional cross-talking proteins. A total of 1,306 SNPs and 152 indels were identified from 101 genes. Water Buffalo-MSTdb was developed with 3-tier architecture to retrieve mastitis associated genes having genomic coordinates with chromosomal details for TGP sequencing for mining of minor alleles for further association studies. Lastly, a web-genomic resource was made available to mine variants of targeted gene panels in buffalo for mastitis resistance breeding in an endeavor to ensure improved productivity and the reproductive efficiency of water buffalo.
Collapse
Affiliation(s)
- Sarika Jaiswal
- Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Jaisri Jagannadham
- Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Juli Kumari
- Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Mir Asif Iquebal
- Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anoop Kishor Singh Gurjar
- Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Varij Nayan
- Indian Council of Agricultural Research (ICAR)-Central Institute for Research on Buffaloes, Hisar, India
| | - Ulavappa B Angadi
- Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Sunil Kumar
- Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Rakesh Kumar
- Animal Biotechnology Centre, Indian Council of Agricultural Research (ICAR)-National Dairy research Institute, Karnal, India
| | - Tirtha Kumar Datta
- Animal Biotechnology Centre, Indian Council of Agricultural Research (ICAR)-National Dairy research Institute, Karnal, India
| | - Anil Rai
- Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Dinesh Kumar
- Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, New Delhi, India
| |
Collapse
|
32
|
Gogate N, Lyman D, Bell A, Cauley E, Crandall KA, Joseph A, Kahsay R, Natale DA, Schriml LM, Sen S, Mazumder R. COVID-19 biomarkers and their overlap with comorbidities in a disease biomarker data model. Brief Bioinform 2021; 22:6278606. [PMID: 34015823 PMCID: PMC8195003 DOI: 10.1093/bib/bbab191] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 03/29/2021] [Accepted: 04/26/2021] [Indexed: 12/23/2022] Open
Abstract
In response to the COVID-19 outbreak, scientists and medical researchers are capturing a wide range of host responses, symptoms and lingering postrecovery problems within the human population. These variable clinical manifestations suggest differences in influential factors, such as innate and adaptive host immunity, existing or underlying health conditions, comorbidities, genetics and other factors—compounding the complexity of COVID-19 pathobiology and potential biomarkers associated with the disease, as they become available. The heterogeneous data pose challenges for efficient extrapolation of information into clinical applications. We have curated 145 COVID-19 biomarkers by developing a novel cross-cutting disease biomarker data model that allows integration and evaluation of biomarkers in patients with comorbidities. Most biomarkers are related to the immune (SAA, TNF-∝ and IP-10) or coagulation (D-dimer, antithrombin and VWF) cascades, suggesting complex vascular pathobiology of the disease. Furthermore, we observe commonality with established cancer biomarkers (ACE2, IL-6, IL-4 and IL-2) as well as biomarkers for metabolic syndrome and diabetes (CRP, NLR and LDL). We explore these trends as we put forth a COVID-19 biomarker resource (https://data.oncomx.org/covid19) that will help researchers and diagnosticians alike.
Collapse
Affiliation(s)
- Nikhita Gogate
- George Washington University School of Medicine and Health Sciences, Washington, DC 20037, USA
| | - Daniel Lyman
- George Washington University School of Medicine and Health Sciences, Department of Biochemistry and Molecular Medicine, Washington, DC 20037, USA
| | - Amanda Bell
- George Washington University School of Medicine and Health Sciences, Washington, DC 20037, USA
| | - Edmund Cauley
- George Washington University School of Medicine and Health Sciences, Washington, DC 20037, USA
| | - Keith A Crandall
- Computational Biology Institute at The George Washington University, Washington, DC 20037, USA
| | - Ashia Joseph
- George Washington University, Washington, DC 20037, USA
| | - Robel Kahsay
- George Washington University School of Medicine and Health Sciences, Department of Biochemistry and Molecular Medicine, Washington, DC 20037, USA
| | - Darren A Natale
- Georgetown University Medical Center, Washington, DC 20037, USA
| | - Lynn M Schriml
- University of Maryland, School of Medicine in Baltimore, MD, USA
| | - Sabyasach Sen
- George Washington University School of Medicine and Health Sciences, Washington, DC 20037, USA
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, USA
| |
Collapse
|
33
|
Galgonek J, Vondrášek J. IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminform 2021; 13:38. [PMID: 33980298 PMCID: PMC8117646 DOI: 10.1186/s13321-021-00515-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 04/23/2021] [Indexed: 11/12/2022] Open
Abstract
The Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic.
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic
| |
Collapse
|
34
|
Good BM, Van Auken K, Hill DP, Mi H, Carbon S, Balhoff JP, Albou LP, Thomas PD, Mungall CJ, Blake JA, D'Eustachio P. Reactome and the Gene Ontology: Digital convergence of data resources. Bioinformatics 2021; 37:3343-3348. [PMID: 33964129 PMCID: PMC8504636 DOI: 10.1093/bioinformatics/btab325] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 03/18/2021] [Accepted: 04/27/2021] [Indexed: 12/22/2022] Open
Abstract
Motivation Gene Ontology Causal Activity Models (GO-CAMs) assemble individual associations of gene products with cellular components, molecular functions and biological processes into causally linked activity flow models. Pathway databases such as the Reactome Knowledgebase create detailed molecular process descriptions of reactions and assemble them, based on sharing of entities between individual reactions into pathway descriptions. Results To convert the rich content of Reactome into GO-CAMs, we have developed a software tool, Pathways2GO, to convert the entire set of normal human Reactome pathways into GO-CAMs. This conversion yields standard GO annotations from Reactome content and supports enhanced quality control for both Reactome and GO, yielding a nearly seamless conversion between these two resources for the bioinformatics community. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin M Good
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley CA 94720 USA
| | - Kimberly Van Auken
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena CA 91125 USA
| | | | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles CA 90033 USA
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley CA 94720 USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517 USA
| | - Laurent-Philippe Albou
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles CA 90033 USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles CA 90033 USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley CA 94720 USA
| | | | - Peter D'Eustachio
- Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York NY 10016 USA
| |
Collapse
|
35
|
Roth YD, Lian Z, Pochiraju S, Shaikh B, Karr JR. Datanator: an integrated database of molecular data for quantitatively modeling cellular behavior. Nucleic Acids Res 2021; 49:D516-D522. [PMID: 33174603 PMCID: PMC7779073 DOI: 10.1093/nar/gkaa1008] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 10/12/2020] [Accepted: 10/21/2020] [Indexed: 12/23/2022] Open
Abstract
Integrative research about multiple biochemical subsystems has significant potential to help advance biology, bioengineering and medicine. However, it is difficult to obtain the diverse data needed for integrative research. To facilitate biochemical research, we developed Datanator (https://datanator.info), an integrated database and set of tools for finding clouds of multiple types of molecular data about specific molecules and reactions in specific organisms and environments, as well as data about chemically-similar molecules and reactions in phylogenetically-similar organisms in similar environments. Currently, Datanator includes metabolite concentrations, RNA modifications and half-lives, protein abundances and modifications, and reaction rate constants about a broad range of organisms. Going forward, we aim to launch a community initiative to curate additional data. Datanator also provides tools for filtering, visualizing and exporting these data clouds. We believe that Datanator can facilitate a wide range of research from integrative mechanistic models, such as whole-cell models, to comparative data-driven analyses of multiple organisms.
Collapse
Affiliation(s)
- Yosef D Roth
- Icahn Institute for Data Science and Genomic Technology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1255 5th Avenue, Suite C2, New York, NY 10029, USA
| | - Zhouyang Lian
- Icahn Institute for Data Science and Genomic Technology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1255 5th Avenue, Suite C2, New York, NY 10029, USA
| | - Saahith Pochiraju
- Icahn Institute for Data Science and Genomic Technology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1255 5th Avenue, Suite C2, New York, NY 10029, USA
| | - Bilal Shaikh
- Icahn Institute for Data Science and Genomic Technology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1255 5th Avenue, Suite C2, New York, NY 10029, USA
| | - Jonathan R Karr
- Icahn Institute for Data Science and Genomic Technology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1255 5th Avenue, Suite C2, New York, NY 10029, USA
| |
Collapse
|
36
|
Henry V, Moszer I, Dameron O, Vila Xicota L, Dubois B, Potier MC, Hofmann-Apitius M, Colliot O. Converting disease maps into heavyweight ontologies: general methodology and application to Alzheimer's disease. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6137817. [PMID: 33590873 DOI: 10.1093/database/baab004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 01/17/2021] [Accepted: 01/27/2021] [Indexed: 11/12/2022]
Abstract
Omics technologies offer great promises for improving our understanding of diseases. The integration and interpretation of such data pose major challenges, calling for adequate knowledge models. Disease maps provide curated knowledge about disorders' pathophysiology at the molecular level adapted to omics measurements. However, the expressiveness of disease maps could be increased to help in avoiding ambiguities and misinterpretations and to reinforce their interoperability with other knowledge resources. Ontology is an adequate framework to overcome this limitation, through their axiomatic definitions and logical reasoning properties. We introduce the Disease Map Ontology (DMO), an ontological upper model based on systems biology terms. We then propose to apply DMO to Alzheimer's disease (AD). Specifically, we use it to drive the conversion of AlzPathway, a disease map devoted to AD, into a formal ontology: Alzheimer DMO. We demonstrate that it allows one to deal with issues related to redundancy, naming, consistency, process classification and pathway relationships. Furthermore, we show that it can store and manage multi-omics data. Finally, we expand the model using elements from other resources, such as clinical features contained in the AD Ontology, resulting in an enriched model called ADMO-plus. The current versions of DMO, ADMO and ADMO-plus are freely available at http://bioportal.bioontology.org/ontologies/ADMO.
Collapse
Affiliation(s)
- Vincent Henry
- Inria Paris, Aramis Project-Team, Paris 75013, France.,Institut du Cerveau et de la Moelle épinière, ICM, Paris 75013, France.,Inserm, U 1127, Paris 75013, France.,CNRS, UMR 7225, Paris 75013, France.,Sorbonne Université, Paris 75013, France.,ICONICS Core Facility, Paris Brain Institute, Paris 75013, France
| | - Ivan Moszer
- Institut du Cerveau et de la Moelle épinière, ICM, Paris 75013, France.,Inserm, U 1127, Paris 75013, France.,CNRS, UMR 7225, Paris 75013, France.,Sorbonne Université, Paris 75013, France.,ICONICS Core Facility, Paris Brain Institute, Paris 75013, France
| | - Olivier Dameron
- Univ Rennes, CNRS, Inria, IRISA-UMR 6074, Rennes 35000, France
| | - Laura Vila Xicota
- Institut du Cerveau et de la Moelle épinière, ICM, Paris 75013, France.,Inserm, U 1127, Paris 75013, France.,CNRS, UMR 7225, Paris 75013, France.,Sorbonne Université, Paris 75013, France.,Alzheimer's and Prion Diseases Team, Paris Brain Institute, Paris 75013, France
| | - Bruno Dubois
- Institut du Cerveau et de la Moelle épinière, ICM, Paris 75013, France.,Inserm, U 1127, Paris 75013, France.,CNRS, UMR 7225, Paris 75013, France.,Sorbonne Université, Paris 75013, France.,AP-HP, Hôpital de la Pitié-Salpêtrière, Department of Neurology, Institut de la Mémoire et de la Maladie d'Alzheimer (IM2A), Paris 75013, France
| | - Marie-Claude Potier
- Institut du Cerveau et de la Moelle épinière, ICM, Paris 75013, France.,Inserm, U 1127, Paris 75013, France.,CNRS, UMR 7225, Paris 75013, France.,Sorbonne Université, Paris 75013, France.,Alzheimer's and Prion Diseases Team, Paris Brain Institute, Paris 75013, France
| | | | - Olivier Colliot
- Inria Paris, Aramis Project-Team, Paris 75013, France.,Institut du Cerveau et de la Moelle épinière, ICM, Paris 75013, France.,Inserm, U 1127, Paris 75013, France.,CNRS, UMR 7225, Paris 75013, France.,Sorbonne Université, Paris 75013, France
| | | |
Collapse
|
37
|
Ong E, Wang LL, Schaub J, O'Toole JF, Steck B, Rosenberg AZ, Dowd F, Hansen J, Barisoni L, Jain S, de Boer IH, Valerius MT, Waikar SS, Park C, Crawford DC, Alexandrov T, Anderton CR, Stoeckert C, Weng C, Diehl AD, Mungall CJ, Haendel M, Robinson PN, Himmelfarb J, Iyengar R, Kretzler M, Mooney S, He Y. Modelling kidney disease using ontology: insights from the Kidney Precision Medicine Project. Nat Rev Nephrol 2020; 16:686-696. [PMID: 32939051 PMCID: PMC8012202 DOI: 10.1038/s41581-020-00335-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/24/2020] [Indexed: 12/29/2022]
Abstract
An important need exists to better understand and stratify kidney disease according to its underlying pathophysiology in order to develop more precise and effective therapeutic agents. National collaborative efforts such as the Kidney Precision Medicine Project are working towards this goal through the collection and integration of large, disparate clinical, biological and imaging data from patients with kidney disease. Ontologies are powerful tools that facilitate these efforts by enabling researchers to organize and make sense of different data elements and the relationships between them. Ontologies are critical to support the types of big data analysis necessary for kidney precision medicine, where heterogeneous clinical, imaging and biopsy data from diverse sources must be combined to define a patient's phenotype. The development of two new ontologies - the Kidney Tissue Atlas Ontology and the Ontology of Precision Medicine and Investigation - will support the creation of the Kidney Tissue Atlas, which aims to provide a comprehensive molecular, cellular and anatomical map of the kidney. These ontologies will improve the annotation of kidney-relevant data, and eventually lead to new definitions of kidney disease in support of precision medicine.
Collapse
Affiliation(s)
- Edison Ong
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Lucy L Wang
- Allen Institute for Artificial Intelligence, Seattle, WA, USA
| | - Jennifer Schaub
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - John F O'Toole
- Department of Nephrology and Hypertension, Glickman Urological and Kidney Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Becky Steck
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Avi Z Rosenberg
- Department of Pathology, Johns Hopkins University, Baltimore, MD, USA
| | - Frederick Dowd
- UW Medicine Research IT, University of Washington, Seattle, WA, USA
| | - Jens Hansen
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Systems Biomedicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Laura Barisoni
- Division of AI/Computational Pathology, Department of Pathology, and Division of Nephrology, Department of Medicine, Duke University, Durham, NC, USA
| | - Sanjay Jain
- Division of Nephrology, School of Medicine, Washington University in St. Louis, St Louis, MO, USA
| | - Ian H de Boer
- Division of Nephrology, Department of Medicine, University of Washington, Seattle, WA, USA
| | - M Todd Valerius
- Division of Renal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Sushrut S Waikar
- Section of Nephrology, Boston University Medical Center, Boston, MA, USA
| | - Christopher Park
- Kidney Research Institute, University of Washington, Seattle, WA, USA
| | - Dana C Crawford
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Cleveland Institute for Computational Biology, Cleveland, OH, USA
| | - Theodore Alexandrov
- Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | | | - Christian Stoeckert
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania Philadelphia, Philadelphia, PA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Alexander D Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Melissa Haendel
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jonathan Himmelfarb
- Division of Nephrology, Department of Medicine, University of Washington, Seattle, WA, USA
- Kidney Research Institute, University of Washington, Seattle, WA, USA
| | - Ravi Iyengar
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Systems Biomedicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Matthias Kretzler
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Sean Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA.
| | - Yongqun He
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI, USA.
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI, USA.
| |
Collapse
|
38
|
Protein ontology on the semantic web for knowledge discovery. Sci Data 2020; 7:337. [PMID: 33046717 PMCID: PMC7550340 DOI: 10.1038/s41597-020-00679-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 09/17/2020] [Indexed: 11/26/2022] Open
Abstract
The Protein Ontology (PRO) provides an ontological representation of protein-related entities, ranging from protein families to proteoforms to complexes. Protein Ontology Linked Open Data (LOD) exposes, shares, and connects knowledge about protein-related entities on the Semantic Web using Resource Description Framework (RDF), thus enabling integration with other Linked Open Data for biological knowledge discovery. For example, proteins (or variants thereof) can be retrieved on the basis of specific disease associations. As a community resource, we strive to follow the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles, disseminate regular updates of our data, support multiple methods for accessing, querying and downloading data in various formats, and provide documentation both for scientists and programmers. PRO Linked Open Data can be browsed via faceted browser interface and queried using SPARQL via YASGUI. RDF data dumps are also available for download. Additionally, we developed RESTful APIs to support programmatic data access. We also provide W3C HCLS specification compliant metadata description for our data. The PRO Linked Open Data is available at https://lod.proconsortium.org/.
Collapse
|
39
|
Gogate N, Lyman D, Crandall K, Kahsay R, Natale D, Sen S, Mazumder R. COVID-19 Biomarkers in research: Extension of the OncoMX cancer biomarker data model to capture biomarker data from other diseases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.09.09.196220. [PMID: 32935101 PMCID: PMC7491515 DOI: 10.1101/2020.09.09.196220] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Scientists, medical researchers, and health care workers have mobilized worldwide in response to the outbreak of COVID-19, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; SCoV2). Preliminary data have captured a wide range of host responses, symptoms, and lingering problems post-recovery within the human population. These variable clinical manifestations suggest differences in influential factors, such as innate and adaptive host immunity, existing or underlying health conditions, co-morbidities, genetics, and other factors. As COVID-19-related data continue to accumulate from disparate groups, the heterogeneous nature of these datasets poses challenges for efficient extrapolation of meaningful observations, hindering translation of information into clinical applications. Attempts to utilize, analyze, or combine biomarker datasets from multiple sources have shown to be inefficient and complicated, without a unifying resource. As such, there is an urgent need within the research community for the rapid development of an integrated and harmonized COVID-19 Biomarker Knowledgebase. By leveraging data collection and integration methods, backed by a robust data model developed to capture cancer biomarker data we have rapidly crowdsourced the collection and harmonization of COVID-19 biomarkers. Our resource currently has 138 unique biomarkers. We found multiple instances of the same biomarker substance being suggested as multiple biomarker types during our extensive cross-validation and manual curation. As a result, our Knowledgebase currently has 265 biomarker type combinations. Every biomarker entry is made comprehensive by bringing in together ancillary data from multiple sources such as biomarker accessions (canonical UniProtKB accession, PubChem Compound ID, Cell Ontology ID, Protein Ontology ID, NCI Thesaurus Code, and Disease Ontology ID), BEST biomarker category, and specimen type (Uberon Anatomy Ontology) unified with ontology standards. Our preliminary observations show distinct trends in the collated biomarkers. Most biomarkers are related to the immune system (SAA,TNF-∝, and IP-10) or coagulopathies (D-dimer, antithrombin, and VWF) and a few have already been established as cancer biomarkers (ACE2, IL-6, IL-4 and IL-2). These trends align with proposed hypotheses of clinical manifestations compounding the complexity of COVID-19 pathobiology. We explore these trends as we put forth a COVID-19 biomarker resource that will help researchers and diagnosticians alike. All biomarker data are freely available from https://data.oncomx.org/covid19 .
Collapse
Affiliation(s)
- N Gogate
- The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037
| | - D Lyman
- The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037
| | - K.A Crandall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, D.C., USA
| | - R Kahsay
- The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037
| | - D.A Natale
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA
| | - S Sen
- Division of Endocrinology, Department of Medicine, The George Washington University, Washington, DC, USA
| | - R Mazumder
- The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037
- The McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, United States of America
| |
Collapse
|
40
|
Lang PF, Chebaro Y, Zheng X, P Sekar JA, Shaikh B, Natale DA, Karr JR. BpForms and BcForms: a toolkit for concretely describing non-canonical polymers and complexes to facilitate global biochemical networks. Genome Biol 2020; 21:117. [PMID: 32423472 PMCID: PMC7236495 DOI: 10.1186/s13059-020-02025-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 04/16/2020] [Indexed: 12/12/2022] Open
Abstract
Non-canonical residues, caps, crosslinks, and nicks are important to many functions of DNAs, RNAs, proteins, and complexes. However, we do not fully understand how networks of such non-canonical macromolecules generate behavior. One barrier is our limited formats for describing macromolecules. To overcome this barrier, we develop BpForms and BcForms, a toolkit for representing the primary structure of macromolecules as combinations of residues, caps, crosslinks, and nicks. The toolkit can help omics researchers perform quality control and exchange information about macromolecules, help systems biologists assemble global models of cells that encompass processes such as post-translational modification, and help bioengineers design cells.
Collapse
Affiliation(s)
- Paul F Lang
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK
| | - Yassmine Chebaro
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Université de Strasbourg, Illkirch, 67404, France
| | - Xiaoyue Zheng
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
| | - John A P Sekar
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
| | - Bilal Shaikh
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, 20007, USA
| | - Jonathan R Karr
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA.
| |
Collapse
|
41
|
Xu H, Liu Y, Li Y, Diao L, Xun Z, Zhang Y, Wang Z, Li D. RadAtlas 1.0: a knowledgebase focusing on radiation-associated genes. Int J Radiat Biol 2020; 96:980-987. [PMID: 32338561 DOI: 10.1080/09553002.2020.1761567] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Purpose: Ionizing radiation has very complex biological effects, such as inducing damage to DNA and proteins, ionizing water molecules to produce toxic free radicals, and triggering genetic and somatic effects. Understanding the biomolecular response mechanism of radiation is very important for the prevention and treatment of radiation diseases. However, function information of these radiation-associated genes is hidden in numbers of scientific papers and databases, making it difficult to understand the response mechanism of ionizing radiation.Materials and methods: We collected radiation-associated genes by literature and database mining. Literature and database mining was performed on the basis of biomedical literature from PubMed and gene expression datasets from GEO respectively.Results: We built an ionizing radiation related knowledgebase RadAtlas 1.0 (http://biokb.ncpsb.org/radatlas), which contains 598 radiation-associated genes compiled from literature mining, and 611 potential radiation-associated genes collected from gene expression datasets by differential gene expression analysis. We also provide a user-friendly web interface that offers multiple search methods.Conclusions: RadAtlas collected a large amount of information about genes, biological processes, and pathways related to ionizing radiation. It is the first attempt to provide a comprehensive catalog of radiation-associated genes with literature evidence and potential radiation-associated genes with differential expression evidence. We believe that RadAtlas would be a helpful tool to understand the response mechanism to ionizing radiation.
Collapse
Affiliation(s)
- Hao Xu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China
| | - Yuan Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China
| | - Yang Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China
| | - Lihong Diao
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China
| | - Ziyu Xun
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China
| | - Yuqi Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China
| | - Zhidong Wang
- Department of Radiobiology, Beijing Key Laboratory for Radiobiology, Beijing Institute of Radiation Medicine, Beijing, China
| | - Dong Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing, China
| |
Collapse
|
42
|
Tripodi IJ, Callahan TJ, Westfall JT, Meitzer NS, Dowell RD, Hunter LE. Applying knowledge-driven mechanistic inference to toxicogenomics. Toxicol In Vitro 2020; 66:104877. [PMID: 32387679 DOI: 10.1016/j.tiv.2020.104877] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 04/13/2020] [Accepted: 04/23/2020] [Indexed: 02/07/2023]
Abstract
When considering toxic chemicals in the environment, a mechanistic, causal explanation of toxicity may be preferred over a statistical or machine learning-based prediction by itself. Elucidating a mechanism of toxicity is, however, a costly and time-consuming process that requires the participation of specialists from a variety of fields, often relying on animal models. We present an innovative mechanistic inference framework (MechSpy), which can be used as a hypothesis generation aid to narrow the scope of mechanistic toxicology analysis. MechSpy generates hypotheses of the most likely mechanisms of toxicity, by combining a semantically-interconnected knowledge representation of human biology, toxicology and biochemistry with gene expression time series on human tissue. Using vector representations of biological entities, MechSpy seeks enrichment in a manually curated list of high-level mechanisms of toxicity, represented as biochemically- and causally-linked ontology concepts. Besides predicting the canonical mechanism of toxicity for many well-studied compounds, we experimentally validated some of our predictions for other chemicals without an established mechanism of toxicity. This mechanistic inference framework is an advantageous tool for predictive toxicology, and the first of its kind to produce a mechanistic explanation for each prediction. MechSpy can be modified to include additional mechanisms of toxicity, and is generalizable to other types of mechanisms of human biology.
Collapse
Affiliation(s)
- Ignacio J Tripodi
- University of Colorado, Computer Science / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA.
| | - Tiffany J Callahan
- University of Colorado Anschutz Medical Campus, Computational Bioscience, Denver, CO 80045, USA
| | - Jessica T Westfall
- University of Colorado, Molecular, Cellular and Developmental Biology, Boulder, CO 80309, USA
| | | | - Robin D Dowell
- University of Colorado, Molecular, Cellular and Developmental Biology / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA
| | - Lawrence E Hunter
- University of Colorado Anschutz Medical Campus, Computational Bioscience / Interdisciplinary Quantitative Biology, Denver, CO 80045, USA
| |
Collapse
|
43
|
Smith LM, Thomas PM, Shortreed MR, Schaffer LV, Fellers RT, LeDuc RD, Tucholski T, Ge Y, Agar JN, Anderson LC, Chamot-Rooke J, Gault J, Loo JA, Paša-Tolić L, Robinson CV, Schlüter H, Tsybin YO, Vilaseca M, Vizcaíno JA, Danis PO, Kelleher NL. A five-level classification system for proteoform identifications. Nat Methods 2020; 16:939-940. [PMID: 31451767 DOI: 10.1038/s41592-019-0573-x] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA.
| | - Paul M Thomas
- Department of Chemistry and Molecular Biosciences, Northwestern University, Evanston, IL, USA.,National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, IL, USA
| | | | - Leah V Schaffer
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Ryan T Fellers
- Department of Chemistry and Molecular Biosciences, Northwestern University, Evanston, IL, USA.,National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, IL, USA
| | - Richard D LeDuc
- Department of Chemistry and Molecular Biosciences, Northwestern University, Evanston, IL, USA.,National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, IL, USA
| | - Trisha Tucholski
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Ying Ge
- Department of Cell and Regenerative Biology and Human Proteomics Program, University of Wisconsin-Madison, Madison, WI, USA
| | - Jeffrey N Agar
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA, USA
| | - Lissa C Anderson
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Tallahassee, FL, USA
| | | | - Joseph Gault
- Department of Chemistry, University of Oxford, Oxford, UK
| | - Joseph A Loo
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, USA
| | - Ljiljana Paša-Tolić
- Environmental Molecular Sciences Laboratory and Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | | | | | | | | | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Paul O Danis
- Consortium for Top Down Proteomics, Cambridge, MA, USA
| | - Neil L Kelleher
- Department of Chemistry and Molecular Biosciences, Northwestern University, Evanston, IL, USA. .,National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, IL, USA.
| |
Collapse
|
44
|
Gavali S, Cowart J, Chen C, Ross KE, Arighi C, Wu CH. RESTful API for iPTMnet: a resource for protein post-translational modification network discovery. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5829784. [PMID: 32395768 PMCID: PMC7216315 DOI: 10.1093/database/baz157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 12/09/2019] [Accepted: 12/23/2019] [Indexed: 11/12/2022]
Abstract
iPTMnet is a bioinformatics resource that integrates protein post-translational modification (PTM) data from text mining and curated databases and ontologies to aid in knowledge discovery and scientific study. The current iPTMnet website can be used for querying and browsing rich PTM information but does not support automated iPTMnet data integration with other tools. Hence, we have developed a RESTful API utilizing the latest developments in cloud technologies to facilitate the integration of iPTMnet into existing tools and pipelines. We have packaged iPTMnet API software in Docker containers and published it on DockerHub for easy redistribution. We have also developed Python and R packages that allow users to integrate iPTMnet for scientific discovery, as demonstrated in a use case that connects PTM sites to kinase signaling pathways.
Collapse
Affiliation(s)
- Sachin Gavali
- Center for Bioinformatics and Computational Biology, 205 Delaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19711, USA
| | - Julie Cowart
- Center for Bioinformatics and Computational Biology, 205 Delaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19711, USA
| | - Chuming Chen
- Center for Bioinformatics and Computational Biology, 205 Delaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19711, USA.,Department of Computer and Information Sciences, 101 Smith Hall, 18 Amstel Ave Newark, DE 19716, USA
| | - Karen E Ross
- Department of Biochemistry and Molecular & Cellular Biology, 337 Basic Science Building, 3900 Reservoir Road, N.W, Washington D.C. 20057, USA
| | - Cecilia Arighi
- Center for Bioinformatics and Computational Biology, 205 Delaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19711, USA.,Department of Computer and Information Sciences, 101 Smith Hall, 18 Amstel Ave Newark, DE 19716, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, 205 Delaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19711, USA.,Department of Biochemistry and Molecular & Cellular Biology, 337 Basic Science Building, 3900 Reservoir Road, N.W, Washington D.C. 20057, USA.,Department of Computer and Information Sciences, 101 Smith Hall, 18 Amstel Ave Newark, DE 19716, USA
| |
Collapse
|
45
|
FibroAtlas: A Database for the Exploration of Fibrotic Diseases and Their Genes. Cardiol Res Pract 2019; 2019:4237285. [PMID: 32082621 PMCID: PMC7012261 DOI: 10.1155/2019/4237285] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 12/06/2019] [Indexed: 12/21/2022] Open
Abstract
Background Fibrosis is a highly dynamic process caused by prolonged injury, deregulation of the normal processes of wound healing, and extensive deposition of extracellular matrix (ECM) proteins. During fibrosis process, multiple genes interact with environmental factors. Over recent decades, tons of fibrosis-related genes have been identified to shed light on the particular clinical manifestations of this complex process. However, the genetics information about fibrosis is dispersed in lots of extensive literature. Methods We extracted data from literature abstracts in PubMed by text mining, and manually curated the literature and identified the evidence sentences. Results We presented FibroAtlas, which included 1,439 well-annotated fibrosis-associated genes. FibroAtlas 1.0 is the first attempt to build a nonredundant and comprehensive catalog of fibrosis-related genes with supporting evidence derived from curated published literature and allows us to have an overview of human fibrosis-related genes.
Collapse
|
46
|
Sánchez LFH, Burger B, Horro C, Fabregat A, Johansson S, Njølstad PR, Barsnes H, Hermjakob H, Vaudel M. PathwayMatcher: proteoform-centric network construction enables fine-granularity multiomics pathway mapping. Gigascience 2019; 8:giz088. [PMID: 31363752 PMCID: PMC6667378 DOI: 10.1093/gigascience/giz088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 06/03/2019] [Accepted: 06/30/2019] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Mapping biomedical data to functional knowledge is an essential task in bioinformatics and can be achieved by querying identifiers (e.g., gene sets) in pathway knowledge bases. However, the isoform and posttranslational modification states of proteins are lost when converting input and pathways into gene-centric lists. FINDINGS Based on the Reactome knowledge base, we built a network of protein-protein interactions accounting for the documented isoform and modification statuses of proteins. We then implemented a command line application called PathwayMatcher (github.com/PathwayAnalysisPlatform/PathwayMatcher) to query this network. PathwayMatcher supports multiple types of omics data as input and outputs the possibly affected biochemical reactions, subnetworks, and pathways. CONCLUSIONS PathwayMatcher enables refining the network representation of pathways by including proteoforms defined as protein isoforms with posttranslational modifications. The specificity of pathway analyses is hence adapted to different levels of granularity, and it becomes possible to distinguish interactions between different forms of the same protein.
Collapse
Affiliation(s)
- Luis Francisco Hernández Sánchez
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Children's Hospital, Haukeland University Hospital, 5021 Bergen, Norway
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, P.O Box 1400, 5021 Bergen, Norway
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Bram Burger
- Proteomics Unit, Department of Biomedicine, University of Bergen, Postbox 7804, 5020 Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, P.O. Box 7803, 5020 Bergen, Norway
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, Postbox 7804, 5020 Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, P.O. Box 7803, 5020 Bergen, Norway
| | - Antonio Fabregat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Stefan Johansson
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Children's Hospital, Haukeland University Hospital, 5021 Bergen, Norway
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, P.O Box 1400, 5021 Bergen, Norway
| | - Pål Rasmus Njølstad
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Children's Hospital, Haukeland University Hospital, 5021 Bergen, Norway
- Department of Pediatrics, Haukeland University Hospital, 5021 Bergen, Norway
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine, University of Bergen, Postbox 7804, 5020 Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, P.O. Box 7803, 5020 Bergen, Norway
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Beijing Proteome Research Center, National Center for Protein Sciences Beijing, No. 38, Life Science Park Road, Changping District, 102206 Beijing, China
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Children's Hospital, Haukeland University Hospital, 5021 Bergen, Norway
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, P.O Box 1400, 5021 Bergen, Norway
| |
Collapse
|
47
|
Binz PA, Shofstahl J, Vizcaíno JA, Barsnes H, Chalkley RJ, Menschaert G, Alpi E, Clauser K, Eng JK, Lane L, Seymour SL, Sánchez LFH, Mayer G, Eisenacher M, Perez-Riverol Y, Kapp EA, Mendoza L, Baker PR, Collins A, Van Den Bossche T, Deutsch EW. Proteomics Standards Initiative Extended FASTA Format. J Proteome Res 2019; 18:2686-2692. [PMID: 31081335 DOI: 10.1021/acs.jproteome.9b00064] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff .
Collapse
Affiliation(s)
- Pierre-Alain Binz
- CHUV Centre Hospitalier Universitaire Vaudois , CH-1011 Lausanne 14 , Switzerland
| | - Jim Shofstahl
- Thermo Fisher Scientific , 355 River Oaks Parkway , San Jose , California 95134 , United States
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine , University of Bergen , N-5009 Bergen , Norway.,Computational Biology Unit, Department of Informatics , University of Bergen , N-5008 Bergen , Norway
| | - Robert J Chalkley
- University California at San Francisco , San Francisco , California 94143 , United States
| | - Gerben Menschaert
- Biobix, Department of Data Analysis and Mathematical Modelling , Ghent University , 9000 Ghent , Belgium
| | - Emanuele Alpi
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Karl Clauser
- Broad Institute , Cambridge , Massachusetts 02142 , United States
| | - Jimmy K Eng
- University of Washington , Seattle , Washington 98195 , United States
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics , CH-1211 Geneva 4 , Switzerland.,Department of Microbiology and Molecular Medicine, Faculty of Medicine , University of Geneva , CH-1211 Geneva 4 , Switzerland
| | - Sean L Seymour
- Seymour Data Science, LLC , San Francisco , California 95000 , United States
| | - Luis Francisco Hernández Sánchez
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science , University of Bergen , 5021 Bergen , Norway.,Center for Medical Genetics and Molecular Medicine , Haukeland University Hospital , 5021 Bergen , Norway
| | - Gerhard Mayer
- Medical Faculty, Medizinisches Proteom-Center , Ruhr University Bochum , D-44801 Bochum , Germany
| | - Martin Eisenacher
- Medical Faculty, Medizinisches Proteom-Center , Ruhr University Bochum , D-44801 Bochum , Germany
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Eugene A Kapp
- Walter & Eliza Hall Institute of Medical Research and the University of Melbourne , Melbourne , VIC 3052 , Australia
| | - Luis Mendoza
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| | - Peter R Baker
- University California at San Francisco , San Francisco , California 94143 , United States
| | - Andrew Collins
- Department of Functional and Comparative Genomics, Institute of Integrated Biology , University of Liverpool , Liverpool L69 7ZB , United Kingdom
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology , Ghent University , 9000 Ghent , Belgium
| | - Eric W Deutsch
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| |
Collapse
|
48
|
Overton JA, Vita R, Dunn P, Burel JG, Bukhari SAC, Cheung KH, Kleinstein SH, Diehl AD, Peters B. Reporting and connecting cell type names and gating definitions through ontologies. BMC Bioinformatics 2019; 20:182. [PMID: 31272390 PMCID: PMC6509839 DOI: 10.1186/s12859-019-2725-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background Human immunology studies often rely on the isolation and quantification of cell populations from an input sample based on flow cytometry and related techniques. Such techniques classify cells into populations based on the detection of a pattern of markers. The description of the cell populations targeted in such experiments typically have two complementary components: the description of the cell type targeted (e.g. ‘T cells’), and the description of the marker pattern utilized (e.g. CD14−, CD3+). Results We here describe our attempts to use ontologies to cross-compare cell types and marker patterns (also referred to as gating definitions). We used a large set of such gating definitions and corresponding cell types submitted by different investigators into ImmPort, a central database for immunology studies, to examine the ability to parse gating definitions using terms from the Protein Ontology (PRO) and cell type descriptions, using the Cell Ontology (CL). We then used logical axioms from CL to detect discrepancies between the two. Conclusions We suggest adoption of our proposed format for describing gating and cell type definitions to make comparisons easier. We also suggest a number of new terms to describe gating definitions in flow cytometry that are not based on molecular markers captured in PRO, but on forward- and side-scatter of light during data acquisition, which is more appropriate to capture in the Ontology for Biomedical Investigations (OBI). Finally, our approach results in suggestions on what logical axioms and new cell types could be considered for addition to the Cell Ontology.
Collapse
Affiliation(s)
| | - Randi Vita
- Division for Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA
| | - Patrick Dunn
- ImmPort Curation Team, NG Health Solutions, Rockville, MD, USA
| | - Julie G Burel
- Division for Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA
| | | | - Kei-Hoi Cheung
- Department of Emergency Medicine and Yale Center for Medical Informatics, Yale School of Medicine, New Haven, CT, USA
| | - Steven H Kleinstein
- Department of Pathology, Yale School of Medicine, New Haven, Connecticut, USA.,Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Alexander D Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Bjoern Peters
- Division for Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA. .,Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
49
|
Neal ML, König M, Nickerson D, Mısırlı G, Kalbasi R, Dräger A, Atalag K, Chelliah V, Cooling MT, Cook DL, Crook S, de Alba M, Friedman SH, Garny A, Gennari JH, Gleeson P, Golebiewski M, Hucka M, Juty N, Myers C, Olivier BG, Sauro HM, Scharm M, Snoep JL, Touré V, Wipat A, Wolkenhauer O, Waltemath D. Harmonizing semantic annotations for computational models in biology. Brief Bioinform 2019; 20:540-550. [PMID: 30462164 PMCID: PMC6433895 DOI: 10.1093/bib/bby087] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 08/08/2018] [Accepted: 08/17/2018] [Indexed: 02/06/2023] Open
Abstract
Life science researchers use computational models to articulate and test hypotheses about the behavior of biological systems. Semantic annotation is a critical component for enhancing the interoperability and reusability of such models as well as for the integration of the data needed for model parameterization and validation. Encoded as machine-readable links to knowledge resource terms, semantic annotations describe the computational or biological meaning of what models and data represent. These annotations help researchers find and repurpose models, accelerate model composition and enable knowledge integration across model repositories and experimental data stores. However, realizing the potential benefits of semantic annotation requires the development of model annotation standards that adhere to a community-based annotation protocol. Without such standards, tool developers must account for a variety of annotation formats and approaches, a situation that can become prohibitively cumbersome and which can defeat the purpose of linking model elements to controlled knowledge resource terms. Currently, no consensus protocol for semantic annotation exists among the larger biological modeling community. Here, we report on the landscape of current annotation practices among the COmputational Modeling in BIology NEtwork community and provide a set of recommendations for building a consensus approach to semantic annotation.
Collapse
Affiliation(s)
- Maxwell Lewis Neal
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle, USA
| | - Matthias König
- Department of Biology, Humboldt-University Berlin, Institute for Theoretical Biology, Berlin, Germany
| | - David Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, NZ
| | - Göksel Mısırlı
- School of Computing and Mathematics, Keele University, Keele, UK
| | - Reza Kalbasi
- Auckland Bioengineering Institute, University of Auckland, Auckland, NZ
| | - Andreas Dräger
- Computational Systems Biology of Infection and Antimicrobial-Resistant Pathogens, Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Tübingen, Germany
- Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Koray Atalag
- Auckland Bioengineering Institute, University of Auckland, Auckland, NZ
| | - Vijayalakshmi Chelliah
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Michael T Cooling
- Auckland Bioengineering Institute, University of Auckland, Auckland, NZ
| | - Daniel L Cook
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Sharon Crook
- School of Mathematical and Statistical Sciences, Arizona State University, Tempe, USA
| | - Miguel de Alba
- German Federal Institute for Risk Assessment, Berlin, Germany
| | | | - Alan Garny
- Auckland Bioengineering Institute, University of Auckland, Auckland, NZ
| | - John H Gennari
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Padraig Gleeson
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, UK
| | - Martin Golebiewski
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Heidelberg, Germany
| | - Michael Hucka
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Nick Juty
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Chris Myers
- Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT, USA
| | - Brett G Olivier
- Systems Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Modelling of Biological Processes, BioQUANT/COS, Heidelberg University, Germany
| | - Herbert M Sauro
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Martin Scharm
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| | - Jacky L Snoep
- Department of Biochemistry, Stellenbosch University, Matieland, South Africa
- Department of Molecular Cell Physiology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Manchester Institute for Biotechnology, University of Manchester, Manchester, UK
| | - Vasundra Touré
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Anil Wipat
- School of Computing Science, Newcastle University, Newcastle upon Tyne, UK
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
- Stellenbosch Institute for Advanced Study (STIAS), Stellenbosch, South Africa
| | - Dagmar Waltemath
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| |
Collapse
|
50
|
The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res 2019; 47:D330-D338. [PMID: 30395331 PMCID: PMC6323945 DOI: 10.1093/nar/gky1055] [Citation(s) in RCA: 2684] [Impact Index Per Article: 536.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Accepted: 10/17/2018] [Indexed: 02/06/2023] Open
Abstract
The Gene Ontology resource (GO; http://geneontology.org) provides structured, computable knowledge regarding the functions of genes and gene products. Founded in 1998, GO has become widely adopted in the life sciences, and its contents are under continual improvement, both in quantity and in quality. Here, we report the major developments of the GO resource during the past two years. Each monthly release of the GO resource is now packaged and given a unique identifier (DOI), enabling GO-based analyses on a specific release to be reproduced in the future. The molecular function ontology has been refactored to better represent the overall activities of gene products, with a focus on transcription regulator activities. Quality assurance efforts have been ramped up to address potentially out-of-date or inaccurate annotations. New evidence codes for high-throughput experiments now enable users to filter out annotations obtained from these sources. GO-CAM, a new framework for representing gene function that is more expressive than standard GO annotations, has been released, and users can now explore the growing repository of these models. We also provide the 'GO ribbon' widget for visualizing GO annotations to a gene; the widget can be easily embedded in any web page.
Collapse
|