1
|
Zhang J, Jiang Q, Du Z, Geng Y, Hu Y, Tong Q, Song Y, Zhang HY, Yan X, Feng Z. Knowledge graph-derived feed efficiency analysis via pig gut microbiota. Sci Rep 2024; 14:13939. [PMID: 38886444 PMCID: PMC11182767 DOI: 10.1038/s41598-024-64835-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 06/13/2024] [Indexed: 06/20/2024] Open
Abstract
Feed efficiency (FE) is essential for pig production, has been reported to be partially explained by gut microbiota. Despite an extensive body of research literature to this topic, studies regarding the regulation of feed efficiency by gut microbiota remain fragmented and mostly confined to disorganized or semi-structured unrestricted texts. Meanwhile, structured databases for microbiota analysis are available, yet they often lack a comprehensive understanding of the associated biological processes. Therefore, we have devised an approach to construct a comprehensive knowledge graph by combining unstructured textual intelligence with structured database information and applied it to investigate the relationship between pig gut microbes and FE. Firstly, we created the pgmReading knowledge base and the domain ontology of pig gut microbiota by annotating, extracting, and integrating semantic information from 157 scientific publications. Secondly, we created the pgmPubtator by utilizing PubTator to expand the semantic information related to microbiota. Thirdly, we created the pgmDatabase by mapping and combining the ADDAGMA, gutMGene, and KEGG databases based on the ontology. These three knowledge bases were integrated to form the Pig Gut Microbial Knowledge Graph (PGMKG). Additionally, we created five biological query cases to validate the performance of PGMKG. These cases not only allow us to identify microbes with the most significant impact on FE but also provide insights into the metabolites produced by these microbes and the associated metabolic pathways. This study introduces PGMKG, mapping key microbes in pig feed efficiency and guiding microbiota-targeted optimization.
Collapse
Affiliation(s)
- Junmei Zhang
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Qin Jiang
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
- Yazhouwan National Laboratory (YNL), Sanya, 572025, China
| | - Zhihong Du
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yilin Geng
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuren Hu
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Qichang Tong
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yunfeng Song
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hong-Yu Zhang
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xianghua Yan
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zaiwen Feng
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
2
|
Kurbatova N, Swiers R. Disease ontologies for knowledge graphs. BMC Bioinformatics 2021; 22:377. [PMID: 34289807 PMCID: PMC8296689 DOI: 10.1186/s12859-021-04173-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 05/05/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Data integration to build a biomedical knowledge graph is a challenging task. There are multiple disease ontologies used in data sources and publications, each having its hierarchy. A common task is to map between ontologies, find disease clusters and finally build a representation of the chosen disease area. There is a shortage of published resources and tools to facilitate interactive, efficient and flexible cross-referencing and analysis of multiple disease ontologies commonly found in data sources and research. RESULTS Our results are represented as a knowledge graph solution that uses disease ontology cross-references and facilitates switching between ontology hierarchies for data integration and other tasks. CONCLUSIONS Grakn core with pre-installed "Disease ontologies for knowledge graphs" facilitates the biomedical knowledge graph build and provides an elegant solution for the multiple disease ontologies problem.
Collapse
Affiliation(s)
- Natalja Kurbatova
- Data Infrastructure & Tools, Data Science & Artificial Intelligence, R&D, AstraZeneca, Cambridge, UK.
| | - Rowan Swiers
- Quantitative Biology, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| |
Collapse
|
3
|
Srinivas Bharadhwaj V, Ali M, Birkenbihl C, Mubeen S, Lehmann J, Hofmann-Apitius M, Tapley Hoyt C, Domingo-Fernández D. CLEP: A Hybrid Data- and Knowledge- Driven Framework for Generating Patient Representations. Bioinformatics 2021; 37:3311-3318. [PMID: 33964127 PMCID: PMC8504642 DOI: 10.1093/bioinformatics/btab340] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 03/29/2021] [Accepted: 05/03/2021] [Indexed: 12/29/2022] Open
Abstract
Summary As machine learning and artificial intelligence increasingly attain a larger number of applications in the biomedical domain, at their core, their utility depends on the data used to train them. Due to the complexity and high dimensionality of biomedical data, there is a need for approaches that combine prior knowledge around known biological interactions with patient data. Here, we present CLinical Embedding of Patients (CLEP), a novel approach that generates new patient representations by leveraging both prior knowledge and patient-level data. First, given a patient-level dataset and a knowledge graph containing relations across features that can be mapped to the dataset, CLEP incorporates patients into the knowledge graph as new nodes connected to their most characteristic features. Next, CLEP employs knowledge graph embedding models to generate new patient representations that can ultimately be used for a variety of downstream tasks, ranging from clustering to classification. We demonstrate how using new patient representations generated by CLEP significantly improves performance in classifying between patients and healthy controls for a variety of machine learning models, as compared to the use of the original transcriptomics data. Furthermore, we also show how incorporating patients into a knowledge graph can foster the interpretation and identification of biological features characteristic of a specific disease or patient subgroup. Finally, we released CLEP as an open source Python package together with examples and documentation. Availability and implementation CLEP is available to the bioinformatics community as an open source Python package at https://github.com/hybrid-kg/clep under the Apache 2.0 License. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vinay Srinivas Bharadhwaj
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
| | - Mehdi Ali
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53113, Germany.,Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Dresden and Sankt Augustin, Germany
| | - Colin Birkenbihl
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
| | - Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany.,Fraunhofer Center for Machine Learning, Germany
| | - Jens Lehmann
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53113, Germany.,Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Dresden and Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
| | - Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53113, Germany.,Fraunhofer Center for Machine Learning, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53113, Germany.,Fraunhofer Center for Machine Learning, Germany
| |
Collapse
|
4
|
Löffler F, Wesp V, König-Ries B, Klan F. Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs? PLoS One 2021; 16:e0246099. [PMID: 33760822 PMCID: PMC7990268 DOI: 10.1371/journal.pone.0246099] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 01/13/2021] [Indexed: 11/19/2022] Open
Abstract
The increasing amount of publicly available research data provides the opportunity to link and integrate data in order to create and prove novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system. We show that existing metadata currently poorly reflect information needs and therefore are the biggest obstacle in retrieving relevant data. Our findings indicate that for data seekers in the biodiversity domain environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important information categories. These interests are well covered in metadata elements of domain-specific standards. However, instead of utilizing these standards, large data repositories tend to use metadata standards with domain-independent metadata fields that cover search interests only to some extent. A second problem are arbitrary keywords utilized in descriptive fields such as title, description or subject. Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known.
Collapse
Affiliation(s)
- Felicitas Löffler
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany
| | - Valentin Wesp
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany
| | - Birgitta König-Ries
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany
- Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, Germany
- German Center for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Friederike Klan
- Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, Germany
- Citizen Science Group, DLR-Institute of Data Science, German Aerospace Center, Jena, Germany
| |
Collapse
|
5
|
Irshad O, Ghani Khan MU. Formalization and Semantic Integration of Heterogeneous Omics Annotations for Exploratory Searches. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200127122818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aim:
To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations.
Background:
Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell.
Objective:
To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data.
Method:
We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with
Result:
To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers.
Conclusion:
Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax
Collapse
Affiliation(s)
- Omer Irshad
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| | - Muhammad Usman Ghani Khan
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| |
Collapse
|
6
|
Zhang Y, Sheng M, Zhou R, Wang Y, Han G, Zhang H, Xing C, Dong J. HKGB: An Inclusive, Extensible, Intelligent, Semi-auto-constructed Knowledge Graph Framework for Healthcare with Clinicians’ Expertise Incorporated. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2020.102324] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
7
|
Sima AC, Mendes de Farias T, Zbinden E, Anisimova M, Gil M, Stockinger H, Stockinger K, Robinson-Rechavi M, Dessimoz C. Enabling semantic queries across federated bioinformatics databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5614223. [PMID: 31697362 PMCID: PMC6836710 DOI: 10.1093/database/baz106] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/01/2019] [Accepted: 08/02/2019] [Indexed: 11/23/2022]
Abstract
Motivation: Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases. Results: We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: (i) Bgee, a gene expression relational database; (ii) Orthologous Matrix (OMA), a Hierarchical Data Format 5 orthology DS; and (iii) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialized RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.
Collapse
Affiliation(s)
- Ana Claudia Sima
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tarcisio Mendes de Farias
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | - Erich Zbinden
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Anisimova
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Manuel Gil
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Heinz Stockinger
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Kurt Stockinger
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland
| | - Marc Robinson-Rechavi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Genetics, Evolution, and Environment, University College London, Gower St, London WC1E 6BT, UK.,Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| |
Collapse
|
8
|
Tiddi I, Balliet D, ten Teije A. Fostering Scientific Meta-analyses with Knowledge Graphs: A Case-Study. THE SEMANTIC WEB 2020. [PMCID: PMC7250615 DOI: 10.1007/978-3-030-49461-2_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
A meta-analysis is a Science of Science method widely used in the medical and social sciences to review, aggregate and quantitatively synthesise a body of studies that address the same research question. With the volume of research growing exponentially every year, conducting meta-analyses can be costly and inefficient, as a significant amount of time and human efforts needs to be spent in finding studies meeting research criteria, annotating them, and properly performing the statistical analyses to summarise the findings. In this work, we show these issues can be tackled with semantic representations and technologies, using a social science scenario as case-study. We show how the domain-specific content of research outputs can be represented and used to facilitate their search, analysis and synthesis. We present the very first representation of the domain of human cooperation, and the application we built on top of this to help experts in performing meta-analyses semi-automatically. Using few application scenarios, we show how our approach supports the various phases meta-analyses, and more in general contributes towards research replication and automated hypotheses generation.
Collapse
|