1
|
He F, Liu K, Yang Z, Chen Y, Hammer RD, Xu D, Popescu M. pathCLIP: Detection of Genes and Gene Relations From Biological Pathway Figures Through Image-Text Contrastive Learning. IEEE J Biomed Health Inform 2024; 28:5007-5019. [PMID: 38568768 DOI: 10.1109/jbhi.2024.3383610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024]
Abstract
In biomedical literature, biological pathways are commonly described through a combination of images and text. These pathways contain valuable information, including genes and their relationships, which provide insight into biological mechanisms and precision medicine. Curating pathway information across the literature enables the integration of this information to build a comprehensive knowledge base. While some studies have extracted pathway information from images and text independently, they often overlook the correspondence between the two modalities. In this paper, we present a pathway figure curation system named pathCLIP for identifying genes and gene relations from pathway figures. Our key innovation is the use of an image-text contrastive learning model to learn coordinated embeddings of image snippets and text descriptions of genes and gene relations, thereby improving curation. Our validation results, using pathway figures from PubMed, showed that our multimodal model outperforms models using only a single modality. Additionally, our system effectively curates genes and gene relations from multiple literature sources. Two case studies on extracting pathway information from literature of non-small cell lung cancer and Alzheimer's disease further demonstrate the usefulness of our curated pathway information in enhancing related pathways in the KEGG database.
Collapse
|
2
|
Elizarraras JM, Liao Y, Shi Z, Zhu Q, Pico A, Zhang B. WebGestalt 2024: faster gene set analysis and new support for metabolomics and multi-omics. Nucleic Acids Res 2024; 52:W415-W421. [PMID: 38808672 PMCID: PMC11223849 DOI: 10.1093/nar/gkae456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/07/2024] [Accepted: 05/14/2024] [Indexed: 05/30/2024] Open
Abstract
Enrichment analysis, crucial for interpreting genomic, transcriptomic, and proteomic data, is expanding into metabolomics. Furthermore, there is a rising demand for integrated enrichment analysis that combines data from different studies and omics platforms, as seen in meta-analysis and multi-omics research. To address these growing needs, we have updated WebGestalt to include enrichment analysis capabilities for both metabolites and multiple input lists of analytes. We have also significantly increased analysis speed, revamped the user interface, and introduced new pathway visualizations to accommodate these updates. Notably, the adoption of a Rust backend reduced gene set enrichment analysis time by 95% from 270.64 to 12.41 s and network topology-based analysis by 89% from 159.59 to 17.31 s in our evaluation. This performance improvement is also accessible in both the R package and a newly introduced Python package. Additionally, we have updated the data in the WebGestalt database to reflect the current status of each source and have expanded our collection of pathways, networks, and gene signatures. The 2024 WebGestalt update represents a significant leap forward, offering new support for metabolomics, streamlined multi-omics analysis capabilities, and remarkable performance enhancements. Discover these updates and more at https://www.webgestalt.org.
Collapse
Affiliation(s)
- John M Elizarraras
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yuxing Liao
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zhiao Shi
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Qian Zhu
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
3
|
Clarke DJB, Marino GB, Deng EZ, Xie Z, Evangelista JE, Ma'ayan A. Rummagene: massive mining of gene sets from supporting materials of biomedical research publications. Commun Biol 2024; 7:482. [PMID: 38643247 PMCID: PMC11032387 DOI: 10.1038/s42003-024-06177-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 04/10/2024] [Indexed: 04/22/2024] Open
Abstract
Many biomedical research publications contain gene sets in their supporting tables, and these sets are currently not available for search and reuse. By crawling PubMed Central, the Rummagene server provides access to hundreds of thousands of such mammalian gene sets. So far, we scanned 5,448,589 articles to find 121,237 articles that contain 642,389 gene sets. These sets are served for enrichment analysis, free text, and table title search. Investigating statistical patterns within the Rummagene database, we demonstrate that Rummagene can be used for transcription factor and kinase enrichment analyses, and for gene function predictions. By combining gene set similarity with abstract similarity, Rummagene can find surprising relationships between biological processes, concepts, and named entities. Overall, Rummagene brings to surface the ability to search a massive collection of published biomedical datasets that are currently buried and inaccessible. The Rummagene web application is available at https://rummagene.com .
Collapse
Affiliation(s)
- Daniel J B Clarke
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Giacomo B Marino
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Eden Z Deng
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Zhuorui Xie
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - John Erol Evangelista
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|
4
|
Agrawal A, Balcı H, Hanspers K, Coort SL, Martens M, Slenter DN, Ehrhart F, Digles D, Waagmeester A, Wassink I, Abbassi-Daloii T, Lopes EN, Iyer A, Acosta J, Willighagen LG, Nishida K, Riutta A, Basaric H, Evelo C, Willighagen EL, Kutmon M, Pico A. WikiPathways 2024: next generation pathway database. Nucleic Acids Res 2024; 52:D679-D689. [PMID: 37941138 PMCID: PMC10767877 DOI: 10.1093/nar/gkad960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/04/2023] [Accepted: 10/13/2023] [Indexed: 11/10/2023] Open
Abstract
WikiPathways (wikipathways.org) is an open-source biological pathway database. Collaboration and open science are pivotal to the success of WikiPathways. Here we highlight the continuing efforts supporting WikiPathways, content growth and collaboration among pathway researchers. As an evolving database, there is a growing need for WikiPathways to address and overcome technical challenges. In this direction, WikiPathways has undergone major restructuring, enabling a renewed approach for sharing and curating pathway knowledge, thus providing stability for the future of community pathway curation. The website has been redesigned to improve and enhance user experience. This next generation of WikiPathways continues to support existing features while improving maintainability of the database and facilitating community input by providing new functionality and leveraging automation.
Collapse
Affiliation(s)
- Ayushi Agrawal
- Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, 94158, USA
| | - Hasan Balcı
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, The Netherlands
| | - Kristina Hanspers
- Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, 94158, USA
| | - Susan L Coort
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, The Netherlands
| | - Marvin Martens
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, The Netherlands
| | - Denise N Slenter
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, The Netherlands
| | - Friederike Ehrhart
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, The Netherlands
| | - Daniela Digles
- Department of Pharmaceutical Sciences, University of Vienna, Austria
| | | | - Isabel Wassink
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, The Netherlands
| | - Tooba Abbassi-Daloii
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, The Netherlands
| | - Elisson N Lopes
- Department of Epigenetics. Van Andel Institute, Grand Rapids, MI 49503, USA
| | - Aishwarya Iyer
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, The Netherlands
| | - Javier Millán Acosta
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, The Netherlands
| | | | - Kozo Nishida
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, Japan
| | - Anders Riutta
- Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, 94158, USA
| | - Helena Basaric
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, The Netherlands
| | - Martina Kutmon
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, The Netherlands
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, 94158, USA
| |
Collapse
|
5
|
Wang L, Sesachalam PV, Chua R, Ghosh S. Interactome Analysis of Visceral Adipose Tissue Elucidates Gene Regulatory Networks and Novel Gene Candidates in Obesity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.21.572734. [PMID: 38187694 PMCID: PMC10769441 DOI: 10.1101/2023.12.21.572734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Objective Visceral adiposity is associated with increased proinflammatory activity, insulin resistance, diabetes risk and mortality rate. Numerous individual genes have been associated with obesity, but studies investigating gene-regulatory networks in human visceral obesity are lacking. Methods We analyzed gene-regulatory networks in human visceral adipose tissue (VAT) from 48 obese and 11 non-obese Chinese subjects using gene co-expression and network construction with RNA-sequencing data. We also conducted RNA interference-based tests on selected genes for adipocyte differentiation effects. Results A scale-free gene co-expression network was constructed from 360 differentially expressed genes between obese and non-obese VAT (absolute log fold-change >1, FDR<0.05) with edge probability >0.8. Gene regulatory network analysis identified candidate transcription factors associated with differentially expressed genes. Fifteen subnetworks (communities) displayed altered connectivity patterns between obese and non-obese networks. Genes in pro-inflammatory pathways showed increased network connectivities in obese VAT whereas the oxidative phosphorylation pathway displayed reduced connections (enrichment FDR<0.05). Functional screening via RNA interference identified SOX30 and OSBPL3 as potential network-derived gene candidates influencing adipocyte differentiation. Conclusions This interactome-based approach highlights the network architecture, identifies novel candidate genes, and leads to new hypotheses regarding network-assisted gene regulation in obese vs. non-obese VAT.What is already known about this subject?: Visceral adipose tissue (VAT) is associated with increased levels of proinflammatory activity, insulin resistance, diabetes risk and mortality rate.Gene expression studies have identified candidate genes associated with proinflammatory function in VAT.What are the new findings in your manuscript?: Using integrative network-science, we identified co-expression and gene regulatory networks that are differentially regulated in VAT samples from subjects with and without obesityWe used functional testing (adipocyte differentiation) to validate a subset of novel candidate genes with minimal prior reported associations to obesityHow might your results change the direction of research or the focus of clinical practice: Network biology-based investigation provides a new avenue to our understanding of gene function in visceral adiposityFunctional validation screen allows for the identification of novel gene candidates that may be targeted for the treatment of adipose tissue dysfunction in obesity.
Collapse
|
6
|
Shin MG, Pico AR. Using published pathway figures in enrichment analysis and machine learning. BMC Genomics 2023; 24:713. [PMID: 38007419 PMCID: PMC10676589 DOI: 10.1186/s12864-023-09816-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 11/18/2023] [Indexed: 11/27/2023] Open
Abstract
Pathway Figure OCR (PFOCR) is a novel kind of pathway database approaching the breadth and depth of Gene Ontology while providing rich, mechanistic diagrams and direct literature support. Here, we highlight the utility of PFOCR in disease research in comparison with popular pathway databases through an assessment of disease coverage and analytical applications. In addition to common pathway analysis use cases, we present two advanced case studies demonstrating unique advantages of PFOCR in terms of cancer subtype and grade prediction analyses.
Collapse
Affiliation(s)
- Min-Gyoung Shin
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA.
| |
Collapse
|
7
|
He F, Liu K, Yang Z, Chen Y, Hammer RD, Xu D, Popescu M. pathCLIP: Detection of Genes and Gene Relations from Biological Pathway Figures through Image-Text Contrastive Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.31.564859. [PMID: 37961680 PMCID: PMC10635012 DOI: 10.1101/2023.10.31.564859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
In biomedical literature, biological pathways are commonly described through a combination of images and text. These pathways contain valuable information, including genes and their relationships, which provide insight into biological mechanisms and precision medicine. Curating pathway information across the literature enables the integration of this information to build a comprehensive knowledge base. While some studies have extracted pathway information from images and text independently, they often overlook the correspondence between the two modalities. In this paper, we present a pathway figure curation system named pathCLIP for identifying genes and gene relations from pathway figures. Our key innovation is the use of an image-text contrastive learning model to learn coordinated embeddings of image snippets and text descriptions of genes and gene relations, thereby improving curation. Our validation results, using pathway figures from PubMed, showed that our multimodal model outperforms models using only a single modality. Additionally, our system effectively curates genes and gene relations from multiple literature sources. A case study on extracting pathway information from non-small cell lung cancer literature further demonstrates the usefulness of our curated pathway information in enhancing related pathways in the KEGG database.
Collapse
Affiliation(s)
- Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun 130000, China; Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia Missouri, MO 65211 USA
| | - Kai Liu
- School of Information Science and Technology, Northeast Normal University, Changchun 130000, China
| | - Zhiyuan Yang
- School of Information Science and Technology, Northeast Normal University, Changchun 130000, China
| | - Yibo Chen
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia Missouri, MO 65211 USA
| | - Richard D Hammer
- School of Medicine, University of Missouri, Columbia Missouri, MO 65211 USA
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia Missouri, MO 65211 USA
| | - Mihail Popescu
- School of Medicine, University of Missouri, Columbia Missouri, MO 65211 USA
| |
Collapse
|
8
|
Shin MG, Pico A. Using Published Pathway Figures in Enrichment Analysis and Machine Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.06.548037. [PMID: 37461614 PMCID: PMC10350053 DOI: 10.1101/2023.07.06.548037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Pathway Figure OCR (PFOCR) is a novel kind of pathway database approaching the breadth and depth of Gene Ontology while providing rich, mechanistic diagrams and direct literature support. PFOCR content is extracted from published pathway figures currently emerging at a rate of 1000 new pathways each month. Here, we compare the pathway information contained in PFOCR against popular pathway databases with respect to overall and disease-specific coverage. In addition to common pathways analysis use cases, we present two advanced case studies demonstrating unique advantages of PFOCR in terms of cancer subtype and grade prediction analyses.
Collapse
|
9
|
Evangelista JE, Xie Z, Marino GB, Nguyen N, Clarke DB, Ma’ayan A. Enrichr-KG: bridging enrichment analysis across multiple libraries. Nucleic Acids Res 2023; 51:W168-W179. [PMID: 37166973 PMCID: PMC10320098 DOI: 10.1093/nar/gkad393] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/23/2023] [Accepted: 05/02/2023] [Indexed: 05/12/2023] Open
Abstract
Gene and protein set enrichment analysis is a critical step in the analysis of data collected from omics experiments. Enrichr is a popular gene set enrichment analysis web-server search engine that contains hundreds of thousands of annotated gene sets. While Enrichr has been useful in providing enrichment analysis with many gene set libraries from different categories, integrating enrichment results across libraries and domains of knowledge can further hypothesis generation. To this end, Enrichr-KG is a knowledge graph database and a web-server application that combines selected gene set libraries from Enrichr for integrative enrichment analysis and visualization. The enrichment results are presented as subgraphs made of nodes and links that connect genes to their enriched terms. In addition, users of Enrichr-KG can add gene-gene links, as well as predicted genes to the subgraphs. This graphical representation of cross-library results with enriched and predicted genes can illuminate hidden associations between genes and annotated enriched terms from across datasets and resources. Enrichr-KG currently serves 26 gene set libraries from different categories that include transcription, pathways, ontologies, diseases/drugs, and cell types. To demonstrate the utility of Enrichr-KG we provide several case studies. Enrichr-KG is freely available at: https://maayanlab.cloud/enrichr-kg.
Collapse
Affiliation(s)
- John Erol Evangelista
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Zhuorui Xie
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Giacomo B Marino
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Nhi Nguyen
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Daniel J B Clarke
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Avi Ma’ayan
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| |
Collapse
|
10
|
He F, Liu K, Yang Z, Hannink M, Hammer RD, Popescu M, Xu D. Applications of cutting-edge artificial intelligence technologies in biomedical literature and document mining. MEDICAL REVIEW (2021) 2023; 3:200-204. [PMID: 37789956 PMCID: PMC10542881 DOI: 10.1515/mr-2023-0011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 05/29/2023] [Indexed: 10/05/2023]
Abstract
The biomedical literature is a vast and invaluable resource for biomedical research. Integrating knowledge from the literature with biomedical data can help biological studies and the clinical decision-making process. Efforts have been made to gather information from the biomedical literature and create biomedical knowledge bases, such as KEGG and Reactome. However, manual curation remains the primary method to retrieve accurate biomedical entities and relationships. Manual curation becomes increasingly challenging and costly as the volume of biomedical publications quickly grows. Fortunately, recent advancements in Artificial Intelligence (AI) technologies offer the potential to automate the process of curating, updating, and integrating knowledge from the literature. Herein, we highlight the AI capabilities to aid in mining knowledge and building the knowledge base from the biomedical literature.
Collapse
Affiliation(s)
- Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin Province, China
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| | - Kai Liu
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin Province, China
| | - Zhiyuan Yang
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin Province, China
| | - Mark Hannink
- Department of Biochemistry, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| | - Richard D. Hammer
- Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, USA
| | - Mihail Popescu
- Department of Health Management and Informatics, University of Missouri, Columbia, USA
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| |
Collapse
|
11
|
Slenter DN, Hemel IMGM, Evelo CT, Bierau J, Willighagen EL, Steinbusch LKM. Extending inherited metabolic disorder diagnostics with biomarker interaction visualizations. Orphanet J Rare Dis 2023; 18:95. [PMID: 37101200 PMCID: PMC10131334 DOI: 10.1186/s13023-023-02683-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 04/02/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Inherited Metabolic Disorders (IMDs) are rare diseases where one impaired protein leads to a cascade of changes in the adjacent chemical conversions. IMDs often present with non-specific symptoms, a lack of a clear genotype-phenotype correlation, and de novo mutations, complicating diagnosis. Furthermore, products of one metabolic conversion can be the substrate of another pathway obscuring biomarker identification and causing overlapping biomarkers for different disorders. Visualization of the connections between metabolic biomarkers and the enzymes involved might aid in the diagnostic process. The goal of this study was to provide a proof-of-concept framework for integrating knowledge of metabolic interactions with real-life patient data before scaling up this approach. This framework was tested on two groups of well-studied and related metabolic pathways (the urea cycle and pyrimidine de-novo synthesis). The lessons learned from our approach will help to scale up the framework and support the diagnosis of other less-understood IMDs. METHODS Our framework integrates literature and expert knowledge into machine-readable pathway models, including relevant urine biomarkers and their interactions. The clinical data of 16 previously diagnosed patients with various pyrimidine and urea cycle disorders were visualized on the top 3 relevant pathways. Two expert laboratory scientists evaluated the resulting visualizations to derive a diagnosis. RESULTS The proof-of-concept platform resulted in varying numbers of relevant biomarkers (five to 48), pathways, and pathway interactions for each patient. The two experts reached the same conclusions for all samples with our proposed framework as with the current metabolic diagnostic pipeline. For nine patient samples, the diagnosis was made without knowledge about clinical symptoms or sex. For the remaining seven cases, four interpretations pointed in the direction of a subset of disorders, while three cases were found to be undiagnosable with the available data. Diagnosing these patients would require additional testing besides biochemical analysis. CONCLUSION The presented framework shows how metabolic interaction knowledge can be integrated with clinical data in one visualization, which can be relevant for future analysis of difficult patient cases and untargeted metabolomics data. Several challenges were identified during the development of this framework, which should be resolved before this approach can be scaled up and implemented to support the diagnosis of other (less understood) IMDs. The framework could be extended with other OMICS data (e.g. genomics, transcriptomics), and phenotypic data, as well as linked to other knowledge captured as Linked Open Data.
Collapse
Affiliation(s)
- Denise N Slenter
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands.
| | - Irene M G M Hemel
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Jörgen Bierau
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Laura K M Steinbusch
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
| |
Collapse
|
12
|
Pillich RT, Chen J, Churas C, Fong D, Gyori BM, Ideker T, Karis K, Liu SN, Ono K, Pico A, Pratt D. NDEx IQuery: a multi-method network gene set analysis leveraging the Network Data Exchange. Bioinformatics 2023; 39:btad118. [PMID: 36882166 PMCID: PMC10023220 DOI: 10.1093/bioinformatics/btad118] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 01/16/2023] [Accepted: 02/17/2023] [Indexed: 03/09/2023] Open
Abstract
MOTIVATION The investigation of sets of genes using biological pathways is a common task for researchers and is supported by a wide variety of software tools. This type of analysis generates hypotheses about the biological processes that are active or modulated in a specific experimental context. RESULTS The Network Data Exchange Integrated Query (NDEx IQuery) is a new tool for network and pathway-based gene set interpretation that complements or extends existing resources. It combines novel sources of pathways, integration with Cytoscape, and the ability to store and share analysis results. The NDEx IQuery web application performs multiple gene set analyses based on diverse pathways and networks stored in NDEx. These include curated pathways from WikiPathways and SIGNOR, published pathway figures from the last 27 years, machine-assembled networks using the INDRA system, and the new NCI-PID v2.0, an updated version of the popular NCI Pathway Interaction Database. NDEx IQuery's integration with MSigDB and cBioPortal now provides pathway analysis in the context of these two resources. AVAILABILITY AND IMPLEMENTATION NDEx IQuery is available at https://www.ndexbio.org/iquery and is implemented in Javascript and Java.
Collapse
Affiliation(s)
- Rudolf T Pillich
- Department of Medicine, UC San Diego, La Jolla, CA 92093, United States
| | - Jing Chen
- Department of Medicine, UC San Diego, La Jolla, CA 92093, United States
| | | | - Dylan Fong
- Department of Medicine, UC San Diego, La Jolla, CA 92093, United States
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, United States
| | - Trey Ideker
- Department of Medicine, UC San Diego, La Jolla, CA 92093, United States
| | - Klas Karis
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, United States
| | - Sophie N Liu
- Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Keiichiro Ono
- Department of Medicine, UC San Diego, La Jolla, CA 92093, United States
| | - Alexander Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Dexter Pratt
- Department of Medicine, UC San Diego, La Jolla, CA 92093, United States
| |
Collapse
|
13
|
Santangelo BE, Gillenwater LA, Salem NM, Hunter LE. Molecular cartooning with knowledge graphs. FRONTIERS IN BIOINFORMATICS 2022; 2:1054578. [PMID: 36568701 PMCID: PMC9772836 DOI: 10.3389/fbinf.2022.1054578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/23/2022] [Indexed: 12/13/2022] Open
Abstract
Molecular "cartoons," such as pathway diagrams, provide a visual summary of biomedical research results and hypotheses. Their ubiquitous appearance within the literature indicates their universal application in mechanistic communication. A recent survey of pathway diagrams identified 64,643 pathway figures published between 1995 and 2019 with 1,112,551 mentions of 13,464 unique human genes participating in a wide variety of biological processes. Researchers generally create these diagrams using generic diagram editing software that does not itself embody any biomedical knowledge. Biomedical knowledge graphs (KGs) integrate and represent knowledge in a semantically consistent way, systematically capturing biomedical knowledge similar to that in molecular cartoons. KGs have the potential to provide context and precise details useful in drawing such figures. However, KGs cannot generally be translated directly into figures. They include substantial material irrelevant to the scientific point of a given figure and are often more detailed than is appropriate. How could KGs be used to facilitate the creation of molecular diagrams? Here we present a new approach towards cartoon image creation that utilizes the semantic structure of knowledge graphs to aid the production of molecular diagrams. We introduce a set of "semantic graphical actions" that select and transform the relational information between heterogeneous entities (e.g., genes, proteins, pathways, diseases) in a KG to produce diagram schematics that meet the scientific communication needs of the user. These semantic actions search, select, filter, transform, group, arrange, connect and extract relevant subgraphs from KGs based on meaning in biological terms, e.g., a protein upstream of a target in a pathway. To demonstrate the utility of this approach, we show how semantic graphical actions on KGs could have been used to produce three existing pathway diagrams in diverse biomedical domains: Down Syndrome, COVID-19, and neuroinflammation. Our focus is on recapitulating the semantic content of the figures, not the layout, glyphs, or other aesthetic aspects. Our results suggest that the use of KGs and semantic graphical actions to produce biomedical diagrams will reduce the effort required and improve the quality of this visual form of scientific communication.
Collapse
|
14
|
Gable AL, Szklarczyk D, Lyon D, Matias Rodrigues JF, von Mering C. Systematic assessment of pathway databases, based on a diverse collection of user-submitted experiments. Brief Bioinform 2022; 23:6695266. [PMID: 36088548 PMCID: PMC9487593 DOI: 10.1093/bib/bbac355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 07/13/2022] [Accepted: 07/30/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
A knowledge-based grouping of genes into pathways or functional units is essential for describing and understanding cellular complexity. However, it is not always clear a priori how and at what level of specificity functionally interconnected genes should be partitioned into pathways, for a given application. Here, we assess and compare nine existing and two conceptually novel functional classification systems, with respect to their discovery power and generality in gene set enrichment testing. We base our assessment on a collection of nearly 2000 functional genomics datasets provided by users of the STRING database. With these real-life and diverse queries, we assess which systems typically provide the most specific and complete enrichment results. We find many structural and performance differences between classification systems. Overall, the well-established, hierarchically organized pathway annotation systems yield the best enrichment performance, despite covering substantial parts of the human genome in general terms only. On the other hand, the more recent unsupervised annotation systems perform strongest in understudied areas and organisms, and in detecting more specific pathways, albeit with less informative labels.
Collapse
Affiliation(s)
- Annika L Gable
- Department of Molecular Life Sciences, University of Zurich , 8057 Zurich, Switzerland
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich , 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics , 1015 Lausanne, Switzerland
| | - David Lyon
- Department of Molecular Life Sciences, University of Zurich , 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics , 1015 Lausanne, Switzerland
| | | | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich , 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics , 1015 Lausanne, Switzerland
| |
Collapse
|
15
|
Mubeen S, Tom Kodamullil A, Hofmann-Apitius M, Domingo-Fernández D. On the influence of several factors on pathway enrichment analysis. Brief Bioinform 2022; 23:bbac143. [PMID: 35453140 PMCID: PMC9116215 DOI: 10.1093/bib/bbac143] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 03/21/2022] [Accepted: 03/30/2022] [Indexed: 02/01/2023] Open
Abstract
Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
- Fraunhofer Center for Machine Learning, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Fraunhofer Center for Machine Learning, Germany
- Enveda Biosciences, Boulder, CO, 80301, USA
| |
Collapse
|
16
|
Ostaszewski M, Niarakis A, Mazein A, Kuperstein I, Phair R, Orta‐Resendiz A, Singh V, Aghamiri SS, Acencio ML, Glaab E, Ruepp A, Fobo G, Montrone C, Brauner B, Frishman G, Monraz Gómez LC, Somers J, Hoch M, Kumar Gupta S, Scheel J, Borlinghaus H, Czauderna T, Schreiber F, Montagud A, Ponce de Leon M, Funahashi A, Hiki Y, Hiroi N, Yamada TG, Dräger A, Renz A, Naveez M, Bocskei Z, Messina F, Börnigen D, Fergusson L, Conti M, Rameil M, Nakonecnij V, Vanhoefer J, Schmiester L, Wang M, Ackerman EE, Shoemaker JE, Zucker J, Oxford K, Teuton J, Kocakaya E, Summak GY, Hanspers K, Kutmon M, Coort S, Eijssen L, Ehrhart F, Rex DAB, Slenter D, Martens M, Pham N, Haw R, Jassal B, Matthews L, Orlic‐Milacic M, Senff Ribeiro A, Rothfels K, Shamovsky V, Stephan R, Sevilla C, Varusai T, Ravel J, Fraser R, Ortseifen V, Marchesi S, Gawron P, Smula E, Heirendt L, Satagopam V, Wu G, Riutta A, Golebiewski M, Owen S, Goble C, Hu X, Overall RW, Maier D, Bauch A, Gyori BM, Bachman JA, Vega C, Grouès V, Vazquez M, Porras P, Licata L, Iannuccelli M, Sacco F, Nesterova A, Yuryev A, de Waard A, Turei D, Luna A, Babur O, Soliman S, Valdeolivas A, Esteban‐Medina M, Peña‐Chilet M, Rian K, Helikar T, Puniya BL, Modos D, Treveil A, Olbei M, De Meulder B, Ballereau S, Dugourd A, Naldi A, Noël V, Calzone L, Sander C, Demir E, Korcsmaros T, Freeman TC, Augé F, Beckmann JS, Hasenauer J, Wolkenhauer O, Wilighagen EL, Pico AR, Evelo CT, Gillespie ME, Stein LD, Hermjakob H, D'Eustachio P, Saez‐Rodriguez J, Dopazo J, Valencia A, Kitano H, Barillot E, Auffray C, Balling R, Schneider R. COVID19 Disease Map, a computational knowledge repository of virus-host interaction mechanisms. Mol Syst Biol 2021; 17:e10387. [PMID: 34664389 PMCID: PMC8524328 DOI: 10.15252/msb.202110387] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 08/25/2021] [Accepted: 08/26/2021] [Indexed: 12/13/2022] Open
Abstract
We need to effectively combine the knowledge from surging literature with complex datasets to propose mechanistic models of SARS-CoV-2 infection, improving data interpretation and predicting key targets of intervention. Here, we describe a large-scale community effort to build an open access, interoperable and computable repository of COVID-19 molecular mechanisms. The COVID-19 Disease Map (C19DMap) is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources. Notably, it is a computational resource for graph-based analyses and disease modelling. To this end, we established a framework of tools, platforms and guidelines necessary for a multifaceted community of biocurators, domain experts, bioinformaticians and computational biologists. The diagrams of the C19DMap, curated from the literature, are integrated with relevant interaction and text mining databases. We demonstrate the application of network analysis and modelling approaches by concrete examples to highlight new testable hypotheses. This framework helps to find signatures of SARS-CoV-2 predisposition, treatment response or prioritisation of drug candidates. Such an approach may help deal with new waves of COVID-19 or similar pandemics in the long-term perspective.
Collapse
Affiliation(s)
- Marek Ostaszewski
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | - Anna Niarakis
- Université Paris‐SaclayLaboratoire Européen de Recherche pour la Polyarthrite rhumatoïde ‐ GenhotelUniv EvryEvryFrance
- Lifeware GroupInria Saclay‐Ile de FrancePalaiseauFrance
| | - Alexander Mazein
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | - Inna Kuperstein
- Institut CuriePSL Research UniversityParisFrance
- INSERMParisFrance
- MINES ParisTechPSL Research UniversityParisFrance
| | - Robert Phair
- Integrative Bioinformatics, Inc.Mountain ViewCAUSA
| | - Aurelio Orta‐Resendiz
- Institut PasteurUniversité de Paris, Unité HIVInflammation et PersistanceParisFrance
- Bio Sorbonne Paris CitéUniversité de ParisParisFrance
| | - Vidisha Singh
- Université Paris‐SaclayLaboratoire Européen de Recherche pour la Polyarthrite rhumatoïde ‐ GenhotelUniv EvryEvryFrance
| | - Sara Sadat Aghamiri
- Inserm‐ Institut national de la santé et de la recherche médicaleParisFrance
| | - Marcio Luis Acencio
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | - Enrico Glaab
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | - Andreas Ruepp
- Institute of Experimental Genetics (IEG)Helmholtz Zentrum München‐German Research Center for Environmental Health (GmbH)NeuherbergGermany
| | - Gisela Fobo
- Institute of Experimental Genetics (IEG)Helmholtz Zentrum München‐German Research Center for Environmental Health (GmbH)NeuherbergGermany
| | - Corinna Montrone
- Institute of Experimental Genetics (IEG)Helmholtz Zentrum München‐German Research Center for Environmental Health (GmbH)NeuherbergGermany
| | - Barbara Brauner
- Institute of Experimental Genetics (IEG)Helmholtz Zentrum München‐German Research Center for Environmental Health (GmbH)NeuherbergGermany
| | - Goar Frishman
- Institute of Experimental Genetics (IEG)Helmholtz Zentrum München‐German Research Center for Environmental Health (GmbH)NeuherbergGermany
| | - Luis Cristóbal Monraz Gómez
- Institut CuriePSL Research UniversityParisFrance
- INSERMParisFrance
- MINES ParisTechPSL Research UniversityParisFrance
| | - Julia Somers
- Department of Molecular and Medical GeneticsOregon Health & Sciences UniversityPortlandORUSA
| | - Matti Hoch
- Department of Systems Biology and BioinformaticsUniversity of RostockRostockGermany
| | | | - Julia Scheel
- Department of Systems Biology and BioinformaticsUniversity of RostockRostockGermany
| | - Hanna Borlinghaus
- Department of Computer and Information ScienceUniversity of KonstanzKonstanzGermany
| | - Tobias Czauderna
- Faculty of Information TechnologyDepartment of Human‐Centred ComputingMonash UniversityClaytonVic.Australia
| | - Falk Schreiber
- Department of Computer and Information ScienceUniversity of KonstanzKonstanzGermany
- Faculty of Information TechnologyDepartment of Human‐Centred ComputingMonash UniversityClaytonVic.Australia
| | | | | | - Akira Funahashi
- Department of Biosciences and InformaticsKeio UniversityYokohamaJapan
| | - Yusuke Hiki
- Department of Biosciences and InformaticsKeio UniversityYokohamaJapan
| | - Noriko Hiroi
- Graduate School of Media and GovernanceResearch Institute at SFCKeio UniversityKanagawaJapan
| | - Takahiro G Yamada
- Department of Biosciences and InformaticsKeio UniversityYokohamaJapan
| | - Andreas Dräger
- Computational Systems Biology of Infections and Antimicrobial‐Resistant PathogensInstitute for Bioinformatics and Medical Informatics (IBMI)University of TübingenTübingenGermany
- Department of Computer ScienceUniversity of TübingenTübingenGermany
- German Center for Infection Research (DZIF), partner siteTübingenGermany
| | - Alina Renz
- Computational Systems Biology of Infections and Antimicrobial‐Resistant PathogensInstitute for Bioinformatics and Medical Informatics (IBMI)University of TübingenTübingenGermany
- Department of Computer ScienceUniversity of TübingenTübingenGermany
| | - Muhammad Naveez
- Department of Systems Biology and BioinformaticsUniversity of RostockRostockGermany
- Institute of Applied Computer SystemsRiga Technical UniversityRigaLatvia
| | - Zsolt Bocskei
- Sanofi R&DTranslational SciencesChilly‐MazarinFrance
| | - Francesco Messina
- Dipartimento di Epidemiologia Ricerca Pre‐Clinica e Diagnostica AvanzataNational Institute for Infectious Diseases 'Lazzaro Spallanzani' I.R.C.C.S.RomeItaly
- COVID‐19 INMI Network Medicine for IDs Study GroupNational Institute for Infectious Diseases 'Lazzaro Spallanzani' I.R.C.C.SRomeItaly
| | - Daniela Börnigen
- Bioinformatics Core FacilityUniversitätsklinikum Hamburg‐EppendorfHamburgGermany
| | - Liam Fergusson
- Royal (Dick) School of Veterinary MedicineThe University of EdinburghEdinburghUK
| | - Marta Conti
- Faculty of Mathematics and Natural SciencesUniversity of BonnBonnGermany
| | - Marius Rameil
- Faculty of Mathematics and Natural SciencesUniversity of BonnBonnGermany
| | - Vanessa Nakonecnij
- Faculty of Mathematics and Natural SciencesUniversity of BonnBonnGermany
| | - Jakob Vanhoefer
- Faculty of Mathematics and Natural SciencesUniversity of BonnBonnGermany
| | - Leonard Schmiester
- Faculty of Mathematics and Natural SciencesUniversity of BonnBonnGermany
- Center for MathematicsChair of Mathematical Modeling of Biological SystemsTechnische Universität MünchenGarchingGermany
| | - Muying Wang
- Department of Chemical and Petroleum EngineeringUniversity of PittsburghPittsburghPAUSA
| | - Emily E Ackerman
- Department of Chemical and Petroleum EngineeringUniversity of PittsburghPittsburghPAUSA
| | - Jason E Shoemaker
- Department of Chemical and Petroleum EngineeringUniversity of PittsburghPittsburghPAUSA
- Department of Computational and Systems BiologyUniversity of PittsburghPittsburghPAUSA
| | | | | | | | | | | | - Kristina Hanspers
- Institute of Data Science and BiotechnologyGladstone InstitutesSan FranciscoCAUSA
| | - Martina Kutmon
- Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
- Maastricht Centre for Systems Biology (MaCSBio)Maastricht UniversityMaastrichtThe Netherlands
| | - Susan Coort
- Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
| | - Lars Eijssen
- Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
- Maastricht University Medical CentreMaastrichtThe Netherlands
| | - Friederike Ehrhart
- Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
- Maastricht University Medical CentreMaastrichtThe Netherlands
| | | | - Denise Slenter
- Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
| | - Marvin Martens
- Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
| | - Nhung Pham
- Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
| | - Robin Haw
- MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
| | - Bijay Jassal
- MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
| | | | | | - Andrea Senff Ribeiro
- MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
- Universidade Federal do ParanáCuritibaBrasil
| | - Karen Rothfels
- MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
| | | | - Ralf Stephan
- MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
| | - Cristoffer Sevilla
- European Bioinformatics Institute (EMBL‐EBI)European Molecular Biology LaboratoryHinxton, CambridgeshireUK
| | - Thawfeek Varusai
- European Bioinformatics Institute (EMBL‐EBI)European Molecular Biology LaboratoryHinxton, CambridgeshireUK
| | - Jean‐Marie Ravel
- INSERM UMR_S 1256Nutrition, Genetics, and Environmental Risk Exposure (NGERE)Faculty of Medicine of NancyUniversity of LorraineNancyFrance
- Laboratoire de génétique médicaleCHRU NancyNancyFrance
| | - Rupsha Fraser
- Queen's Medical Research InstituteThe University of EdinburghEdinburghUK
| | - Vera Ortseifen
- Senior Research Group in Genome Research of Industrial MicroorganismsCenter for BiotechnologyBielefeld UniversityBielefeldGermany
| | - Silvia Marchesi
- Department of Surgical ScienceUppsala UniversityUppsalaSweden
| | - Piotr Gawron
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
- Institute of Computing SciencePoznan University of TechnologyPoznanPoland
| | - Ewa Smula
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | - Laurent Heirendt
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | - Guanming Wu
- Department of Medical Informatics and Clinical EpidemiologyOregon Health & Science UniversityPortlandORUSA
| | - Anders Riutta
- Institute of Data Science and BiotechnologyGladstone InstitutesSan FranciscoCAUSA
| | | | - Stuart Owen
- Department of Computer ScienceThe University of ManchesterManchesterUK
| | - Carole Goble
- Department of Computer ScienceThe University of ManchesterManchesterUK
| | - Xiaoming Hu
- Heidelberg Institute for Theoretical Studies (HITS)HeidelbergGermany
| | - Rupert W Overall
- German Center for Neurodegenerative Diseases (DZNE) DresdenDresdenGermany
- Center for Regenerative Therapies Dresden (CRTD)Technische Universität DresdenDresdenGermany
- Institute for BiologyHumboldt University of BerlinBerlinGermany
| | | | | | - Benjamin M Gyori
- Harvard Medical SchoolLaboratory of Systems PharmacologyBostonMAUSA
| | - John A Bachman
- Harvard Medical SchoolLaboratory of Systems PharmacologyBostonMAUSA
| | - Carlos Vega
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | - Valentin Grouès
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | | | - Pablo Porras
- European Bioinformatics Institute (EMBL‐EBI)European Molecular Biology LaboratoryHinxton, CambridgeshireUK
| | - Luana Licata
- Department of BiologyUniversity of Rome Tor VergataRomeItaly
| | | | - Francesca Sacco
- Department of BiologyUniversity of Rome Tor VergataRomeItaly
| | | | | | | | - Denes Turei
- Institute for Computational BiomedicineHeidelberg UniversityHeidelbergGermany
| | - Augustin Luna
- cBio Center, Divisions of Biostatistics and Computational BiologyDepartment of Data SciencesDana‐Farber Cancer InstituteBostonMAUSA
- Department of Cell BiologyHarvard Medical SchoolBostonMAUSA
| | - Ozgun Babur
- Computer Science DepartmentUniversity of Massachusetts BostonBostonMAUSA
| | | | - Alberto Valdeolivas
- Institute for Computational BiomedicineHeidelberg UniversityHeidelbergGermany
| | - Marina Esteban‐Medina
- Clinical Bioinformatics AreaFundación Progreso y Salud (FPS)Hospital Virgen del RocioSevillaSpain
- Computational Systems Medicine GroupInstitute of Biomedicine of Seville (IBIS)Hospital Virgen del RocioSevillaSpain
| | - Maria Peña‐Chilet
- Clinical Bioinformatics AreaFundación Progreso y Salud (FPS)Hospital Virgen del RocioSevillaSpain
- Computational Systems Medicine GroupInstitute of Biomedicine of Seville (IBIS)Hospital Virgen del RocioSevillaSpain
- Bioinformatics in Rare Diseases (BiER)Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)FPS, Hospital Virgen del RocíoSevillaSpain
| | - Kinza Rian
- Clinical Bioinformatics AreaFundación Progreso y Salud (FPS)Hospital Virgen del RocioSevillaSpain
- Computational Systems Medicine GroupInstitute of Biomedicine of Seville (IBIS)Hospital Virgen del RocioSevillaSpain
| | - Tomáš Helikar
- Department of BiochemistryUniversity of Nebraska‐LincolnLincolnNEUSA
| | | | - Dezso Modos
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | - Agatha Treveil
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | - Marton Olbei
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | | | - Stephane Ballereau
- Cancer Research UK Cambridge InstituteUniversity of CambridgeCambridgeUK
| | - Aurélien Dugourd
- Institute for Computational BiomedicineHeidelberg UniversityHeidelbergGermany
- Institute of Experimental Medicine and Systems BiologyFaculty of Medicine, RWTHAachen UniversityAachenGermany
| | | | - Vincent Noël
- Institut CuriePSL Research UniversityParisFrance
- INSERMParisFrance
- MINES ParisTechPSL Research UniversityParisFrance
| | - Laurence Calzone
- Institut CuriePSL Research UniversityParisFrance
- INSERMParisFrance
- MINES ParisTechPSL Research UniversityParisFrance
| | - Chris Sander
- cBio Center, Divisions of Biostatistics and Computational BiologyDepartment of Data SciencesDana‐Farber Cancer InstituteBostonMAUSA
- Department of Cell BiologyHarvard Medical SchoolBostonMAUSA
| | - Emek Demir
- Department of Molecular and Medical GeneticsOregon Health & Sciences UniversityPortlandORUSA
| | | | - Tom C Freeman
- The Roslin InstituteUniversity of EdinburghEdinburghUK
| | - Franck Augé
- Sanofi R&DTranslational SciencesChilly‐MazarinFrance
| | | | - Jan Hasenauer
- Helmholtz Zentrum München – German Research Center for Environmental HealthInstitute of Computational BiologyNeuherbergGermany
- Interdisciplinary Research Unit Mathematics and Life SciencesUniversity of BonnBonnGermany
| | - Olaf Wolkenhauer
- Department of Systems Biology and BioinformaticsUniversity of RostockRostockGermany
| | - Egon L Wilighagen
- Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
| | - Alexander R Pico
- Institute of Data Science and BiotechnologyGladstone InstitutesSan FranciscoCAUSA
| | - Chris T Evelo
- Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
- Maastricht Centre for Systems Biology (MaCSBio)Maastricht UniversityMaastrichtThe Netherlands
| | - Marc E Gillespie
- MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
- St. John’s University College of Pharmacy and Health SciencesQueensNYUSA
| | - Lincoln D Stein
- MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
- Department of Molecular GeneticsUniversity of TorontoTorontoONCanada
| | - Henning Hermjakob
- European Bioinformatics Institute (EMBL‐EBI)European Molecular Biology LaboratoryHinxton, CambridgeshireUK
| | | | | | - Joaquin Dopazo
- Clinical Bioinformatics AreaFundación Progreso y Salud (FPS)Hospital Virgen del RocioSevillaSpain
- Computational Systems Medicine GroupInstitute of Biomedicine of Seville (IBIS)Hospital Virgen del RocioSevillaSpain
- Bioinformatics in Rare Diseases (BiER)Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)FPS, Hospital Virgen del RocíoSevillaSpain
- FPS/ELIXIR‐esHospital Virgen del RocíoSevillaSpain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC)BarcelonaSpain
- Institució Catalana de Recerca i Estudis Avançats (ICREA)BarcelonaSpain
| | - Hiroaki Kitano
- Systems Biology InstituteTokyoJapan
- Okinawa Institute of Science and Technology Graduate SchoolOkinawaJapan
| | - Emmanuel Barillot
- Institut CuriePSL Research UniversityParisFrance
- INSERMParisFrance
- MINES ParisTechPSL Research UniversityParisFrance
| | - Charles Auffray
- Cancer Research UK Cambridge InstituteUniversity of CambridgeCambridgeUK
| | - Rudi Balling
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
| | | |
Collapse
|
17
|
Hanspers K, Kutmon M, Coort SL, Digles D, Dupuis LJ, Ehrhart F, Hu F, Lopes EN, Martens M, Pham N, Shin W, Slenter DN, Waagmeester A, Willighagen EL, Winckers LA, Evelo CT, Pico AR. Ten simple rules for creating reusable pathway models for computational analysis and visualization. PLoS Comput Biol 2021; 17:e1009226. [PMID: 34411100 PMCID: PMC8375987 DOI: 10.1371/journal.pcbi.1009226] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Kristina Hanspers
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, California, United States of America
| | - Martina Kutmon
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Susan L. Coort
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Daniela Digles
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Lauren J. Dupuis
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Friederike Ehrhart
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Finterly Hu
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Elisson N. Lopes
- Instituto de Ciencias Biologicas, Departamento de Bioquimica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Marvin Martens
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Nhung Pham
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Woosub Shin
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Denise N. Slenter
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | | | - Egon L. Willighagen
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Laurent A. Winckers
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Alexander R. Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
18
|
Figueiredo RQ, Raschka T, Kodamullil AT, Hofmann-Apitius M, Mubeen S, Domingo-Fernández D. Towards a global investigation of transcriptomic signatures through co-expression networks and pathway knowledge for the identification of disease mechanisms. Nucleic Acids Res 2021; 49:7939-7953. [PMID: 34197603 PMCID: PMC8373148 DOI: 10.1093/nar/gkab556] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 05/17/2021] [Accepted: 06/11/2021] [Indexed: 12/17/2022] Open
Abstract
We attempt to address a key question in the joint analysis of transcriptomic data: can we correlate the patterns we observe in transcriptomic datasets to known interactions and pathway knowledge to broaden our understanding of disease pathophysiology? We present a systematic approach that sheds light on the patterns observed in hundreds of transcriptomic datasets from over sixty indications by using pathways and molecular interactions as a template. Our analysis employs transcriptomic datasets to construct dozens of disease specific co-expression networks, alongside a human protein-protein interactome network. Leveraging the interoperability between these two network templates, we explore patterns both common and particular to these diseases on three different levels. Firstly, at the node-level, we identify most and least common proteins across diseases and evaluate their consistency against the interactome as a proxy for their prevalence in the scientific literature. Secondly, we overlay both network templates to analyze common correlations and interactions across diseases at the edge-level. Thirdly, we explore the similarity between patterns observed at the disease-level and pathway knowledge to identify signatures associated with specific diseases and indication areas. Finally, we present a case scenario in schizophrenia, where we show how our approach can be used to investigate disease pathophysiology.
Collapse
Affiliation(s)
- Rebeca Queiroz Figueiredo
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany
| | - Tamara Raschka
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany.,Fraunhofer Center for Machine Learning, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Causality Biomodels, Kinfra Hi-Tech Park, Kalamassery, Cochin, Kerala, India
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany
| | - Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany.,Fraunhofer Center for Machine Learning, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Fraunhofer Center for Machine Learning, Germany.,Enveda Biosciences, Boulder, CO 80301, USA
| |
Collapse
|
19
|
Martens M, Ammar A, Riutta A, Waagmeester A, Slenter D, Hanspers K, A. Miller R, Digles D, Lopes E, Ehrhart F, Dupuis LJ, Winckers LA, Coort S, Willighagen EL, Evelo CT, Pico AR, Kutmon M. WikiPathways: connecting communities. Nucleic Acids Res 2021; 49:D613-D621. [PMID: 33211851 PMCID: PMC7779061 DOI: 10.1093/nar/gkaa1024] [Citation(s) in RCA: 455] [Impact Index Per Article: 151.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/13/2020] [Accepted: 10/19/2020] [Indexed: 12/17/2022] Open
Abstract
WikiPathways (https://www.wikipathways.org) is a biological pathway database known for its collaborative nature and open science approaches. With the core idea of the scientific community developing and curating biological knowledge in pathway models, WikiPathways lowers all barriers for accessing and using its content. Increasingly more content creators, initiatives, projects and tools have started using WikiPathways. Central in this growth and increased use of WikiPathways are the various communities that focus on particular subsets of molecular pathways such as for rare diseases and lipid metabolism. Knowledge from published pathway figures helps prioritize pathway development, using optical character and named entity recognition. We show the growth of WikiPathways over the last three years, highlight the new communities and collaborations of pathway authors and curators, and describe various technologies to connect to external resources and initiatives. The road toward a sustainable, community-driven pathway database goes through integration with other resources such as Wikidata and allowing more use, curation and redistribution of WikiPathways content.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
| | - Ammar Ammar
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
| | - Anders Riutta
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, USA
| | | | - Denise N Slenter
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
| | - Kristina Hanspers
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Ryan A. Miller
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
| | - Daniela Digles
- Department of Pharmaceutical Chemistry/Pharmacoinformatics Research Group, University of Vienna, 1090 Vienna, Austria
| | - Elisson N Lopes
- Instituto de Ciencias Biologicas, Departamento de Bioquimica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Friederike Ehrhart
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
| | - Lauren J Dupuis
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
| | - Laurent A Winckers
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
| | - Susan L Coort
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
| | - Chris T Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 EN Maastricht, the Netherlands
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Martina Kutmon
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 EN Maastricht, the Netherlands
| |
Collapse
|