1
|
Author-sourced capture of pathway knowledge in computable form using Biofactoid. eLife 2021; 10:68292. [PMID: 34860157 PMCID: PMC8683078 DOI: 10.7554/elife.68292] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 12/02/2021] [Indexed: 01/04/2023] Open
Abstract
Making the knowledge contained in scientific papers machine-readable and formally computable would allow researchers to take full advantage of this information by enabling integration with other knowledge sources to support data analysis and interpretation. Here we describe Biofactoid, a web-based platform that allows scientists to specify networks of interactions between genes, their products, and chemical compounds, and then translates this information into a representation suitable for computational analysis, search and discovery. We also report the results of a pilot study to encourage the wide adoption of Biofactoid by the scientific community.
Collapse
|
2
|
Pathway Commons 2019 Update: integration, analysis and exploration of pathway data. Nucleic Acids Res 2020; 48:D489-D497. [PMID: 31647099 PMCID: PMC7145667 DOI: 10.1093/nar/gkz946] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/07/2019] [Accepted: 10/10/2019] [Indexed: 12/14/2022] Open
Abstract
Pathway Commons (https://www.pathwaycommons.org) is an integrated resource of publicly available information about biological pathways including biochemical reactions, assembly of biomolecular complexes, transport and catalysis events and physical interactions involving proteins, DNA, RNA, and small molecules (e.g. metabolites and drug compounds). Data is collected from multiple providers in standard formats, including the Biological Pathway Exchange (BioPAX) language and the Proteomics Standards Initiative Molecular Interactions format, and then integrated. Pathway Commons provides biologists with (i) tools to search this comprehensive resource, (ii) a download site offering integrated bulk sets of pathway data (e.g. tables of interactions and gene sets), (iii) reusable software libraries for working with pathway information in several programming languages (Java, R, Python and Javascript) and (iv) a web service for programmatically querying the entire dataset. Visualization of pathways is supported using the Systems Biological Graphical Notation (SBGN). Pathway Commons currently contains data from 22 databases with 4794 detailed human biochemical processes (i.e. pathways) and ∼2.3 million interactions. To enhance the usability of this large resource for end-users, we develop and maintain interactive web applications and training materials that enable pathway exploration and advanced analysis.
Collapse
|
3
|
Abstract 3451: Interpreting gene lists from -omics experiments. Cancer Res 2019. [DOI: 10.1158/1538-7445.am2019-3451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Understanding the mechanisms responsible for a cellular behavior often begins with observations of genes and gene products. Depending on the type of experiment, the number of resulting genes can be small, but increasingly, researchers are faced with many thousands of measurements, as in the case of transcriptomic or protein-DNA binding observations. Here, we describe ways to pair experimental results consisting of one or more genes with analysis tools with the overall aim being to make results more biologically interpretable. In certain cases, experimental approaches such as screens for essential genes can generate one or a few ‘genes of interest’ and there is a desire to understand their relationship to one another as well as discover links to additional, interesting genes. To this end, ‘GeneMANIA’ is a web tool that accepts gene names and returns a network visualization of related genes based on similarity in expression, localization, protein domains and those involved in physical interactions. Likewise, ‘PCViz’ is a web tool that displays a network of interactions drawn from Pathway Commons, a web resource for pathway and interaction knowledge. In cases where experiments generate a lengthy list of genes, for instance, transcriptomic measurements, there is a desire to understand their relevance to a phenotype of interest. Pathway enrichment analysis methods aim to summarize gene lists as pathways, which have a closer link to cell function. An online ‘Guide’ by Pathway Commons includes workflows that illustrate how to chain together software tools to identify pathways from the corresponding gene-level data then organize and summarize the pathway-level results in an interactive visualization known as an Enrichment Map. For those wishing to drill-down to individual pathways, Pathway Commons offers a set of web apps, including ‘Search’ that enables users to query by keyword and visualize ranked search results. Ongoing development of web apps aims to enhance the accessibility to pathways and integrate support for analysis and visualization of experimental data. The full complement of data, tools and resources offered by Pathway Commons in support of pathway analysis are described.
Citation Format: Augustin Luna, Jeffrey V. Wong, Emek Demir, Igor Rodchenkov, Özgün Babur, Chris Sander, Gary D. Bader. Interpreting gene lists from -omics experiments [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 3451.
Collapse
|
4
|
Abstract 1284: How can you interpret gene lists from -omics experiments. Cancer Res 2018. [DOI: 10.1158/1538-7445.am2018-1284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Understanding the mechanisms responsible for a cellular behaviour often begins with observations of genes and gene products. Depending on the type of experiment, the number of resulting genes can be small, but increasingly, researchers are faced with many thousands of measurements, as in the case of transcriptomic or protein-DNA binding observations. Here, we describe ways to pair experimental results consisting of one or more genes with analysis tools with the overall aim being to make results more biologically interpretable. In certain cases, experimental approaches such as screens for essential genes can generate one or a few ‘genes of interest' and there is a desire to understand their relationship to one another as well as discover links to additional, interesting genes. To this end, ‘GeneMANIA' is a web tool that accepts gene names and returns a network visualization of related genes based on similarity in expression, localization, protein domains and those involved in physical interactions. Likewise, ‘PCViz' is a web tool that displays a network of interactions drawn from Pathway Commons, a web resource for pathway and interaction knowledge. In cases where experiments generate a lengthy list of genes, for instance, transcriptomic measurements, there is a desire to understand their relevance to a phenotype of interest. Pathway enrichment analysis methods aim to summarize gene lists as pathways, which have a closer link to cell function. An online ‘Guide' by Pathway Commons includes workflows that illustrate how to chain together software tools to identify pathways from the corresponding gene-level data then organize and summarize the pathway-level results in an interactive visualization known as an Enrichment Map. For those wishing to drill-down to individual pathways, Pathway Commons offers a set of web apps, including ‘Search' that enables users to query by keyword and visualize ranked search results. Ongoing development of web apps aims to enhance the accessibility to pathways and integrate support for analysis and visualization of experimental data. The full complement of data, tools and resources offered by Pathway Commons in support of pathway analysis are described.
Citation Format: Jeffrey V. Wong, Augustin Luna, Emek Demir, Igor Rodchenkov, Özgün Babur, Chris Sander, Gary D. Bader. How can you interpret gene lists from -omics experiments [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 1284.
Collapse
|
5
|
Abstract
A rapidly growing corpus of formal, computable pathway information can be used to answer important biological questions including finding non-trivial connections between cellular processes, identifying significantly altered portions of the cellular network in a disease state and building predictive models that can be used for precision medicine. Due to its complexity and fragmented nature, however, working with pathway data is still difficult. We present Paxtools, a Java library that contains algorithms, software components and converters for biological pathways represented in the standard BioPAX language. Paxtools allows scientists to focus on their scientific problem by removing technical barriers to access and analyse pathway information. Paxtools can run on any platform that has a Java Runtime Environment and was tested on most modern operating systems. Paxtools is open source and is available under the Lesser GNU public license (LGPL), which allows users to freely use the code in their software systems with a requirement for attribution. Source code for the current release (4.2.0) can be found in Software S1. A detailed manual for obtaining and using Paxtools can be found in Protocol S1. The latest sources and release bundles can be obtained from biopax.org/paxtools.
Collapse
|
6
|
Abstract
Motivation: BioPAX is a standard language for representing complex cellular processes, including metabolic networks, signal transduction and gene regulation. Owing to the inherent complexity of a BioPAX model, searching for a specific type of subnetwork can be non-trivial and difficult. Results: We developed an open source and extensible framework for defining and searching graph patterns in BioPAX models. We demonstrate its use with a sample pattern that captures directed signaling relations between proteins. We provide search results for the pattern obtained from the Pathway Commons database and compare these results with the current data in signaling databases SPIKE and SignaLink. Results show that a pattern search in public pathway data can identify a substantial amount of signaling relations that do not exist in signaling databases. Availability: BioPAX-pattern software was developed in Java. Source code and documentation is freely available at http://code.google.com/p/biopax-pattern under Lesser GNU Public License. Contact:patternsearch@cbio.mskcc.org Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
7
|
Abstract
Summary: BioPAX is a community-developed standard language for biological pathway data. A key functionality required for efficient BioPAX data exchange is validation—detecting errors and inconsistencies in BioPAX documents. The BioPAX Validator is a command-line tool, Java library and online web service for BioPAX that performs >100 classes of consistency checks. Availability and implementation: The validator recognizes common syntactic errors and semantic inconsistencies and reports them in a customizable human readable format. It can also automatically fix some errors and normalize BioPAX data. Since its release, the validator has become a critical tool for the pathway informatics community, detecting thousands of errors and helping substantially increase the conformity and uniformity of BioPAX-formatted data. The BioPAX Validator is open source and released under LGPL v3 license. All sources, binaries and documentation can be found at sf.net/p/biopax, and the latest stable version of the web application is available at biopax.org/validator. Contact:igor.rodchenkov@utoronto.ca or gary.bader@utoronto.ca
Collapse
|
8
|
PPISURV: a novel bioinformatics tool for uncovering the hidden role of specific genes in cancer survival outcome. Oncogene 2013; 33:1621-8. [PMID: 23686313 DOI: 10.1038/onc.2013.119] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Revised: 01/31/2013] [Accepted: 02/07/2013] [Indexed: 12/31/2022]
Abstract
Multiple clinical studies have correlated gene expression with survival outcome in cancer on a genome-wide scale. However, in many cases, no obvious correlation between expression of well-known tumour-related genes (that is, p53, p73 and p21) and survival rates of patients has been observed. This can be mainly explained by the complex molecular mechanisms involved in cancer, which mask the clinical relevance of a gene with multiple functions if only gene expression status is considered. As we demonstrate here, in many such cases, the expression of the gene interaction partners (gene 'interactome') correlates significantly with cancer survival and is indicative of the role of that gene in cancer. On the basis of this principle, we have implemented a free online datamining tool (http://www.bioprofiling.de/PPISURV). PPISURV automatically correlates expression of an input gene interactome with survival rates on >40 publicly available clinical expression data sets covering various tumours involving about 8000 patients in total. To derive the query gene interactome, PPISURV employs several public databases including protein-protein interactions, regulatory and signalling pathways and protein post-translational modifications.
Collapse
|
9
|
|
10
|
Abstract
Pathway Commons (http://www.pathwaycommons.org) is a collection of publicly available pathway data from multiple organisms. Pathway Commons provides a web-based interface that enables biologists to browse and search a comprehensive collection of pathways from multiple sources represented in a common language, a download site that provides integrated bulk sets of pathway information in standard or convenient formats and a web service that software developers can use to conveniently query and access all data. Database providers can share their pathway data via a common repository. Pathways include biochemical reactions, complex assembly, transport and catalysis events and physical interactions involving proteins, DNA, RNA, small molecules and complexes. Pathway Commons aims to collect and integrate all public pathway data available in standard formats. Pathway Commons currently contains data from nine databases with over 1400 pathways and 687 000 interactions and will be continually expanded and updated.
Collapse
|
11
|
Abstract
BioPAX (Biological Pathway Exchange) is a standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data (http://www.biopax.org). Pathway data captures our understanding of biological processes, but its rapid growth necessitates development of databases and computational tools to aid interpretation. However, the current fragmentation of pathway information across many databases with incompatible formats presents barriers to its effective use. BioPAX solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. BioPAX was created through a community process. Through BioPAX, millions of interactions organized into thousands of pathways across many organisms, from a growing number of sources, are available. Thus, large amounts of pathway data are available in a computable form to support visualization, analysis and biological discovery.
Collapse
|
12
|
Abstract
CCancer is an automatically collected database of gene lists, which were reported mostly by experimental studies in various biological and clinical contexts. At the moment, the database covers 3369 gene lists extracted from 2644 papers published in ∼80 peer-reviewed journals. As input, CCancer accepts a gene list. An enrichment analyses is implemented to generate, as output, a highly informative survey over recently published studies that report gene lists, which significantly intersect with the query gene list. A report on gene pairs from the input list which were frequently reported together by other biological studies is also provided. CCancer is freely available at http://mips.helmholtz-muenchen.de/proj/ccancer.
Collapse
|
13
|
PPI spider: a tool for the interpretation of proteomics data in the context of protein-protein interaction networks. Proteomics 2009; 9:2740-9. [PMID: 19405022 DOI: 10.1002/pmic.200800612] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Recent advances in experimental technologies allow for the detection of a complete cell proteome. Proteins that are expressed at a particular cell state or in a particular compartment as well as proteins with differential expression between various cells states are commonly delivered by many proteomics studies. Once a list of proteins is derived, a major challenge is to interpret the identified set of proteins in the biological context. Protein-protein interaction (PPI) data represents abundant information that can be employed for this purpose. However, these data have not yet been fully exploited due to the absence of a methodological framework that can integrate this type of information. Here, we propose to infer a network model from an experimentally identified protein list based on the available information about the topology of the global PPI network. We propose to use a Monte Carlo simulation procedure to compute the statistical significance of the inferred models. The method has been implemented as a freely available web-based tool, PPI spider (http://mips.helmholtz-muenchen.de/proj/ppispider). To support the practical significance of PPI spider, we collected several hundreds of recently published experimental proteomics studies that reported lists of proteins in various biological contexts. We reanalyzed them using PPI spider and demonstrated that in most cases PPI spider could provide statistically significant hypotheses that are helpful for understanding of the protein list.
Collapse
|
14
|
Protein sequence-structure compatibility criteria in terms of statistical hypothesis testing. PROTEIN ENGINEERING 1997; 10:635-46. [PMID: 9278276 DOI: 10.1093/protein/10.6.635] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The assignment of query protein sequences to probable folds in a threading approach is based on the statistical analysis (learning) of structural properties of amino acids in known protein structures. We formalize the recognition problem in terms of mathematical statistics, namely statistical hypothesis testing. Our general formulation leads to various mathematical forms of a decision rule function for evaluation of the quality of a sequence-structure fit. Three criteria were derived according to a likelihood ratio approach. Two of them have new functional forms while the third happens to coincide with the mean force potential function previously derived under the additional assumption of the Boltzmann law. New decision rule functions employ (i) the Parzen estimator of a probability density and (ii) the newly introduced non-parametric statistic with known asymptotic distribution. We compared criteria efficiency by a 'structure seeks sequence' search for three highly populated template folds through a query library of non-homologous sequences of proteins with known 3D structure using residue accessibility as an environmental variable. Various criteria reflect different underlying statistical propositions and thus often recognize diverse correct sequence-structure matches. On the other hand, if an amino acid sequence is recognized as compatible with a template by each of three decision rules it appears that one can make a more reliable inference of sequence-structure relationship since almost all false positives obtained by the three criteria differ.
Collapse
|