1
|
Reprint of “Abstraction for data integration: Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction”. Comput Biol Chem 2015; 59 Pt B:123-38. [DOI: 10.1016/j.compbiolchem.2015.08.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 06/04/2015] [Accepted: 06/05/2015] [Indexed: 12/21/2022]
|
2
|
Rouillard AD, Wang Z, Ma’ayan A. Publisher’s Note:Abstraction for data integration:Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction. Comput Biol Chem 2015; 58:104-19. [PMID: 26101093 PMCID: PMC4675694 DOI: 10.1016/j.compbiolchem.2015.06.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 06/04/2015] [Accepted: 06/05/2015] [Indexed: 12/27/2022]
Abstract
With advances in genomics, transcriptomics, metabolomics and proteomics, and more expansive electronic clinical record monitoring, as well as advances in computation, we have entered the Big Data era in biomedical research. Data gathering is growing rapidly while only a small fraction of this data is converted to useful knowledge or reused in future studies. To improve this, an important concept that is often overlooked is data abstraction. To fuse and reuse biomedical datasets from diverse resources, data abstraction is frequently required. Here we summarize some of the major Big Data biomedical research resources for genomics, proteomics and phenotype data, collected from mammalian cells, tissues and organisms. We then suggest simple data abstraction methods for fusing this diverse but related data. Finally, we demonstrate examples of the potential utility of such data integration efforts, while warning about the inherit biases that exist within such data.
Collapse
Affiliation(s)
- Andrew D. Rouillard
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, NY 10029
- BD2K-LINCS Data Coordination and Integration Center
- Illuminating the Druggable Genome Knowledge Management Center
| | - Zichen Wang
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, NY 10029
- BD2K-LINCS Data Coordination and Integration Center
- Illuminating the Druggable Genome Knowledge Management Center
| | - Avi Ma’ayan
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place Box 1215, New York, NY 10029
- BD2K-LINCS Data Coordination and Integration Center
- Illuminating the Druggable Genome Knowledge Management Center
| |
Collapse
|
3
|
Li W, Espinal-Enríquez J, Simpfendorfer KR, Hernández-Lemus E. A survey of disease connections for CD4+ T cell master genes and their directly linked genes. Comput Biol Chem 2015; 59 Pt B:78-90. [PMID: 26411796 DOI: 10.1016/j.compbiolchem.2015.08.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Revised: 08/18/2015] [Accepted: 08/21/2015] [Indexed: 02/07/2023]
Abstract
Genome-wide association studies and other genetic analyses have identified a large number of genes and variants implicating a variety of disease etiological mechanisms. It is imperative for the study of human diseases to put these genetic findings into a coherent functional context. Here we use system biology tools to examine disease connections of five master genes for CD4+ T cell subtypes (TBX21, GATA3, RORC, BCL6, and FOXP3). We compiled a list of genes functionally interacting (protein-protein interaction, or by acting in the same pathway) with the master genes, then we surveyed the disease connections, either by experimental evidence or by genetic association. Embryonic lethal genes (also known as essential genes) are over-represented in master genes and their interacting genes (55% versus 40% in other genes). Transcription factors are significantly enriched among genes interacting with the master genes (63% versus 10% in other genes). Predicted haploinsufficiency is a feature of most these genes. Disease-connected genes are enriched in this list of genes: 42% of these genes have a disease connection according to Online Mendelian Inheritance in Man (OMIM) (versus 23% in other genes), and 74% are associated with some diseases or phenotype in a Genome Wide Association Study (GWAS) (versus 43% in other genes). Seemingly, not all of the diseases connected to genes surveyed were immune related, which may indicate pleiotropic functions of the master regulator genes and associated genes.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA.
| | - Jesús Espinal-Enríquez
- Computational Genomics Department, National Institute of Genomic Medicine, México, D.F., Mexico; Complexity in Systems Biology, Center for Complexity Sciences, Universidad Nacional Autónoma de México, México, D.F., Mexico
| | - Kim R Simpfendorfer
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA
| | - Enrique Hernández-Lemus
- Computational Genomics Department, National Institute of Genomic Medicine, México, D.F., Mexico; Complexity in Systems Biology, Center for Complexity Sciences, Universidad Nacional Autónoma de México, México, D.F., Mexico
| |
Collapse
|
4
|
DYVIPAC: an integrated analysis and visualisation framework to probe multi-dimensional biological networks. Sci Rep 2015. [PMID: 26220783 PMCID: PMC4518224 DOI: 10.1038/srep12569] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Biochemical networks are dynamic and multi-dimensional systems, consisting of tens or hundreds of molecular components. Diseases such as cancer commonly arise due to changes in the dynamics of signalling and gene regulatory networks caused by genetic alternations. Elucidating the network dynamics in health and disease is crucial to better understand the disease mechanisms and derive effective therapeutic strategies. However, current approaches to analyse and visualise systems dynamics can often provide only low-dimensional projections of the network dynamics, which often does not present the multi-dimensional picture of the system behaviour. More efficient and reliable methods for multi-dimensional systems analysis and visualisation are thus required. To address this issue, we here present an integrated analysis and visualisation framework for high-dimensional network behaviour which exploits the advantages provided by parallel coordinates graphs. We demonstrate the applicability of the framework, named "Dynamics Visualisation based on Parallel Coordinates" (DYVIPAC), to a variety of signalling networks ranging in topological wirings and dynamic properties. The framework was proved useful in acquiring an integrated understanding of systems behaviour.
Collapse
|
5
|
Porras P, Duesbury M, Fabregat A, Ueffing M, Orchard S, Gloeckner CJ, Hermjakob H. A visual review of the interactome of LRRK2: Using deep-curated molecular interaction data to represent biology. Proteomics 2015; 15:1390-404. [PMID: 25648416 PMCID: PMC4415485 DOI: 10.1002/pmic.201400390] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Revised: 01/15/2015] [Accepted: 01/29/2015] [Indexed: 02/04/2023]
Abstract
Molecular interaction databases are essential resources that enable access to a wealth of information on associations between proteins and other biomolecules. Network graphs generated from these data provide an understanding of the relationships between different proteins in the cell, and network analysis has become a widespread tool supporting –omics analysis. Meaningfully representing this information remains far from trivial and different databases strive to provide users with detailed records capturing the experimental details behind each piece of interaction evidence. A targeted curation approach is necessary to transfer published data generated by primarily low-throughput techniques into interaction databases. In this review we present an example highlighting the value of both targeted curation and the subsequent effective visualization of detailed features of manually curated interaction information. We have curated interactions involving LRRK2, a protein of largely unknown function linked to familial forms of Parkinson's disease, and hosted the data in the IntAct database. This LRRK2-specific dataset was then used to produce different visualization examples highlighting different aspects of the data: the level of confidence in the interaction based on orthogonal evidence, those interactions found under close-to-native conditions, and the enzyme–substrate relationships in different in vitro enzymatic assays. Finally, pathway annotation taken from the Reactome database was overlaid on top of interaction networks to bring biological functional context to interaction maps.
Collapse
Affiliation(s)
- Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | | | | | | |
Collapse
|
6
|
Hirst JD, Glowacki DR, Baaden M. Molecular simulations and visualization: introduction and overview. Faraday Discuss 2014; 169:9-22. [DOI: 10.1039/c4fd90024c] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
7
|
Villoutreix BO, Kuenemann MA, Poyet JL, Bruzzoni-Giovanelli H, Labbé C, Lagorce D, Sperandio O, Miteva MA. Drug-Like Protein-Protein Interaction Modulators: Challenges and Opportunities for Drug Discovery and Chemical Biology. Mol Inform 2014; 33:414-437. [PMID: 25254076 PMCID: PMC4160817 DOI: 10.1002/minf.201400040] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 04/21/2014] [Indexed: 12/13/2022]
Abstract
[Formula: see text] Fundamental processes in living cells are largely controlled by macromolecular interactions and among them, protein-protein interactions (PPIs) have a critical role while their dysregulations can contribute to the pathogenesis of numerous diseases. Although PPIs were considered as attractive pharmaceutical targets already some years ago, they have been thus far largely unexploited for therapeutic interventions with low molecular weight compounds. Several limiting factors, from technological hurdles to conceptual barriers, are known, which, taken together, explain why research in this area has been relatively slow. However, this last decade, the scientific community has challenged the dogma and became more enthusiastic about the modulation of PPIs with small drug-like molecules. In fact, several success stories were reported both, at the preclinical and clinical stages. In this review article, written for the 2014 International Summer School in Chemoinformatics (Strasbourg, France), we discuss in silico tools (essentially post 2012) and databases that can assist the design of low molecular weight PPI modulators (these tools can be found at www.vls3d.com). We first introduce the field of protein-protein interaction research, discuss key challenges and comment recently reported in silico packages, protocols and databases dedicated to PPIs. Then, we illustrate how in silico methods can be used and combined with experimental work to identify PPI modulators.
Collapse
Affiliation(s)
- Bruno O Villoutreix
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse59000 Lille, France
| | - Melaine A Kuenemann
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| | - Jean-Luc Poyet
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- IUH, Hôpital Saint-LouisParis, France
- CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse59000 Lille, France
| | - Heriberto Bruzzoni-Giovanelli
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- CIC, Clinical investigation center, Hôpital Saint-LouisParis, France
| | - Céline Labbé
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| | - David Lagorce
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| | - Olivier Sperandio
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse59000 Lille, France
| | - Maria A Miteva
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| |
Collapse
|
8
|
Jayaswal V, Schramm SJ, Mann GJ, Wilkins MR, Yang YH. VAN: an R package for identifying biologically perturbed networks via differential variability analysis. BMC Res Notes 2013; 6:430. [PMID: 24156242 PMCID: PMC4015612 DOI: 10.1186/1756-0500-6-430] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Accepted: 10/18/2013] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Large-scale molecular interaction networks are dynamic in nature and are of special interest in the analysis of complex diseases, which are characterized by network-level perturbations rather than changes in individual genes/proteins. The methods developed for the identification of differentially expressed genes or gene sets are not suitable for network-level analyses. Consequently, bioinformatics approaches that enable a joint analysis of high-throughput transcriptomics datasets and large-scale molecular interaction networks for identifying perturbed networks are gaining popularity. Typically, these approaches require the sequential application of multiple bioinformatics techniques - ID mapping, network analysis, and network visualization. Here, we present the Variability Analysis in Networks (VAN) software package: a collection of R functions to streamline this bioinformatics analysis. FINDINGS VAN determines whether there are network-level perturbations across biological states of interest. It first identifies hubs (densely connected proteins/microRNAs) in a network and then uses them to extract network modules (comprising of a hub and all its interaction partners). The function identifySignificantHubs identifies dysregulated modules (i.e. modules with changes in expression correlation between a hub and its interaction partners) using a single expression and network dataset. The function summarizeHubData identifies dysregulated modules based on a meta-analysis of multiple expression and/or network datasets. VAN also converts protein identifiers present in a MITAB-formatted interaction network to gene identifiers (UniProt identifier to Entrez identifier or gene symbol using the function generatePpiMap) and generates microRNA-gene interaction networks using TargetScan and Microcosm databases (generateMicroRnaMap). The function obtainCancerInfo is used to identify hubs (corresponding to significantly perturbed modules) that are already causally associated with cancer(s) in the Cancer Gene Census database. Additionally, VAN supports the visualization of changes to network modules in R and Cytoscape (visualizeNetwork and obtainPairSubset, respectively). We demonstrate the utility of VAN using a gene expression data from metastatic melanoma and a protein-protein interaction network from the Human Protein Reference Database. CONCLUSIONS Our package provides a comprehensive and user-friendly platform for the integrative analysis of -omics data to identify disease-associated network modules. This bioinformatics approach, which is essentially focused on the question of explaining phenotype with a 'network type' and in particular, how regulation is changing among different states of interest, is relevant to many questions including those related to network perturbations across developmental timelines.
Collapse
Affiliation(s)
- Vivek Jayaswal
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
| | - Sarah-Jane Schramm
- Westmead Millennium Institute for Medical Research, Sydney Medical School, The University of Sydney, Sydney, NSW, Australia
- Melanoma Institute Australia, Sydney, NSW, Australia
| | - Graham J Mann
- Westmead Millennium Institute for Medical Research, Sydney Medical School, The University of Sydney, Sydney, NSW, Australia
- Melanoma Institute Australia, Sydney, NSW, Australia
| | - Marc R Wilkins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
- Systems Biology Initiative, University of New South Wales, Sydney, NSW, Australia
| | - Yee Hwa Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Melanoma Institute Australia, Sydney, NSW, Australia
| |
Collapse
|
9
|
Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 2013; 138:333-408. [PMID: 23384594 PMCID: PMC3647006 DOI: 10.1016/j.pharmthera.2013.01.016] [Citation(s) in RCA: 506] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 01/22/2013] [Indexed: 02/02/2023]
Abstract
Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only give a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central nodes/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved by targeting the neighbors of central nodes/edges. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
Collapse
Affiliation(s)
- Peter Csermely
- Department of Medical Chemistry, Semmelweis University, P.O. Box 260, H-1444 Budapest 8, Hungary.
| | | | | | | | | |
Collapse
|