1
|
Defoort J, Van de Peer Y, Vermeirssen V. Function, dynamics and evolution of network motif modules in integrated gene regulatory networks of worm and plant. Nucleic Acids Res 2019; 46:6480-6503. [PMID: 29873777 PMCID: PMC6061849 DOI: 10.1093/nar/gky468] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Accepted: 05/14/2018] [Indexed: 12/29/2022] Open
Abstract
Gene regulatory networks (GRNs) consist of different molecular interactions that closely work together to establish proper gene expression in time and space. Especially in higher eukaryotes, many questions remain on how these interactions collectively coordinate gene regulation. We study high quality GRNs consisting of undirected protein–protein, genetic and homologous interactions, and directed protein–DNA, regulatory and miRNA–mRNA interactions in the worm Caenorhabditis elegans and the plant Arabidopsis thaliana. Our data-integration framework integrates interactions in composite network motifs, clusters these in biologically relevant, higher-order topological network motif modules, overlays these with gene expression profiles and discovers novel connections between modules and regulators. Similar modules exist in the integrated GRNs of worm and plant. We show how experimental or computational methodologies underlying a certain data type impact network topology. Through phylogenetic decomposition, we found that proteins of worm and plant tend to functionally interact with proteins of a similar age, while at the regulatory level TFs favor same age, but also older target genes. Despite some influence of the duplication mode difference, we also observe at the motif and module level for both species a preference for age homogeneity for undirected and age heterogeneity for directed interactions. This leads to a model where novel genes are added together to the GRNs in a specific biological functional context, regulated by one or more TFs that also target older genes in the GRNs. Overall, we detected topological, functional and evolutionary properties of GRNs that are potentially universal in all species.
Collapse
Affiliation(s)
- Jonas Defoort
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium.,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, 9052 Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium.,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, 9052 Ghent, Belgium.,Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa
| | - Vanessa Vermeirssen
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium.,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, 9052 Ghent, Belgium
| |
Collapse
|
2
|
Aparicio D, Ribeiro P, Silva F. Extending the Applicability of Graphlets to Directed Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1302-1315. [PMID: 27362986 DOI: 10.1109/tcbb.2016.2586046] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
With recent advances in high-throughput cell biology, the amount of cellular biological data has grown drastically. Such data is often modeled as graphs (also called networks) and studying them can lead to new insights into molecule-level organization. A possible way to understand their structure is by analyzing the smaller components that constitute them, namely network motifs and graphlets. Graphlets are particularly well suited to compare networks and to assess their level of similarity due to the rich topological information that they offer but are almost always used as small undirected graphs of up to five nodes, thus limiting their applicability in directed networks. However, a large set of interesting biological networks such as metabolic, cell signaling, or transcriptional regulatory networks are intrinsically directional, and using metrics that ignore edge direction may gravely hinder information extraction. Our main purpose in this work is to extend the applicability of graphlets to directed networks by considering their edge direction, thus providing a powerful basis for the analysis of directed biological networks. We tested our approach on two network sets, one composed of synthetic graphs and another of real directed biological networks, and verified that they were more accurately grouped using directed graphlets than undirected graphlets. It is also evident that directed graphlets offer substantially more topological information than simple graph metrics such as degree distribution or reciprocity. However, enumerating graphlets in large networks is a computationally demanding task. Our implementation addresses this concern by using a state-of-the-art data structure, the g-trie, which is able to greatly reduce the necessary computation. We compared our tool to other state-of-the art methods and verified that it is the fastest general tool for graphlet counting.
Collapse
|
3
|
Haase T, Börnigen D, Müller C, Zeller T. Systems Medicine as an Emerging Tool for Cardiovascular Genetics. Front Cardiovasc Med 2016; 3:27. [PMID: 27626034 PMCID: PMC5003874 DOI: 10.3389/fcvm.2016.00027] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Accepted: 08/16/2016] [Indexed: 01/11/2023] Open
Abstract
Cardiovascular disease (CVD) is a major contributor to morbidity and mortality worldwide. However, the pathogenesis of CVD is complex and remains elusive. Within the last years, systems medicine has emerged as a novel tool to study the complex genetic, molecular, and physiological interactions leading to diseases. In this review, we provide an overview about the current approaches for systems medicine in CVD. They include bioinformatical and experimental tools such as cell and animal models, omics technologies, network, and pathway analyses. Additionally, we discuss challenges and current literature examples where systems medicine has been successfully applied for the study of CVD.
Collapse
Affiliation(s)
- Tina Haase
- Clinic for General and Interventional Cardiology, University Heart Center Hamburg, Hamburg, Germany; Partner Site Hamburg/Kiel/Lübeck, German Center for Cardiovascular Research (DZHK e.V.), Hamburg, Germany
| | - Daniela Börnigen
- Clinic for General and Interventional Cardiology, University Heart Center Hamburg, Hamburg, Germany; Partner Site Hamburg/Kiel/Lübeck, German Center for Cardiovascular Research (DZHK e.V.), Hamburg, Germany
| | - Christian Müller
- Clinic for General and Interventional Cardiology, University Heart Center Hamburg, Hamburg, Germany; Partner Site Hamburg/Kiel/Lübeck, German Center for Cardiovascular Research (DZHK e.V.), Hamburg, Germany
| | - Tanja Zeller
- Clinic for General and Interventional Cardiology, University Heart Center Hamburg, Hamburg, Germany; Partner Site Hamburg/Kiel/Lübeck, German Center for Cardiovascular Research (DZHK e.V.), Hamburg, Germany
| |
Collapse
|
4
|
Börnigen D, Tyekucheva S, Wang X, Rider JR, Lee GS, Mucci LA, Sweeney C, Huttenhower C. Computational Reconstruction of NFκB Pathway Interaction Mechanisms during Prostate Cancer. PLoS Comput Biol 2016; 12:e1004820. [PMID: 27078000 PMCID: PMC4831844 DOI: 10.1371/journal.pcbi.1004820] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 02/19/2016] [Indexed: 12/21/2022] Open
Abstract
Molecular research in cancer is one of the largest areas of bioinformatic investigation, but it remains a challenge to understand biomolecular mechanisms in cancer-related pathways from high-throughput genomic data. This includes the Nuclear-factor-kappa-B (NFκB) pathway, which is central to the inflammatory response and cell proliferation in prostate cancer development and progression. Despite close scrutiny and a deep understanding of many of its members’ biomolecular activities, the current list of pathway members and a systems-level understanding of their interactions remains incomplete. Here, we provide the first steps toward computational reconstruction of interaction mechanisms of the NFκB pathway in prostate cancer. We identified novel roles for ATF3, CXCL2, DUSP5, JUNB, NEDD9, SELE, TRIB1, and ZFP36 in this pathway, in addition to new mechanistic interactions between these genes and 10 known NFκB pathway members. A newly predicted interaction between NEDD9 and ZFP36 in particular was validated by co-immunoprecipitation, as was NEDD9's potential biological role in prostate cancer cell growth regulation. We combined 651 gene expression datasets with 1.4M gene product interactions to predict the inclusion of 40 additional genes in the pathway. Molecular mechanisms of interaction among pathway members were inferred using recent advances in Bayesian data integration to simultaneously provide information specific to biological contexts and individual biomolecular activities, resulting in a total of 112 interactions in the fully reconstructed NFκB pathway: 13 (11%) previously known, 29 (26%) supported by existing literature, and 70 (63%) novel. This method is generalizable to other tissue types, cancers, and organisms, and this new information about the NFκB pathway will allow us to further understand prostate cancer and to develop more effective prevention and treatment strategies. In molecular research in cancer it remains challenging to uncover biomolecular mechanisms in cancer-related pathways from high-throughput genomic data, including the Nuclear-factor-kappa-B (NFκB) pathway. Despite close scrutiny and a deep understanding of many of the NFκB pathway members’ biomolecular activities, the current list of pathway members and a systems-level understanding of their interactions remains incomplete. In this study, we provide the first steps toward computational reconstruction of interaction mechanisms of the NFκB pathway in prostate cancer. We identified novel roles for 8 genes in this pathway and new mechanistic interactions between these genes and 10 known pathway members. We combined 651 gene expression datasets with 1.4M interactions to predict the inclusion of 40 additional genes in the pathway. Molecular mechanisms of interaction were inferred using recent advances in Bayesian data integration to simultaneously provide information specific to biological contexts and individual biomolecular activities, resulting in 112 interactions in the fully reconstructed NFκB pathway. This method is generalizable, and this new information about the NFκB pathway will allow us to further understand prostate cancer.
Collapse
Affiliation(s)
- Daniela Börnigen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.,The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Svitlana Tyekucheva
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Xiaodong Wang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Jennifer R Rider
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Gwo-Shu Lee
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Lorelei A Mucci
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Christopher Sweeney
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.,The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
5
|
Acharya L, Reynolds R, Zhu D. Network inference through synergistic subnetwork evolution. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:12. [PMID: 26640480 PMCID: PMC4662719 DOI: 10.1186/s13637-015-0027-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 08/21/2015] [Indexed: 12/02/2022]
Abstract
Study of signaling networks is important for a better understanding of cell behaviors e.g., growth, differentiation, metabolism, proptosis, and gaining deeper insights into the molecular mechanisms of complex diseases. While there have been many successes in developing computational approaches for identifying potential genes and proteins involved in cell signaling, new methods are needed for identifying network structures that depict underlying signal cascading mechanisms. In this paper, we propose a new computational approach for inferring signaling network structures from overlapping gene sets related to the networks. In the proposed approach, a signaling network is represented as a directed graph and is viewed as a union of many active paths representing linear and overlapping chains of signal cascading activities in the network. Gene sets represent the sets of genes participating in active paths without prior knowledge of the order in which genes occur within each path. From a compendium of unordered gene sets, the proposed algorithm reconstructs the underlying network structure through evolution of synergistic active paths. In our context, the extent of edge overlapping among active paths is used to define the synergy present in a network. We evaluated the performance of the proposed algorithm in terms of its convergence and recovering true active paths by utilizing four gene set compendiums derived from the KEGG database. Evaluation of results demonstrate the ability of the algorithm in reconstructing the underlying networks with high accuracy and precision.
Collapse
Affiliation(s)
- Lipi Acharya
- Dow AgroSciences, 9330 Zionsville Road, Indianapolis, IN 46268 USA
| | - Robert Reynolds
- Department of Computer Science, Wayne State University, 5057 Woodward Avenue, Detroit, MI 48202 USA
| | - Dongxiao Zhu
- Department of Computer Science, Wayne State University, 5057 Woodward Avenue, Detroit, MI 48202 USA
| |
Collapse
|
6
|
Sequencing and beyond: integrating molecular 'omics' for microbial community profiling. Nat Rev Microbiol 2015; 13:360-72. [PMID: 25915636 DOI: 10.1038/nrmicro3451] [Citation(s) in RCA: 406] [Impact Index Per Article: 45.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
High-throughput DNA sequencing has proven invaluable for investigating diverse environmental and host-associated microbial communities. In this Review, we discuss emerging strategies for microbial community analysis that complement and expand traditional metagenomic profiling. These include novel DNA sequencing strategies for identifying strain-level microbial variation and community temporal dynamics; measuring multiple 'omic' data types that better capture community functional activity, such as transcriptomics, proteomics and metabolomics; and combining multiple forms of omic data in an integrated framework. We highlight studies in which the 'multi-omics' approach has led to improved mechanistic models of microbial community structure and function.
Collapse
|
7
|
Zhu F, Shi L, Engel JD, Guan Y. Regulatory network inferred using expression data of small sample size: application and validation in erythroid system. Bioinformatics 2015; 31:2537-44. [PMID: 25840044 DOI: 10.1093/bioinformatics/btv186] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 03/27/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Modeling regulatory networks using expression data observed in a differentiation process may help identify context-specific interactions. The outcome of the current algorithms highly depends on the quality and quantity of a single time-course dataset, and the performance may be compromised for datasets with a limited number of samples. RESULTS In this work, we report a multi-layer graphical model that is capable of leveraging many publicly available time-course datasets, as well as a cell lineage-specific data with small sample size, to model regulatory networks specific to a differentiation process. First, a collection of network inference methods are used to predict the regulatory relationships in individual public datasets. Then, the inferred directional relationships are weighted and integrated together by evaluating against the cell lineage-specific dataset. To test the accuracy of this algorithm, we collected a time-course RNA-Seq dataset during human erythropoiesis to infer regulatory relationships specific to this differentiation process. The resulting erythroid-specific regulatory network reveals novel regulatory relationships activated in erythropoiesis, which were further validated by genome-wide TR4 binding studies using ChIP-seq. These erythropoiesis-specific regulatory relationships were not identifiable by single dataset-based methods or context-independent integrations. Analysis of the predicted targets reveals that they are all closely associated with hematopoietic lineage differentiation.
Collapse
Affiliation(s)
- Fan Zhu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lihong Shi
- State Key Laboratory of Experimental Hematology, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | | | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA, Department of Internal Medicine, and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
8
|
Pelle KG, Oh K, Buchholz K, Narasimhan V, Joice R, Milner DA, Brancucci NM, Ma S, Voss TS, Ketman K, Seydel KB, Taylor TE, Barteneva NS, Huttenhower C, Marti M. Transcriptional profiling defines dynamics of parasite tissue sequestration during malaria infection. Genome Med 2015; 7:19. [PMID: 25722744 PMCID: PMC4342211 DOI: 10.1186/s13073-015-0133-7] [Citation(s) in RCA: 75] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 01/15/2015] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND During intra-erythrocytic development, late asexually replicating Plasmodium falciparum parasites sequester from peripheral circulation. This facilitates chronic infection and is linked to severe disease and organ-specific pathology including cerebral and placental malaria. Immature gametocytes - sexual stage precursor cells - likewise disappear from circulation. Recent work has demonstrated that these sexual stage parasites are located in the hematopoietic system of the bone marrow before mature gametocytes are released into the bloodstream to facilitate mosquito transmission. However, as sequestration occurs only in vivo and not during in vitro culture, the mechanisms by which it is regulated and enacted (particularly by the gametocyte stage) remain poorly understood. RESULTS We generated the most comprehensive P. falciparum functional gene network to date by integrating global transcriptional data from a large set of asexual and sexual in vitro samples, patient-derived in vivo samples, and a new set of in vitro samples profiling sexual commitment. We defined more than 250 functional modules (clusters) of genes that are co-expressed primarily during the intra-erythrocytic parasite cycle, including 35 during sexual commitment and gametocyte development. Comparing the in vivo and in vitro datasets allowed us, for the first time, to map the time point of asexual parasite sequestration in patients to 22 hours post-invasion, confirming previous in vitro observations on the dynamics of host cell modification and cytoadherence. Moreover, we were able to define the properties of gametocyte sequestration, demonstrating the presence of two circulating gametocyte populations: gametocyte rings between 0 and approximately 30 hours post-invasion and mature gametocytes after around 7 days post-invasion. CONCLUSIONS This study provides a bioinformatics resource for the functional elucidation of parasite life cycle dynamics and specifically demonstrates the presence of the gametocyte ring stages in circulation, adding significantly to our understanding of the dynamics of gametocyte sequestration in vivo.
Collapse
Affiliation(s)
- Karell G Pelle
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA
| | - Keunyoung Oh
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115 USA
| | - Kathrin Buchholz
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA
| | - Vagheesh Narasimhan
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115 USA
| | - Regina Joice
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA
| | - Danny A Milner
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA ; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115 USA
| | - Nicolas Mb Brancucci
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA ; Swiss Tropical and Public Health Institute, 4051 Basel, Switzerland
| | - Siyuan Ma
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115 USA
| | - Till S Voss
- Swiss Tropical and Public Health Institute, 4051 Basel, Switzerland
| | - Ken Ketman
- Program in Cellular and Molecular Medicine, Children's Hospital, Boston, MA 02115 USA
| | - Karl B Seydel
- College of Osteopathic Medicine, Michigan State University, East Lansing, MI 48825 USA ; Blantyre Malaria Project, University of Malawi College of Medicine, Blantyre, 3 Malawi
| | - Terrie E Taylor
- College of Osteopathic Medicine, Michigan State University, East Lansing, MI 48825 USA ; Blantyre Malaria Project, University of Malawi College of Medicine, Blantyre, 3 Malawi
| | - Natasha S Barteneva
- Program in Cellular and Molecular Medicine, Children's Hospital, Boston, MA 02115 USA ; Department of Pediatrics, Harvard Medical School, Boston, MA 02115 USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115 USA ; The Broad Institute of Harvard and MIT, Cambridge, MA 02142 USA
| | - Matthias Marti
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA
| |
Collapse
|
9
|
Joice R, Yasuda K, Shafquat A, Morgan XC, Huttenhower C. Determining microbial products and identifying molecular targets in the human microbiome. Cell Metab 2014; 20:731-741. [PMID: 25440055 PMCID: PMC4254638 DOI: 10.1016/j.cmet.2014.10.003] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Human-associated microbes are the source of many bioactive microbial products (proteins and metabolites) that play key functions both in human host pathways and in microbe-microbe interactions. Culture-independent studies now provide an accelerated means of exploring novel bioactives in the human microbiome; however, intriguingly, a substantial fraction of the microbial metagenome cannot be mapped to annotated genes or isolate genomes and is thus of unknown function. Meta'omic approaches, including metagenomic sequencing, metatranscriptomics, metabolomics, and integration of multiple assay types, represent an opportunity to efficiently explore this large pool of potential therapeutics. In combination with appropriate follow-up validation, high-throughput culture-independent assays can be combined with computational approaches to identify and characterize novel and biologically interesting microbial products. Here we briefly review the state of microbial product identification and characterization and discuss possible next steps to catalog and leverage the large uncharted fraction of the microbial metagenome.
Collapse
Affiliation(s)
- Regina Joice
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Koji Yasuda
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Afrah Shafquat
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Xochitl C Morgan
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
10
|
Lee YS, Krishnan A, Zhu Q, Troyanskaya OG. Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies. ACTA ACUST UNITED AC 2013; 29:3036-44. [PMID: 24037214 PMCID: PMC3834796 DOI: 10.1093/bioinformatics/btt529] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Motivation: Leveraging gene expression data through large-scale integrative analyses for multicellular organisms is challenging because most samples are not fully annotated to their tissue/cell-type of origin. A computational method to classify samples using their entire gene expression profiles is needed. Such a method must be applicable across thousands of independent studies, hundreds of gene expression technologies and hundreds of diverse human tissues and cell-types. Results: We present Unveiling RNA Sample Annotation (URSA) that leverages the complex tissue/cell-type relationships and simultaneously estimates the probabilities associated with hundreds of tissues/cell-types for any given gene expression profile. URSA provides accurate and intuitive probability values for expression profiles across independent studies and outperforms other methods, irrespective of data preprocessing techniques. Moreover, without re-training, URSA can be used to classify samples from diverse microarray platforms and even from next-generation sequencing technology. Finally, we provide a molecular interpretation for the tissue and cell-type models as the biological basis for URSA’s classifications. Availability and implementation: An interactive web interface for using URSA for gene expression analysis is available at: ursa.princeton.edu. The source code is available at https://bitbucket.org/youngl/ursa_backend. Contact:ogt@cs.princeton.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Young-suk Lee
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | | | | | | |
Collapse
|
11
|
Rajagopalan P, Kasif S, Murali T. Systems Biology Characterization of Engineered Tissues. Annu Rev Biomed Eng 2013; 15:55-70. [DOI: 10.1146/annurev-bioeng-071811-150120] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Padmavathy Rajagopalan
- Department of Chemical Engineering, Virginia Tech, Blacksburg, Virginia 24060;
- School of Biomedical Engineering and Sciences, Virginia Tech, Blacksburg, Virginia 24060
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, Virginia 24060
| | - Simon Kasif
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215
| | - T.M. Murali
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24060
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, Virginia 24060
| |
Collapse
|
12
|
Van Hemert JL, Dickerson JA. Discriminating response groups in metabolic and regulatory pathway networks. Bioinformatics 2012; 28:947-54. [DOI: 10.1093/bioinformatics/bts039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
13
|
Marbach D, Roy S, Ay F, Meyer PE, Candeias R, Kahveci T, Bristow CA, Kellis M. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks. Genome Res 2012; 22:1334-49. [PMID: 22456606 PMCID: PMC3396374 DOI: 10.1101/gr.127191.111] [Citation(s) in RCA: 89] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.
Collapse
Affiliation(s)
- Daniel Marbach
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Acharya LR, Judeh T, Wang G, Zhu D. Optimal structural inference of signaling pathways from unordered and overlapping gene sets. Bioinformatics 2012; 28:546-56. [PMID: 22199386 PMCID: PMC3278757 DOI: 10.1093/bioinformatics/btr696] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Revised: 11/16/2011] [Accepted: 12/18/2011] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION A plethora of bioinformatics analysis has led to the discovery of numerous gene sets, which can be interpreted as discrete measurements emitted from latent signaling pathways. Their potential to infer signaling pathway structures, however, has not been sufficiently exploited. Existing methods accommodating discrete data do not explicitly consider signal cascading mechanisms that characterize a signaling pathway. Novel computational methods are thus needed to fully utilize gene sets and broaden the scope from focusing only on pairwise interactions to the more general cascading events in the inference of signaling pathway structures. RESULTS We propose a gene set based simulated annealing (SA) algorithm for the reconstruction of signaling pathway structures. A signaling pathway structure is a directed graph containing up to a few hundred nodes and many overlapping signal cascades, where each cascade represents a chain of molecular interactions from the cell surface to the nucleus. Gene sets in our context refer to discrete sets of genes participating in signal cascades, the basic building blocks of a signaling pathway, with no prior information about gene orderings in the cascades. From a compendium of gene sets related to a pathway, SA aims to search for signal cascades that characterize the optimal signaling pathway structure. In the search process, the extent of overlap among signal cascades is used to measure the optimality of a structure. Throughout, we treat gene sets as random samples from a first-order Markov chain model. We evaluated the performance of SA in three case studies. In the first study conducted on 83 KEGG pathways, SA demonstrated a significantly better performance than Bayesian network methods. Since both SA and Bayesian network methods accommodate discrete data, use a 'search and score' network learning strategy and output a directed network, they can be compared in terms of performance and computational time. In the second study, we compared SA and Bayesian network methods using four benchmark datasets from DREAM. In our final study, we showcased two context-specific signaling pathways activated in breast cancer. AVAILABILITY Source codes are available from http://dl.dropbox.com/u/16000775/sa_sc.zip.
Collapse
Affiliation(s)
- Lipi R Acharya
- Department of Computer Science, University of New Orleans, New Orleans, LA 70148, USA
| | | | | | | |
Collapse
|
15
|
Sequence- and interactome-based prediction of viral protein hotspots targeting host proteins: a case study for HIV Nef. PLoS One 2011; 6:e20735. [PMID: 21738584 PMCID: PMC3125164 DOI: 10.1371/journal.pone.0020735] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 05/08/2011] [Indexed: 01/03/2023] Open
Abstract
Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk.
Collapse
|
16
|
Li B, Cao W, Zhou J, Luo F. Understanding and predicting synthetic lethal genetic interactions in Saccharomyces cerevisiae using domain genetic interactions. BMC SYSTEMS BIOLOGY 2011; 5:73. [PMID: 21586150 PMCID: PMC3113237 DOI: 10.1186/1752-0509-5-73] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2010] [Accepted: 05/17/2011] [Indexed: 12/27/2022]
Abstract
BACKGROUND Synthetic lethal genetic interactions among proteins have been widely used to define functional relationships between proteins and pathways. However, the molecular mechanism of synthetic lethal genetic interactions is still unclear. RESULTS In this study, we demonstrated that yeast synthetic lethal genetic interactions can be explained by the genetic interactions between domains of those proteins. The domain genetic interactions rarely overlap with the domain physical interactions from iPfam database and provide a complementary view about domain relationships. Moreover, we found that domains in multidomain yeast proteins contribute to their genetic interactions differently. The domain genetic interactions help more precisely define the function related to the synthetic lethal genetic interactions, and then help understand how domains contribute to different functionalities of multidomain proteins. Using the probabilities of domain genetic interactions, we were able to predict novel yeast synthetic lethal genetic interactions. Furthermore, we had also identified novel compensatory pathways from the predicted synthetic lethal genetic interactions. CONCLUSION The identification of domain genetic interactions helps the understanding of originality of functional relationship in SLGIs at domain level. Our study significantly improved the understanding of yeast mulitdomain proteins, the synthetic lethal genetic interactions and the functional relationships between proteins and pathways.
Collapse
Affiliation(s)
- Bo Li
- School of Computing, Clemson University, Clemson, SC 29634, USA
| | | | | | | |
Collapse
|