1
|
Hassan J, Saeed SM, Deka L, Uddin MJ, Das DB. Applications of Machine Learning (ML) and Mathematical Modeling (MM) in Healthcare with Special Focus on Cancer Prognosis and Anticancer Therapy: Current Status and Challenges. Pharmaceutics 2024; 16:260. [PMID: 38399314 PMCID: PMC10892549 DOI: 10.3390/pharmaceutics16020260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/29/2024] [Accepted: 02/07/2024] [Indexed: 02/25/2024] Open
Abstract
The use of data-driven high-throughput analytical techniques, which has given rise to computational oncology, is undisputed. The widespread use of machine learning (ML) and mathematical modeling (MM)-based techniques is widely acknowledged. These two approaches have fueled the advancement in cancer research and eventually led to the uptake of telemedicine in cancer care. For diagnostic, prognostic, and treatment purposes concerning different types of cancer research, vast databases of varied information with manifold dimensions are required, and indeed, all this information can only be managed by an automated system developed utilizing ML and MM. In addition, MM is being used to probe the relationship between the pharmacokinetics and pharmacodynamics (PK/PD interactions) of anti-cancer substances to improve cancer treatment, and also to refine the quality of existing treatment models by being incorporated at all steps of research and development related to cancer and in routine patient care. This review will serve as a consolidation of the advancement and benefits of ML and MM techniques with a special focus on the area of cancer prognosis and anticancer therapy, leading to the identification of challenges (data quantity, ethical consideration, and data privacy) which are yet to be fully addressed in current studies.
Collapse
Affiliation(s)
- Jasmin Hassan
- Drug Delivery & Therapeutics Lab, Dhaka 1212, Bangladesh; (J.H.); (S.M.S.)
| | | | - Lipika Deka
- Faculty of Computing, Engineering and Media, De Montfort University, Leicester LE1 9BH, UK;
| | - Md Jasim Uddin
- Department of Pharmaceutical Technology, Faculty of Pharmacy, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Diganta B. Das
- Department of Chemical Engineering, Loughborough University, Loughborough LE11 3TU, UK
| |
Collapse
|
2
|
Greaves RB, Dietmann S, Smith A, Stepney S, Halley JD. A conceptual and computational framework for modelling and understanding the non-equilibrium gene regulatory networks of mouse embryonic stem cells. PLoS Comput Biol 2017; 13:e1005713. [PMID: 28863148 PMCID: PMC5599049 DOI: 10.1371/journal.pcbi.1005713] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2017] [Revised: 09/14/2017] [Accepted: 08/04/2017] [Indexed: 11/20/2022] Open
Abstract
The capacity of pluripotent embryonic stem cells to differentiate into any cell type in the body makes them invaluable in the field of regenerative medicine. However, because of the complexity of both the core pluripotency network and the process of cell fate computation it is not yet possible to control the fate of stem cells. We present a theoretical model of stem cell fate computation that is based on Halley and Winkler’s Branching Process Theory (BPT) and on Greaves et al.’s agent-based computer simulation derived from that theoretical model. BPT abstracts the complex production and action of a Transcription Factor (TF) into a single critical branching process that may dissipate, maintain, or become supercritical. Here we take the single TF model and extend it to multiple interacting TFs, and build an agent-based simulation of multiple TFs to investigate the dynamics of such coupled systems. We have developed the simulation and the theoretical model together, in an iterative manner, with the aim of obtaining a deeper understanding of stem cell fate computation, in order to influence experimental efforts, which may in turn influence the outcome of cellular differentiation. The model used is an example of self-organization and could be more widely applicable to the modelling of other complex systems. The simulation based on this model, though currently limited in scope in terms of the biology it represents, supports the utility of the Halley and Winkler branching process model in describing the behaviour of stem cell gene regulatory networks. Our simulation demonstrates three key features: (i) the existence of a critical value of the branching process parameter, dependent on the details of the cistrome in question; (ii) the ability of an active cistrome to “ignite” an otherwise fully dissipated cistrome, and drive it to criticality; (iii) how coupling cistromes together can reduce their critical branching parameter values needed to drive them to criticality. Pluripotent stem cells possess the capacity both to renew themselves indefinitely and to differentiate to any cell type in the body. Thus the ability to direct stem cell differentiation would have immense potential in regenerative medicine. There is a massive amount of biological data relevant to stem cells; here we exploit data relating to stem cell differentiation to help understand cell behaviour and complexity. These cells contain a dynamic, non-equilibrium network of genes regulated in part by transcription factors expressed by the network itself. Here we take an existing theoretical framework, Transcription Factor Branching Processes, which explains how these genetic networks can have critical behaviour, and can tip between low and full expression. We use this theory as the basis for the design and implementation of a computational simulation platform, which we then use to run a variety of simulation experiments, to gain a better understanding how these various transcription factors can combine, interact, and influence each other. The simulation parameters are derived from experimental data relating to the core factors in pluripotent stem cell differentiation. The simulation results determine the critical values of branching process parameters, and how these are modulated by the various interacting transcription factors.
Collapse
Affiliation(s)
- Richard B. Greaves
- York Centre for Complex Systems Analysis, University of York, York, United Kingdom
| | - Sabine Dietmann
- Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, United Kingdom
| | - Austin Smith
- Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, United Kingdom
| | - Susan Stepney
- York Centre for Complex Systems Analysis, University of York, York, United Kingdom
- * E-mail:
| | - Julianne D. Halley
- York Centre for Complex Systems Analysis, University of York, York, United Kingdom
| |
Collapse
|
3
|
Kurum E, Benayoun BA, Malhotra A, George J, Ucar D. Computational inference of a genomic pluripotency signature in human and mouse stem cells. Biol Direct 2016; 11:47. [PMID: 27639379 PMCID: PMC5027095 DOI: 10.1186/s13062-016-0148-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Accepted: 09/03/2016] [Indexed: 12/18/2022] Open
Abstract
UNLABELLED Recent analyses of next-generation sequencing datasets have shown that cell-specific regulatory elements in stem cells are marked with distinguishable patterns of transcription factor (TF) binding and epigenetic marks. For example, we recently demonstrated that promoters of cell-specific genes are covered with expanded trimethylation of histone H3 at lysine 4 (H3K4me3) marks (i.e., broad H3K4me3 domains). Moreover, binding of specific TFs, such as OCT4, NANOG, and SOX2, have been shown to play a critical role in maintaining the pluripotency of stem cells. Despite these observations, a systematic exploration of genomic and epigenomic features of stem-cell-specific gene promoters has not been conducted. Advanced machine-learning models can capture distinguishable genomic and epigenomic characteristics of stem-cell-specific promoters by taking advantage of the wealth of publicly available datasets. Here, we propose a three-step framework to discover novel data characteristics of high-throughput next generation sequencing datasets that distinguish pluripotency genes in human and mouse embryonic stem cells (ESCs). Our framework involves: i) feature extraction to identify novel features of genomic datasets; ii) feature selection using a logistic regression model combined with the Least Absolute Shrinkage and Selection Operator (LASSO) method to find the most critical datasets and features; and iii) cross validation with features selected using LASSO method to assess the predictive power of selected data features in distinguishing pluripotency genes. We show that specific epigenetic marks, and specific features of these marks, are enriched at pluripotency gene promoters. Moreover, we also assess both the individual and combined effect of TF binding, epigenetic mark deposition, gene expression datasets for marking pluripotency genes. Our findings are consistent with the existence of a conserved, complex and integrative genomic signature in ESCs that can be exploited to flag important candidate pluripotency genes. They also validate our computational framework for fostering a deeper understanding of genomic datasets in stem cells, in the future, could be extended to study cell-type-specific genomic landscapes in other cell types. REVIEWERS This article was reviewed by Zoltan Gaspari and Piotr Zielenkiewicz.
Collapse
Affiliation(s)
- Esra Kurum
- Department of Statistics, University of California, Riverside, Riverside, CA, USA
| | | | - Ankit Malhotra
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, 06032, USA
| | - Joshy George
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, 06032, USA
| | - Duygu Ucar
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, 06032, USA.
| |
Collapse
|
4
|
Lim CY, Wang H, Woodhouse S, Piterman N, Wernisch L, Fisher J, Göttgens B. BTR: training asynchronous Boolean models using single-cell expression data. BMC Bioinformatics 2016; 17:355. [PMID: 27600248 PMCID: PMC5012073 DOI: 10.1186/s12859-016-1235-y] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 09/01/2016] [Indexed: 12/25/2022] Open
Abstract
Background Rapid technological innovation for the generation of single-cell genomics data presents new challenges and opportunities for bioinformatics analysis. One such area lies in the development of new ways to train gene regulatory networks. The use of single-cell expression profiling technique allows the profiling of the expression states of hundreds of cells, but these expression states are typically noisier due to the presence of technical artefacts such as drop-outs. While many algorithms exist to infer a gene regulatory network, very few of them are able to harness the extra expression states present in single-cell expression data without getting adversely affected by the substantial technical noise present. Results Here we introduce BTR, an algorithm for training asynchronous Boolean models with single-cell expression data using a novel Boolean state space scoring function. BTR is capable of refining existing Boolean models and reconstructing new Boolean models by improving the match between model prediction and expression data. We demonstrate that the Boolean scoring function performed favourably against the BIC scoring function for Bayesian networks. In addition, we show that BTR outperforms many other network inference algorithms in both bulk and single-cell synthetic expression data. Lastly, we introduce two case studies, in which we use BTR to improve published Boolean models in order to generate potentially new biological insights. Conclusions BTR provides a novel way to refine or reconstruct Boolean models using single-cell expression data. Boolean model is particularly useful for network reconstruction using single-cell data because it is more robust to the effect of drop-outs. In addition, BTR does not assume any relationship in the expression states among cells, it is useful for reconstructing a gene regulatory network with as few assumptions as possible. Given the simplicity of Boolean models and the rapid adoption of single-cell genomics by biologists, BTR has the potential to make an impact across many fields of biomedical research. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1235-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chee Yee Lim
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge, CB2 0XY, UK
| | - Huange Wang
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge, CB2 0XY, UK
| | - Steven Woodhouse
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge, CB2 0XY, UK
| | - Nir Piterman
- Department of Computer Science, University of Leicester, Leicester, UK
| | | | - Jasmin Fisher
- Microsoft Research Cambridge, Cambridge, UK.,Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Berthold Göttgens
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge, CB2 0XY, UK.
| |
Collapse
|
5
|
Pezzarossa A, Guedes AMV, Henrique D, Abranches E. Imaging Pluripotency: Time-Lapse Analysis of Mouse Embryonic Stem Cells. Methods Mol Biol 2016; 1341:87-100. [PMID: 26162772 DOI: 10.1007/7651_2015_255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The current view of the pluripotent state is that of a transient, dynamic state, maintained by the balance between opposing cues. Understanding how this dynamic state is established in pluripotent cells and how it relates to gene expression is essential to obtain a more detailed description of the pluripotent state.In this chapter, we describe how to study the dynamic expression of a core pluripotency gene regulator-Nanog-by exploiting single-cell time-lapse imaging of a reporter mESC line grown in different cell culture media. We further describe an automated image analysis method and discuss how to extract information from the generated quantitative time-course data.
Collapse
Affiliation(s)
- Anna Pezzarossa
- Faculdade de Medicina da Universidade de Lisboa, Instituto de Medicina Molecular and Instituto de Histologia e Biologia do Desenvolvimento, Av. Prof. Egas Moniz, 1649-028, Lisbon, Portugal
| | - Ana M V Guedes
- Faculdade de Medicina da Universidade de Lisboa, Instituto de Medicina Molecular and Instituto de Histologia e Biologia do Desenvolvimento, Av. Prof. Egas Moniz, 1649-028, Lisbon, Portugal
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Avenida Brasilia-Doca de Pedrouços, 1400-038, Lisbon, Portugal
| | - Domingos Henrique
- Faculdade de Medicina da Universidade de Lisboa, Instituto de Medicina Molecular and Instituto de Histologia e Biologia do Desenvolvimento, Av. Prof. Egas Moniz, 1649-028, Lisbon, Portugal
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Avenida Brasilia-Doca de Pedrouços, 1400-038, Lisbon, Portugal
| | - Elsa Abranches
- Faculdade de Medicina da Universidade de Lisboa, Instituto de Medicina Molecular and Instituto de Histologia e Biologia do Desenvolvimento, Av. Prof. Egas Moniz, 1649-028, Lisbon, Portugal.
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Avenida Brasilia-Doca de Pedrouços, 1400-038, Lisbon, Portugal.
| |
Collapse
|
6
|
|
7
|
Benayoun BA, Pollina EA, Ucar D, Mahmoudi S, Karra K, Wong ED, Devarajan K, Daugherty AC, Kundaje AB, Mancini E, Hitz BC, Gupta R, Rando TA, Baker JC, Snyder MP, Cherry JM, Brunet A. H3K4me3 breadth is linked to cell identity and transcriptional consistency. Cell 2014; 158:673-88. [PMID: 25083876 PMCID: PMC4137894 DOI: 10.1016/j.cell.2014.06.027] [Citation(s) in RCA: 348] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Revised: 04/03/2014] [Accepted: 06/10/2014] [Indexed: 12/15/2022]
Abstract
Trimethylation of histone H3 at lysine 4 (H3K4me3) is a chromatin modification known to mark the transcription start sites of active genes. Here, we show that H3K4me3 domains that spread more broadly over genes in a given cell type preferentially mark genes that are essential for the identity and function of that cell type. Using the broadest H3K4me3 domains as a discovery tool in neural progenitor cells, we identify novel regulators of these cells. Machine learning models reveal that the broadest H3K4me3 domains represent a distinct entity, characterized by increased marks of elongation. The broadest H3K4me3 domains also have more paused polymerase at their promoters, suggesting a unique transcriptional output. Indeed, genes marked by the broadest H3K4me3 domains exhibit enhanced transcriptional consistency and [corrected] increased transcriptional levels, and perturbation of H3K4me3 breadth leads to changes in transcriptional consistency. Thus, H3K4me3 breadth contains information that could ensure transcriptional precision at key cell identity/function genes.
Collapse
Affiliation(s)
- Bérénice A Benayoun
- Department of Genetics, Stanford University, Stanford CA 94305, USA; Paul F. Glenn Laboratories for the Biology of Aging, Stanford University, Stanford CA 94305, USA
| | - Elizabeth A Pollina
- Department of Genetics, Stanford University, Stanford CA 94305, USA; Cancer Biology Program, Stanford University, Stanford CA 94305, USA
| | - Duygu Ucar
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | - Salah Mahmoudi
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | - Edith D Wong
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | | | | | - Anshul B Kundaje
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | - Elena Mancini
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | - Rakhi Gupta
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | - Thomas A Rando
- Paul F. Glenn Laboratories for the Biology of Aging, Stanford University, Stanford CA 94305, USA; Department of Neurology and Neurological Sciences, Stanford University, Stanford CA 94305, USA; RR&D REAP, VA Palo Alto Health Care Systems, Palo Alto, CA 94304,USA
| | - Julie C Baker
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford CA 94305, USA
| | - Anne Brunet
- Department of Genetics, Stanford University, Stanford CA 94305, USA; Paul F. Glenn Laboratories for the Biology of Aging, Stanford University, Stanford CA 94305, USA; Cancer Biology Program, Stanford University, Stanford CA 94305, USA.
| |
Collapse
|
8
|
Xu H, Baroukh C, Dannenfelser R, Chen EY, Tan CM, Kou Y, Kim YE, Lemischka IR, Ma'ayan A. ESCAPE: database for integrating high-content published data collected from human and mouse embryonic stem cells. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat045. [PMID: 23794736 PMCID: PMC3689438 DOI: 10.1093/database/bat045] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
High content studies that profile mouse and human embryonic stem cells (m/hESCs) using various genome-wide technologies such as transcriptomics and proteomics are constantly being published. However, efforts to integrate such data to obtain a global view of the molecular circuitry in m/hESCs are lagging behind. Here, we present an m/hESC-centered database called Embryonic Stem Cell Atlas from Pluripotency Evidence integrating data from many recent diverse high-throughput studies including chromatin immunoprecipitation followed by deep sequencing, genome-wide inhibitory RNA screens, gene expression microarrays or RNA-seq after knockdown (KD) or overexpression of critical factors, immunoprecipitation followed by mass spectrometry proteomics and phosphoproteomics. The database provides web-based interactive search and visualization tools that can be used to build subnetworks and to identify known and novel regulatory interactions across various regulatory layers. The web-interface also includes tools to predict the effects of combinatorial KDs by additive effects controlled by sliders, or through simulation software implemented in MATLAB. Overall, the Embryonic Stem Cell Atlas from Pluripotency Evidence database is a comprehensive resource for the stem cell systems biology community. Database URL: http://www.maayanlab.net/ESCAPE
Collapse
Affiliation(s)
- Huilei Xu
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Altman RB. Translational bioinformatics: linking the molecular world to the clinical world. Clin Pharmacol Ther 2012; 91:994-1000. [PMID: 22549287 DOI: 10.1038/clpt.2012.49] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Translational bioinformatics represents the union of translational medicine and bioinformatics. Translational medicine moves basic biological discoveries from the research bench into the patient-care setting and uses clinical observations to inform basic biology. It focuses on patient care, including the creation of new diagnostics, prognostics, prevention strategies, and therapies based on biological discoveries. Bioinformatics involves algorithms to represent, store, and analyze basic biological data, including DNA sequence, RNA expression, and protein and small-molecule abundance within cells. Translational bioinformatics spans these two fields; it involves the development of algorithms to analyze basic molecular and cellular data with an explicit goal of affecting clinical care.
Collapse
Affiliation(s)
- R B Altman
- Department of Bioengineering, Stanford University, Stanford, California, USA.
| |
Collapse
|
10
|
Clark NR, Dannenfelser R, Tan CM, Komosinski ME, Ma'ayan A. Sets2Networks: network inference from repeated observations of sets. BMC SYSTEMS BIOLOGY 2012; 6:89. [PMID: 22824380 PMCID: PMC3443648 DOI: 10.1186/1752-0509-6-89] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 06/25/2012] [Indexed: 11/10/2022]
Abstract
BACKGROUND The skeleton of complex systems can be represented as networks where vertices represent entities, and edges represent the relations between these entities. Often it is impossible, or expensive, to determine the network structure by experimental validation of the binary interactions between every vertex pair. It is usually more practical to infer the network from surrogate observations. Network inference is the process by which an underlying network of relations between entities is determined from indirect evidence. While many algorithms have been developed to infer networks from quantitative data, less attention has been paid to methods which infer networks from repeated co-occurrence of entities in related sets. This type of data is ubiquitous in the field of systems biology and in other areas of complex systems research. Hence, such methods would be of great utility and value. RESULTS Here we present a general method for network inference from repeated observations of sets of related entities. Given experimental observations of such sets, we infer the underlying network connecting these entities by generating an ensemble of networks consistent with the data. The frequency of occurrence of a given link throughout this ensemble is interpreted as the probability that the link is present in the underlying real network conditioned on the data. Exponential random graphs are used to generate and sample the ensemble of consistent networks, and we take an algorithmic approach to numerically execute the inference method. The effectiveness of the method is demonstrated on synthetic data before employing this inference approach to problems in systems biology and systems pharmacology, as well as to construct a co-authorship collaboration network. We predict direct protein-protein interactions from high-throughput mass-spectrometry proteomics, integrate data from Chip-seq and loss-of-function/gain-of-function followed by expression data to infer a network of associations between pluripotency regulators, extract a network that connects 53 cancer drugs to each other and to 34 severe adverse events by mining the FDA's Adverse Events Reporting Systems (AERS), and construct a co-authorship network that connects Mount Sinai School of Medicine investigators. The predicted networks and online software to create networks from entity-set libraries are provided online at http://www.maayanlab.net/S2N. CONCLUSIONS The network inference method presented here can be applied to resolve different types of networks in current systems biology and systems pharmacology as well as in other fields of research.
Collapse
Affiliation(s)
- Neil R Clark
- Department of Pharmacology and Systems Therapeutics, Systems Biology Center of New York (SBCNY), Mount Sinai School of Medicine, One Gustave L, Levy Place, Box 1215, New York, NY 10029, USA
| | | | | | | | | |
Collapse
|
11
|
Karlebach G, Shamir R. Constructing logical models of gene regulatory networks by integrating transcription factor-DNA interactions with expression data: an entropy-based approach. J Comput Biol 2012; 19:30-41. [PMID: 22216865 DOI: 10.1089/cmb.2011.0100] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Models of gene regulatory networks (GRNs) attempt to explain the complex processes that determine cells' behavior, such as differentiation, metabolism, and the cell cycle. The advent of high-throughput data generation technologies has allowed researchers to fit theoretical models to experimental data on gene-expression profiles. GRNs are often represented using logical models. These models require that real-valued measurements be converted to discrete levels, such as on/off, but the discretization often introduces inconsistencies into the data. Dimitrova et al. posed the problem of efficiently finding a parsimonious resolution of the introduced inconsistencies. We show that reconstruction of a logical GRN that minimizes the errors is NP-complete, so that an efficient exact algorithm for the problem is not likely to exist. We present a probabilistic formulation of the problem that circumvents discretization of expression data. We phrase the problem of error reduction as a minimum entropy problem, develop a heuristic algorithm for it, and evaluate its performance on mouse embryonic stem cell data. The constructed model displays high consistency with prior biological knowledge. Despite the oversimplification of a discrete model, we show that it is superior to raw experimental measurements and demonstrates a highly significant level of identical regulatory logic among co-regulated genes. A software implementing the method is freely available at: http://acgt.cs.tau.ac.il/modent.
Collapse
Affiliation(s)
- Guy Karlebach
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | | |
Collapse
|
12
|
Cassar PA, Stanford WL. Integrating post-transcriptional regulation into the embryonic stem cell gene regulatory network. J Cell Physiol 2012; 227:439-49. [PMID: 21503874 DOI: 10.1002/jcp.22787] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Stem cell behavior is orchestrated as a multilayered, concert of gene regulatory mechanisms collectively referred to as the gene regulatory network (GRN). Via cooperative mechanisms, transcriptional, epigenetic, and post-transcriptional regulators activate and repress gene expression to finely regulate stem cell self-renewal and commitment. Due to their tractability, embryonic stem cells (ESCs) serve as the model stem cell to dissect the complexities of the GRN, and discern its relation to stem cell fate. By way of high-throughput genomic analysis, targets of individual gene regulators have been established in ESCs. The compilation of these discrete networks has revealed convergent, multi-dimensional gene regulatory mechanisms involving transcription factors, epigenetic modifiers, non-coding RNA (ncRNA), and RNA-binding proteins. Here we highlight the seminal genomic studies that have shaped our understanding of the ESC GRN and describe alternate post-transcriptional gene regulatory mechanisms that require in depth analyses to draft networks that fully model ESC behavior.
Collapse
Affiliation(s)
- Paul A Cassar
- Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada
| | | |
Collapse
|
13
|
Abstract
A review of 2010 research in translational bioinformatics provides much to marvel at. We have seen notable advances in personal genomics, pharmacogenetics, and sequencing. At the same time, the infrastructure for the field has burgeoned. While acknowledging that, according to researchers, the members of this field tend to be overly optimistic, the authors predict a bright future.
Collapse
Affiliation(s)
- Russ B Altman
- Department of Bioengineering, Stanford University School of Medicine, Stanford, California 94305-5444, USA.
| | | |
Collapse
|
14
|
Williamson AJK, Whetton AD. The requirement for proteomics to unravel stem cell regulatory mechanisms. J Cell Physiol 2011; 226:2478-83. [PMID: 21792904 DOI: 10.1002/jcp.22610] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Stem cells are defined by their ability to self-renew and to differentiate, the processes whereby these events are achieved is the subject of much investigation. These studies include cancer stem cell populations, where eradication of this specific population is the ultimate goal of treatment. Whilst cellular signalling events and transcription factor complex-mediated changes in gene expression have been analysed in some detail within stem cells, full systematic understanding of the events promoting self-renewal or the commitment process leading to formation of a specific cell type require a systems biology approach. This in turn demands a need for proteomic analysis of post-translational regulation of protein levels, protein interactions, protein post-translational modification (e.g. ubiquitination, methylation, acetylation, phosphorylation) to identify networks for stem cell regulation. Furthermore, the phenomenon of induced pluripotency via cellular reprogramming also can be understood optimally using combined molecular biology and proteomics approaches; here we describe current research employing proteomics and mass spectrometry to dissect stem cell regulatory mechanisms.
Collapse
Affiliation(s)
- Andrew J K Williamson
- Stem Cell and Leukaemia Proteomics Laboratory, School of Cancer and Enabling Sciences, Manchester Academic Health Science Centre, The University of Manchester, Christie's NHS Foundation Trust, Wolfson Molecular Imaging Centre, Withington, Manchester, UK.
| | | |
Collapse
|
15
|
Xu H, Lemischka IR, Ma'ayan A. SVM classifier to predict genes important for self-renewal and pluripotency of mouse embryonic stem cells. BMC SYSTEMS BIOLOGY 2010; 4:173. [PMID: 21176149 PMCID: PMC3019180 DOI: 10.1186/1752-0509-4-173] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2010] [Accepted: 12/21/2010] [Indexed: 11/10/2022]
Abstract
Background Mouse embryonic stem cells (mESCs) are derived from the inner cell mass of a developing blastocyst and can be cultured indefinitely in-vitro. Their distinct features are their ability to self-renew and to differentiate to all adult cell types. Genes that maintain mESCs self-renewal and pluripotency identity are of interest to stem cell biologists. Although significant steps have been made toward the identification and characterization of such genes, the list is still incomplete and controversial. For example, the overlap among candidate self-renewal and pluripotency genes across different RNAi screens is surprisingly small. Meanwhile, machine learning approaches have been used to analyze multi-dimensional experimental data and integrate results from many studies, yet they have not been applied to specifically tackle the task of predicting and classifying self-renewal and pluripotency gene membership. Results For this study we developed a classifier, a supervised machine learning framework for predicting self-renewal and pluripotency mESCs stemness membership genes (MSMG) using support vector machines (SVM). The data used to train the classifier was derived from mESCs-related studies using mRNA microarrays, measuring gene expression in various stages of early differentiation, as well as ChIP-seq studies applied to mESCs profiling genome-wide binding of key transcription factors, such as Nanog, Oct4, and Sox2, to the regulatory regions of other genes. Comparison to other classification methods using the leave-one-out cross-validation method was employed to evaluate the accuracy and generality of the classification. Finally, two sets of candidate genes from genome-wide RNA interference screens are used to test the generality and potential application of the classifier. Conclusions Our results reveal that an SVM approach can be useful for prioritizing genes for functional validation experiments and complement the analyses of high-throughput profiling experimental data in stem cell research.
Collapse
Affiliation(s)
- Huilei Xu
- Department of Pharmacology and System Therapeutics, Mount Sinai School of Medicine, 1 Gustave L, Levy Place, New York, New York 10029, USA
| | | | | |
Collapse
|
16
|
Som A, Harder C, Greber B, Siatkowski M, Paudel Y, Warsow G, Cap C, Schöler H, Fuellen G. The PluriNetWork: an electronic representation of the network underlying pluripotency in mouse, and its applications. PLoS One 2010; 5:e15165. [PMID: 21179244 PMCID: PMC3003487 DOI: 10.1371/journal.pone.0015165] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2010] [Accepted: 10/27/2010] [Indexed: 12/20/2022] Open
Abstract
Background Analysis of the mechanisms underlying pluripotency and reprogramming would benefit substantially from easy access to an electronic network of genes, proteins and mechanisms. Moreover, interpreting gene expression data needs to move beyond just the identification of the up-/downregulation of key genes and of overrepresented processes and pathways, towards clarifying the essential effects of the experiment in molecular terms. Methodology/Principal Findings We have assembled a network of 574 molecular interactions, stimulations and inhibitions, based on a collection of research data from 177 publications until June 2010, involving 274 mouse genes/proteins, all in a standard electronic format, enabling analyses by readily available software such as Cytoscape and its plugins. The network includes the core circuit of Oct4 (Pou5f1), Sox2 and Nanog, its periphery (such as Stat3, Klf4, Esrrb, and c-Myc), connections to upstream signaling pathways (such as Activin, WNT, FGF, BMP, Insulin, Notch and LIF), and epigenetic regulators as well as some other relevant genes/proteins, such as proteins involved in nuclear import/export. We describe the general properties of the network, as well as a Gene Ontology analysis of the genes included. We use several expression data sets to condense the network to a set of network links that are affected in the course of an experiment, yielding hypotheses about the underlying mechanisms. Conclusions/Significance We have initiated an electronic data repository that will be useful to understand pluripotency and to facilitate the interpretation of high-throughput data. To keep up with the growth of knowledge on the fundamental processes of pluripotency and reprogramming, we suggest to combine Wiki and social networking software towards a community curation system that is easy to use and flexible, and tailored to provide a benefit for the scientist, and to improve communication and exchange of research results. A PluriNetWork tutorial is available at http://www.ibima.med.uni-rostock.de/IBIMA/PluriNetWork/.
Collapse
Affiliation(s)
- Anup Som
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Rostock, Germany
| | - Clemens Harder
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Rostock, Germany
| | - Boris Greber
- Department of Cell and Developmental Biology, Max Planck Institute for Molecular Biomedicine, Münster, Germany
| | - Marcin Siatkowski
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Rostock, Germany
- DZNE, German Center for Neurodegenerative Diseases, Rostock, Germany
| | - Yogesh Paudel
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Rostock, Germany
| | - Gregor Warsow
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Rostock, Germany
- Institute for Anatomy and Cell Biology, Ernst Moritz Arndt University Greifswald, Greifswald, Germany
- Department of Mathematics and Informatics, Ernst Moritz Arndt University Greifswald, Greifswald, Germany
| | - Clemens Cap
- Department of Computer Science, University of Rostock, Rostock, Germany
| | - Hans Schöler
- Department of Cell and Developmental Biology, Max Planck Institute for Molecular Biomedicine, Münster, Germany
- Medical Faculty, University of Münster, Münster, Germany
| | - Georg Fuellen
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Rostock, Germany
- * E-mail:
| |
Collapse
|
17
|
Warsow G, Greber B, Falk SSI, Harder C, Siatkowski M, Schordan S, Som A, Endlich N, Schöler H, Repsilber D, Endlich K, Fuellen G. ExprEssence--revealing the essence of differential experimental data in the context of an interaction/regulation net-work. BMC SYSTEMS BIOLOGY 2010; 4:164. [PMID: 21118483 PMCID: PMC3012047 DOI: 10.1186/1752-0509-4-164] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2010] [Accepted: 11/30/2010] [Indexed: 12/15/2022]
Abstract
Background Experimentalists are overwhelmed by high-throughput data and there is an urgent need to condense information into simple hypotheses. For example, large amounts of microarray and deep sequencing data are becoming available, describing a variety of experimental conditions such as gene knockout and knockdown, the effect of interventions, and the differences between tissues and cell lines. Results To address this challenge, we developed a method, implemented as a Cytoscape plugin called ExprEssence. As input we take a network of interaction, stimulation and/or inhibition links between genes/proteins, and differential data, such as gene expression data, tracking an intervention or development in time. We condense the network, highlighting those links across which the largest changes can be observed. Highlighting is based on a simple formula inspired by the law of mass action. We can interactively modify the threshold for highlighting and instantaneously visualize results. We applied ExprEssence to three scenarios describing kidney podocyte biology, pluripotency and ageing: 1) We identify putative processes involved in podocyte (de-)differentiation and validate one prediction experimentally. 2) We predict and validate the expression level of a transcription factor involved in pluripotency. 3) Finally, we generate plausible hypotheses on the role of apoptosis, cell cycle deregulation and DNA repair in ageing data obtained from the hippocampus. Conclusion Reducing the size of gene/protein networks to the few links affected by large changes allows to screen for putative mechanistic relationships among the genes/proteins that are involved in adaptation to different experimental conditions, yielding important hypotheses, insights and suggestions for new experiments. We note that we do not focus on the identification of 'active subnetworks'. Instead we focus on the identification of single links (which may or may not form subnetworks), and these single links are much easier to validate experimentally than submodules. ExprEssence is available at http://sourceforge.net/projects/expressence/.
Collapse
Affiliation(s)
- Gregor Warsow
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Ernst-Heydemann-Strasse 8, Rostock, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|