1
|
Laber S, Strobel S, Mercader JM, Dashti H, dos Santos FR, Kubitz P, Jackson M, Ainbinder A, Honecker J, Agrawal S, Garborcauskas G, Stirling DR, Leong A, Figueroa K, Sinnott-Armstrong N, Kost-Alimova M, Deodato G, Harney A, Way GP, Saadat A, Harken S, Reibe-Pal S, Ebert H, Zhang Y, Calabuig-Navarro V, McGonagle E, Stefek A, Dupuis J, Cimini BA, Hauner H, Udler MS, Carpenter AE, Florez JC, Lindgren C, Jacobs SB, Claussnitzer M. Discovering cellular programs of intrinsic and extrinsic drivers of metabolic traits using LipocyteProfiler. CELL GENOMICS 2023; 3:100346. [PMID: 37492099 PMCID: PMC10363917 DOI: 10.1016/j.xgen.2023.100346] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 08/22/2022] [Accepted: 05/26/2023] [Indexed: 07/27/2023]
Abstract
A primary obstacle in translating genetic associations with disease into therapeutic strategies is elucidating the cellular programs affected by genetic risk variants and effector genes. Here, we introduce LipocyteProfiler, a cardiometabolic-disease-oriented high-content image-based profiling tool that enables evaluation of thousands of morphological and cellular profiles that can be systematically linked to genes and genetic variants relevant to cardiometabolic disease. We show that LipocyteProfiler allows surveillance of diverse cellular programs by generating rich context- and process-specific cellular profiles across hepatocyte and adipocyte cell-state transitions. We use LipocyteProfiler to identify known and novel cellular mechanisms altered by polygenic risk of metabolic disease, including insulin resistance, fat distribution, and the polygenic contribution to lipodystrophy. LipocyteProfiler paves the way for large-scale forward and reverse deep phenotypic profiling in lipocytes and provides a framework for the unbiased identification of causal relationships between genetic variants and cellular programs relevant to human disease.
Collapse
Affiliation(s)
- Samantha Laber
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7FZ, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Sophie Strobel
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Institute of Nutritional Medicine, School of Medicine, Technical University of Munich, 85354 Freising-Weihenstephan, Germany
| | - Josep M. Mercader
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Hesam Dashti
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Felipe R.C. dos Santos
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Phil Kubitz
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Else Kröner-Fresenius-Centre for Nutritional Medicine, School of Life Sciences, Technical University of Munich, 85354 Freising-Weihenstephan, Germany
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Maya Jackson
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alina Ainbinder
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Julius Honecker
- Else Kröner-Fresenius-Centre for Nutritional Medicine, School of Life Sciences, Technical University of Munich, 85354 Freising-Weihenstephan, Germany
| | - Saaket Agrawal
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Garrett Garborcauskas
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - David R. Stirling
- Imaging Platform, Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aaron Leong
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Katherine Figueroa
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nasa Sinnott-Armstrong
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Stanford University, San Francisco, CA, USA
| | - Maria Kost-Alimova
- Imaging Platform, Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Giacomo Deodato
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alycen Harney
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Gregory P. Way
- Imaging Platform, Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alham Saadat
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Sierra Harken
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Saskia Reibe-Pal
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7FZ, UK
| | - Hannah Ebert
- Institute of Nutritional Science, University Hohenheim, 70599 Stuttgart, Germany
| | - Yixin Zhang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Virtu Calabuig-Navarro
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Institute of Nutritional Science, University Hohenheim, 70599 Stuttgart, Germany
| | - Elizabeth McGonagle
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Adam Stefek
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1G1, Canada
| | - Beth A. Cimini
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Hans Hauner
- Institute of Nutritional Medicine, School of Medicine, Technical University of Munich, 85354 Freising-Weihenstephan, Germany
- Else Kröner-Fresenius-Centre for Nutritional Medicine, School of Life Sciences, Technical University of Munich, 85354 Freising-Weihenstephan, Germany
- German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany
| | - Miriam S. Udler
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Anne E. Carpenter
- Imaging Platform, Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jose C. Florez
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Cecilia Lindgren
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7FZ, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Suzanne B.R. Jacobs
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Melina Claussnitzer
- Programs in Metabolism and Medical and Population Genetics, Type 2 Diabetes Systems Genomics Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
2
|
Rams M, Conrad TOF. Dictionary learning allows model-free pseudotime estimation of transcriptomic data. BMC Genomics 2022; 23:56. [PMID: 35033004 PMCID: PMC8760643 DOI: 10.1186/s12864-021-08276-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 12/22/2021] [Indexed: 11/10/2022] Open
Abstract
Background Pseudotime estimation from dynamic single-cell transcriptomic data enables characterisation and understanding of the underlying processes, for example developmental processes. Various pseudotime estimation methods have been proposed during the last years. Typically, these methods start with a dimension reduction step because the low-dimensional representation is usually easier to analyse. Approaches such as PCA, ICA or t-SNE belong to the most widely used methods for dimension reduction in pseudotime estimation methods. However, these methods usually make assumptions on the derived dimensions, which can result in important dataset properties being missed. In this paper, we suggest a new dictionary learning based approach, dynDLT, for dimension reduction and pseudotime estimation of dynamic transcriptomic data. Dictionary learning is a matrix factorisation approach that does not restrict the dependence of the derived dimensions. To evaluate the performance, we conduct a large simulation study and analyse 8 real-world datasets. Results The simulation studies reveal that firstly, dynDLT preserves the simulated patterns in low-dimension and the pseudotimes can be derived from the low-dimensional representation. Secondly, the results show that dynDLT is suitable for the detection of genes exhibiting the simulated dynamic patterns, thereby facilitating the interpretation of the compressed representation and thus the dynamic processes. For the real-world data analysis, we select datasets with samples that are taken at different time points throughout an experiment. The pseudotimes found by dynDLT have high correlations with the experimental times. We compare the results to other approaches used in pseudotime estimation, or those that are method-wise closely connected to dictionary learning: ICA, NMF, PCA, t-SNE, and UMAP. DynDLT has the best overall performance for the simulated and real-world datasets. Conclusions We introduce dynDLT, a method that is suitable for pseudotime estimation. Its main advantages are: (1) It presents a model-free approach, meaning that it does not restrict the dependence of the derived dimensions; (2) Genes that are relevant in the detected dynamic processes can be identified from the dictionary matrix; (3) By a restriction of the dictionary entries to positive values, the dictionary atoms are highly interpretable. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-08276-9).
Collapse
Affiliation(s)
- Mona Rams
- Freie Universitaet Berlin, Arnimallee 6, Berlin, 14195, Germany.
| | - Tim O F Conrad
- Konrad-Zuse-Zentrum für Informationstechnik Berlin, Takustraße 7, Berlin, 14195, Germany
| |
Collapse
|
3
|
Jamalkandi SA, Kouhsar M, Salimian J, Ahmadi A. The identification of co-expressed gene modules in Streptococcus pneumonia from colonization to infection to predict novel potential virulence genes. BMC Microbiol 2020; 20:376. [PMID: 33334315 PMCID: PMC7745498 DOI: 10.1186/s12866-020-02059-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 12/02/2020] [Indexed: 11/14/2022] Open
Abstract
Background Streptococcus pneumonia (pneumococcus) is a human bacterial pathogen causing a range of mild to severe infections. The complicated transcriptome patterns of pneumococci during the colonization to infection process in the human body are usually determined by measuring the expression of essential virulence genes and the comparison of pathogenic with non-pathogenic bacteria through microarray analyses. As systems biology studies have demonstrated, critical co-expressing modules and genes may serve as key players in biological processes. Generally, Sample Progression Discovery (SPD) is a computational approach traditionally used to decipher biological progression trends and their corresponding gene modules (clusters) in different clinical samples underlying a microarray dataset. The present study aimed to investigate the bacterial gene expression pattern from colonization to severe infection periods (specimens isolated from the nasopharynx, lung, blood, and brain) to find new genes/gene modules associated with the infection progression. This strategy may lead to finding novel gene candidates for vaccines or drug design. Results The results included essential genes whose expression patterns varied in different bacterial conditions and have not been investigated in similar studies. Conclusions In conclusion, the SPD algorithm, along with differentially expressed genes detection, can offer new ways of discovering new therapeutic or vaccine targeted gene products. Supplementary Information The online version contains supplementary material available at 10.1186/s12866-020-02059-0.
Collapse
Affiliation(s)
- Sadegh Azimzadeh Jamalkandi
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Morteza Kouhsar
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Jafar Salimian
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Ali Ahmadi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
4
|
Deng AC, Sun XQ. Dynamic gene regulatory network reconstruction and analysis based on clinical transcriptomic data of colorectal cancer. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2020; 17:3224-3239. [PMID: 32987526 DOI: 10.3934/mbe.2020183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Inferring dynamic regulatory networks that rewire at different stages is a reasonable way to understand the mechanisms underlying cancer development. In this study, we reconstruct the stage-specific gene regulatory networks (GRNs) for colorectal cancer to understand dynamic changes of gene regulations along different disease stages. We combined multiple sets of clinical transcriptomic data of colorectal cancer patients and employed a supervised approach to select initial gene set for network construction. We then developed a dynamical system-based optimization method to infer dynamic GRNs by incorporating mutual information-based network sparsification and a dynamic cascade technique into an ordinary differential equations model. Dynamic GRNs at four different stages of colorectal cancer were reconstructed and analyzed. Several important genes were revealed based on the rewiring of the reconstructed GRNs. Our study demonstrated that reconstructing dynamic GRNs based on clinical transcriptomic profiling allows us to detect the dynamic trend of gene regulation as well as reveal critical genes for cancer development which may be important candidates of master regulators for further experimental test.
Collapse
Affiliation(s)
- An Cheng Deng
- School of Life Science, Sun Yat-sen University, Guangzhou 510275, China
| | - Xiao Qiang Sun
- Key Laboratory of Tropical Disease Control, Chinese Ministry of Education, Zhong-Shan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
5
|
Liang S, Wang F, Han J, Chen K. Latent periodic process inference from single-cell RNA-seq data. Nat Commun 2020; 11:1441. [PMID: 32188848 PMCID: PMC7080821 DOI: 10.1038/s41467-020-15295-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 03/03/2020] [Indexed: 11/15/2022] Open
Abstract
The development of a phenotype in a multicellular organism often involves multiple, simultaneously occurring biological processes. Advances in single-cell RNA-sequencing make it possible to infer latent developmental processes from the transcriptomic profiles of cells at various developmental stages. Accurate characterization is challenging however, particularly for periodic processes such as cell cycle. To address this, we develop Cyclum, an autoencoder approach identifying circular trajectories in the gene expression space. Cyclum substantially improves the accuracy and robustness of cell-cycle characterization beyond existing approaches. Applying Cyclum to removing cell-cycle effects substantially improves delineations of cell subpopulations, which is useful for establishing various cell atlases and studying tumor heterogeneity.
Collapse
Affiliation(s)
- Shaoheng Liang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, Houston, TX, 77005, USA.
| | - Fang Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jincheng Han
- Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
| |
Collapse
|
6
|
Pierson E, Koh PW, Hashimoto T, Koller D, Leskovec J, Eriksson N, Liang P. Inferring Multidimensional Rates of Aging from Cross-Sectional Data. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2019; 89:97-107. [PMID: 31538144 PMCID: PMC6752884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Modeling how individuals evolve over time is a fundamental problem in the natural and social sciences. However, existing datasets are often cross-sectional with each individual observed only once, making it impossible to apply traditional time-series methods. Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional data. Our model represents each individual's features over time as a nonlinear function of a low-dimensional, linearly-evolving latent state. We prove that when this nonlinear function is constrained to be order-isomorphic, the model family is identifiable solely from cross-sectional data provided the distribution of time-independent variation is known. On the UK Biobank human health dataset, our model reconstructs the observed data while learning interpretable rates of aging associated with diseases, mortality, and aging risk factors.
Collapse
|
7
|
Identification of Novel Genes in Human Airway Epithelial Cells associated with Chronic Obstructive Pulmonary Disease (COPD) using Machine-Based Learning Algorithms. Sci Rep 2018; 8:15775. [PMID: 30361509 PMCID: PMC6202402 DOI: 10.1038/s41598-018-33986-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Accepted: 10/07/2018] [Indexed: 01/26/2023] Open
Abstract
The aim of this project was to identify candidate novel therapeutic targets to facilitate the treatment of COPD using machine-based learning (ML) algorithms and penalized regression models. In this study, 59 healthy smokers, 53 healthy non-smokers and 21 COPD smokers (9 GOLD stage I and 12 GOLD stage II) were included (n = 133). 20,097 probes were generated from a small airway epithelium (SAE) microarray dataset obtained from these subjects previously. Subsequently, the association between gene expression levels and smoking and COPD, respectively, was assessed using: AdaBoost Classification Trees, Decision Tree, Gradient Boosting Machines, Naive Bayes, Neural Network, Random Forest, Support Vector Machine and adaptive LASSO, Elastic-Net, and Ridge logistic regression analyses. Using this methodology, we identified 44 candidate genes, 27 of these genes had been previously been reported as important factors in the pathogenesis of COPD or regulation of lung function. Here, we also identified 17 genes, which have not been previously identified to be associated with the pathogenesis of COPD or the regulation of lung function. The most significantly regulated of these genes included: PRKAR2B, GAD1, LINC00930 and SLITRK6. These novel genes may provide the basis for the future development of novel therapeutics in COPD and its associated morbidities.
Collapse
|
8
|
Cook D, Achanta S, Hoek JB, Ogunnaike BA, Vadigepalli R. Cellular network modeling and single cell gene expression analysis reveals novel hepatic stellate cell phenotypes controlling liver regeneration dynamics. BMC SYSTEMS BIOLOGY 2018; 12:86. [PMID: 30285726 PMCID: PMC6171157 DOI: 10.1186/s12918-018-0605-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 08/21/2018] [Indexed: 12/26/2022]
Abstract
Background Recent results from single cell gene and protein regulation studies are starting to uncover the previously underappreciated fact that individual cells within a population exhibit high variability in the expression of mRNA and proteins (i.e., molecular variability). By combining cellular network modeling, and high-throughput gene expression measurements in single cells, we seek to reconcile the high molecular variability in single cells with the relatively low variability in tissue-scale gene and protein expression and the highly coordinated functional responses of tissues to physiological challenges. In this study, we focus on relating the dynamic changes in distributions of hepatic stellate cell (HSC) functional phenotypes to the tightly regulated physiological response of liver regeneration. Results We develop a mathematical model describing contributions of HSC functional phenotype populations to liver regeneration and test model predictions through isolation and transcriptional characterization of single HSCs. We identify and characterize four HSC transcriptional states contributing to liver regeneration, two of which are described for the first time in this work. We show that HSC state populations change in vivo in response to acute challenges (in this case, 70% partial hepatectomy) and chronic challenges (chronic ethanol consumption). Our results indicate that HSCs influence the dynamics of liver regeneration through steady-state tissue preconditioning prior to an acute insult and through dynamic control of cell state balances. Furthermore, our modeling approach provides a framework to understand how balances among cell states influence tissue dynamics. Conclusions Taken together, our combined modeling and experimental studies reveal novel HSC transcriptional states and indicate that baseline differences in HSC phenotypes as well as a dynamic balance of transitions between these phenotypes control liver regeneration responses. Electronic supplementary material The online version of this article (10.1186/s12918-018-0605-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Daniel Cook
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE, USA.,Department of Pathology, Anatomy, and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Sirisha Achanta
- Department of Pathology, Anatomy, and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Jan B Hoek
- Department of Pathology, Anatomy, and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Babatunde A Ogunnaike
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE, USA
| | - Rajanikanth Vadigepalli
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE, USA. .,Department of Pathology, Anatomy, and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA.
| |
Collapse
|
9
|
Uncovering pseudotemporal trajectories with covariates from single cell and bulk expression data. Nat Commun 2018; 9:2442. [PMID: 29934517 PMCID: PMC6015076 DOI: 10.1038/s41467-018-04696-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 05/17/2018] [Indexed: 12/29/2022] Open
Abstract
Pseudotime algorithms can be employed to extract latent temporal information from cross-sectional data sets allowing dynamic biological processes to be studied in situations where the collection of time series data is challenging or prohibitive. Computational techniques have arisen from single-cell ‘omics and cancer modelling where pseudotime can be used to learn about cellular differentiation or tumour progression. However, methods to date typically implicitly assume homogeneous genetic, phenotypic or environmental backgrounds, which becomes limiting as data sets grow in size and complexity. We describe a novel statistical framework that learns how pseudotime trajectories can be modulated through covariates that encode such factors. We apply this model to both single-cell and bulk gene expression data sets and show that the approach can recover known and novel covariate-pseudotime interaction effects. This hybrid regression-latent variable model framework extends pseudotemporal modelling from its most prevalent area of single cell genomics to wider applications. Cross-sectional omic data often have non-homogeneous genetic, phenotypic, or environmental backgrounds. Here, the authors develop a statistical framework to infer pseudotime trajectories in the presence of such factors as well as their interactions in both single-cell and bulk gene expression analysis
Collapse
|
10
|
Hong CF, Chen YC, Chen WC, Tu KC, Tsai MH, Chan YK, Yu SS. Construction of diagnosis system and gene regulatory networks based on microarray analysis. J Biomed Inform 2018; 81:61-73. [PMID: 29550394 DOI: 10.1016/j.jbi.2018.03.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 01/30/2018] [Accepted: 03/12/2018] [Indexed: 01/02/2023]
Abstract
A microarray analysis generally contains expression data of thousands of genes, but most of them are irrelevant to the disease of interest, making analyzing the genes concerning specific diseases complicated. Therefore, filtering out a few essential genes as well as their regulatory networks is critical, and a disease can be easily diagnosed just depending on the expression profiles of a few critical genes. In this study, a target gene screening (TGS) system, which is a microarray-based information system that integrates F-statistics, pattern recognition matching, a two-layer K-means classifier, a Parameter Detection Genetic Algorithm (PDGA), a genetic-based gene selector (GBG selector) and the association rule, was developed to screen out a small subset of genes that can discriminate malignant stages of cancers. During the first stage, F-statistic, pattern recognition matching, and a two-layer K-means classifier were applied in the system to filter out the 20 critical genes most relevant to ovarian cancer from 9600 genes, and the PDGA was used to decide the fittest values of the parameters for these critical genes. Among the 20 critical genes, 15 are associated with cancer progression. In the second stage, we further employed a GBG selector and the association rule to screen out seven target gene sets, each with only four to six genes, and each of which can precisely identify the malignancy stage of ovarian cancer based on their expression profiles. We further deduced the gene regulatory networks of the 20 critical genes by applying the Pearson correlation coefficient to evaluate the correlationship between the expression of each gene at the same stages and at different stages. Correlationships between gene pairs were calculated, and then, three regulatory networks were deduced. Their correlationships were further confirmed by the Ingenuity pathway analysis. The prognostic significances of the genes identified via regulatory networks were examined using online tools, and most represented biomarker candidates. In summary, our proposed system provides a new strategy to identify critical genes or biomarkers, as well as their regulatory networks, from microarray data.
Collapse
Affiliation(s)
- Chun-Fu Hong
- Department of Long-Term Care, National Quemoy University, Kinmen County 892, Taiwan, ROC
| | - Ying-Chen Chen
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan, ROC
| | - Wei-Chun Chen
- Department of Management Information System, National Chung Hsing University, Taichung City 402, Taiwan, ROC
| | - Keng-Chang Tu
- Deparment of Computer Science and Engineering, National Chung Hsing University, Taichung City 402, Taiwan, ROC
| | - Meng-Hsiun Tsai
- Department of Management Information System, National Chung Hsing University, Taichung City 402, Taiwan, ROC.
| | - Yung-Kuan Chan
- Department of Management Information System, National Chung Hsing University, Taichung City 402, Taiwan, ROC.
| | - Shyr Shen Yu
- Deparment of Computer Science and Engineering, National Chung Hsing University, Taichung City 402, Taiwan, ROC
| |
Collapse
|
11
|
Data-analysis strategies for image-based cell profiling. Nat Methods 2017; 14:849-863. [PMID: 28858338 PMCID: PMC6871000 DOI: 10.1038/nmeth.4397] [Citation(s) in RCA: 402] [Impact Index Per Article: 57.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/28/2017] [Indexed: 12/16/2022]
Abstract
Image-based cell profiling is a high-throughput strategy for the quantification of phenotypic differences among a variety of cell populations. It paves the way to studying biological systems on a large scale by using chemical and genetic perturbations. The general workflow for this technology involves image acquisition with high-throughput microscopy systems and subsequent image processing and analysis. Here, we introduce the steps required to create high-quality image-based (i.e., morphological) profiles from a collection of microscopy images. We recommend techniques that have proven useful in each stage of the data analysis process, on the basis of the experience of 20 laboratories worldwide that are refining their image-based cell-profiling methodologies in pursuit of biological discovery. The recommended techniques cover alternatives that may suit various biological goals, experimental designs, and laboratories' preferences.
Collapse
|
12
|
Sun Y, Yao J, Yang L, Chen R, Nowak NJ, Goodison S. Computational approach for deriving cancer progression roadmaps from static sample data. Nucleic Acids Res 2017; 45:e69. [PMID: 28108658 PMCID: PMC5436003 DOI: 10.1093/nar/gkx003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 01/07/2017] [Indexed: 12/26/2022] Open
Abstract
As with any biological process, cancer development is inherently dynamic. While major efforts continue to catalog the genomic events associated with human cancer, it remains difficult to interpret and extrapolate the accumulating data to provide insights into the dynamic aspects of the disease. Here, we present a computational strategy that enables the construction of a cancer progression model using static tumor sample data. The developed approach overcame many technical limitations of existing methods. Application of the approach to breast cancer data revealed a linear, branching model with two distinct trajectories for malignant progression. The validity of the constructed model was demonstrated in 27 independent breast cancer data sets, and through visualization of the data in the context of disease progression we were able to identify a number of potentially key molecular events in the advance of breast cancer to malignancy.
Collapse
Affiliation(s)
- Yijun Sun
- Department of Microbiology and Immunology.,Department of Computer Science and Engineering.,Department of Biostatistics, The State University of New York, Buffalo, NY14203, USA.,Department of Biochemistry The State University of New York, Buffalo, NY14203, USA
| | - Jin Yao
- Department of Microbiology and Immunology
| | - Le Yang
- Department of Computer Science and Engineering
| | - Runpu Chen
- Department of Computer Science and Engineering
| | - Norma J Nowak
- Department of Bioinformatics and Biostatistics Roswell Park Cancer Institute, Buffalo, NY 14201, USA
| | - Steve Goodison
- Department of Health Sciences Research Mayo Clinic, Jacksonville, FL 32224, USA
| |
Collapse
|
13
|
Abstract
A combination of single-cell techniques and computational analysis enables the simultaneous discovery of cell states, lineage relationships and the genes that control developmental decisions.
Collapse
Affiliation(s)
- Xiuwei Zhang
- Department of Electrical Engineering and Computer Science and the Center for Computational Biology, University of California, Berkeley, United States.,Ragon Institute of Massachusetts General Hospital, MIT and Harvard, Cambridge, United States
| | - Nir Yosef
- Department of Electrical Engineering and Computer Science and the Center for Computational Biology, University of California, Berkeley, United States.,Ragon Institute of Massachusetts General Hospital, MIT and Harvard, Cambridge, United States
| |
Collapse
|
14
|
Eshleman R, Singh R. Reconstructing the Temporal Progression of Biological Data Using Cluster Spanning Trees. IEEE Trans Nanobioscience 2017; 16:140-147. [PMID: 28207402 DOI: 10.1109/tnb.2017.2667402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Identifying the temporal progression of a set of biological samples is crucial for comprehending the dynamics of the underlying molecular interactions. It is often also a basic step in data denoising and synchronization. Finally, identifying the progression order is crucial for problems like cell lineage identification, disease progression, tumor classification, and epidemiology and thus impacts the spectrum of disciplines spanning basic biology, drug discovery, and public health. Current methods that attempt solving this problem, face difficulty when it is necessary to factor-in complex relationships within the data, such as grouping, partial ordering or bifurcating or multifurcating progressions. We propose the notion of cluster spanning trees (CST) that can model both linear as well as the aforementioned complex progression relationships in temporally evolving data. Through a number of experimental investigations involving synthetic data sets as well as data sets from the cell cycle, cellular differentiation, phenotypic screening, and genetic variation, we show that the proposed CST approach outperforms existing methods in reconstructing the temporal progression of the data.
Collapse
|
15
|
Campbell KR, Yau C. Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference. PLoS Comput Biol 2016; 12:e1005212. [PMID: 27870852 PMCID: PMC5117567 DOI: 10.1371/journal.pcbi.1005212] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 10/13/2016] [Indexed: 11/18/2022] Open
Abstract
Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a ‘pseudotime’ where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference. Understanding the “cellular programming” that controls fundamental, dynamic biological processes is important for determining normal cellular function and potential perturbations that might give rise to physiological disorders. Ideally, investigations would employ time series experiments to periodically measure the properties of each cell. This would allow us to understand the sequence of gene (in)activations that constitute the program being followed. In practice, such experiments can be difficult to perform as cellular activity may be asynchronous with each cell occupying a different phase of the process of interested. Furthermore, the unbiased measurement of all transcripts or proteins requires the cells to be captured and lysed precluding the continued monitoring of that cell. In the absence of the ability to conduct true time series experiments, pseudotime algorithms exploit the asynchronous cellular nature of these systems to mathematically assign a “pseudotime” to each cell based on its molecular profile allowing the cells to be aligned and the sequence of gene activation events retrospectively inferred. Existing approaches predominantly use deterministic methods that ignore the statistical uncertainties associated with the problem. This paper demonstrates that this statistical uncertainty limits the temporal resolution that can be extracted from static snapshots of cell expression profiles and can also detrimentally affect downstream analysis.
Collapse
Affiliation(s)
- Kieran R. Campbell
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Christopher Yau
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
16
|
Abstract
A revolution in cellular measurement technology is under way: For the first time, we have the ability to monitor global gene regulation in thousands of individual cells in a single experiment. Such experiments will allow us to discover new cell types and states and trace their developmental origins. They overcome fundamental limitations inherent in measurements of bulk cell population that have frustrated efforts to resolve cellular states. Single-cell genomics and proteomics enable not only precise characterization of cell state, but also provide a stunningly high-resolution view of transitions between states. These measurements may finally make explicit the metaphor that C.H. Waddington posed nearly 60 years ago to explain cellular plasticity: Cells are residents of a vast “landscape” of possible states, over which they travel during development and in disease. Single-cell technology helps not only locate cells on this landscape, but illuminates the molecular mechanisms that shape the landscape itself. However, single-cell genomics is a field in its infancy, with many experimental and computational advances needed to fully realize its full potential.
Collapse
Affiliation(s)
- Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, USA
| |
Collapse
|
17
|
Abstract
High-throughput single-cell technologies provide an unprecedented view into cellular heterogeneity, yet they pose new challenges in data analysis and interpretation. In this protocol, we describe the use of Spanning-tree Progression Analysis of Density-normalized Events (SPADE), a density-based algorithm for visualizing single-cell data and enabling cellular hierarchy inference among subpopulations of similar cells. It was initially developed for flow and mass cytometry single-cell data. We describe SPADE's implementation and application using an open-source R package that runs on Mac OS X, Linux and Windows systems. A typical SPADE analysis on a 2.27-GHz processor laptop takes ∼5 min. We demonstrate the applicability of SPADE to single-cell RNA-seq data. We compare SPADE with recently developed single-cell visualization approaches based on the t-distribution stochastic neighborhood embedding (t-SNE) algorithm. We contrast the implementation and outputs of these methods for normal and malignant hematopoietic cells analyzed by mass cytometry and provide recommendations for appropriate use. Finally, we provide an integrative strategy that combines the strengths of t-SNE and SPADE to infer cellular hierarchy from high-dimensional single-cell data.
Collapse
|
18
|
Xu Y, Qiu P, Roysam B. Unsupervised Discovery of Subspace Trends. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015; 37:2131-2145. [PMID: 26353189 DOI: 10.1109/tpami.2015.2394475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper presents unsupervised algorithms for discovering previously unknown subspace trends in high-dimensional data sets without the benefit of prior information. A subspace trend is a sustained pattern of gradual/progressive changes within an unknown subset of feature dimensions. A fundamental challenge to subspace trend discovery is the presence of irrelevant data dimensions, noise, outliers, and confusion from multiple subspace trends driven by independent factors that are mixed in with each other. These factors can obscure the trends in conventional dimension reduction & projection based data visualizations. To overcome these limitations, we propose a novel graph-theoretic neighborhood similarity measure for detecting concordant progressive changes across data dimensions. Using this measure, we present an unsupervised algorithm for trend-relevant feature selection, subspace trend discovery, quantification of trend strength, and validation. Our method successfully identified verifiable subspace trends in diverse synthetic and real-world biomedical datasets. Visualizations derived from the selected trend-relevant features revealed biologically meaningful hidden subspace trend(s) that were obscured by irrelevant features and noise. Although our examples are drawn from the biological domain, the proposed algorithm is broadly applicable to exploratory analysis of high-dimensional data including visualization, hypothesis generation, knowledge discovery, and prediction in diverse other applications.
Collapse
|
19
|
Oscope identifies oscillatory genes in unsynchronized single-cell RNA-seq experiments. Nat Methods 2015; 12:947-950. [PMID: 26301841 PMCID: PMC4589503 DOI: 10.1038/nmeth.3549] [Citation(s) in RCA: 110] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 06/05/2015] [Indexed: 01/24/2023]
Abstract
Oscillatory gene expression is fundamental to mammalian development, but technologies to monitor expression oscillations are limited. We have developed a statistical approach called Oscope to identify and characterize the transcriptional dynamics of oscillating genes in single-cell RNA-seq data from an unsynchronized cell population. Applications to a number of data sets demonstrate the utility of the approach and also identify a potential artifact in the Fluidigm C1 platform.
Collapse
|
20
|
Yuan K, Sakoparnig T, Markowetz F, Beerenwinkel N. BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biol 2015; 16:36. [PMID: 25786108 PMCID: PMC4359483 DOI: 10.1186/s13059-015-0592-6] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 01/21/2015] [Indexed: 11/28/2022] Open
Abstract
Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogenyBitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. We validate our approach in the controlled setting of a simulation study and compare it against several competing methods. In two case studies, we demonstrate how BitPhylogeny BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm.
Collapse
Affiliation(s)
- Ke Yuan
- />University of Cambridge, Cancer Research UK Cambridge Institute, Cambridge, UK
| | - Thomas Sakoparnig
- />Department of Biosystems Science and Engineering, ETH Zurich, Basel Switzerland
- />SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- />Current address: Biozentrum, University of Basel, Basel, Switzerland
| | - Florian Markowetz
- />University of Cambridge, Cancer Research UK Cambridge Institute, Cambridge, UK
| | - Niko Beerenwinkel
- />Department of Biosystems Science and Engineering, ETH Zurich, Basel Switzerland
- />SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
21
|
Francesconi M, Lehner B. Reconstructing and analysing cellular states, space and time from gene expression profiles of many cells and single cells. MOLECULAR BIOSYSTEMS 2015; 11:2690-8. [DOI: 10.1039/c5mb00339c] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Gene expression profiling is a fast, cheap and standardised analysis that provides a high dimensional measurement of the state of a biological sample, including of single cells. Computational methods to reconstruct the composition of samples and spatial and temporal information from expression profiles are described, as well as how they can be used to describe the effects of genetic variation.
Collapse
Affiliation(s)
- Mirko Francesconi
- EMBL-CRG Systems Biology Unit
- Centre for Genomic Regulation (CRG)
- 08003 Barcelona
- Spain
- Universitat Pompeu Fabra (UPF)
| | - Ben Lehner
- EMBL-CRG Systems Biology Unit
- Centre for Genomic Regulation (CRG)
- 08003 Barcelona
- Spain
- Universitat Pompeu Fabra (UPF)
| |
Collapse
|
22
|
Xie R, Huang H, Li W, Chen B, Jiang J, He Y, Lv J, ma B, Zhou Y, Feng C, Chen L, He W. Identifying progression related disease risk modules based on the human subcellular signaling networks. MOLECULAR BIOSYSTEMS 2014; 10:3298-309. [PMID: 25315201 DOI: 10.1039/c4mb00482e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Many studies have shown that the structure and dynamics of the human signaling network are disturbed in complex diseases such as coronary artery disease, and gene expression profiles can distinguish variations in diseases since they can accurately reflect the status of cells. Integration of subcellular localization and the human signaling network holds promise for providing insight into human diseases. In this study, we performed a novel algorithm to identify progression-related-disease-risk modules (PRDRMs) among patients of different disease states within eleven subcellular sub-networks from a human signaling network. The functional annotation and literature retrieval showed that the PRDRMs were strongly associated with disease pathogenesis. The results indicated that the PRDRM expression values as classification features had a good classification performance to distinguish patients of different disease states. Our approach compared with the method PageRank had a better classification performance. The identification of the PRDRMs in response to the dynamic gene expression change could facilitate our understanding of the pathological basis of complex diseases. Our strategy could provide new insights into the potential use of prognostic biomarkers and the effective guidance of clinical therapy from the human subcellular signaling network perspective.
Collapse
Affiliation(s)
- Ruiqiang Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province 150081, China.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Sun Y, Yao J, Nowak NJ, Goodison S. Cancer progression modeling using static sample data. Genome Biol 2014; 15:440. [PMID: 25155694 PMCID: PMC4196119 DOI: 10.1186/s13059-014-0440-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/14/2014] [Indexed: 12/20/2022] Open
Abstract
As molecular profiling data continue to accumulate, the design of integrative computational analyses that can provide insights into the dynamic aspects of cancer progression becomes feasible. Here, we present a novel computational method for the construction of cancer progression models based on the analysis of static tumor samples. We demonstrate the reliability of the method with simulated data, and describe the application to breast cancer data. Our findings support a linear, branching model for breast cancer progression. An interactive model facilitates the identification of key molecular events in the advance of disease to malignancy.
Collapse
|
24
|
Wang Z, San Lucas FA, Qiu P, Liu Y. Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection. BMC Bioinformatics 2014; 15:153. [PMID: 24885641 PMCID: PMC4035826 DOI: 10.1186/1471-2105-15-153] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 05/14/2014] [Indexed: 11/10/2022] Open
Abstract
Background Many variable selection techniques have been proposed for the clustering of gene expression data. While these methods tend to filter out irrelevant genes and identify informative genes that contribute to a clustering solution, they are based on criteria that do not consider the potential interactive influence among individual genes. Motivated by ensemble clustering, there is a strong interest in leveraging the structure of gene networks for gene selection, so that the relationship information between genes can be effectively utilized, while the selected genes are expected to preserve all the possible clustering structures in the data. Results We present a new filter method that uses the gene connectivity in the gene co-expression network as the evaluation criteria for variable selection. The gene connectivity measures the importance of the genes in term of their expression similarity with others in the co-expression network. The hard threshold and soft threshold transformations are employed to construct the gene co-expression networks. Both simulation studies and real data analysis have shown that the network based on soft thresholding is more effective in selecting relevant variables and provides better clustering results compared to the hard thresholding transformation and two other canonical filter methods for variable selection. Furthermore, a new module analysis approach is proposed to reveal the higher order organization of the gene space, where the genes of a module share significant topological similarity and are associated with a consensus partition of the sample space. We demonstrate that the identified modules can lead to biologically meaningful sample partitions that might be missed by other methods. Conclusions By leveraging the structure of gene co-expression network, first we propose a variable selection method that selects individual genes with top connectivity. Both simulation studies and real data application have demonstrated that our method has better performance in terms of the reliability of the selected genes and sample clustering results. In addition, we propose a module recovery method that can help discover novel sample partitions that might be hidden when performing clustering analyses using all available genes. The source code of our program is available at http://nba.uth.tmc.edu/homepage/liu/netVar/.
Collapse
Affiliation(s)
| | | | | | - Yin Liu
- Department of Neurobiology and Anatomy, University of Texas Health Science Center at Houston, Houston, Texas, USA.
| |
Collapse
|
25
|
The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014. [PMID: 24658644 DOI: 10.1038/nbt.2859.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Defining the transcriptional dynamics of a temporal process such as cell differentiation is challenging owing to the high variability in gene expression between individual cells. Time-series gene expression analyses of bulk cells have difficulty distinguishing early and late phases of a transcriptional cascade or identifying rare subpopulations of cells, and single-cell proteomic methods rely on a priori knowledge of key distinguishing markers. Here we describe Monocle, an unsupervised algorithm that increases the temporal resolution of transcriptome dynamics using single-cell RNA-Seq data collected at multiple time points. Applied to the differentiation of primary human myoblasts, Monocle revealed switch-like changes in expression of key regulatory factors, sequential waves of gene regulation, and expression of regulators that were not known to act in differentiation. We validated some of these predicted regulators in a loss-of function screen. Monocle can in principle be used to recover single-cell gene expression kinetics from a wide array of cellular processes, including differentiation, proliferation and oncogenic transformation.
Collapse
|
26
|
The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014; 32:381-386. [PMID: 24658644 PMCID: PMC4122333 DOI: 10.1038/nbt.2859] [Citation(s) in RCA: 3646] [Impact Index Per Article: 364.6] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2013] [Accepted: 02/25/2014] [Indexed: 11/20/2022]
|
27
|
Martinez E, Trevino V. Modelling gene expression profiles related to prostate tumor progression using binary states. Theor Biol Med Model 2013; 10:37. [PMID: 23721350 PMCID: PMC3691825 DOI: 10.1186/1742-4682-10-37] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2012] [Accepted: 05/21/2013] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Cancer is a complex disease commonly characterized by the disrupted activity of several cancer-related genes such as oncogenes and tumor-suppressor genes. Previous studies suggest that the process of tumor progression to malignancy is dynamic and can be traced by changes in gene expression. Despite the enormous efforts made for differential expression detection and biomarker discovery, few methods have been designed to model the gene expression level to tumor stage during malignancy progression. Such models could help us understand the dynamics and simplify or reveal the complexity of tumor progression. METHODS We have modeled an on-off state of gene activation per sample then per stage to select gene expression profiles associated to tumor progression. The selection is guided by statistical significance of profiles based on random permutated datasets. RESULTS We show that our method identifies expected profiles corresponding to oncogenes and tumor suppressor genes in a prostate tumor progression dataset. Comparisons with other methods support our findings and indicate that a considerable proportion of significant profiles is not found by other statistical tests commonly used to detect differential expression between tumor stages nor found by other tailored methods. Ontology and pathway analysis concurred with these findings. CONCLUSIONS Results suggest that our methodology may be a valuable tool to study tumor malignancy progression, which might reveal novel cancer therapies.
Collapse
Affiliation(s)
- Emmanuel Martinez
- Tecnológico de Monterrey, Campus Monterrey, Cátedra de Bioinformática, Monterrey, Nuevo León 64849, México
| | - Victor Trevino
- Tecnológico de Monterrey, Campus Monterrey, Cátedra de Bioinformática, Monterrey, Nuevo León 64849, México
| |
Collapse
|
28
|
Sánchez-Alvarez R, Gayen S, Vadigepalli R, Anni H. Ethanol diverts early neuronal differentiation trajectory of embryonic stem cells by disrupting the balance of lineage specifiers. PLoS One 2013; 8:e63794. [PMID: 23724002 PMCID: PMC3665827 DOI: 10.1371/journal.pone.0063794] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 04/04/2013] [Indexed: 02/07/2023] Open
Abstract
Background Ethanol is a toxin responsible for the neurodevelopmental deficits of Fetal Alcohol Spectrum Disorders (FASD). Recent evidence suggests that ethanol modulates the protein expression of lineage specifier transcription factors Oct4 (Pou5f1) and Sox2 in early stages of mouse embryonic stem (ES) cell differentiation. We hypothesized that ethanol induced an imbalance in the expression of Oct4 and Sox2 in early differentiation, that dysregulated the expression of associated and target genes and signaling molecules and diverted cells from neuroectodermal (NE) formation. Methodology/Principal Findings We showed modulation by ethanol of 33 genes during ES cell differentiation, using high throughput microfluidic dynamic array chips measuring 2,304 real time quantitative PCR assays. Based on the overall gene expression dynamics, ethanol drove cells along a differentiation trajectory away from NE fate. These ethanol-induced gene expression changes were observed as early as within 2 days of differentiation, and were independent of cell proliferation or apoptosis. Gene expression changes were correlated with fewer βIII-tubulin positive cells of an immature neural progenitor phenotype, as well as a disrupted actin cytoskeleton were observed. Moreover, Tuba1a and Gapdh housekeeping genes were modulated by ethanol during differentiation and were replaced by a set of ribosomal genes with stable expression. Conclusions/Significance These findings provided an ethanol-response gene signature and pointed to the transcriptional dynamics underlying lineage imbalance that may be relevant to FASD phenotype.
Collapse
Affiliation(s)
- Rosa Sánchez-Alvarez
- Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, Pennsylvania, United States of America
| | - Saurabh Gayen
- Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, Pennsylvania, United States of America
| | - Rajanikanth Vadigepalli
- Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, Pennsylvania, United States of America
- Daniel Baugh Institute for Functional Genomics and Computational Biology, Thomas Jefferson University, Philadelphia, Pennsylvania, United States of America
- * E-mail: (RV); (HA)
| | - Helen Anni
- Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, Pennsylvania, United States of America
- * E-mail: (RV); (HA)
| |
Collapse
|
29
|
Qiu P, Plevritis SK. TreeVis: a MATLAB-based tool for tree visualization. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2013; 109:74-6. [PMID: 23036855 PMCID: PMC3508366 DOI: 10.1016/j.cmpb.2012.08.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2011] [Revised: 06/02/2012] [Accepted: 08/15/2012] [Indexed: 05/25/2023]
Abstract
Network-based analyses of high-dimensional biological data often produce results in the form of tree structures. Generating easily interpretable layouts to visualize these tree structures is a non-trivial task. We present a new visualization algorithm to generate two-dimensional layouts for complex tree structures. Implementations in both MATLAB and R are provided.
Collapse
Affiliation(s)
- Peng Qiu
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, USA
| | | |
Collapse
|
30
|
Qiu P, Zhang L. Identification of markers associated with global changes in DNA methylation regulation in cancers. BMC Bioinformatics 2012; 13 Suppl 13:S7. [PMID: 23320390 PMCID: PMC3426805 DOI: 10.1186/1471-2105-13-s13-s7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
DNA methylation exhibits different patterns in different cancers. DNA methylation rates at different genomic loci appear to be highly correlated in some samples but not in others. We call such phenomena conditional concordant relationships (CCRs). In this study, we explored DNA methylation patterns in 12 common cancers using data of 2434 patient samples collected by The Cancer Genome Atlas project. We developed an exploratory method to characterize CCRs in the methylation data and identified the 200 gene markers whose on-and-off statuses in DNA methylation are most significantly associated with drastic changes in CCRs throughout the genome. Clustering analysis of the methylation data of the 200 markers showed that they are tightly associated with cancer subtypes. We also generated a library of the significant CCRs that may be of interest to future studies of the regulation network of DNA methylation in cancer.
Collapse
Affiliation(s)
- Peng Qiu
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | |
Collapse
|
31
|
Ng JWY, Barrett LM, Wong A, Kuh D, Smith GD, Relton CL. The role of longitudinal cohort studies in epigenetic epidemiology: challenges and opportunities. Genome Biol 2012; 13:246. [PMID: 22747597 PMCID: PMC3446311 DOI: 10.1186/gb-2012-13-6-246] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Longitudinal cohort studies are ideal for investigating how epigenetic patterns change over time and relate to changing exposure patterns and the development of disease. We highlight the challenges and opportunities in this approach.
Collapse
|
32
|
Inferring phenotypic properties from single-cell characteristics. PLoS One 2012; 7:e37038. [PMID: 22662133 PMCID: PMC3360688 DOI: 10.1371/journal.pone.0037038] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2012] [Accepted: 04/11/2012] [Indexed: 11/19/2022] Open
Abstract
Flow cytometry provides multi-dimensional data at the single-cell level. Such data contain information about the cellular heterogeneity of bulk samples, making it possible to correlate single-cell features with phenotypic properties of bulk tissues. Predicting phenotypes from single-cell measurements is a difficult challenge that has not been extensively studied. The 6th Dialogue for Reverse Engineering Assessments and Methods (DREAM6) invited the research community to develop solutions to a computational challenge: classifying acute myeloid leukemia (AML) positive patients and healthy donors using flow cytometry data. DREAM6 provided flow cytometry data for 359 normal and AML samples, and the class labels for half of the samples. Researchers were asked to predict the class labels of the remaining half. This paper describes one solution that was constructed by combining three algorithms: spanning-tree progression analysis of density-normalized events (SPADE), earth mover’s distance, and a nearest-neighbor classifier called Relief. This solution was among the top-performing methods that achieved 100% prediction accuracy.
Collapse
|
33
|
Cornero A, Acquaviva M, Fardin P, Versteeg R, Schramm A, Eva A, Bosco MC, Blengio F, Barzaghi S, Varesio L. Design of a multi-signature ensemble classifier predicting neuroblastoma patients' outcome. BMC Bioinformatics 2012; 13 Suppl 4:S13. [PMID: 22536959 PMCID: PMC3314564 DOI: 10.1186/1471-2105-13-s4-s13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Neuroblastoma is the most common pediatric solid tumor of the sympathetic nervous system. Development of improved predictive tools for patients stratification is a crucial requirement for neuroblastoma therapy. Several studies utilized gene expression-based signatures to stratify neuroblastoma patients and demonstrated a clear advantage of adding genomic analysis to risk assessment. There is little overlapping among signatures and merging their prognostic potential would be advantageous. Here, we describe a new strategy to merge published neuroblastoma related gene signatures into a single, highly accurate, Multi-Signature Ensemble (MuSE)-classifier of neuroblastoma (NB) patients outcome. Methods Gene expression profiles of 182 neuroblastoma tumors, subdivided into three independent datasets, were used in the various phases of development and validation of neuroblastoma NB-MuSE-classifier. Thirty three signatures were evaluated for patients' outcome prediction using 22 classification algorithms each and generating 726 classifiers and prediction results. The best-performing algorithm for each signature was selected, validated on an independent dataset and the 20 signatures performing with an accuracy > = 80% were retained. Results We combined the 20 predictions associated to the corresponding signatures through the selection of the best performing algorithm into a single outcome predictor. The best performance was obtained by the Decision Table algorithm that produced the NB-MuSE-classifier characterized by an external validation accuracy of 94%. Kaplan-Meier curves and log-rank test demonstrated that patients with good and poor outcome prediction by the NB-MuSE-classifier have a significantly different survival (p < 0.0001). Survival curves constructed on subgroups of patients divided on the bases of known prognostic marker suggested an excellent stratification of localized and stage 4s tumors but more data are needed to prove this point. Conclusions The NB-MuSE-classifier is based on an ensemble approach that merges twenty heterogeneous, neuroblastoma-related gene signatures to blend their discriminating power, rather than numeric values, into a single, highly accurate patients' outcome predictor. The novelty of our approach derives from the way to integrate the gene expression signatures, by optimally associating them with a single paradigm ultimately integrated into a single classifier. This model can be exported to other types of cancer and to diseases for which dedicated databases exist.
Collapse
Affiliation(s)
- Andrea Cornero
- Laboratory of Molecular Biology, G. Gaslini Institute, Genoa 16147, Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Ng JWY, Barrett LM, Wong A, Kuh D, Smith G, Relton CL. The role of longitudinal cohort studies in epigenetic epidemiology: challenges and opportunities. Genome Biol 2012. [DOI: 10.1186/gb4029] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
35
|
Chen EY, Xu H, Gordonov S, Lim MP, Perkins MH, Ma'ayan A. Expression2Kinases: mRNA profiling linked to multiple upstream regulatory layers. ACTA ACUST UNITED AC 2011; 28:105-11. [PMID: 22080467 DOI: 10.1093/bioinformatics/btr625] [Citation(s) in RCA: 113] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
MOTIVATION Genome-wide mRNA profiling provides a snapshot of the global state of cells under different conditions. However, mRNA levels do not provide direct understanding of upstream regulatory mechanisms. Here, we present a new approach called Expression2Kinases (X2K) to identify upstream regulators likely responsible for observed patterns in genome-wide gene expression. By integrating chromatin immuno-precipitation (ChIP)-seq/chip and position weight matrices (PWMs) data, protein-protein interactions and kinase-substrate phosphorylation reactions, we can better identify regulatory mechanisms upstream of genome-wide differences in gene expression. We validated X2K by applying it to recover drug targets of food and drug administration (FDA)-approved drugs from drug perturbations followed by mRNA expression profiling; to map the regulatory landscape of 44 stem cells and their differentiating progeny; to profile upstream regulatory mechanisms of 327 breast cancer tumors; and to detect pathways from profiled hepatic stellate cells and hippocampal neurons. The X2K approach can advance our understanding of cell signaling and unravel drugs mechanisms of action. AVAILABILITY The software and source code are freely available at: http://www.maayanlab.net/X2K. CONTACT avi.maayan@mssm.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Edward Y Chen
- Department of Pharmacology and Systems Therapeutics, Systems Biology Center New York, New York, NY, USA
| | | | | | | | | | | |
Collapse
|
36
|
Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat Biotechnol 2011; 29:886-91. [PMID: 21964415 PMCID: PMC3196363 DOI: 10.1038/nbt.1991] [Citation(s) in RCA: 702] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2011] [Accepted: 08/31/2011] [Indexed: 01/17/2023]
Abstract
Multiparametric single-cell analysis is critical for understanding cellular heterogeneity. Despite recent technological advances in single-cell measurements, methods for analyzing high-dimensional single-cell data are often subjective, labor intensive and require prior knowledge of the biological system under investigation. To objectively uncover cellular heterogeneity from single-cell measurements, we present a novel computational approach, Spanning-tree Progression Analysis of Density-normalized Events (SPADE). We applied SPADE to cytometry data of mouse and human bone marrow. In both cases, SPADE organized cells in a hierarchy of related phenotypes that partially recapitulated well-described patterns of hematopoiesis. In addition, SPADE produced a map of intracellular signal activation across the landscape of human hematopoietic development. SPADE revealed a functionally distinct cell population, natural killer (NK) cells, without using any NK-specific parameters. SPADE is a versatile method that facilitates the analysis of cellular heterogeneity, the identification of cell types, and comparison of functional markers in response to perturbations.
Collapse
|