1
|
Ding J, Sharon N, Bar-Joseph Z. Temporal modelling using single-cell transcriptomics. Nat Rev Genet 2022; 23:355-368. [PMID: 35102309 DOI: 10.1038/s41576-021-00444-7] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2021] [Indexed: 12/16/2022]
Abstract
Methods for profiling genes at the single-cell level have revolutionized our ability to study several biological processes and systems including development, differentiation, response programmes and disease progression. In many of these studies, cells are profiled over time in order to infer dynamic changes in cell states and types, sets of expressed genes, active pathways and key regulators. However, time-series single-cell RNA sequencing (scRNA-seq) also raises several new analysis and modelling issues. These issues range from determining when and how deep to profile cells, linking cells within and between time points, learning continuous trajectories, and integrating bulk and single-cell data for reconstructing models of dynamic networks. In this Review, we discuss several approaches for the analysis and modelling of time-series scRNA-seq, highlighting their steps, key assumptions, and the types of data and biological questions they are most appropriate for.
Collapse
|
2
|
Jiang Q, Zhang S, Wan L. Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data. PLoS Comput Biol 2022; 18:e1009821. [PMID: 35073331 PMCID: PMC8812873 DOI: 10.1371/journal.pcbi.1009821] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 02/03/2022] [Accepted: 01/10/2022] [Indexed: 12/27/2022] Open
Abstract
Time series single-cell RNA sequencing (scRNA-seq) data are emerging. However, dynamic inference of an evolving cell population from time series scRNA-seq data is challenging owing to the stochasticity and nonlinearity of the underlying biological processes. This calls for the development of mathematical models and methods capable of reconstructing cellular dynamic transition processes and uncovering the nonlinear cell-cell interactions. In this study, we present GraphFP, a nonlinear Fokker-Planck equation on graph based model and dynamic inference framework, with the aim of reconstructing the cell state-transition complex potential energy landscape from time series single-cell transcriptomic data. The free energy of our model explicitly takes into account of the cell-cell interactions in a nonlinear quadratic term. We then recast the model inference problem in the form of a dynamic optimal transport framework and solve it efficiently with the adjoint method of optimal control. We evaluated GraphFP on the time series scRNA-seq data set of embryonic murine cerebral cortex development. We illustrated that it 1) reconstructs cell state potential energy, which is a measure of cellular differentiation potency, 2) faithfully charts the probability flows between paired cell states over the dynamic processes of cell differentiation, and 3) accurately quantifies the stochastic dynamics of cell type frequencies on probability simplex in continuous time. We also illustrated that GraphFP is robust in terms of cluster labelling with different resolutions, as well as parameter choices. Meanwhile, GraphFP provides a model-based approach to delineate the cell-cell interactions that drive cell differentiation. GraphFP software is available at https://github.com/QiJiang-QJ/GraphFP. Dynamic inference of cell development processes from time series scRNA-seq data is a major challenge. Here, we present GraphFP, a coherent computational framework that simultaneously reconstructs the cell state-transition complex potential energy landscape and infers cell-cell interactions from time series single-cell transcriptomic data. Based on the mathematical framework of nonlinear Fokker-Planck equation on graph, GraphFP models the stochastic dynamics of the cell state/type frequencies on probability simplex in continuous time, where the free energy with a nonlinear quadratic interaction term is employed to characterize cell-cell interactions. We formulate the model inference problem in the form of a dynamic optimal transport framework and solve it efficiently with the celebrated adjoint method. GraphFP allows for 1) reconstructing cell state potential energy, which is a measure of cellular differentiation potency, 2) charting the probability flows between paired cell states over dynamic processes, 3) quantifying the stochastic dynamics of cell type frequencies on probability simplex in continuous time, and 4) delineating cell-cell interactions that drive cell differentiation. We show how GraphFP can be used to faithfully reveal and accurately quantify the cell development processes using the embryonic murine cerebral cortex development time series scRNA-seq dataset.
Collapse
Affiliation(s)
- Qi Jiang
- NCMIS, LSC, LSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Shuo Zhang
- NCMIS, LSC, LSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Lin Wan
- NCMIS, LSC, LSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
- * E-mail:
| |
Collapse
|
3
|
Where the genome meets the connectome: Understanding how genes shape human brain connectivity. Neuroimage 2021; 244:118570. [PMID: 34508898 DOI: 10.1016/j.neuroimage.2021.118570] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 08/10/2021] [Accepted: 09/07/2021] [Indexed: 02/07/2023] Open
Abstract
The integration of modern neuroimaging methods with genetically informative designs and data can shed light on the molecular mechanisms underlying the structural and functional organization of the human connectome. Here, we review studies that have investigated the genetic basis of human brain network structure and function through three complementary frameworks: (1) the quantification of phenotypic heritability through classical twin designs; (2) the identification of specific DNA variants linked to phenotypic variation through association and related studies; and (3) the analysis of correlations between spatial variations in imaging phenotypes and gene expression profiles through the integration of neuroimaging and transcriptional atlas data. We consider the basic foundations, strengths, limitations, and discoveries associated with each approach. We present converging evidence to indicate that anatomical connectivity is under stronger genetic influence than functional connectivity and that genetic influences are not uniformly distributed throughout the brain, with phenotypic variation in certain regions and connections being under stronger genetic control than others. We also consider how the combination of imaging and genetics can be used to understand the ways in which genes may drive brain dysfunction in different clinical disorders.
Collapse
|
4
|
Zhao C, Xiu W, Hua Y, Zhang N, Zhang Y. CStreet: a computed Cell State trajectory inference method for time-series single-cell RNA sequencing data. Bioinformatics 2021; 37:3774-3780. [PMID: 34196686 DOI: 10.1093/bioinformatics/btab488] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 06/24/2021] [Accepted: 06/30/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The increasing amount of time-series single-cell RNA sequencing (scRNA-seq) data raises the key issue of connecting cell states (i.e., cell clusters or cell types) to obtain the continuous temporal dynamics of transcription, which can highlight the unified biological mechanisms involved in cell state transitions. However, most existing trajectory methods are specifically designed for individual cells, so they can hardly meet the needs of accurately inferring the trajectory topology of the cell state, which usually contains cells assigned to different branches. RESULTS Here, we present CStreet, a computed Cell State trajectory inference method for time-series scRNA-seq data. It uses time-series information to construct the k-nearest neighbors connections between cells within each time point and between adjacent time points. Then, CStreet estimates the connection probabilities of the cell states and visualizes the trajectory, which may include multiple starting points and paths, using a force-directed graph. By comparing the performance of CStreet with that of six commonly used cell state trajectory reconstruction methods on simulated data and real data, we demonstrate the high accuracy and high tolerance of CStreet. AVAILABILITY AND IMPLEMENTATION CStreet is written in Python and freely available on the web at https://github.com/TongjiZhanglab/CStreet and https://doi.org/10.5281/zenodo.4483205. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chengchen Zhao
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| | - Wenchao Xiu
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| | - Yuwei Hua
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, 264209, China
| | - Yong Zhang
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| |
Collapse
|
5
|
Zheng X, Huang Y, Zou X. scPADGRN: A preconditioned ADMM approach for reconstructing dynamic gene regulatory network using single-cell RNA sequencing data. PLoS Comput Biol 2020; 16:e1007471. [PMID: 32716923 PMCID: PMC7410337 DOI: 10.1371/journal.pcbi.1007471] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 08/06/2020] [Accepted: 05/28/2020] [Indexed: 12/23/2022] Open
Abstract
Disease development and cell differentiation both involve dynamic changes; therefore, the reconstruction of dynamic gene regulatory networks (DGRNs) is an important but difficult problem in systems biology. With recent technical advances in single-cell RNA sequencing (scRNA-seq), large volumes of scRNA-seq data are being obtained for various processes. However, most current methods of inferring DGRNs from bulk samples may not be suitable for scRNA-seq data. In this work, we present scPADGRN, a novel DGRN inference method using “time-series” scRNA-seq data. scPADGRN combines the preconditioned alternating direction method of multipliers with cell clustering for DGRN reconstruction. It exhibits advantages in accuracy, robustness and fast convergence. Moreover, a quantitative index called Differentiation Genes’ Interaction Enrichment (DGIE) is presented to quantify the interaction enrichment of genes related to differentiation. From the DGIE scores of relevant subnetworks, we infer that the functions of embryonic stem (ES) cells are most active initially and may gradually fade over time. The communication strength of known contributing genes that facilitate cell differentiation increases from ES cells to terminally differentiated cells. We also identify several genes responsible for the changes in the DGIE scores occurring during cell differentiation based on three real single-cell datasets. Our results demonstrate that single-cell analyses based on network inference coupled with quantitative computations can reveal key transcriptional regulators involved in cell differentiation and disease development. Single-cell RNA sequencing (scRNA-seq) data are gaining popularity for providing access to cell-level measurements. Currently, time-series scRNA-seq data allow researchers to study dynamic changes during biological processes. This work proposes a novel method, scPADGRN, for application to time-series scRNA-seq data to construct dynamic gene regulatory networks, which are informative for investigating dynamic changes during disease development and cell differentiation. The proposed method shows satisfactory performance on both simulated data and three real datasets concerning cell differentiation. To quantify network dynamics, we present a quantitative index, DGIE, to measure the degree of activity of a certain set of genes in a regulatory network. Quantitative computations based on dynamic networks identify key regulators in cell differentiation and reveal the activity states of the identified regulators. Specifically, Bhlhe40, Msx2, Foxa2 and Dnmt3l might be important regulatory genes involved in differentiation from mouse ES cells to primitive endoderm (PrE) cells. For differentiation from mouse embryonic fibroblast cells to myocytes, Scx, Fos and Tcf12 are suggested to be key regulators. Sox5, Meis2, Hoxb3, Tcf7l1 and Plagl1 critically contribute during differentiation from human ES cells to definitive endoderm cells. These results may guide further theoretical and experimental efforts to understand cell differentiation processes and explore cell heterogeneity.
Collapse
Affiliation(s)
- Xiao Zheng
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| | - Yuan Huang
- Department of Biostatistics, Yale University, New Haven, Connecticut, United States of America
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
- * E-mail:
| |
Collapse
|
6
|
Hie B, Peters J, Nyquist SK, Shalek AK, Berger B, Bryson BD. Computational Methods for Single-Cell RNA Sequencing. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-012220-100601] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has provided a high-dimensional catalog of millions of cells across species and diseases. These data have spurred the development of hundreds of computational tools to derive novel biological insights. Here, we outline the components of scRNA-seq analytical pipelines and the computational methods that underlie these steps. We describe available methods, highlight well-executed benchmarking studies, and identify opportunities for additional benchmarking studies and computational methods. As the biochemical approaches for single-cell omics advance, we propose coupled development of robust analytical pipelines suited for the challenges that new data present and principled selection of analytical methods that are suited for the biological questions to be addressed.
Collapse
Affiliation(s)
- Brian Hie
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Joshua Peters
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| | - Sarah K. Nyquist
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Alex K. Shalek
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Department of Chemistry, Institute for Medical Engineering & Science (IMES), and Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bryan D. Bryson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
7
|
Lin C, Bar-Joseph Z. Continuous-state HMMs for modeling time-series single-cell RNA-Seq data. Bioinformatics 2020; 35:4707-4715. [PMID: 31038684 DOI: 10.1093/bioinformatics/btz296] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 02/11/2019] [Accepted: 04/18/2019] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Methods for reconstructing developmental trajectories from time-series single-cell RNA-Seq (scRNA-Seq) data can be largely divided into two categories. The first, often referred to as pseudotime ordering methods are deterministic and rely on dimensionality reduction followed by an ordering step. The second learns a probabilistic branching model to represent the developmental process. While both types have been successful, each suffers from shortcomings that can impact their accuracy. RESULTS We developed a new method based on continuous-state HMMs (CSHMMs) for representing and modeling time-series scRNA-Seq data. We define the CSHMM model and provide efficient learning and inference algorithms which allow the method to determine both the structure of the branching process and the assignment of cells to these branches. Analyzing several developmental single-cell datasets, we show that the CSHMM method accurately infers branching topology and correctly and continuously assign cells to paths, improving upon prior methods proposed for this task. Analysis of genes based on the continuous cell assignment identifies known and novel markers for different cell types. AVAILABILITY AND IMPLEMENTATION Software and Supporting website: www.andrew.cmu.edu/user/chiehl1/CSHMM/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chieh Lin
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, US
| | - Ziv Bar-Joseph
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, US.,Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, US
| |
Collapse
|
8
|
|
9
|
Hurley K, Ding J, Villacorta-Martin C, Herriges MJ, Jacob A, Vedaie M, Alysandratos KD, Sun YL, Lin C, Werder RB, Huang J, Wilson AA, Mithal A, Mostoslavsky G, Oglesby I, Caballero IS, Guttentag SH, Ahangari F, Kaminski N, Rodriguez-Fraticelli A, Camargo F, Bar-Joseph Z, Kotton DN. Reconstructed Single-Cell Fate Trajectories Define Lineage Plasticity Windows during Differentiation of Human PSC-Derived Distal Lung Progenitors. Cell Stem Cell 2020; 26:593-608.e8. [PMID: 32004478 PMCID: PMC7469703 DOI: 10.1016/j.stem.2019.12.009] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2019] [Revised: 11/04/2019] [Accepted: 12/19/2019] [Indexed: 12/17/2022]
Abstract
Alveolar epithelial type 2 cells (AEC2s) are the facultative progenitors responsible for maintaining lung alveoli throughout life but are difficult to isolate from patients. Here, we engineer AEC2s from human pluripotent stem cells (PSCs) in vitro and use time-series single-cell RNA sequencing with lentiviral barcoding to profile the kinetics of their differentiation in comparison to primary fetal and adult AEC2 benchmarks. We observe bifurcating cell-fate trajectories as primordial lung progenitors differentiate in vitro, with some progeny reaching their AEC2 fate target, while others diverge to alternative non-lung endodermal fates. We develop a Continuous State Hidden Markov model to identify the timing and type of signals, such as overexuberant Wnt responses, that induce some early multipotent NKX2-1+ progenitors to lose lung fate. Finally, we find that this initial developmental plasticity is regulatable and subsides over time, ultimately resulting in PSC-derived AEC2s that exhibit a stable phenotype and nearly limitless self-renewal capacity.
Collapse
Affiliation(s)
- Killian Hurley
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA; The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA; Department of Medicine, Royal College of Surgeons in Ireland, Education and Research Centre, Beaumont Hospital, Dublin, Ireland; Tissue Engineering Research Group, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Jun Ding
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Carlos Villacorta-Martin
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
| | - Michael J Herriges
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA; The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Anjali Jacob
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA; The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Marall Vedaie
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA; The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Konstantinos D Alysandratos
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA; The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Yuliang L Sun
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA; The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Chieh Lin
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15217, USA
| | - Rhiannon B Werder
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA; The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Jessie Huang
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA; The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Andrew A Wilson
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA; The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Aditya Mithal
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
| | - Gustavo Mostoslavsky
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
| | - Irene Oglesby
- Department of Medicine, Royal College of Surgeons in Ireland, Education and Research Centre, Beaumont Hospital, Dublin, Ireland; Tissue Engineering Research Group, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Ignacio S Caballero
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
| | - Susan H Guttentag
- Department of Pediatrics, Monroe Carell Jr. Children's Hospital, Vanderbilt University, Nashville, TN 37232, USA
| | - Farida Ahangari
- Pulmonary, Critical Care and Sleep Medicine, Yale University School of Medicine, New Haven, CT 16520, USA
| | - Naftali Kaminski
- Pulmonary, Critical Care and Sleep Medicine, Yale University School of Medicine, New Haven, CT 16520, USA
| | | | - Fernando Camargo
- Stem Cell Program, Boston Children's Hospital, Boston, MA 02115, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA; Harvard Stem Cell Institute, Boston, MA 02115, USA
| | - Ziv Bar-Joseph
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15217, USA.
| | - Darrell N Kotton
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA; The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA.
| |
Collapse
|
10
|
Lin C, Ding J, Bar-Joseph Z. Inferring TF activation order in time series scRNA-Seq studies. PLoS Comput Biol 2020; 16:e1007644. [PMID: 32069291 PMCID: PMC7048296 DOI: 10.1371/journal.pcbi.1007644] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 02/28/2020] [Accepted: 01/09/2020] [Indexed: 12/11/2022] Open
Abstract
Methods for the analysis of time series single cell expression data (scRNA-Seq) either do not utilize information about transcription factors (TFs) and their targets or only study these as a post-processing step. Using such information can both, improve the accuracy of the reconstructed model and cell assignments, while at the same time provide information on how and when the process is regulated. We developed the Continuous-State Hidden Markov Models TF (CSHMM-TF) method which integrates probabilistic modeling of scRNA-Seq data with the ability to assign TFs to specific activation points in the model. TFs are assumed to influence the emission probabilities for cells assigned to later time points allowing us to identify not just the TFs controlling each path but also their order of activation. We tested CSHMM-TF on several mouse and human datasets. As we show, the method was able to identify known and novel TFs for all processes, assigned time of activation agrees with both expression information and prior knowledge and combinatorial predictions are supported by known interactions. We also show that CSHMM-TF improves upon prior methods that do not utilize TF-gene interaction. An important attribute of time series single cell RNA-Seq (scRNA-Seq) data, is the ability to infer continuous trajectories of genes based on orderings of the cells. While several methods have been developed for ordering cells and inferring such trajectories, to date it was not possible to use these to infer the temporal activity of several key TFs. These TFs are are only post-transcriptionally regulated and so their expression does not provide complete information on their activity. To address this we developed the Continuous-State Hidden Markov Models TF (CSHMM-TF) methods that assigns continuous activation time to TFs based on both, their expression and the expression of their targets. Applying our method to several time series scRNA-Seq datasets we show that it correctly identifies the key regulators for the processes being studied. We analyze the temporal assignments for these TFs and show that they provide new insights about combinatorial regulation and the ordering of TF activation. We used several complementary sources to validate some of these predictions and discuss a number of other novel suggestions based on the method. As we show, the method is able to scale to large and noisy datasets and so is appropriate for several studies utilizing time series scRNA-Seq data.
Collapse
Affiliation(s)
- Chieh Lin
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Jun Ding
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Ziv Bar-Joseph
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
11
|
Ding J, Lin C, Bar-Joseph Z. Cell lineage inference from SNP and scRNA-Seq data. Nucleic Acids Res 2019; 47:e56. [PMID: 30820578 PMCID: PMC6547431 DOI: 10.1093/nar/gkz146] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 02/13/2019] [Accepted: 02/20/2019] [Indexed: 12/15/2022] Open
Abstract
Several recent studies focus on the inference of developmental and response trajectories from single cell RNA-Seq (scRNA-Seq) data. A number of computational methods, often referred to as pseudo-time ordering, have been developed for this task. Recently, CRISPR has also been used to reconstruct lineage trees by inserting random mutations. However, both approaches suffer from drawbacks that limit their use. Here, we develop a method to detect significant, cell type specific, sequence mutations from scRNA-Seq data. We show that only a few mutations are enough for reconstructing good branching models. Integrating these mutations with expression data further improves the accuracy of the reconstructed models. As we show, the majority of mutations we identify are likely RNA editing events indicating that such information can be used to distinguish cell types.
Collapse
Affiliation(s)
- Jun Ding
- Computational Biology Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA
| | - Chieh Lin
- Machine Learning Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA
| | - Ziv Bar-Joseph
- Computational Biology Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA.,Machine Learning Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA
| |
Collapse
|
12
|
Li J, Wang GZ. Application of Computational Biology to Decode Brain Transcriptomes. GENOMICS PROTEOMICS & BIOINFORMATICS 2019; 17:367-380. [PMID: 31655213 PMCID: PMC6943780 DOI: 10.1016/j.gpb.2019.03.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Revised: 02/21/2019] [Accepted: 03/15/2019] [Indexed: 01/03/2023]
Abstract
The rapid development of high-throughput sequencing technologies has generated massive valuable brain transcriptome atlases, providing great opportunities for systematically investigating gene expression characteristics across various brain regions throughout a series of developmental stages. Recent studies have revealed that the transcriptional architecture is the key to interpreting the molecular mechanisms of brain complexity. However, our knowledge of brain transcriptional characteristics remains very limited. With the immense efforts to generate high-quality brain transcriptome atlases, new computational approaches to analyze these high-dimensional multivariate data are greatly needed. In this review, we summarize some public resources for brain transcriptome atlases and discuss the general computational pipelines that are commonly used in this field, which would aid in making new discoveries in brain development and disorders.
Collapse
Affiliation(s)
- Jie Li
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guang-Zhong Wang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
13
|
Lignelli E, Palumbo F, Myti D, Morty RE. Recent advances in our understanding of the mechanisms of lung alveolarization and bronchopulmonary dysplasia. Am J Physiol Lung Cell Mol Physiol 2019; 317:L832-L887. [PMID: 31596603 DOI: 10.1152/ajplung.00369.2019] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Bronchopulmonary dysplasia (BPD) is the most common cause of morbidity and mortality in preterm infants. A key histopathological feature of BPD is stunted late lung development, where the process of alveolarization-the generation of alveolar gas exchange units-is impeded, through mechanisms that remain largely unclear. As such, there is interest in the clarification both of the pathomechanisms at play in affected lungs, and the mechanisms of de novo alveoli generation in healthy, developing lungs. A better understanding of normal and pathological alveolarization might reveal opportunities for improved medical management of affected infants. Furthermore, disturbances to the alveolar architecture are a key histopathological feature of several adult chronic lung diseases, including emphysema and fibrosis, and it is envisaged that knowledge about the mechanisms of alveologenesis might facilitate regeneration of healthy lung parenchyma in affected patients. To this end, recent efforts have interrogated clinical data, developed new-and refined existing-in vivo and in vitro models of BPD, have applied new microscopic and radiographic approaches, and have developed advanced cell-culture approaches, including organoid generation. Advances have also been made in the development of other methodologies, including single-cell analysis, metabolomics, lipidomics, and proteomics, as well as the generation and use of complex mouse genetics tools. The objective of this review is to present advances made in our understanding of the mechanisms of lung alveolarization and BPD over the period 1 January 2017-30 June 2019, a period that spans the 50th anniversary of the original clinical description of BPD in preterm infants.
Collapse
Affiliation(s)
- Ettore Lignelli
- Department of Lung Development and Remodeling, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.,Department of Internal Medicine (Pulmonology), University of Giessen and Marburg Lung Center, member of the German Center for Lung Research, Giessen, Germany
| | - Francesco Palumbo
- Department of Lung Development and Remodeling, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.,Department of Internal Medicine (Pulmonology), University of Giessen and Marburg Lung Center, member of the German Center for Lung Research, Giessen, Germany
| | - Despoina Myti
- Department of Lung Development and Remodeling, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.,Department of Internal Medicine (Pulmonology), University of Giessen and Marburg Lung Center, member of the German Center for Lung Research, Giessen, Germany
| | - Rory E Morty
- Department of Lung Development and Remodeling, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.,Department of Internal Medicine (Pulmonology), University of Giessen and Marburg Lung Center, member of the German Center for Lung Research, Giessen, Germany
| |
Collapse
|
14
|
An S, Ma L, Wan L. TSEE: an elastic embedding method to visualize the dynamic gene expression patterns of time series single-cell RNA sequencing data. BMC Genomics 2019; 20:224. [PMID: 30967106 PMCID: PMC6456934 DOI: 10.1186/s12864-019-5477-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Time series single-cell RNA sequencing (scRNA-seq) data are emerging. However, the analysis of time series scRNA-seq data could be compromised by 1) distortion created by assorted sources of data collection and generation across time samples and 2) inheritance of cell-to-cell variations by stochastic dynamic patterns of gene expression. This calls for the development of an algorithm able to visualize time series scRNA-seq data in order to reveal latent structures and uncover dynamic transition processes. RESULTS In this study, we propose an algorithm, termed time series elastic embedding (TSEE), by incorporating experimental temporal information into the elastic embedding (EE) method, in order to visualize time series scRNA-seq data. TSEE extends the EE algorithm by penalizing the proximal placement of latent points that correspond to data points otherwise separated by experimental time intervals. TSEE is herein used to visualize time series scRNA-seq datasets of embryonic developmental processed in human and zebrafish. We demonstrate that TSEE outperforms existing methods (e.g. PCA, tSNE and EE) in preserving local and global structures as well as enhancing the temporal resolution of samples. Meanwhile, TSEE reveals the dynamic oscillation patterns of gene expression waves during zebrafish embryogenesis. CONCLUSIONS TSEE can efficiently visualize time series scRNA-seq data by diluting the distortions of assorted sources of data variation across time stages and achieve the temporal resolution enhancement by preserving temporal order and structure. TSEE uncovers the subtle dynamic structures of gene expression patterns, facilitating further downstream dynamic modeling and analysis of gene expression processes. The computational framework of TSEE is generalizable by allowing the incorporation of other sources of information.
Collapse
Affiliation(s)
- Shaokun An
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Liang Ma
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101 China
| | - Lin Wan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
| |
Collapse
|
15
|
Schiebinger G, Shu J, Tabaka M, Cleary B, Subramanian V, Solomon A, Gould J, Liu S, Lin S, Berube P, Lee L, Chen J, Brumbaugh J, Rigollet P, Hochedlinger K, Jaenisch R, Regev A, Lander ES. Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. Cell 2019; 176:928-943.e22. [PMID: 30712874 PMCID: PMC6402800 DOI: 10.1016/j.cell.2019.01.006] [Citation(s) in RCA: 263] [Impact Index Per Article: 52.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 10/15/2018] [Accepted: 01/02/2019] [Indexed: 12/18/2022]
Abstract
Understanding the molecular programs that guide differentiation during development is a major challenge. Here, we introduce Waddington-OT, an approach for studying developmental time courses to infer ancestor-descendant fates and model the regulatory programs that underlie them. We apply the method to reconstruct the landscape of reprogramming from 315,000 single-cell RNA sequencing (scRNA-seq) profiles, collected at half-day intervals across 18 days. The results reveal a wider range of developmental programs than previously characterized. Cells gradually adopt either a terminal stromal state or a mesenchymal-to-epithelial transition state. The latter gives rise to populations related to pluripotent, extra-embryonic, and neural cells, with each harboring multiple finer subpopulations. The analysis predicts transcription factors and paracrine signals that affect fates and experiments validate that the TF Obox6 and the cytokine GDF9 enhance reprogramming efficiency. Our approach sheds light on the process and outcome of reprogramming and provides a framework applicable to diverse temporal processes in biology.
Collapse
Affiliation(s)
- Geoffrey Schiebinger
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; MIT Center for Statistics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jian Shu
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA.
| | - Marcin Tabaka
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Brian Cleary
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Computational and Systems Biology Program, MIT, Cambridge, MA 02142, USA
| | - Vidya Subramanian
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aryeh Solomon
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Joshua Gould
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Siyan Liu
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Biochemistry Program, Wellesley College, Wellesley, MA 02481, USA
| | - Stacie Lin
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Peter Berube
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Lia Lee
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jenny Chen
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139, USA
| | - Justin Brumbaugh
- Cancer Center, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Molecular Biology, Center for Regenerative Medicine and Cancer Center, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA; Harvard Stem Cell Institute, Cambridge, MA 02138, USA; Harvard Medical School, Boston, MA 02115, USA
| | - Philippe Rigollet
- MIT Center for Statistics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Konrad Hochedlinger
- Department of Molecular Biology, Center for Regenerative Medicine and Cancer Center, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA; Harvard Stem Cell Institute, Cambridge, MA 02138, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Rudolf Jaenisch
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA; Computational and Systems Biology Program, MIT, Cambridge, MA 02142, USA
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA.
| | - Eric S Lander
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Systems Biology Harvard Medical School, Boston, MA 02125, USA.
| |
Collapse
|
16
|
van der Wijst MGP, de Vries DH, Brugge H, Westra HJ, Franke L. An integrative approach for building personalized gene regulatory networks for precision medicine. Genome Med 2018; 10:96. [PMID: 30567569 PMCID: PMC6299585 DOI: 10.1186/s13073-018-0608-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Only a small fraction of patients respond to the drug prescribed to treat their disease, which means that most are at risk of unnecessary exposure to side effects through ineffective drugs. This inter-individual variation in drug response is driven by differences in gene interactions caused by each patient's genetic background, environmental exposures, and the proportions of specific cell types involved in disease. These gene interactions can now be captured by building gene regulatory networks, by taking advantage of RNA velocity (the time derivative of the gene expression state), the ability to study hundreds of thousands of cells simultaneously, and the falling price of single-cell sequencing. Here, we propose an integrative approach that leverages these recent advances in single-cell data with the sensitivity of bulk data to enable the reconstruction of personalized, cell-type- and context-specific gene regulatory networks. We expect this approach will allow the prioritization of key driver genes for specific diseases and will provide knowledge that opens new avenues towards improved personalized healthcare.
Collapse
Affiliation(s)
- Monique G P van der Wijst
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Dylan H de Vries
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Harm Brugge
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Harm-Jan Westra
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Lude Franke
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
| |
Collapse
|
17
|
Hon CC, Shin JW, Carninci P, Stubbington MJT. The Human Cell Atlas: Technical approaches and challenges. Brief Funct Genomics 2018; 17:283-294. [PMID: 29092000 PMCID: PMC6063304 DOI: 10.1093/bfgp/elx029] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The Human Cell Atlas is a large, international consortium that aims to identify and describe every cell type in the human body. The comprehensive cellular maps that arise from this ambitious effort have the potential to transform many aspects of fundamental biology and clinical practice. Here, we discuss the technical approaches that could be used today to generate such a resource and also the technical challenges that will be encountered.
Collapse
Affiliation(s)
- Chung-Chau Hon
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Jay W Shin
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | | |
Collapse
|
18
|
Abstract
Motivation Single-cell Hi-C (scHi-C) data promises to enable scientists to interrogate the 3D architecture of DNA in the nucleus of the cell, studying how this structure varies stochastically or along developmental or cell-cycle axes. However, Hi-C data analysis requires methods that take into account the unique characteristics of this type of data. In this work, we explore whether methods that have been developed previously for the analysis of bulk Hi-C data can be applied to scHi-C data. We apply methods designed for analysis of bulk Hi-C data to scHi-C data in conjunction with unsupervised embedding. Results We find that one of these methods, HiCRep, when used in conjunction with multidimensional scaling (MDS), strongly outperforms three other methods, including a technique that has been used previously for scHi-C analysis. We also provide evidence that the HiCRep/MDS method is robust to extremely low per-cell sequencing depth, that this robustness is improved even further when high-coverage and low-coverage cells are projected together, and that the method can be used to jointly embed cells from multiple published datasets.
Collapse
Affiliation(s)
- Jie Liu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Dejun Lin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
19
|
Jin S, MacLean AL, Peng T, Nie Q. scEpath: energy landscape-based inference of transition probabilities and cellular trajectories from single-cell transcriptomic data. Bioinformatics 2018; 34:2077-2086. [PMID: 29415263 PMCID: PMC6658715 DOI: 10.1093/bioinformatics/bty058] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Revised: 01/11/2018] [Accepted: 02/03/2018] [Indexed: 01/18/2023] Open
Abstract
Motivation Single-cell RNA-sequencing (scRNA-seq) offers unprecedented resolution for studying cellular decision-making processes. Robust inference of cell state transition paths and probabilities is an important yet challenging step in the analysis of these data. Results Here we present scEpath, an algorithm that calculates energy landscapes and probabilistic directed graphs in order to reconstruct developmental trajectories. We quantify the energy landscape using 'single-cell energy' and distance-based measures, and find that the combination of these enables robust inference of the transition probabilities and lineage relationships between cell states. We also identify marker genes and gene expression patterns associated with cell state transitions. Our approach produces pseudotemporal orderings that are-in combination-more robust and accurate than current methods, and offers higher resolution dynamics of the cell state transitions, leading to new insight into key transition events during differentiation and development. Moreover, scEpath is robust to variation in the size of the input gene set, and is broadly unsupervised, requiring few parameters to be set by the user. Applications of scEpath led to the identification of a cell-cell communication network implicated in early human embryo development, and novel transcription factors important for myoblast differentiation. scEpath allows us to identify common and specific temporal dynamics and transcriptional factor programs along branched lineages, as well as the transition probabilities that control cell fates. Availability and implementation A MATLAB package of scEpath is available at https://github.com/sqjin/scEpath. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Suoqin Jin
- Department of Mathematics and Center for Complex Biological Systems
| | - Adam L MacLean
- Department of Mathematics and Center for Complex Biological Systems
| | - Tao Peng
- Department of Mathematics and Center for Complex Biological Systems
| | - Qing Nie
- Department of Mathematics and Center for Complex Biological Systems
- Department of Development and Cell Biology, University of California, Irvine, CA, USA
| |
Collapse
|
20
|
Ding J, Aronow BJ, Kaminski N, Kitzmiller J, Whitsett JA, Bar-Joseph Z. Reconstructing differentiation networks and their regulation from time series single-cell expression data. Genome Res 2018; 28:383-395. [PMID: 29317474 PMCID: PMC5848617 DOI: 10.1101/gr.225979.117] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Accepted: 12/21/2017] [Indexed: 12/26/2022]
Abstract
Generating detailed and accurate organogenesis models using single-cell RNA-seq data remains a major challenge. Current methods have relied primarily on the assumption that descendant cells are similar to their parents in terms of gene expression levels. These assumptions do not always hold for in vivo studies, which often include infrequently sampled, unsynchronized, and diverse cell populations. Thus, additional information may be needed to determine the correct ordering and branching of progenitor cells and the set of transcription factors (TFs) that are active during advancing stages of organogenesis. To enable such modeling, we have developed a method that learns a probabilistic model that integrates expression similarity with regulatory information to reconstruct the dynamic developmental cell trajectories. When applied to mouse lung developmental data, the method accurately distinguished different cell types and lineages. Existing and new experimental data validated the ability of the method to identify key regulators of cell fate.
Collapse
Affiliation(s)
- Jun Ding
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Bruce J Aronow
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio 45229, USA
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, Connecticut 06520, USA
| | - Joseph Kitzmiller
- Section of Neonatology, Perinatal and Pulmonary Biology, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio 45229, USA
| | - Jeffrey A Whitsett
- Section of Neonatology, Perinatal and Pulmonary Biology, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio 45229, USA
| | - Ziv Bar-Joseph
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
21
|
Li J, Wang Z, Chu Q, Jiang K, Li J, Tang N. The Strength of Mechanical Forces Determines the Differentiation of Alveolar Epithelial Cells. Dev Cell 2018; 44:297-312.e5. [PMID: 29408236 DOI: 10.1016/j.devcel.2018.01.008] [Citation(s) in RCA: 137] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 12/19/2017] [Accepted: 01/08/2018] [Indexed: 11/17/2022]
Abstract
The differentiation of alveolar epithelial type I (AT1) and type II (AT2) cells is essential for the lung gas exchange function. Disruption of this process results in neonatal death or in severe lung diseases that last into adulthood. We developed live imaging techniques to characterize the mechanisms that control alveolar epithelial cell differentiation. We discovered that mechanical forces generated from the inhalation of amniotic fluid by fetal breathing movements are essential for AT1 cell differentiation. We found that a large subset of alveolar progenitor cells is able to protrude from the airway epithelium toward the mesenchyme in an FGF10/FGFR2 signaling-dependent manner. The cell protrusion process results in enrichment of myosin in the apical region of protruded cells; this myosin prevents these cells from being flattened by mechanical forces, thereby ensuring their AT2 cell fate. Our study demonstrates that mechanical forces and local growth factors synergistically control alveolar epithelial cell differentiation.
Collapse
Affiliation(s)
- Jiao Li
- China Agricultural University, Beijing 100083, China; National Institute of Biological Sciences, Beijing 102206, China
| | - Zheng Wang
- National Institute of Biological Sciences, Beijing 102206, China; Graduate School of Peking Union Medical College, Beijing 100730, China
| | - Qiqi Chu
- National Institute of Biological Sciences, Beijing 102206, China; College of Life Sciences, Beijing Normal University, Beijing 100875 China
| | - Kewu Jiang
- National Institute of Biological Sciences, Beijing 102206, China; College of Life Sciences, Beijing Normal University, Beijing 100875 China
| | - Juan Li
- National Institute of Biological Sciences, Beijing 102206, China
| | - Nan Tang
- National Institute of Biological Sciences, Beijing 102206, China.
| |
Collapse
|
22
|
Chen J, Rénia L, Ginhoux F. Constructing cell lineages from single-cell transcriptomes. Mol Aspects Med 2017; 59:95-113. [PMID: 29107741 DOI: 10.1016/j.mam.2017.10.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 10/23/2017] [Accepted: 10/25/2017] [Indexed: 12/25/2022]
Abstract
Advances in single-cell RNA-sequencing have helped reveal the previously underappreciated level of cellular heterogeneity present during cellular differentiation. A static snapshot of single-cell transcriptomes provides a good representation of the various stages of differentiation as differentiation is rarely synchronized between cells. Data from numerous single-cell analyses has suggested that cellular differentiation and development can be conceptualized as continuous processes. Consequently, computational algorithms have been developed to infer lineage relationships between cell types and construct developmental trajectories along which cells are re-ordered such that similarity between successive cell pairs is maximized. Here, we compare and contrast the existing computational methods, and illustrate how they may be applied to build mouse myeloid progenitor lineages from massively parallel RNA single-cell sequencing data.
Collapse
Affiliation(s)
- Jinmiao Chen
- Singapore Immunology Network (SIgN), A*STAR, 8A Biomedical Grove, Immunos Building, Level 4, Singapore 138648, Singapore.
| | - Laurent Rénia
- Singapore Immunology Network (SIgN), A*STAR, 8A Biomedical Grove, Immunos Building, Level 4, Singapore 138648, Singapore
| | - Florent Ginhoux
- Singapore Immunology Network (SIgN), A*STAR, 8A Biomedical Grove, Immunos Building, Level 4, Singapore 138648, Singapore
| |
Collapse
|