201
|
Cahan P, Cacchiarelli D, Dunn SJ, Hemberg M, de Sousa Lopes SMC, Morris SA, Rackham OJL, Del Sol A, Wells CA. Computational Stem Cell Biology: Open Questions and Guiding Principles. Cell Stem Cell 2021; 28:20-32. [PMID: 33417869 PMCID: PMC7799393 DOI: 10.1016/j.stem.2020.12.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computational biology is enabling an explosive growth in our understanding of stem cells and our ability to use them for disease modeling, regenerative medicine, and drug discovery. We discuss four topics that exemplify applications of computation to stem cell biology: cell typing, lineage tracing, trajectory inference, and regulatory networks. We use these examples to articulate principles that have guided computational biology broadly and call for renewed attention to these principles as computation becomes increasingly important in stem cell biology. We also discuss important challenges for this field with the hope that it will inspire more to join this exciting area.
Collapse
Affiliation(s)
- Patrick Cahan
- Institute for Cell Engineering, Department of Biomedical Engineering, Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA.
| | - Davide Cacchiarelli
- Telethon Institute of Genetics and Medicine (TIGEM), Armenise/Harvard Laboratory of Integrative Genomics, Pozzuoli, Italy d Department of Translational Medicine, University of Naples "Federico II," Naples, Italy
| | - Sara-Jane Dunn
- DeepMind, 14-18 Handyside Street, London N1C 4DN, UK; Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Jeffrey Cheah Biomedical Centre, Puddicombe Way, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | | | - Samantha A Morris
- Department of Developmental Biology, Department of Genetics, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Owen J L Rackham
- Centre for Computational Biology and The Program for Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, Singapore
| | - Antonio Del Sol
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, Belvaux 4366, Luxembourg; CIC bioGUNE, Bizkaia Technology Park, 801 Building, 48160 Derio, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao 48013, Spain
| | - Christine A Wells
- Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC 3010, Australia
| |
Collapse
|
202
|
Guillemin A, Stumpf MPH. Noise and the molecular processes underlying cell fate decision-making. Phys Biol 2021; 18:011002. [PMID: 33181489 DOI: 10.1088/1478-3975/abc9d1] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Cell fate decision-making events involve the interplay of many molecular processes, ranging from signal transduction to genetic regulation, as well as a set of molecular and physiological feedback loops. Each aspect offers a rich field of investigation in its own right, but to understand the whole process, even in simple terms, we need to consider them together. Here we attempt to characterise this process by focussing on the roles of noise during cell fate decisions. We use a range of recent results to develop a view of the sequence of events by which a cell progresses from a pluripotent or multipotent to a differentiated state: chromatin organisation, transcription factor stoichiometry, and cellular signalling all change during this progression, and all shape cellular variability, which becomes maximal at the transition state.
Collapse
Affiliation(s)
- Anissa Guillemin
- School of BioSciences, University of Melbourne, Parkville, Australia
| | | |
Collapse
|
203
|
Discovering Higher-Order Interactions Through Neural Information Decomposition. ENTROPY 2021; 23:e23010079. [PMID: 33430463 PMCID: PMC7827712 DOI: 10.3390/e23010079] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 12/21/2020] [Accepted: 12/25/2020] [Indexed: 11/25/2022]
Abstract
If regularity in data takes the form of higher-order functions among groups of variables, models which are biased towards lower-order functions may easily mistake the data for noise. To distinguish whether this is the case, one must be able to quantify the contribution of different orders of dependence to the total information. Recent work in information theory attempts to do this through measures of multivariate mutual information (MMI) and information decomposition (ID). Despite substantial theoretical progress, practical issues related to tractability and learnability of higher-order functions are still largely unaddressed. In this work, we introduce a new approach to information decomposition—termed Neural Information Decomposition (NID)—which is both theoretically grounded, and can be efficiently estimated in practice using neural networks. We show on synthetic data that NID can learn to distinguish higher-order functions from noise, while many unsupervised probability models cannot. Additionally, we demonstrate the usefulness of this framework as a tool for exploring biological and artificial neural networks.
Collapse
|
204
|
Kumar N, Mishra B, Athar M, Mukhtar S. Inference of Gene Regulatory Network from Single-Cell Transcriptomic Data Using pySCENIC. Methods Mol Biol 2021; 2328:171-182. [PMID: 34251625 DOI: 10.1007/978-1-0716-1534-8_10] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
With the advent of recent next-generation sequencing (NGS) technologies in genomics, transcriptomics, and epigenomics, profiling single-cell sequencing became possible. The single-cell RNA sequencing (scRNA-seq) is widely used to characterize diverse cell populations and ascertain cell type-specific regulatory mechanisms. The gene regulatory network (GRN) mainly consists of genes and their regulators-transcription factors (TF). Here, we describe the lightning-fast Python implementation of the SCENIC (Single-Cell reEgulatory Network Inference and Clustering) pipeline called pySCENIC. Using single-cell RNA-seq data, it maps TFs onto gene regulatory networks and integrates various cell types to infer cell-specific GRNs. There are two fast and efficient GRN inference algorithms, GRNBoost2 and GENIE3, optionally available with pySCENIC. The pipeline has three steps: (1) identification of potential TF targets based on co-expression; (2) TF-motif enrichment analysis to identify the direct targets (regulons); and (3) scoring the activity of regulons (or other gene sets) on single cell types.
Collapse
Affiliation(s)
- Nilesh Kumar
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Bharat Mishra
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Mohammad Athar
- Department of Dermatology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Shahid Mukhtar
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA.
| |
Collapse
|
205
|
|
206
|
Augugliaro L, Abbruzzo A, Vinciotti V. ℓ 1-Penalized censored Gaussian graphical model. Biostatistics 2020; 21:e1-e16. [PMID: 30203001 DOI: 10.1093/biostatistics/kxy043] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 07/02/2018] [Accepted: 07/15/2018] [Indexed: 12/30/2022] Open
Abstract
Graphical lasso is one of the most used estimators for inferring genetic networks. Despite its diffusion, there are several fields in applied research where the limits of detection of modern measurement technologies make the use of this estimator theoretically unfounded, even when the assumption of a multivariate Gaussian distribution is satisfied. Typical examples are data generated by polymerase chain reactions and flow cytometer. The combination of censoring and high-dimensionality make inference of the underlying genetic networks from these data very challenging. In this article, we propose an $\ell_1$-penalized Gaussian graphical model for censored data and derive two EM-like algorithms for inference. We evaluate the computational efficiency of the proposed algorithms by an extensive simulation study and show that, when censored data are available, our proposal is superior to existing competitors both in terms of network recovery and parameter estimation. We apply the proposed method to gene expression data generated by microfluidic Reverse Transcription quantitative Polymerase Chain Reaction technology in order to make inference on the regulatory mechanisms of blood development. A software implementation of our method is available on github (https://github.com/LuigiAugugliaro/cglasso).
Collapse
Affiliation(s)
- Luigi Augugliaro
- Department of Economics, Business and Statistics, University of Palermo, Building 13, Viale delle Scienze, Palermo, Italy
| | - Antonino Abbruzzo
- Department of Economics, Business and Statistics, University of Palermo, Building 13, Viale delle Scienze, Palermo, Italy
| | - Veronica Vinciotti
- Department of Mathematics, Brunel University London, Kingston Lane, Uxbridge UB8 3PH, UK
| |
Collapse
|
207
|
Yuan B, Shen C, Luna A, Korkut A, Marks DS, Ingraham J, Sander C. CellBox: Interpretable Machine Learning for Perturbation Biology with Application to the Design of Cancer Combination Therapy. Cell Syst 2020; 12:128-140.e4. [PMID: 33373583 DOI: 10.1016/j.cels.2020.11.013] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 07/13/2020] [Accepted: 11/25/2020] [Indexed: 01/13/2023]
Abstract
Systematic perturbation of cells followed by comprehensive measurements of molecular and phenotypic responses provides informative data resources for constructing computational models of cell biology. Models that generalize well beyond training data can be used to identify combinatorial perturbations of potential therapeutic interest. Major challenges for machine learning on large biological datasets are to find global optima in a complex multidimensional space and mechanistically interpret the solutions. To address these challenges, we introduce a hybrid approach that combines explicit mathematical models of cell dynamics with a machine-learning framework, implemented in TensorFlow. We tested the modeling framework on a perturbation-response dataset of a melanoma cell line after drug treatments. The models can be efficiently trained to describe cellular behavior accurately. Even though completely data driven and independent of prior knowledge, the resulting de novo network models recapitulate some known interactions. The approach is readily applicable to various kinetic models of cell biology. A record of this paper's Transparent Peer Review process is included in the Supplemental Information.
Collapse
Affiliation(s)
- Bo Yuan
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA; cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute, Cambridge, MA, USA.
| | - Ciyue Shen
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA; cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute, Cambridge, MA, USA.
| | - Augustin Luna
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA; cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute, Cambridge, MA, USA
| | - Anil Korkut
- Department of Bioinformatics & Computational Biology, the University of Texas M D Anderson Cancer Center, Houston, TX, USA
| | - Debora S Marks
- Broad Institute, Cambridge, MA, USA; Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - John Ingraham
- MIT Computer Science & Artificial Intelligence Laboratory, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA; cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute, Cambridge, MA, USA.
| |
Collapse
|
208
|
Fidanza A, Stumpf PS, Ramachandran P, Tamagno S, Babtie A, Lopez-Yrigoyen M, Taylor AH, Easterbrook J, Henderson BEP, Axton R, Henderson NC, Medvinsky A, Ottersbach K, Romanò N, Forrester LM. Single-cell analyses and machine learning define hematopoietic progenitor and HSC-like cells derived from human PSCs. Blood 2020; 136:2893-2904. [PMID: 32614947 PMCID: PMC7862875 DOI: 10.1182/blood.2020006229] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Accepted: 06/20/2020] [Indexed: 01/19/2023] Open
Abstract
Hematopoietic stem and progenitor cells (HSPCs) develop in distinct waves at various anatomical sites during embryonic development. The in vitro differentiation of human pluripotent stem cells (hPSCs) recapitulates some of these processes; however, it has proven difficult to generate functional hematopoietic stem cells (HSCs). To define the dynamics and heterogeneity of HSPCs that can be generated in vitro from hPSCs, we explored single-cell RNA sequencing (scRNAseq) in combination with single-cell protein expression analysis. Bioinformatics analyses and functional validation defined the transcriptomes of naïve progenitors and erythroid-, megakaryocyte-, and leukocyte-committed progenitors, and we identified CD44, CD326, ICAM2/CD9, and CD18, respectively, as markers of these progenitors. Using an artificial neural network that we trained on scRNAseq derived from human fetal liver, we identified a wide range of hPSC-derived HSPCs phenotypes, including a small group classified as HSCs. This transient HSC-like population decreased as differentiation proceeded, and was completely missing in the data set that had been generated using cells selected on the basis of CD43 expression. By comparing the single-cell transcriptome of in vitro-generated HSC-like cells with those generated within the fetal liver, we identified transcription factors and molecular pathways that can be explored in the future to improve the in vitro production of HSCs.
Collapse
Affiliation(s)
- Antonella Fidanza
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Patrick S Stumpf
- Joint Research Center for Computational Biomedicine, Uniklinik Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, Germany
| | - Prakash Ramachandran
- Centre for Inflammation Research, University of Edinburgh, Edinburgh, United Kingdom
| | - Sara Tamagno
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Ann Babtie
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, United Kingdom; and
| | - Martha Lopez-Yrigoyen
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - A Helen Taylor
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Jennifer Easterbrook
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Beth E P Henderson
- Centre for Inflammation Research, University of Edinburgh, Edinburgh, United Kingdom
| | - Richard Axton
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Neil C Henderson
- Centre for Inflammation Research, University of Edinburgh, Edinburgh, United Kingdom
| | - Alexander Medvinsky
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Katrin Ottersbach
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Nicola Romanò
- Centre for Discovery Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Lesley M Forrester
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
209
|
Osorio D, Zhong Y, Li G, Huang JZ, Cai JJ. scTenifoldNet: A Machine Learning Workflow for Constructing and Comparing Transcriptome-wide Gene Regulatory Networks from Single-Cell Data. PATTERNS (NEW YORK, N.Y.) 2020; 1:100139. [PMID: 33336197 PMCID: PMC7733883 DOI: 10.1016/j.patter.2020.100139] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/29/2020] [Accepted: 10/12/2020] [Indexed: 02/02/2023]
Abstract
We present scTenifoldNet-a machine learning workflow built upon principal-component regression, low-rank tensor approximation, and manifold alignment-for constructing and comparing single-cell gene regulatory networks (scGRNs) using data from single-cell RNA sequencing. scTenifoldNet reveals regulatory changes in gene expression between samples by comparing the constructed scGRNs. With real data, scTenifoldNet identifies specific gene expression programs associated with different biological processes, providing critical insights into the underlying mechanism of regulatory networks governing cellular transcriptional activities.
Collapse
Affiliation(s)
- Daniel Osorio
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Yan Zhong
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
| | - Guanxun Li
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
| | - Jianhua Z. Huang
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
| | - James J. Cai
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Interdisciplinary Program of Genetics, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
210
|
Yuan Y, Bar-Joseph Z. GCNG: graph convolutional networks for inferring gene interaction from spatial transcriptomics data. Genome Biol 2020; 21:300. [PMID: 33303016 PMCID: PMC7726911 DOI: 10.1186/s13059-020-02214-w] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 11/30/2020] [Indexed: 12/13/2022] Open
Abstract
Most methods for inferring gene-gene interactions from expression data focus on intracellular interactions. The availability of high-throughput spatial expression data opens the door to methods that can infer such interactions both within and between cells. To achieve this, we developed Graph Convolutional Neural networks for Genes (GCNG). GCNG encodes the spatial information as a graph and combines it with expression data using supervised training. GCNG improves upon prior methods used to analyze spatial transcriptomics data and can propose novel pairs of extracellular interacting genes. The output of GCNG can also be used for downstream analysis including functional gene assignment.Supporting website with software and data: https://github.com/xiaoyeye/GCNG .
Collapse
Affiliation(s)
- Ye Yuan
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Ziv Bar-Joseph
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
211
|
Single-cell network biology for resolving cellular heterogeneity in human diseases. Exp Mol Med 2020; 52:1798-1808. [PMID: 33244151 PMCID: PMC8080824 DOI: 10.1038/s12276-020-00528-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 08/26/2020] [Accepted: 08/31/2020] [Indexed: 01/10/2023] Open
Abstract
Understanding cellular heterogeneity is the holy grail of biology and medicine. Cells harboring identical genomes show a wide variety of behaviors in multicellular organisms. Genetic circuits underlying cell-type identities will facilitate the understanding of the regulatory programs for differentiation and maintenance of distinct cellular states. Such a cell-type-specific gene network can be inferred from coregulatory patterns across individual cells. Conventional methods of transcriptome profiling using tissue samples provide only average signals of diverse cell types. Therefore, reconstructing gene regulatory networks for a particular cell type is not feasible with tissue-based transcriptome data. Recently, single-cell omics technology has emerged and enabled the capture of the transcriptomic landscape of every individual cell. Although single-cell gene expression studies have already opened up new avenues, network biology using single-cell transcriptome data will further accelerate our understanding of cellular heterogeneity. In this review, we provide an overview of single-cell network biology and summarize recent progress in method development for network inference from single-cell RNA sequencing (scRNA-seq) data. Then, we describe how cell-type-specific gene networks can be utilized to study regulatory programs specific to disease-associated cell types and cellular states. Moreover, with scRNA data, modeling personal or patient-specific gene networks is feasible. Therefore, we also introduce potential applications of single-cell network biology for precision medicine. We envision a rapid paradigm shift toward single-cell network analysis for systems biology in the near future. Gene regulatory networks reconstructed from single-cell RNA sequencing datasets are allowing researchers to better understand the molecular circuits and cell states that contribute to complex human disease. Junha Cha and Insuk Lee from Yonsei University in Seoul, South Korea, review the concept of ‘single-cell network biology’, which involves using computational algorithms on genetic expression data from thousands of cells to infer functional interactions in various biological contexts. This systems biology approach to analyzing the profiles of messenger RNA in single cells is helping researchers discover new signaling pathways that could serve as disease biomarkers or therapeutic targets. In the future, patient-specific models of personal gene networks could explain why certain genetic variants affect disease risk. This research could also eventually lead to new types of individualized medical treatments.
Collapse
|
212
|
Dai H, Jin QQ, Li L, Chen LN. Reconstructing gene regulatory networks in single-cell transcriptomic data analysis. Zool Res 2020; 41:599-604. [PMID: 33124218 PMCID: PMC7671911 DOI: 10.24272/j.issn.2095-8137.2020.215] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 10/20/2020] [Indexed: 11/07/2022] Open
Abstract
Gene regulatory networks play pivotal roles in our understanding of biological processes/mechanisms at the molecular level. Many studies have developed sample-specific or cell-type-specific gene regulatory networks from single-cell transcriptomic data based on a large amount of cell samples. Here, we review the state-of-the-art computational algorithms and describe various applications of gene regulatory networks in biological studies.
Collapse
Affiliation(s)
- Hao Dai
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- Institute of Brain-Intelligence Technology, Zhangjiang Laboratory, Shanghai 201210, China
| | - Qi-Qi Jin
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Lin Li
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Luo-Nan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, Zhejiang 310024, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| |
Collapse
|
213
|
Sha Y, Wang S, Zhou P, Nie Q. Inference and multiscale model of epithelial-to-mesenchymal transition via single-cell transcriptomic data. Nucleic Acids Res 2020; 48:9505-9520. [PMID: 32870263 PMCID: PMC7515733 DOI: 10.1093/nar/gkaa725] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 07/19/2020] [Accepted: 08/20/2020] [Indexed: 12/17/2022] Open
Abstract
Rapid growth of single-cell transcriptomic data provides unprecedented opportunities for close scrutinizing of dynamical cellular processes. Through investigating epithelial-to-mesenchymal transition (EMT), we develop an integrative tool that combines unsupervised learning of single-cell transcriptomic data and multiscale mathematical modeling to analyze transitions during cell fate decision. Our approach allows identification of individual cells making transition between all cell states, and inference of genes that drive transitions. Multiscale extractions of single-cell scale outputs naturally reveal intermediate cell states (ICS) and ICS-regulated transition trajectories, producing emergent population-scale models to be explored for design principles. Testing on the newly designed single-cell gene regulatory network model and applying to twelve published single-cell EMT datasets in cancer and embryogenesis, we uncover the roles of ICS on adaptation, noise attenuation, and transition efficiency in EMT, and reveal their trade-off relations. Overall, our unsupervised learning method is applicable to general single-cell transcriptomic datasets, and our integrative approach at single-cell resolution may be adopted for other cell fate transition systems beyond EMT.
Collapse
Affiliation(s)
- Yutong Sha
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA.,The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA
| | - Shuxiong Wang
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA
| | - Peijie Zhou
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA.,The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA.,Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697, USA
| |
Collapse
|
214
|
Klimm F, Toledo EM, Monfeuga T, Zhang F, Deane CM, Reinert G. Functional module detection through integration of single-cell RNA sequencing data with protein-protein interaction networks. BMC Genomics 2020; 21:756. [PMID: 33138772 PMCID: PMC7607865 DOI: 10.1186/s12864-020-07144-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 10/12/2020] [Indexed: 12/14/2022] Open
Abstract
Background Recent advances in single-cell RNA sequencing have allowed researchers to explore transcriptional function at a cellular level. In particular, single-cell RNA sequencing reveals that there exist clusters of cells with similar gene expression profiles, representing different transcriptional states. Results In this study, we present scPPIN, a method for integrating single-cell RNA sequencing data with protein–protein interaction networks that detects active modules in cells of different transcriptional states. We achieve this by clustering RNA-sequencing data, identifying differentially expressed genes, constructing node-weighted protein–protein interaction networks, and finding the maximum-weight connected subgraphs with an exact Steiner-tree approach. As case studies, we investigate two RNA-sequencing data sets from human liver spheroids and human adipose tissue, respectively. With scPPIN we expand the output of differential expressed genes analysis with information from protein interactions. We find that different transcriptional states have different subnetworks of the protein–protein interaction networks significantly enriched which represent biological pathways. In these pathways, scPPIN identifies proteins that are not differentially expressed but have a crucial biological function (e.g., as receptors) and therefore reveals biology beyond a standard differential expressed gene analysis. Conclusions The introduced scPPIN method can be used to systematically analyse differentially expressed genes in single-cell RNA sequencing data by integrating it with protein interaction data. The detected modules that characterise each cluster help to identify and hypothesise a biological function associated to those cells. Our analysis suggests the participation of unexpected proteins in these pathways that are undetectable from the single-cell RNA sequencing data alone. The techniques described here are applicable to other organisms and tissues. Supplementary Information The online version contains supplementary material available at (doi:10.1186/s12864-020-07144-2).
Collapse
Affiliation(s)
- Florian Klimm
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK. .,Mitochondrial Biology Unit, University of Cambridge, Cambridge, CB2 0XY, UK.
| | - Enrique M Toledo
- Discovery Technology and Genomics, Novo Nordisk Research Centre Oxford, Oxford, OX3 7FZ, UK
| | - Thomas Monfeuga
- Discovery Technology and Genomics, Novo Nordisk Research Centre Oxford, Oxford, OX3 7FZ, UK
| | - Fang Zhang
- Discovery Technology and Genomics, Novo Nordisk Research Centre Oxford, Oxford, OX3 7FZ, UK
| | | | - Gesine Reinert
- Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
| |
Collapse
|
215
|
Abstract
Application of nonlinear dynamics to cancer ecosystems. Chemical turbulence and strange attractor models in tumor growth, invasion and pattern formation are investigated. Computational algorithms for detecting such structures are proposed. Complex systems applications to cancer dynamics.
Cancers are complex, adaptive ecosystems. They remain the leading cause of disease-related death among children in North America. As we approach computational oncology and Deep Learning Healthcare, our mathematical models of cancer dynamics must be revised. Recent findings support the perspective that cancer-microenvironment interactions may consist of chaotic gene expressions and turbulent protein flows during pattern formation. As such, cancer pattern formation, protein-folding and metastatic invasion are discussed herein as processes driven by chemical turbulence within the framework of complex systems theory. To conclude, cancer stem cells are presented as strange attractors of the Waddington landscape.
Collapse
|
216
|
Stumpf MPH. Multi-model and network inference based on ensemble estimates: avoiding the madness of crowds. J R Soc Interface 2020; 17:20200419. [PMID: 33081645 PMCID: PMC7653378 DOI: 10.1098/rsif.2020.0419] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Recent progress in theoretical systems biology, applied mathematics and computational statistics allows us to compare the performance of different candidate models at describing a particular biological system quantitatively. Model selection has been applied with great success to problems where a small number-typically less than 10-of models are compared, but recent studies have started to consider thousands and even millions of candidate models. Often, however, we are left with sets of models that are compatible with the data, and then we can use ensembles of models to make predictions. These ensembles can have very desirable characteristics, but as I show here are not guaranteed to improve on individual estimators or predictors. I will show in the cases of model selection and network inference when we can trust ensembles, and when we should be cautious. The analyses suggest that the careful construction of an ensemble-choosing good predictors-is of paramount importance, more than had perhaps been realized before: merely adding different methods does not suffice. The success of ensemble network inference methods is also shown to rest on their ability to suppress false-positive results. A Jupyter notebook which allows carrying out an assessment of ensemble estimators is provided.
Collapse
Affiliation(s)
- Michael P H Stumpf
- School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia.,Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
217
|
Kwon MS, Lee BT, Lee SY, Kim HU. Modeling regulatory networks using machine learning for systems metabolic engineering. Curr Opin Biotechnol 2020; 65:163-170. [DOI: 10.1016/j.copbio.2020.02.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 02/23/2020] [Accepted: 02/26/2020] [Indexed: 12/18/2022]
|
218
|
Dibaeinia P, Sinha S. SERGIO: A Single-Cell Expression Simulator Guided by Gene Regulatory Networks. Cell Syst 2020; 11:252-271.e11. [PMID: 32871105 PMCID: PMC7530147 DOI: 10.1016/j.cels.2020.08.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 03/18/2020] [Accepted: 08/04/2020] [Indexed: 12/14/2022]
Abstract
A common approach to benchmarking of single-cell transcriptomics tools is to generate synthetic datasets that statistically resemble experimental data. However, most existing single-cell simulators do not incorporate transcription factor-gene regulatory interactions that underlie expression dynamics. Here, we present SERGIO, a simulator of single-cell gene expression data that models the stochastic nature of transcription as well as regulation of genes by multiple transcription factors according to a user-provided gene regulatory network. SERGIO can simulate any number of cell types in steady state or cells differentiating to multiple fates. We show that datasets generated by SERGIO are statistically comparable to experimental data generated by Illumina HiSeq2000, Drop-seq, Illumina 10X chromium, and Smart-seq. We use SERGIO to benchmark several single-cell analysis tools, including GRN inference methods, and identify Tcf7, Gata3, and Bcl11b as key drivers of T cell differentiation by performing in silico knockout experiments. SERGIO is freely available for download here: https://github.com/PayamDiba/SERGIO.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute of Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Cancer Center at Illinois, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
219
|
Møller AF, Natarajan KN. Predicting gene regulatory networks from cell atlases. Life Sci Alliance 2020; 3:3/11/e202000658. [PMID: 32958603 PMCID: PMC7536823 DOI: 10.26508/lsa.202000658] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 08/24/2020] [Accepted: 08/31/2020] [Indexed: 12/17/2022] Open
Abstract
Integrated single-cell gene regulatory network from three mouse cell atlases captures global and cell type–specific regulatory modules and crosstalk, important for cellular identity. Recent single-cell RNA-sequencing atlases have surveyed and identified major cell types across different mouse tissues. Here, we computationally reconstruct gene regulatory networks from three major mouse cell atlases to capture functional regulators critical for cell identity, while accounting for a variety of technical differences, including sampled tissues, sequencing depth, and author assigned cell type labels. Extracting the regulatory crosstalk from mouse atlases, we identify and distinguish global regulons active in multiple cell types from specialised cell type–specific regulons. We demonstrate that regulon activities accurately distinguish individual cell types, despite differences between individual atlases. We generate an integrated network that further uncovers regulon modules with coordinated activities critical for cell types, and validate modules using available experimental data. Inferring regulatory networks during myeloid differentiation from wild-type and Irf8 KO cells, we uncover functional contribution of Irf8 regulon activity and composition towards monocyte lineage. Our analysis provides an avenue to further extract and integrate the regulatory crosstalk from single-cell expression data.
Collapse
Affiliation(s)
- Andreas Fønss Møller
- Department of Biochemistry and Molecular Biology, Functional Genomics and Metabolism Unit, University of Southern Denmark, Odense, Denmark
| | - Kedar Nath Natarajan
- Department of Biochemistry and Molecular Biology, Functional Genomics and Metabolism Unit, University of Southern Denmark, Odense, Denmark .,Danish Institute of Advanced Study, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
220
|
Wu N, Yin F, Ou-Yang L, Zhu Z, Xie W. Joint learning of multiple gene networks from single-cell gene expression data. Comput Struct Biotechnol J 2020; 18:2583-2595. [PMID: 33033579 PMCID: PMC7527714 DOI: 10.1016/j.csbj.2020.09.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/31/2020] [Accepted: 09/01/2020] [Indexed: 11/24/2022] Open
Abstract
Inferring gene networks from gene expression data is important for understanding functional organizations within cells. With the accumulation of single-cell RNA sequencing (scRNA-seq) data, it is possible to infer gene networks at single cell level. However, due to the characteristics of scRNA-seq data, such as cellular heterogeneity and high sparsity caused by dropout events, traditional network inference methods may not be suitable for scRNA-seq data. In this study, we introduce a novel joint Gaussian copula graphical model (JGCGM) to jointly estimate multiple gene networks for multiple cell subgroups from scRNA-seq data. Our model can deal with non-Gaussian data with missing values, and identify the common and unique network structures of multiple cell subgroups, which is suitable for scRNA-seq data. Extensive experiments on synthetic data demonstrate that our proposed model outperforms other compared state-of-the-art network inference models. We apply our model to real scRNA-seq data sets to infer gene networks of different cell subgroups. Hub genes in the estimated gene networks are found to be biological significance.
Collapse
Affiliation(s)
- Nuosi Wu
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Fu Yin
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), Shenzhen University, Shenzhen, China
- Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Weixin Xie
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| |
Collapse
|
221
|
NetExtractor: Extracting a Cerebellar Tissue Gene Regulatory Network Using Differentially Expressed High Mutual Information Binary RNA Profiles. G3-GENES GENOMES GENETICS 2020; 10:2953-2963. [PMID: 32665353 PMCID: PMC7466957 DOI: 10.1534/g3.120.401067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Bigenic expression relationships are conventionally defined based on metrics such as Pearson or Spearman correlation that cannot typically detect latent, non-linear dependencies or require the relationship to be monotonic. Further, the combination of intrinsic and extrinsic noise as well as embedded relationships between sample sub-populations reduces the probability of extracting biologically relevant edges during the construction of gene co-expression networks (GCNs). In this report, we address these problems via our NetExtractor algorithm. NetExtractor examines all pairwise gene expression profiles first with Gaussian mixture models (GMMs) to identify sample sub-populations followed by mutual information (MI) analysis that is capable of detecting non-linear differential bigenic expression relationships. We applied NetExtractor to brain tissue RNA profiles from the Genotype-Tissue Expression (GTEx) project to obtain a brain tissue specific gene expression relationship network centered on cerebellar and cerebellar hemisphere enriched edges. We leveraged the PsychENCODE pre-frontal cortex (PFC) gene regulatory network (GRN) to construct a cerebellar cortex (cerebellar) GRN associated with transcriptionally active regions in cerebellar tissue. Thus, we demonstrate the utility of our NetExtractor approach to detect biologically relevant and novel non-linear binary gene relationships.
Collapse
|
222
|
Li Y, Ma A, Mathé EA, Li L, Liu B, Ma Q. Elucidation of Biological Networks across Complex Diseases Using Single-Cell Omics. Trends Genet 2020; 36:951-966. [PMID: 32868128 DOI: 10.1016/j.tig.2020.08.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 07/29/2020] [Accepted: 08/04/2020] [Indexed: 12/14/2022]
Abstract
Single-cell multimodal omics (scMulti-omics) technologies have made it possible to trace cellular lineages during differentiation and to identify new cell types in heterogeneous cell populations. The derived information is especially promising for computing cell-type-specific biological networks encoded in complex diseases and improving our understanding of the underlying gene regulatory mechanisms. The integration of these networks could, therefore, give rise to a heterogeneous regulatory landscape (HRL) in support of disease diagnosis and drug therapeutics. In this review, we provide an overview of this field and pay particular attention to how diverse biological networks can be inferred in a specific cell type based on integrative methods. Then, we discuss how HRL can advance our understanding of regulatory mechanisms underlying complex diseases and aid in the prediction of prognosis and therapeutic responses. Finally, we outline challenges and future trends that will be central to bringing the field of HRL in complex diseases forward.
Collapse
Affiliation(s)
- Yang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Ewy A Mathé
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health (NIH), Rockville, MD, 20892, USA
| | - Lang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
223
|
Hao Shi, Yan KK, Ding L, Qian C, Chi H, Yu J. Network Approaches for Dissecting the Immune System. iScience 2020; 23:101354. [PMID: 32717640 PMCID: PMC7390880 DOI: 10.1016/j.isci.2020.101354] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Revised: 06/21/2020] [Accepted: 07/08/2020] [Indexed: 02/06/2023] Open
Abstract
The immune system is a complex biological network composed of hierarchically organized genes, proteins, and cellular components that combat external pathogens and monitor the onset of internal disease. To meet and ultimately defeat these challenges, the immune system orchestrates an exquisitely complex interplay of numerous cells, often with highly specialized functions, in a tissue-specific manner. One of the major methodologies of systems immunology is to measure quantitatively the components and interaction levels in the immunologic networks to construct a computational network and predict the response of the components to perturbations. The recent advances in high-throughput sequencing techniques have provided us with a powerful approach to dissecting the complexity of the immune system. Here we summarize the latest progress in integrating omics data and network approaches to construct networks and to infer the underlying signaling and transcriptional landscape, as well as cell-cell communication, in the immune system, with a focus on hematopoiesis, adaptive immunity, and tumor immunology. Understanding the network regulation of immune cells has provided new insights into immune homeostasis and disease, with important therapeutic implications for inflammation, cancer, and other immune-mediated disorders.
Collapse
Affiliation(s)
- Hao Shi
- Departments of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA; Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Koon-Kiu Yan
- Departments of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Liang Ding
- Departments of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Chenxi Qian
- Departments of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA; Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Hongbo Chi
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Jiyang Yu
- Departments of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
| |
Collapse
|
224
|
Sekula M, Gaskins J, Datta S. A sparse Bayesian factor model for the construction of gene co-expression networks from single-cell RNA sequencing count data. BMC Bioinformatics 2020; 21:361. [PMID: 32811424 PMCID: PMC7437941 DOI: 10.1186/s12859-020-03707-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 08/04/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Gene co-expression networks (GCNs) are powerful tools that enable biologists to examine associations between genes during different biological processes. With the advancement of new technologies, such as single-cell RNA sequencing (scRNA-seq), there is a need for developing novel network methods appropriate for new types of data. RESULTS We present a novel sparse Bayesian factor model to explore the network structure associated with genes in scRNA-seq data. Latent factors impact the gene expression values for each cell and provide flexibility to account for common features of scRNA-seq: high proportions of zero values, increased cell-to-cell variability, and overdispersion due to abnormally large expression counts. From our model, we construct a GCN by analyzing the positive and negative associations of the factors that are shared between each pair of genes. CONCLUSIONS Simulation studies demonstrate that our methodology has high power in identifying gene-gene associations while maintaining a nominal false discovery rate. In real data analyses, our model identifies more known and predicted protein-protein interactions than other competing network models.
Collapse
Affiliation(s)
- Michael Sekula
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, USA.
| | - Jeremy Gaskins
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, USA
| | - Susmita Datta
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| |
Collapse
|
225
|
Del Sol A, Jung S. The Importance of Computational Modeling in Stem Cell Research. Trends Biotechnol 2020; 39:126-136. [PMID: 32800604 DOI: 10.1016/j.tibtech.2020.07.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Revised: 07/13/2020] [Accepted: 07/15/2020] [Indexed: 12/30/2022]
Abstract
The generation of large amounts of omics data is increasingly enabling not only the processing and analysis of large data sets but also the development of computational models in the field of stem cell research. Although computational models have been proposed in recent decades, we believe that the stem cell community is not fully aware of the potentiality of computational modeling in guiding their experimental research. In this regard, we discuss how single-cell technologies provide the right framework for computational modeling at different scales of biological organization in order to address challenges in the stem cell field and to guide experimentalists in the design of new strategies for stem cell therapies and treatment of congenital disorders.
Collapse
Affiliation(s)
- Antonio Del Sol
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, Esch-sur-Alzette, L-4367 Belvaux, Luxembourg; CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Bizkaia Technology Park, 801 Building, 48160 Derio, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao 48013, Spain.
| | - Sascha Jung
- CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Bizkaia Technology Park, 801 Building, 48160 Derio, Spain
| |
Collapse
|
226
|
Zheng X, Huang Y, Zou X. scPADGRN: A preconditioned ADMM approach for reconstructing dynamic gene regulatory network using single-cell RNA sequencing data. PLoS Comput Biol 2020; 16:e1007471. [PMID: 32716923 PMCID: PMC7410337 DOI: 10.1371/journal.pcbi.1007471] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 08/06/2020] [Accepted: 05/28/2020] [Indexed: 12/23/2022] Open
Abstract
Disease development and cell differentiation both involve dynamic changes; therefore, the reconstruction of dynamic gene regulatory networks (DGRNs) is an important but difficult problem in systems biology. With recent technical advances in single-cell RNA sequencing (scRNA-seq), large volumes of scRNA-seq data are being obtained for various processes. However, most current methods of inferring DGRNs from bulk samples may not be suitable for scRNA-seq data. In this work, we present scPADGRN, a novel DGRN inference method using “time-series” scRNA-seq data. scPADGRN combines the preconditioned alternating direction method of multipliers with cell clustering for DGRN reconstruction. It exhibits advantages in accuracy, robustness and fast convergence. Moreover, a quantitative index called Differentiation Genes’ Interaction Enrichment (DGIE) is presented to quantify the interaction enrichment of genes related to differentiation. From the DGIE scores of relevant subnetworks, we infer that the functions of embryonic stem (ES) cells are most active initially and may gradually fade over time. The communication strength of known contributing genes that facilitate cell differentiation increases from ES cells to terminally differentiated cells. We also identify several genes responsible for the changes in the DGIE scores occurring during cell differentiation based on three real single-cell datasets. Our results demonstrate that single-cell analyses based on network inference coupled with quantitative computations can reveal key transcriptional regulators involved in cell differentiation and disease development. Single-cell RNA sequencing (scRNA-seq) data are gaining popularity for providing access to cell-level measurements. Currently, time-series scRNA-seq data allow researchers to study dynamic changes during biological processes. This work proposes a novel method, scPADGRN, for application to time-series scRNA-seq data to construct dynamic gene regulatory networks, which are informative for investigating dynamic changes during disease development and cell differentiation. The proposed method shows satisfactory performance on both simulated data and three real datasets concerning cell differentiation. To quantify network dynamics, we present a quantitative index, DGIE, to measure the degree of activity of a certain set of genes in a regulatory network. Quantitative computations based on dynamic networks identify key regulators in cell differentiation and reveal the activity states of the identified regulators. Specifically, Bhlhe40, Msx2, Foxa2 and Dnmt3l might be important regulatory genes involved in differentiation from mouse ES cells to primitive endoderm (PrE) cells. For differentiation from mouse embryonic fibroblast cells to myocytes, Scx, Fos and Tcf12 are suggested to be key regulators. Sox5, Meis2, Hoxb3, Tcf7l1 and Plagl1 critically contribute during differentiation from human ES cells to definitive endoderm cells. These results may guide further theoretical and experimental efforts to understand cell differentiation processes and explore cell heterogeneity.
Collapse
Affiliation(s)
- Xiao Zheng
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| | - Yuan Huang
- Department of Biostatistics, Yale University, New Haven, Connecticut, United States of America
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
- * E-mail:
| |
Collapse
|
227
|
Che D, Guo S, Jiang Q, Chen L. PFBNet: a priori-fused boosting method for gene regulatory network inference. BMC Bioinformatics 2020; 21:308. [PMID: 32664870 PMCID: PMC7362553 DOI: 10.1186/s12859-020-03639-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 07/02/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Inferring gene regulatory networks (GRNs) from gene expression data remains a challenge in system biology. In past decade, numerous methods have been developed for the inference of GRNs. It remains a challenge due to the fact that the data is noisy and high dimensional, and there exists a large number of potential interactions. RESULTS We present a novel method, namely priori-fused boosting network inference method (PFBNet), to infer GRNs from time-series expression data by using the non-linear model of Boosting and the prior information (e.g., the knockout data) fusion scheme. Specifically, PFBNet first calculates the confidences of the regulation relationships using the boosting-based model, where the information about the accumulation impact of the gene expressions at previous time points is taken into account. Then, a newly defined strategy is applied to fuse the information from the prior data by elevating the confidences of the regulation relationships from the corresponding regulators. CONCLUSIONS The experiments on the benchmark datasets from DREAM challenge as well as the E.coli datasets show that PFBNet achieves significantly better performance than other state-of-the-art methods (Jump3, GEINE3-lag, HiDi, iRafNet and BiXGBoost).
Collapse
Affiliation(s)
- Dandan Che
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Shun Guo
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Qingshan Jiang
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Lifei Chen
- School of Mathematics and Computer Science, Fujian Normal University, Fujian, 350117 China
| |
Collapse
|
228
|
Gene regulatory network inference from sparsely sampled noisy data. Nat Commun 2020; 11:3493. [PMID: 32661225 PMCID: PMC7359369 DOI: 10.1038/s41467-020-17217-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 06/11/2020] [Indexed: 12/16/2022] Open
Abstract
The complexity of biological systems is encoded in gene regulatory networks. Unravelling this intricate web is a fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases. The major obstacle in inferring gene regulatory networks is the lack of data. While time series data are nowadays widely available, they are typically noisy, with low sampling frequency and overall small number of samples. This paper develops a method called BINGO to specifically deal with these issues. Benchmarked with both real and simulated time-series data covering many different gene regulatory networks, BINGO clearly and consistently outperforms state-of-the-art methods. The novelty of BINGO lies in a nonparametric approach featuring statistical sampling of continuous gene expression profiles. BINGO's superior performance and ease of use, even by non-specialists, make gene regulatory network inference available to any researcher, helping to decipher the complex mechanisms of life.
Collapse
|
229
|
Kang X, Hajek B, Hanzawa Y. From graph topology to ODE models for gene regulatory networks. PLoS One 2020; 15:e0235070. [PMID: 32603340 PMCID: PMC7326199 DOI: 10.1371/journal.pone.0235070] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 06/08/2020] [Indexed: 11/28/2022] Open
Abstract
A gene regulatory network can be described at a high level by a directed graph with signed edges, and at a more detailed level by a system of ordinary differential equations (ODEs). The former qualitatively models the causal regulatory interactions between ordered pairs of genes, while the latter quantitatively models the time-varying concentrations of mRNA and proteins. This paper clarifies the connection between the two types of models. We propose a property, called the constant sign property, for a general class of ODE models. The constant sign property characterizes the set of conditions (system parameters, external signals, or internal states) under which an ODE model is consistent with a signed, directed graph. If the constant sign property for an ODE model holds globally for all conditions, then the ODE model has a single signed, directed graph. If the constant sign property for an ODE model only holds locally, which may be more typical, then the ODE model corresponds to different graphs under different sets of conditions. In addition, two versions of constant sign property are given and a relationship between them is proved. As an example, the ODE models that capture the effect of cis-regulatory elements involving protein complex binding, based on the model in the GeneNetWeaver source code, are described in detail and shown to satisfy the global constant sign property with a unique consistent gene regulatory graph. Even a single gene regulatory graph is shown to have many ODE models of GeneNetWeaver type consistent with it due to combinatorial complexity and continuous parameters. Finally the question of how closely data generated by one ODE model can be fit by another ODE model is explored. It is observed that the fit is better if the two models come from the same graph.
Collapse
Affiliation(s)
- Xiaohan Kang
- Department of Electrical and Computer Engineering, and Coordinated Science Laboratory, University of Illinois at Urbana–Champaign, Urbana, Illinois, United States of America
| | - Bruce Hajek
- Department of Electrical and Computer Engineering, and Coordinated Science Laboratory, University of Illinois at Urbana–Champaign, Urbana, Illinois, United States of America
| | - Yoshie Hanzawa
- Department of Biology, California State University, Northridge, Northridge, California, United States of America
| |
Collapse
|
230
|
Hu X, Hu Y, Wu F, Leung RWT, Qin J. Integration of single-cell multi-omics for gene regulatory network inference. Comput Struct Biotechnol J 2020; 18:1925-1938. [PMID: 32774787 PMCID: PMC7385034 DOI: 10.1016/j.csbj.2020.06.033] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 06/17/2020] [Accepted: 06/20/2020] [Indexed: 12/20/2022] Open
Abstract
The advancement of single-cell sequencing technology in recent years has provided an opportunity to reconstruct gene regulatory networks (GRNs) with the data from thousands of single cells in one sample. This uncovers regulatory interactions in cells and speeds up the discoveries of regulatory mechanisms in diseases and biological processes. Therefore, more methods have been proposed to reconstruct GRNs using single-cell sequencing data. In this review, we introduce technologies for sequencing single-cell genome, transcriptome, and epigenome. At the same time, we present an overview of current GRN reconstruction strategies utilizing different single-cell sequencing data. Bioinformatics tools were grouped by their input data type and mathematical principles for reader's convenience, and the fundamental mathematics inherent in each group will be discussed. Furthermore, the adaptabilities and limitations of these different methods will also be summarized and compared, with the hope to facilitate researchers recognizing the most suitable tools for them.
Collapse
Affiliation(s)
- Xinlin Hu
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, China
| | - Yaohua Hu
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, China
| | - Fanjie Wu
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen 518107, China
| | - Ricky Wai Tak Leung
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen 518107, China
| | - Jing Qin
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen 518107, China
| |
Collapse
|
231
|
Falco MM, Peña-Chilet M, Loucera C, Hidalgo MR, Dopazo J. Mechanistic models of signaling pathways deconvolute the glioblastoma single-cell functional landscape. NAR Cancer 2020; 2:zcaa011. [PMID: 34316686 PMCID: PMC8210212 DOI: 10.1093/narcan/zcaa011] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 06/08/2020] [Accepted: 06/11/2020] [Indexed: 02/07/2023] Open
Abstract
Single-cell RNA sequencing is revealing an unexpectedly large degree of heterogeneity in gene expression levels across cell populations. However, little is known on the functional consequences of this heterogeneity and the contribution of individual cell fate decisions to the collective behavior of the tissues these cells are part of. Here, we use mechanistic modeling of signaling circuits, which reveals a complex functional landscape at single-cell level. Different clusters of neoplastic glioblastoma cells have been defined according to their differences in signaling circuit activity profiles triggering specific cancer hallmarks, which suggest different functional strategies with distinct degrees of aggressiveness. Moreover, mechanistic modeling of effects of targeted drug inhibitions at single-cell level revealed, how in some cells, the substitution of VEGFA, the target of bevacizumab, by other expressed proteins, like PDGFD, KITLG and FGF2, keeps the VEGF pathway active, insensitive to the VEGFA inhibition by the drug. Here, we describe for the first time mechanisms that individual cells use to avoid the effect of a targeted therapy, providing an explanation for the innate resistance to the treatment displayed by some cells. Our results suggest that mechanistic modeling could become an important asset for the definition of personalized therapeutic interventions.
Collapse
Affiliation(s)
- Matías M Falco
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, 41013 Sevilla, Spain
| | - María Peña-Chilet
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, 41013 Sevilla, Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, 41013 Sevilla, Spain
| | - Marta R Hidalgo
- Unidad de Bioinformática y Bioestadística, Centro de Investigación Príncipe Felipe (CIPF), 46012 Valencia, Spain
| | - Joaquín Dopazo
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, 41013 Sevilla, Spain
| |
Collapse
|
232
|
Aubin-Frankowski PC, Vert JP. Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference. Bioinformatics 2020; 36:4774-4780. [DOI: 10.1093/bioinformatics/btaa576] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 05/04/2020] [Accepted: 06/11/2020] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Single-cell RNA sequencing (scRNA-seq) offers new possibilities to infer gene regulatory network (GRNs) for biological processes involving a notion of time, such as cell differentiation or cell cycles. It also raises many challenges due to the destructive measurements inherent to the technology.
Results
In this work, we propose a new method named GRISLI for de novo GRN inference from scRNA-seq data. GRISLI infers a velocity vector field in the space of scRNA-seq data from profiles of individual cells, and models the dynamics of cell trajectories with a linear ordinary differential equation to reconstruct the underlying GRN with a sparse regression procedure. We show on real data that GRISLI outperforms a recently proposed state-of-the-art method for GRN reconstruction from scRNA-seq data.
Availability and implementation
The MATLAB code of GRISLI is available at: https://github.com/PCAubin/GRISLI.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Jean-Philippe Vert
- MINES ParisTech, PSL Research University, CBIO – Centre for Computational Biology, F-75006 Paris, France
- Google Research, Brain team, 75009 Paris, France
| |
Collapse
|
233
|
Chanda P, Costa E, Hu J, Sukumar S, Van Hemert J, Walia R. Information Theory in Computational Biology: Where We Stand Today. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E627. [PMID: 33286399 PMCID: PMC7517167 DOI: 10.3390/e22060627] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 05/31/2020] [Accepted: 06/03/2020] [Indexed: 12/30/2022]
Abstract
"A Mathematical Theory of Communication" was published in 1948 by Claude Shannon to address the problems in the field of data compression and communication over (noisy) communication channels. Since then, the concepts and ideas developed in Shannon's work have formed the basis of information theory, a cornerstone of statistical learning and inference, and has been playing a key role in disciplines such as physics and thermodynamics, probability and statistics, computational sciences and biological sciences. In this article we review the basic information theory based concepts and describe their key applications in multiple major areas of research in computational biology-gene expression and transcriptomics, alignment-free sequence comparison, sequencing and error correction, genome-wide disease-gene association mapping, metabolic networks and metabolomics, and protein sequence, structure and interaction analysis.
Collapse
Affiliation(s)
- Pritam Chanda
- Corteva Agriscience™, Indianapolis, IN 46268, USA
- Computer and Information Science, Indiana University-Purdue University, Indianapolis, IN 46202, USA
| | - Eduardo Costa
- Corteva Agriscience™, Mogi Mirim, Sao Paulo 13801-540, Brazil
| | - Jie Hu
- Corteva Agriscience™, Indianapolis, IN 46268, USA
| | | | | | - Rasna Walia
- Corteva Agriscience™, Johnston, IA 50131, USA
| |
Collapse
|
234
|
Saint-Antoine MM, Singh A. Network inference in systems biology: recent developments, challenges, and applications. Curr Opin Biotechnol 2020; 63:89-98. [PMID: 31927423 PMCID: PMC7308210 DOI: 10.1016/j.copbio.2019.12.002] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 12/03/2019] [Indexed: 12/12/2022]
Abstract
One of the most interesting, difficult, and potentially useful topics in computational biology is the inference of gene regulatory networks (GRNs) from expression data. Although researchers have been working on this topic for more than a decade and much progress has been made, it remains an unsolved problem and even the most sophisticated inference algorithms are far from perfect. In this paper, we review the latest developments in network inference, including state-of-the-art algorithms like PIDC, Phixer, and more. We also discuss unsolved computational challenges, including the optimal combination of algorithms, integration of multiple data sources, and pseudo-temporal ordering of static expression data. Lastly, we discuss some exciting applications of network inference in cancer research, and provide a list of useful software tools for researchers hoping to conduct their own network inference analyses.
Collapse
Affiliation(s)
- Michael M Saint-Antoine
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware 19716, USA
| | - Abhyudai Singh
- Electrical and Computer Engineering, University of Delaware, Newark, Delaware 19716, USA.
| |
Collapse
|
235
|
Lun XK, Bodenmiller B. Profiling Cell Signaling Networks at Single-cell Resolution. Mol Cell Proteomics 2020; 19:744-756. [PMID: 32132232 PMCID: PMC7196580 DOI: 10.1074/mcp.r119.001790] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 03/03/2020] [Indexed: 12/24/2022] Open
Abstract
Signaling networks process intra- and extracellular information to modulate the functions of a cell. Deregulation of signaling networks results in abnormal cellular physiological states and often drives diseases. Network responses to a stimulus or a drug treatment can be highly heterogeneous across cells in a tissue because of many sources of cellular genetic and non-genetic variance. Signaling network heterogeneity is the key to many biological processes, such as cell differentiation and drug resistance. Only recently, the emergence of multiplexed single-cell measurement technologies has made it possible to evaluate this heterogeneity. In this review, we categorize currently established single-cell signaling network profiling approaches by their methodology, coverage, and application, and we discuss the advantages and limitations of each type of technology. We also describe the available computational tools for network characterization using single-cell data and discuss potential confounding factors that need to be considered in single-cell signaling network analyses.
Collapse
Affiliation(s)
- Xiao-Kang Lun
- Institute of Molecular Life Sciences, University of Zürich, 8057 Zürich, Switzerland; Molecular Life Sciences PhD Program, Life Science Zürich Graduate School, ETH Zürich and University of Zürich, 8057 Zürich, Switzerland
| | - Bernd Bodenmiller
- Institute of Molecular Life Sciences, University of Zürich, 8057 Zürich, Switzerland.
| |
Collapse
|
236
|
Cang Z, Nie Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat Commun 2020; 11:2084. [PMID: 32350282 PMCID: PMC7190659 DOI: 10.1038/s41467-020-15968-5] [Citation(s) in RCA: 181] [Impact Index Per Article: 45.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 03/27/2020] [Indexed: 01/20/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) provides details for individual cells; however, crucial spatial information is often lost. We present SpaOTsc, a method relying on structured optimal transport to recover spatial properties of scRNA-seq data by utilizing spatial measurements of a relatively small number of genes. A spatial metric for individual cells in scRNA-seq data is first established based on a map connecting it with the spatial measurements. The cell-cell communications are then obtained by "optimally transporting" signal senders to target signal receivers in space. Using partial information decomposition, we next compute the intercellular gene-gene information flow to estimate the spatial regulations between genes across cells. Four datasets are employed for cross-validation of spatial gene expression prediction and comparison to known cell-cell communications. SpaOTsc has broader applications, both in integrating non-spatial single-cell measurements with spatial data, and directly in spatial single-cell transcriptomics data to reconstruct spatial cellular dynamics in tissues.
Collapse
Affiliation(s)
- Zixuan Cang
- Department of Mathematics, University of California, Irvine, Irvine, CA, 92697, USA
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA, 92697, USA
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, Irvine, CA, 92697, USA.
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697, USA.
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
237
|
Shang L, Smith JA, Zhou X. Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies. PLoS Genet 2020; 16:e1008734. [PMID: 32310941 PMCID: PMC7192514 DOI: 10.1371/journal.pgen.1008734] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 04/30/2020] [Accepted: 03/24/2020] [Indexed: 12/11/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified many SNPs associated with various common diseases. Understanding the biological functions of these identified SNP associations requires identifying disease/trait relevant tissues or cell types. Here, we develop a network method, CoCoNet, to facilitate the identification of trait-relevant tissues or cell types. Different from existing approaches, CoCoNet incorporates tissue-specific gene co-expression networks constructed from either bulk or single cell RNA sequencing (RNAseq) studies with GWAS data for trait-tissue inference. In particular, CoCoNet relies on a covariance regression network model to express gene-level effect measurements for the given GWAS trait as a function of the tissue-specific co-expression adjacency matrix. With a composite likelihood-based inference algorithm, CoCoNet is scalable to tens of thousands of genes. We validate the performance of CoCoNet through extensive simulations. We apply CoCoNet for an in-depth analysis of four neurological disorders and four autoimmune diseases, where we integrate the corresponding GWASs with bulk RNAseq data from 38 tissues and single cell RNAseq data from 10 cell types. In the real data applications, we show how CoCoNet can help identify specific glial cell types relevant for neurological disorders and identify disease-targeted colon tissues as relevant for autoimmune diseases.
Collapse
Affiliation(s)
- Lulu Shang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
| | - Jennifer A. Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, United States of America
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, United States of America
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States of America
| |
Collapse
|
238
|
Qiu X, Rahimzamani A, Wang L, Ren B, Mao Q, Durham T, McFaline-Figueroa JL, Saunders L, Trapnell C, Kannan S. Inferring Causal Gene Regulatory Networks from Coupled Single-Cell Expression Dynamics Using Scribe. Cell Syst 2020; 10:265-274.e11. [PMID: 32135093 PMCID: PMC7223477 DOI: 10.1016/j.cels.2020.02.003] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 06/08/2019] [Accepted: 02/05/2020] [Indexed: 01/13/2023]
Abstract
Here, we present Scribe (https://github.com/aristoteleo/Scribe-py), a toolkit for detecting and visualizing causal regulatory interactions between genes and explore the potential for single-cell experiments to power network reconstruction. Scribe employs restricted directed information to determine causality by estimating the strength of information transferred from a potential regulator to its downstream target. We apply Scribe and other leading approaches for causal network reconstruction to several types of single-cell measurements and show that there is a dramatic drop in performance for "pseudotime"-ordered single-cell data compared with true time-series data. We demonstrate that performing causal inference requires temporal coupling between measurements. We show that methods such as "RNA velocity" restore some degree of coupling through an analysis of chromaffin cell fate commitment. These analyses highlight a shortcoming in experimental and computational methods for analyzing gene regulation at single-cell resolution and suggest ways of overcoming it.
Collapse
Affiliation(s)
- Xiaojie Qiu
- Molecular & Cellular Biology Program, University of Washington, Seattle, WA, USA; Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Arman Rahimzamani
- Department of Electrical Engineering, University of Washington, Seattle, WA, USA
| | - Li Wang
- Department of Mathematics, University of Texas at Arlington, Arlington, TX, USA
| | - Bingcheng Ren
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Qi Mao
- HERE company, Chicago, IL 60606, USA
| | - Timothy Durham
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Lauren Saunders
- Molecular & Cellular Biology Program, University of Washington, Seattle, WA, USA; Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Molecular & Cellular Biology Program, University of Washington, Seattle, WA, USA; Department of Genome Sciences, University of Washington, Seattle, WA, USA; Brotman-Baty Institute for Precision Medicine, Seattle, WA, USA.
| | - Sreeram Kannan
- Department of Electrical Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
239
|
Uda S. Application of information theory in systems biology. Biophys Rev 2020; 12:377-384. [PMID: 32144740 PMCID: PMC7242537 DOI: 10.1007/s12551-020-00665-w] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 02/25/2020] [Indexed: 12/12/2022] Open
Abstract
Over recent years, new light has been shed on aspects of information processing in cells. The quantification of information, as described by Shannon’s information theory, is a basic and powerful tool that can be applied to various fields, such as communication, statistics, and computer science, as well as to information processing within cells. It has also been used to infer the network structure of molecular species. However, the difficulty of obtaining sufficient sample sizes and the computational burden associated with the high-dimensional data often encountered in biology can result in bottlenecks in the application of information theory to systems biology. This article provides an overview of the application of information theory to systems biology, discussing the associated bottlenecks and reviewing recent work.
Collapse
Affiliation(s)
- Shinsuke Uda
- Division of Integrated Omics, Research Center for Transomics Medicine, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka, 812-8582, Japan.
| |
Collapse
|
240
|
Finn C, Lizier JT. Generalised Measures of Multivariate Information Content. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E216. [PMID: 33285991 PMCID: PMC7851747 DOI: 10.3390/e22020216] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 02/05/2020] [Accepted: 02/12/2020] [Indexed: 12/12/2022]
Abstract
The entropy of a pair of random variables is commonly depicted using a Venn diagram. This representation is potentially misleading, however, since the multivariate mutual information can be negative. This paper presents new measures of multivariate information content that can be accurately depicted using Venn diagrams for any number of random variables. These measures complement the existing measures of multivariate mutual information and are constructed by considering the algebraic structure of information sharing. It is shown that the distinct ways in which a set of marginal observers can share their information with a non-observing third party corresponds to the elements of a free distributive lattice. The redundancy lattice from partial information decomposition is then subsequently and independently derived by combining the algebraic structures of joint and shared information content.
Collapse
Affiliation(s)
- Conor Finn
- Centre for Complex Systems, The University of Sydney, Sydney NSW 2006, Australia;
- CSIRO Data61, Marsfield NSW 2122, Australia
| | - Joseph T. Lizier
- Centre for Complex Systems, The University of Sydney, Sydney NSW 2006, Australia;
| |
Collapse
|
241
|
Brückner DB, Fink A, Rädler JO, Broedersz CP. Disentangling the behavioural variability of confined cell migration. J R Soc Interface 2020. [DOI: 10.1098/rsif.2019.0689] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Cell-to-cell variability is inherent to numerous biological processes, including cell migration. Quantifying and characterizing the variability of migrating cells is challenging, as it requires monitoring many cells for long time windows under identical conditions. Here, we observe the migration of single human breast cancer cells (MDA-MB-231) in confining two-state micropatterns. To describe the stochastic dynamics of this confined migration, we employ a dynamical systems approach. We identify statistics to measure the behavioural variance of the migration, which significantly exceeds that predicted by a population-averaged stochastic model. This additional variance can be explained by the combination of an ‘ageing’ process and population heterogeneity. To quantify population heterogeneity, we decompose the cells into subpopulations of slow and fast cells, revealing the presence of distinct classes of dynamical systems describing the migration, ranging from bistable to limit cycle behaviour. Our findings highlight the breadth of migration behaviours present in cell populations.
Collapse
Affiliation(s)
- David B. Brückner
- Arnold-Sommerfeld-Center for Theoretical Physics and Center for NanoScience, Ludwig-Maximilians-Universität, München, Bayern, Germany
| | - Alexandra Fink
- Faculty of Physics and Center for NanoScience, Ludwig-Maximilians-Universität, München, Bayern, Germany
| | - Joachim O. Rädler
- Faculty of Physics and Center for NanoScience, Ludwig-Maximilians-Universität, München, Bayern, Germany
| | - Chase P. Broedersz
- Arnold-Sommerfeld-Center for Theoretical Physics and Center for NanoScience, Ludwig-Maximilians-Universität, München, Bayern, Germany
| |
Collapse
|
242
|
Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods 2020; 17:147-154. [PMID: 31907445 PMCID: PMC7098173 DOI: 10.1038/s41592-019-0690-6] [Citation(s) in RCA: 326] [Impact Index Per Article: 81.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 11/22/2019] [Indexed: 01/10/2023]
Abstract
We present a systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data. As the ground truth for assessing accuracy, we use synthetic networks with predictable trajectories, literature-curated Boolean models and diverse transcriptional regulatory networks. We develop a strategy to simulate single-cell transcriptional data from synthetic and Boolean networks that avoids pitfalls of previously used methods. Furthermore, we collect networks from multiple experimental single-cell RNA-seq datasets. We develop an evaluation framework called BEELINE. We find that the area under the precision-recall curve and early precision of the algorithms are moderate. The methods are better in recovering interactions in synthetic networks than Boolean models. The algorithms with the best early precision values for Boolean models also perform well on experimental datasets. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we present recommendations to end users. BEELINE will aid the development of gene regulatory network inference algorithms.
Collapse
Affiliation(s)
- Aditya Pratapa
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Amogh P Jalihal
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA, USA
| | - Jeffrey N Law
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA, USA
| | - Aditya Bharadwaj
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
243
|
Gallivan CP, Ren H, Read EL. Analysis of Single-Cell Gene Pair Coexpression Landscapes by Stochastic Kinetic Modeling Reveals Gene-Pair Interactions in Development. Front Genet 2020; 10:1387. [PMID: 32082359 PMCID: PMC7005996 DOI: 10.3389/fgene.2019.01387] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 12/18/2019] [Indexed: 12/04/2022] Open
Abstract
Single-cell transcriptomics is advancing discovery of the molecular determinants of cell identity, while spurring development of novel data analysis methods. Stochastic mathematical models of gene regulatory networks help unravel the dynamic, molecular mechanisms underlying cell-to-cell heterogeneity, and can thus aid interpretation of heterogeneous cell-states revealed by single-cell measurements. However, integrating stochastic gene network models with single cell data is challenging. Here, we present a method for analyzing single-cell gene-pair coexpression patterns, based on biophysical models of stochastic gene expression and interaction dynamics. We first developed a high-computational-throughput approach to stochastic modeling of gene-pair coexpression landscapes, based on numerical solution of gene network Master Equations. We then comprehensively catalogued coexpression patterns arising from tens of thousands of gene-gene interaction models with different biochemical kinetic parameters and regulatory interactions. From the computed landscapes, we obtain a low-dimensional "shape-space" describing distinct types of coexpression patterns. We applied the theoretical results to analysis of published single cell RNA sequencing data and uncovered complex dynamics of coexpression among gene pairs during embryonic development. Our approach provides a generalizable framework for inferring evolution of gene-gene interactions during critical cell-state transitions.
Collapse
Affiliation(s)
- Cameron P. Gallivan
- Department of Chemical & Biomolecular Engineering, University of California, Irvine, CA, United States
| | - Honglei Ren
- NSF-Simons Center for Multiscale Cell Fate, University of California, Irvine, CA, United States
- Mathematical and Computational Systems Biology Graduate Program, University of California, Irvine, CA, United States
| | - Elizabeth L. Read
- Department of Chemical & Biomolecular Engineering, University of California, Irvine, CA, United States
- NSF-Simons Center for Multiscale Cell Fate, University of California, Irvine, CA, United States
| |
Collapse
|
244
|
Jackson CA, Castro DM, Saldi GA, Bonneau R, Gresham D. Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments. eLife 2020; 9:e51254. [PMID: 31985403 PMCID: PMC7004572 DOI: 10.7554/elife.51254] [Citation(s) in RCA: 92] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 01/10/2020] [Indexed: 11/13/2022] Open
Abstract
Understanding how gene expression programs are controlled requires identifying regulatory relationships between transcription factors and target genes. Gene regulatory networks are typically constructed from gene expression data acquired following genetic perturbation or environmental stimulus. Single-cell RNA sequencing (scRNAseq) captures the gene expression state of thousands of individual cells in a single experiment, offering advantages in combinatorial experimental design, large numbers of independent measurements, and accessing the interaction between the cell cycle and environmental responses that is hidden by population-level analysis of gene expression. To leverage these advantages, we developed a method for scRNAseq in budding yeast (Saccharomyces cerevisiae). We pooled diverse transcriptionally barcoded gene deletion mutants in 11 different environmental conditions and determined their expression state by sequencing 38,285 individual cells. We benchmarked a framework for learning gene regulatory networks from scRNAseq data that incorporates multitask learning and constructed a global gene regulatory network comprising 12,228 interactions.
Collapse
Affiliation(s)
- Christopher A Jackson
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
| | | | | | - Richard Bonneau
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
- Courant Institute of Mathematical Sciences, Computer Science DepartmentNew York UniversityNew YorkUnited States
- Center For Data ScienceNew York UniversityNew YorkUnited States
- Flatiron Institute, Center for Computational BiologySimons FoundationNew YorkUnited States
| | - David Gresham
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
| |
Collapse
|
245
|
Sanchez-Taltavull D, Perkins TJ, Dommann N, Melin N, Keogh A, Candinas D, Stroka D, Beldi G. Bayesian correlation is a robust gene similarity measure for single-cell RNA-seq data. NAR Genom Bioinform 2020; 2:lqaa002. [PMID: 33575552 PMCID: PMC7671344 DOI: 10.1093/nargab/lqaa002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 11/30/2019] [Accepted: 01/09/2020] [Indexed: 02/07/2023] Open
Abstract
Assessing similarity is highly important for bioinformatics algorithms to determine correlations between biological information. A common problem is that similarity can appear by chance, particularly for low expressed entities. This is especially relevant in single-cell RNA-seq (scRNA-seq) data because read counts are much lower compared to bulk RNA-seq. Recently, a Bayesian correlation scheme that assigns low similarity to genes that have low confidence expression estimates has been proposed to assess similarity for bulk RNA-seq. Our goal is to extend the properties of the Bayesian correlation in scRNA-seq data by considering three ways to compute similarity. First, we compute the similarity of pairs of genes over all cells. Second, we identify specific cell populations and compute the correlation in those populations. Third, we compute the similarity of pairs of genes over all clusters, by considering the total mRNA expression. We demonstrate that Bayesian correlations are more reproducible than Pearson correlations. Compared to Pearson correlations, Bayesian correlations have a smaller dependence on the number of input cells. We show that the Bayesian correlation algorithm assigns high similarity values to genes with a biological relevance in a specific population. We conclude that Bayesian correlation is a robust similarity measure in scRNA-seq data.
Collapse
Affiliation(s)
- Daniel Sanchez-Taltavull
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Theodore J Perkins
- Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, Ontario, ON K1H8L6, Canada.,Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, ON K1H8L6, Canada
| | - Noelle Dommann
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Nicolas Melin
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Adrian Keogh
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Daniel Candinas
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Deborah Stroka
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| | - Guido Beldi
- Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland
| |
Collapse
|
246
|
He S, Tian Y, Feng S, Wu Y, Shen X, Chen K, He Y, Sun Q, Li X, Xu J, Wen Z, Qu JY. In vivo single-cell lineage tracing in zebrafish using high-resolution infrared laser-mediated gene induction microscopy. eLife 2020; 9:e52024. [PMID: 31904340 PMCID: PMC7018510 DOI: 10.7554/elife.52024] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 01/04/2020] [Indexed: 12/15/2022] Open
Abstract
Heterogeneity broadly exists in various cell types both during development and at homeostasis. Investigating heterogeneity is crucial for comprehensively understanding the complexity of ontogeny, dynamics, and function of specific cell types. Traditional bulk-labeling techniques are incompetent to dissect heterogeneity within cell population, while the new single-cell lineage tracing methodologies invented in the last decade can hardly achieve high-fidelity single-cell labeling and long-term in-vivo observation simultaneously. In this work, we developed a high-precision infrared laser-evoked gene operator heat-shock system, which uses laser-induced CreERT2 combined with loxP-DsRedx-loxP-GFP reporter to achieve precise single-cell labeling and tracing. In vivo study indicated that this system can precisely label single cell in brain, muscle and hematopoietic system in zebrafish embryo. Using this system, we traced the hematopoietic potential of hemogenic endothelium (HE) in the posterior blood island (PBI) of zebrafish embryo and found that HEs in the PBI are heterogeneous, which contains at least myeloid unipotent and myeloid-lymphoid bipotent subtypes.
Collapse
Affiliation(s)
- Sicong He
- Department of Electronic and Computer EngineeringThe Hong Kong University of Science and TechnologyKowloonChina
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
| | - Ye Tian
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
- Division of Life ScienceThe Hong Kong University of Science and TechnologyKowloonChina
| | - Shachuan Feng
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
- Division of Life ScienceThe Hong Kong University of Science and TechnologyKowloonChina
| | - Yi Wu
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
- Division of Life ScienceThe Hong Kong University of Science and TechnologyKowloonChina
| | - Xinwei Shen
- Department of MathematicsThe Hong Kong University of Science and TechnologyKowloonChina
| | - Kani Chen
- Department of MathematicsThe Hong Kong University of Science and TechnologyKowloonChina
| | - Yingzhu He
- Department of Electronic and Computer EngineeringThe Hong Kong University of Science and TechnologyKowloonChina
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
| | - Qiqi Sun
- Department of Electronic and Computer EngineeringThe Hong Kong University of Science and TechnologyKowloonChina
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
| | - Xuesong Li
- Department of Electronic and Computer EngineeringThe Hong Kong University of Science and TechnologyKowloonChina
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
| | - Jin Xu
- Division of Cell, Developmental and Integrative Biology, School of MedicineSouth China University of TechnologyGuangzhouChina
| | - Zilong Wen
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
- Division of Life ScienceThe Hong Kong University of Science and TechnologyKowloonChina
| | - Jianan Y Qu
- Department of Electronic and Computer EngineeringThe Hong Kong University of Science and TechnologyKowloonChina
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
| |
Collapse
|
247
|
Capdevila C, Calderon RI, Bush EC, Sheldon-Collins K, Sims PA, Yan KS. Single-Cell Transcriptional Profiling of the Intestinal Epithelium. Methods Mol Biol 2020; 2171:129-153. [PMID: 32705639 DOI: 10.1007/978-1-0716-0747-3_8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Emerging single-cell technologies, like single-cell RNA sequencing (scRNA-seq), enable the study of heterogeneous biological systems at cellular resolution. By profiling the set of expressed transcripts in each cell, single-cell transcriptomics has allowed for the cataloging of the cellular constituents of multiple organs and tissues, both in health and disease. In addition, these technologies have provided mechanistic insights into cellular function, cell state transitions, developmental trajectories and lineage relationships, as well as helped to dissect complex, population-level responses to environmental perturbations. scRNA-seq is particularly useful for characterizing the intestinal epithelium because it is a dynamic, rapidly self-renewing tissue comprised of more than a dozen specialized cell types. Here we discuss the fundamentals of single-cell transcriptomics of the murine small intestinal epithelium. We review the principles of proper experimental design and provide methods for the dissociation of the small intestinal epithelium into single cells followed by fluorescence-activated cell sorting (FACS) and for scRNA-seq using the 10× Genomics Chromium platform.
Collapse
Affiliation(s)
- Claudia Capdevila
- Columbia Center for Human Development, Columbia Stem Cell Initiative, Division of Digestive & Liver Diseases, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Genetics & Development, Columbia University Irving Medical Center, New York, NY, USA
| | - Ruben I Calderon
- Columbia Center for Human Development, Columbia Stem Cell Initiative, Division of Digestive & Liver Diseases, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Genetics & Development, Columbia University Irving Medical Center, New York, NY, USA
| | - Erin C Bush
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Biochemistry & Molecular Biophysics, Columbia University Irving Medical Center, New York, NY, USA
| | - Kismet Sheldon-Collins
- Columbia Center for Human Development, Columbia Stem Cell Initiative, Division of Digestive & Liver Diseases, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Genetics & Development, Columbia University Irving Medical Center, New York, NY, USA
| | - Peter A Sims
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Biochemistry & Molecular Biophysics, Columbia University Irving Medical Center, New York, NY, USA
| | - Kelley S Yan
- Columbia Center for Human Development, Columbia Stem Cell Initiative, Division of Digestive & Liver Diseases, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Genetics & Development, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
248
|
Chen X, Li M, Zheng R, Wu FX, Wang J. D3GRN: a data driven dynamic network construction method to infer gene regulatory networks. BMC Genomics 2019; 20:929. [PMID: 31881937 PMCID: PMC6933629 DOI: 10.1186/s12864-019-6298-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND To infer gene regulatory networks (GRNs) from gene-expression data is still a fundamental and challenging problem in systems biology. Several existing algorithms formulate GRNs inference as a regression problem and obtain the network with an ensemble strategy. Recent studies on data driven dynamic network construction provide us a new perspective to solve the regression problem. RESULTS In this study, we propose a data driven dynamic network construction method to infer gene regulatory network (D3GRN), which transforms the regulatory relationship of each target gene into functional decomposition problem and solves each sub problem by using the Algorithm for Revealing Network Interactions (ARNI). To remedy the limitation of ARNI in constructing networks solely from the unit level, a bootstrapping and area based scoring method is taken to infer the final network. On DREAM4 and DREAM5 benchmark datasets, D3GRN performs competitively with the state-of-the-art algorithms in terms of AUPR. CONCLUSIONS We have proposed a novel data driven dynamic network construction method by combining ARNI with bootstrapping and area based scoring strategy. The proposed method performs well on the benchmark datasets, contributing as a competitive method to infer gene regulatory networks in a new perspective.
Collapse
Affiliation(s)
- Xiang Chen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China.
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
249
|
Zhang J, Nie Q, Zhou T. Revealing Dynamic Mechanisms of Cell Fate Decisions From Single-Cell Transcriptomic Data. Front Genet 2019; 10:1280. [PMID: 31921315 PMCID: PMC6935941 DOI: 10.3389/fgene.2019.01280] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2019] [Accepted: 11/21/2019] [Indexed: 02/05/2023] Open
Abstract
Cell fate decisions play a pivotal role in development, but technologies for dissecting them are limited. We developed a multifunction new method, Topographer, to construct a "quantitative" Waddington's landscape of single-cell transcriptomic data. This method is able to identify complex cell-state transition trajectories and to estimate complex cell-type dynamics characterized by fate and transition probabilities. It also infers both marker gene networks and their dynamic changes as well as dynamic characteristics of transcriptional bursting along the cell-state transition trajectories. Applying this method to single-cell RNA-seq data on the differentiation of primary human myoblasts, we not only identified three known cell types, but also estimated both their fate probabilities and transition probabilities among them. We found that the percent of genes expressed in a bursty manner is significantly higher at (or near) the branch point (~97%) than before or after branch (below 80%), and that both gene-gene and cell-cell correlation degrees are apparently lower near the branch point than away from the branching. Topographer allows revealing of cell fate mechanisms in a coherent way at three scales: cell lineage (macroscopic), gene network (mesoscopic), and gene expression (microscopic).
Collapse
Affiliation(s)
- Jiajun Zhang
- School of Mathematics, Sun Yat-Sen University, Guangzhou, China
- Guangdong Province Key Laboratory of Computational Science and School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, China
| | - Qing Nie
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, United States
- Department of Mathematics, University of California, Irvine, Irvine, CA, United States
| | - Tianshou Zhou
- School of Mathematics, Sun Yat-Sen University, Guangzhou, China
- Guangdong Province Key Laboratory of Computational Science and School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
250
|
Deep learning for inferring gene relationships from single-cell expression data. Proc Natl Acad Sci U S A 2019; 116:27151-27158. [PMID: 31822622 DOI: 10.1073/pnas.1911536116] [Citation(s) in RCA: 106] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Several methods were developed to mine gene-gene relationships from expression data. Examples include correlation and mutual information methods for coexpression analysis, clustering and undirected graphical models for functional assignments, and directed graphical models for pathway reconstruction. Using an encoding for gene expression data, followed by deep neural networks analysis, we present a framework that can successfully address all of these diverse tasks. We show that our method, convolutional neural network for coexpression (CNNC), improves upon prior methods in tasks ranging from predicting transcription factor targets to identifying disease-related genes to causality inference. CNNC's encoding provides insights about some of the decisions it makes and their biological basis. CNNC is flexible and can easily be extended to integrate additional types of genomics data, leading to further improvements in its performance.
Collapse
|