1
|
Qi G, Battle A. Computational methods for allele-specific expression in single cells. Trends Genet 2024:S0168-9525(24)00169-0. [PMID: 39127549 DOI: 10.1016/j.tig.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 08/12/2024]
Abstract
Allele-specific expression (ASE) is a powerful signal that can be used to investigate multiple molecular mechanisms, such as cis-regulatory effects and imprinting. Single-cell RNA-sequencing (scRNA-seq) enables ASE characterization at the resolution of individual cells. In this review, we highlight the computational methods for processing and analyzing single-cell ASE data. We first describe a bioinformatics pipeline to obtain ASE counts from raw reads synthesized from previous literature. We then discuss statistical methods for detecting allelic imbalance and its variability across conditions using scRNA-seq data. In addition, we describe other methods that use single-cell ASE to address specific biological questions. Finally, we discuss future directions and emphasize the need for an integrated, optimized bioinformatics pipeline, and further development of statistical methods for different technologies.
Collapse
Affiliation(s)
- Guanghao Qi
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205, USA.
| |
Collapse
|
2
|
Zou LS, Cable DM, Barrera-Lopez IA, Zhao T, Murray E, Aryee MJ, Chen F, Irizarry RA. Detection of allele-specific expression in spatial transcriptomics with spASE. Genome Biol 2024; 25:180. [PMID: 38978101 PMCID: PMC11229351 DOI: 10.1186/s13059-024-03317-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 06/20/2024] [Indexed: 07/10/2024] Open
Abstract
Spatial transcriptomics technologies permit the study of the spatial distribution of RNA at near-single-cell resolution genome-wide. However, the feasibility of studying spatial allele-specific expression (ASE) from these data remains uncharacterized. Here, we introduce spASE, a computational framework for detecting and estimating spatial ASE. To tackle the challenges presented by cell type mixtures and a low signal to noise ratio, we implement a hierarchical model involving additive mixtures of spatial smoothing splines. We apply our method to allele-resolved Visium and Slide-seq from the mouse cerebellum and hippocampus and report new insight into the landscape of spatial and cell type-specific ASE therein.
Collapse
Affiliation(s)
- Luli S Zou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Dylan M Cable
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, 02139, USA
| | | | - Tongtong Zhao
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Evan Murray
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Martin J Aryee
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Fei Chen
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
| |
Collapse
|
3
|
Emani PS, Liu JJ, Clarke D, Jensen M, Warrell J, Gupta C, Meng R, Lee CY, Xu S, Dursun C, Lou S, Chen Y, Chu Z, Galeev T, Hwang A, Li Y, Ni P, Zhou X, Bakken TE, Bendl J, Bicks L, Chatterjee T, Cheng L, Cheng Y, Dai Y, Duan Z, Flaherty M, Fullard JF, Gancz M, Garrido-Martín D, Gaynor-Gillett S, Grundman J, Hawken N, Henry E, Hoffman GE, Huang A, Jiang Y, Jin T, Jorstad NL, Kawaguchi R, Khullar S, Liu J, Liu J, Liu S, Ma S, Margolis M, Mazariegos S, Moore J, Moran JR, Nguyen E, Phalke N, Pjanic M, Pratt H, Quintero D, Rajagopalan AS, Riesenmy TR, Shedd N, Shi M, Spector M, Terwilliger R, Travaglini KJ, Wamsley B, Wang G, Xia Y, Xiao S, Yang AC, Zheng S, Gandal MJ, Lee D, Lein ES, Roussos P, Sestan N, Weng Z, White KP, Won H, Girgenti MJ, Zhang J, Wang D, Geschwind D, Gerstein M. Single-cell genomics and regulatory networks for 388 human brains. Science 2024; 384:eadi5199. [PMID: 38781369 DOI: 10.1126/science.adi5199] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 04/05/2024] [Indexed: 05/25/2024]
Abstract
Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multiomics datasets into a resource comprising >2.8 million nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550,000 cell type-specific regulatory elements and >1.4 million single-cell expression quantitative trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.
Collapse
Affiliation(s)
- Prashant S Emani
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Jason J Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Declan Clarke
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Matthew Jensen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Jonathan Warrell
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Chirag Gupta
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Ran Meng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Che Yu Lee
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Siwei Xu
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Cagatay Dursun
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Shaoke Lou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Yuhang Chen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Zhiyuan Chu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Timur Galeev
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Ahyeon Hwang
- Department of Computer Science, University of California, Irvine, CA 92697, USA
- Mathematical, Computational and Systems Biology, University of California, Irvine, CA 92697, USA
| | - Yunyang Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
| | - Pengyu Ni
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Xiao Zhou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | | | - Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Lucy Bicks
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Tanima Chatterjee
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | | | - Yuyan Cheng
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Department of Ophthalmology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yi Dai
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Ziheng Duan
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | | | - John F Fullard
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Michael Gancz
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Diego Garrido-Martín
- Department of Genetics, Microbiology and Statistics, Universitat de Barcelona, Barcelona 08028, Spain
| | - Sophia Gaynor-Gillett
- Tempus Labs, Chicago, IL 60654, USA
- Department of Biology, Cornell College, Mount Vernon, IA 52314, USA
| | - Jennifer Grundman
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Natalie Hawken
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Ella Henry
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Mental Illness Research Education and Clinical Center, James J. Peters VA Medical Center, Bronx, NY 10468, USA
- Center for Precision Medicine and Translational Therapeutics, James J. Peters VA Medical Center, Bronx, NY 10468, USA
| | - Ao Huang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Ting Jin
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | | | - Riki Kawaguchi
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Center for Autism Research and Treatment, Semel Institute, University of California, Los Angeles, CA 90095, USA
| | - Saniya Khullar
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Jianyin Liu
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Junhao Liu
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Shuang Liu
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Shaojie Ma
- Department of Neuroscience, Yale University, New Haven, CT 06510, USA
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | | | - Samantha Mazariegos
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Jill Moore
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | | | - Eric Nguyen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Nishigandha Phalke
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Milos Pjanic
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Henry Pratt
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Diana Quintero
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | | | - Tiernon R Riesenmy
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Nicole Shedd
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | | | | | - Rosemarie Terwilliger
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06520, USA
| | | | - Brie Wamsley
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Gaoyuan Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Yan Xia
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Shaohua Xiao
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Andrew C Yang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Suchen Zheng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Michael J Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Donghoon Lee
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ed S Lein
- Allen Institute for Brain Science, Seattle, WA 98109, USA
- Department of Neurological Surgery, University of Washington, Seattle, WA 98195, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Mental Illness Research Education and Clinical Center, James J. Peters VA Medical Center, Bronx, NY 10468, USA
- Center for Precision Medicine and Translational Therapeutics, James J. Peters VA Medical Center, Bronx, NY 10468, USA
| | - Nenad Sestan
- Department of Neuroscience, Yale University, New Haven, CT 06510, USA
| | - Zhiping Weng
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Kevin P White
- Yong Loo Lin School of Medicine, National University of Singapore, 117597 Singapore
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Matthew J Girgenti
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06520, USA
- Wu Tsai Institute, Yale University, New Haven, CT 06520, USA
- Clinical Neuroscience Division, National Center for Posttraumatic Stress Disorder, Veterans Affairs Connecticut Healthcare System, West Haven, CT 06516, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Daifeng Wang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Daniel Geschwind
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Center for Autism Research and Treatment, Semel Institute, University of California, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
4
|
Ayyamperumal P, Naik HC, Naskar AJ, Bammidi LS, Gayen S. Epigenomic states contribute to coordinated allelic transcriptional bursting in iPSC reprogramming. Life Sci Alliance 2024; 7:e202302337. [PMID: 38320809 PMCID: PMC10847334 DOI: 10.26508/lsa.202302337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/30/2024] [Accepted: 01/30/2024] [Indexed: 02/12/2024] Open
Abstract
Two alleles of a gene can be transcribed independently or coordinatedly, which can lead to temporal expression heterogeneity with potentially distinct impacts on cell fate. Here, we profiled genome-wide allelic transcriptional burst kinetics during the reprogramming of MEF to induced pluripotent stem cells. We show that the degree of coordination of allelic bursting differs among genes, and alleles of many reprogramming-related genes burst in a highly coordinated fashion. Notably, we show that the chromatin accessibility of the two alleles of highly coordinated genes is similar, unlike the semi-coordinated or independent genes, suggesting the degree of coordination of allelic bursting is linked to allelic chromatin accessibility. Consistently, we show that many transcription factors have differential binding affinity between alleles of semi-coordinated or independent genes. We show that highly coordinated genes are enriched with chromatin accessibility regulators such as H3K4me3, H3K4me1, H3K36me3, H3K27ac, histone variant H3.3, and BRD4. Finally, we demonstrate that enhancer elements are highly enriched in highly coordinated genes. Our study demonstrates that epigenomic states contribute to coordinated allelic bursting to fine-tune gene expression during induced pluripotent stem cell reprogramming.
Collapse
Affiliation(s)
- Parichitran Ayyamperumal
- https://ror.org/04dese585 Chromatin, RNA and Genome (CRG) Laboratory, Department of Developmental Biology and Genetics, Indian Institute of Science, Bangalore, India
| | - Hemant Chandru Naik
- https://ror.org/04dese585 Chromatin, RNA and Genome (CRG) Laboratory, Department of Developmental Biology and Genetics, Indian Institute of Science, Bangalore, India
| | - Amlan Jyoti Naskar
- https://ror.org/04dese585 Chromatin, RNA and Genome (CRG) Laboratory, Department of Developmental Biology and Genetics, Indian Institute of Science, Bangalore, India
| | - Lakshmi Sowjanya Bammidi
- https://ror.org/04dese585 Chromatin, RNA and Genome (CRG) Laboratory, Department of Developmental Biology and Genetics, Indian Institute of Science, Bangalore, India
| | - Srimonta Gayen
- https://ror.org/04dese585 Chromatin, RNA and Genome (CRG) Laboratory, Department of Developmental Biology and Genetics, Indian Institute of Science, Bangalore, India
| |
Collapse
|
5
|
Emani PS, Liu JJ, Clarke D, Jensen M, Warrell J, Gupta C, Meng R, Lee CY, Xu S, Dursun C, Lou S, Chen Y, Chu Z, Galeev T, Hwang A, Li Y, Ni P, Zhou X, Bakken TE, Bendl J, Bicks L, Chatterjee T, Cheng L, Cheng Y, Dai Y, Duan Z, Flaherty M, Fullard JF, Gancz M, Garrido-Martín D, Gaynor-Gillett S, Grundman J, Hawken N, Henry E, Hoffman GE, Huang A, Jiang Y, Jin T, Jorstad NL, Kawaguchi R, Khullar S, Liu J, Liu J, Liu S, Ma S, Margolis M, Mazariegos S, Moore J, Moran JR, Nguyen E, Phalke N, Pjanic M, Pratt H, Quintero D, Rajagopalan AS, Riesenmy TR, Shedd N, Shi M, Spector M, Terwilliger R, Travaglini KJ, Wamsley B, Wang G, Xia Y, Xiao S, Yang AC, Zheng S, Gandal MJ, Lee D, Lein ES, Roussos P, Sestan N, Weng Z, White KP, Won H, Girgenti MJ, Zhang J, Wang D, Geschwind D, Gerstein M. Single-cell genomics and regulatory networks for 388 human brains. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.18.585576. [PMID: 38562822 PMCID: PMC10983939 DOI: 10.1101/2024.03.18.585576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet, little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multi-omics datasets into a resource comprising >2.8M nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550K cell-type-specific regulatory elements and >1.4M single-cell expression-quantitative-trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.
Collapse
Affiliation(s)
- Prashant S Emani
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Jason J Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Declan Clarke
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Matthew Jensen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Jonathan Warrell
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Chirag Gupta
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Ran Meng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Che Yu Lee
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Siwei Xu
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Cagatay Dursun
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Shaoke Lou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Yuhang Chen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Zhiyuan Chu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - Timur Galeev
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Ahyeon Hwang
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
- Mathematical, Computational and Systems Biology, University of California, Irvine, CA, 92697, USA
| | - Yunyang Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA
| | - Pengyu Ni
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Xiao Zhou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | | | - Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Lucy Bicks
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Tanima Chatterjee
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | | | - Yuyan Cheng
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Department of Opthalmology, Perlman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yi Dai
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Ziheng Duan
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | | | - John F Fullard
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Michael Gancz
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Diego Garrido-Martín
- Department of Genetics, Microbiology and Statistics, Universitat de Barcelona, Barcelona, 08028, Spain
| | - Sophia Gaynor-Gillett
- Tempus Labs, Inc., Chicago, IL, 60654, USA
- Department of Biology, Cornell College, Mount Vernon, IA, 52314, USA
| | - Jennifer Grundman
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Natalie Hawken
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Ella Henry
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Mental Illness Research Education and Clinical Center, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
- Center for Precision Medicine and Translational Therapeutics, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
| | - Ao Huang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Ting Jin
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | | | - Riki Kawaguchi
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Center for Autism Research and Treatment, Semel Institute, University of California, Los Angeles, CA, 90095, USA
| | - Saniya Khullar
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Jianyin Liu
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Junhao Liu
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Shuang Liu
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Shaojie Ma
- Department of Neuroscience, Yale University, New Haven, CT, 06510, USA
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Michael Margolis
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Samantha Mazariegos
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Jill Moore
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| | | | - Eric Nguyen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Nishigandha Phalke
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| | - Milos Pjanic
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Henry Pratt
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| | - Diana Quintero
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | | | - Tiernon R Riesenmy
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06520, USA
| | - Nicole Shedd
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| | - Manman Shi
- Tempus Labs, Inc., Chicago, IL, 60654, USA
| | | | - Rosemarie Terwilliger
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06520, USA
| | | | - Brie Wamsley
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Gaoyuan Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Yan Xia
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Shaohua Xiao
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Andrew C Yang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Suchen Zheng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Michael J Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Donghoon Lee
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Ed S Lein
- Allen Institute for Brain Science, Seattle, WA, 98109, USA
- Department of Neurological Surgery, University of Washington, Seattle, WA, 98195, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Mental Illness Research Education and Clinical Center, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
- Center for Precision Medicine and Translational Therapeutics, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
| | - Nenad Sestan
- Department of Neuroscience, Yale University, New Haven, CT, 06510, USA
| | - Zhiping Weng
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| | - Kevin P White
- Yong Loo Lin School of Medicine, National University of Singapore, 117597, Singapore
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Matthew J Girgenti
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06520, USA
- Wu Tsai Institute, Yale University, New Haven, CT, 06520, USA
- Clinical Neuroscience Division, National Center for Posttraumatic Stress Disorder, Veterans Affairs Connecticut Healthcare System, West Haven, CT, 06516, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Daifeng Wang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Daniel Geschwind
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Center for Autism Research and Treatment, Semel Institute, University of California, Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06520, USA
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT, 06520, USA
| |
Collapse
|
6
|
Weideman AMK, Wang R, Ibrahim JG, Jiang Y. Canopy2: tumor phylogeny inference by bulk DNA and single-cell RNA sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.18.585595. [PMID: 38562795 PMCID: PMC10983938 DOI: 10.1101/2024.03.18.585595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Tumors are comprised of a mixture of distinct cell populations that differ in terms of genetic makeup and function. Such heterogeneity plays a role in the development of drug resistance and the ineffectiveness of targeted cancer therapies. Insight into this complexity can be obtained through the construction of a phylogenetic tree, which illustrates the evolutionary lineage of tumor cells as they acquire mutations over time. We propose Canopy2, a Bayesian framework that uses single nucleotide variants derived from bulk DNA and single-cell RNA sequencing to infer tumor phylogeny and conduct mutational profiling of tumor subpopulations. Canopy2 uses Markov chain Monte Carlo methods to sample from a joint probability distribution involving a mixture of binomial and beta-binomial distributions, specifically chosen to account for the sparsity and stochasticity of the single-cell data. Canopy2 demystifies the sources of zeros in the single-cell data and separates zeros categorized as non-cancerous (cells without mutations), stochastic (mutations not expressed due to bursting), and technical (expressed mutations not picked up by sequencing). Simulations demonstrate that Canopy2 consistently outperforms competing methods and reconstructs the clonal tree with high fidelity, even in situations involving low sequencing depth, poor single-cell yield, and highly-advanced and polyclonal tumors. We further assess the performance of Canopy2 through application to breast cancer and glioblastoma data, benchmarking against existing methods. Canopy2 is an open-source R package available at https://github.com/annweideman/canopy2.
Collapse
Affiliation(s)
- Ann Marie K. Weideman
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rujin Wang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Joseph G. Ibrahim
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
7
|
Conte MI, Fuentes-Trillo A, Domínguez Conde C. Opportunities and tradeoffs in single-cell transcriptomic technologies. Trends Genet 2024; 40:83-93. [PMID: 37953195 DOI: 10.1016/j.tig.2023.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/26/2023] [Accepted: 10/03/2023] [Indexed: 11/14/2023]
Abstract
Recent technological and algorithmic advances enable single-cell transcriptomic analysis with remarkable depth and breadth. Nonetheless, a persistent challenge is the compromise between the ability to profile high numbers of cells and the achievement of full-length transcript coverage. Currently, the field is progressing and developing new and creative solutions that improve cellular throughput, gene detection sensitivity and full-length transcript capture. Furthermore, long-read sequencing approaches for single-cell transcripts are breaking frontiers that have previously blocked full transcriptome characterization. We here present a comprehensive overview of available options for single-cell transcriptome profiling, highlighting the key advantages and disadvantages of each approach.
Collapse
Affiliation(s)
- Matilde I Conte
- Human Technopole, Viale Rita Levi-Montalcini 1, 20157 Milan, Italy
| | | | | |
Collapse
|
8
|
Chen L, Chang D, Tandukar B, Deivendran D, Pozniak J, Cruz-Pacheco N, Cho RJ, Cheng J, Yeh I, Marine C, Bastian BC, Ji AL, Shain AH. STmut: a framework for visualizing somatic alterations in spatial transcriptomics data of cancer. Genome Biol 2023; 24:273. [PMID: 38037084 PMCID: PMC10688493 DOI: 10.1186/s13059-023-03121-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 11/22/2023] [Indexed: 12/02/2023] Open
Abstract
Spatial transcriptomic technologies, such as the Visium platform, measure gene expression in different regions of tissues. Here, we describe new software, STmut, to visualize somatic point mutations, allelic imbalance, and copy number alterations in Visium data. STmut is tested on fresh-frozen Visium data, formalin-fixed paraffin-embedded (FFPE) Visium data, and tumors with and without matching DNA sequencing data. Copy number is inferred on all conditions, but the chemistry of the FFPE platform does not permit analyses of single nucleotide variants. Taken together, we propose solutions to add the genetic dimension to spatial transcriptomic data and describe the limitations of different datatypes.
Collapse
Affiliation(s)
- Limin Chen
- Department of Dermatology, University of California, San Francisco, San Francisco, USA
| | - Darwin Chang
- Department of Immunology, H. Lee Moffitt Cancer Center, Tampa, USA
| | - Bishal Tandukar
- Department of Dermatology, University of California, San Francisco, San Francisco, USA
| | - Delahny Deivendran
- Department of Dermatology, University of California, San Francisco, San Francisco, USA
| | - Joanna Pozniak
- Laboratory for Molecular Cancer Biology, Center for Cancer Biology, VIB, Louvain, Belgium
- Laboratory for Molecular Cancer Biology, Department of Oncology, KU Leuven, Louvain, Belgium
| | - Noel Cruz-Pacheco
- Department of Dermatology, University of California, San Francisco, San Francisco, USA
| | - Raymond J Cho
- Department of Dermatology, University of California, San Francisco, San Francisco, USA
| | - Jeffrey Cheng
- Department of Dermatology, University of California, San Francisco, San Francisco, USA
| | - Iwei Yeh
- Department of Dermatology, University of California, San Francisco, San Francisco, USA
- Department of Pathology, University of California, San Francisco, San Francisco, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, USA
| | - Chris Marine
- Laboratory for Molecular Cancer Biology, Center for Cancer Biology, VIB, Louvain, Belgium
- Laboratory for Molecular Cancer Biology, Department of Oncology, KU Leuven, Louvain, Belgium
| | - Boris C Bastian
- Department of Dermatology, University of California, San Francisco, San Francisco, USA
- Department of Pathology, University of California, San Francisco, San Francisco, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, USA
| | - Andrew L Ji
- Department of Dermatology, Department of Oncological Sciences, Black Family Stem Cell Institute, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York City, USA
| | - A Hunter Shain
- Department of Dermatology, University of California, San Francisco, San Francisco, USA.
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, USA.
| |
Collapse
|
9
|
Michoel T, Zhang JD. Causal inference in drug discovery and development. Drug Discov Today 2023; 28:103737. [PMID: 37591410 DOI: 10.1016/j.drudis.2023.103737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 07/31/2023] [Accepted: 08/10/2023] [Indexed: 08/19/2023]
Abstract
To discover new drugs is to seek and to prove causality. As an emerging approach leveraging human knowledge and creativity, data, and machine intelligence, causal inference holds the promise of reducing cognitive bias and improving decision-making in drug discovery. Although it has been applied across the value chain, the concepts and practice of causal inference remain obscure to many practitioners. This article offers a nontechnical introduction to causal inference, reviews its recent applications, and discusses opportunities and challenges of adopting the causal language in drug discovery and development.
Collapse
Affiliation(s)
- Tom Michoel
- Computational Biology Unit, Department of Informatics, University of Bergen, Postboks 7803, 5020 Bergen, Norway
| | - Jitao David Zhang
- Pharma Early Research and Development, Roche Innovation Centre Basel, F. Hoffmann-La Roche, Grenzacherstrasse 124, 4070 Basel, Switzerland; Department of Mathematics and Computer Science, University of Basel, Spiegelgasse 1, 4051 Basel, Switzerland.
| |
Collapse
|
10
|
Cuomo ASE, Nathan A, Raychaudhuri S, MacArthur DG, Powell JE. Single-cell genomics meets human genetics. Nat Rev Genet 2023; 24:535-549. [PMID: 37085594 PMCID: PMC10784789 DOI: 10.1038/s41576-023-00599-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/29/2023] [Indexed: 04/23/2023]
Abstract
Single-cell genomic technologies are revealing the cellular composition, identities and states in tissues at unprecedented resolution. They have now scaled to the point that it is possible to query samples at the population level, across thousands of individuals. Combining single-cell information with genotype data at this scale provides opportunities to link genetic variation to the cellular processes underpinning key aspects of human biology and disease. This strategy has potential implications for disease diagnosis, risk prediction and development of therapeutic solutions. But, effectively integrating large-scale single-cell genomic data, genetic variation and additional phenotypic data will require advances in data generation and analysis methods. As single-cell genetics begins to emerge as a field in its own right, we review its current state and the challenges and opportunities ahead.
Collapse
Affiliation(s)
- Anna S E Cuomo
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
| | - Aparna Nathan
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Joseph E Powell
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
11
|
Tang W, Jørgensen ACS, Marguerat S, Thomas P, Shahrezaei V. Modelling capture efficiency of single-cell RNA-sequencing data improves inference of transcriptome-wide burst kinetics. Bioinformatics 2023; 39:btad395. [PMID: 37354494 PMCID: PMC10318389 DOI: 10.1093/bioinformatics/btad395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 05/18/2023] [Accepted: 06/22/2023] [Indexed: 06/26/2023] Open
Abstract
MOTIVATION Gene expression is characterized by stochastic bursts of transcription that occur at brief and random periods of promoter activity. The kinetics of gene expression burstiness differs across the genome and is dependent on the promoter sequence, among other factors. Single-cell RNA sequencing (scRNA-seq) has made it possible to quantify the cell-to-cell variability in transcription at a global genome-wide level. However, scRNA-seq data are prone to technical variability, including low and variable capture efficiency of transcripts from individual cells. RESULTS Here, we propose a novel mathematical theory for the observed variability in scRNA-seq data. Our method captures burst kinetics and variability in both the cell size and capture efficiency, which allows us to propose several likelihood-based and simulation-based methods for the inference of burst kinetics from scRNA-seq data. Using both synthetic and real data, we show that the simulation-based methods provide an accurate, robust and flexible tool for inferring burst kinetics from scRNA-seq data. In particular, in a supervised manner, a simulation-based inference method based on neural networks proves to be accurate and useful when applied to both allele and nonallele-specific scRNA-seq data. AVAILABILITY AND IMPLEMENTATION The code for Neural Network and Approximate Bayesian Computation inference is available at https://github.com/WT215/nnRNA and https://github.com/WT215/Julia_ABC, respectively.
Collapse
Affiliation(s)
- Wenhao Tang
- Department of Mathematics, Imperial College London, London SW7 2BX, United Kingdom
| | - Andreas Christ Sølvsten Jørgensen
- Department of Mathematics, Imperial College London, London SW7 2BX, United Kingdom
- I-X Centre for AI in Science, Imperial College London, White City Campus, London W12 0BZ, United Kingdom
| | - Samuel Marguerat
- MRC London Institute of Medical Sciences (LMS), London W12 0NN, United Kingdom
- Institute of Clinical Sciences (ICS), Faculty of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Philipp Thomas
- Department of Mathematics, Imperial College London, London SW7 2BX, United Kingdom
| | - Vahid Shahrezaei
- Department of Mathematics, Imperial College London, London SW7 2BX, United Kingdom
| |
Collapse
|
12
|
Dong M, He Y, Jiang Y, Zou F. Joint gene network construction by single-cell RNA sequencing data. Biometrics 2023; 79:915-925. [PMID: 35184277 PMCID: PMC10548400 DOI: 10.1111/biom.13645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 11/30/2021] [Accepted: 02/07/2022] [Indexed: 11/26/2022]
Abstract
In contrast to differential gene expression analysis at the single-gene level, gene regulatory network (GRN) analysis depicts complex transcriptomic interactions among genes for better understandings of underlying genetic architectures of human diseases and traits. Recent advances in single-cell RNA sequencing (scRNA-seq) allow constructing GRNs at a much finer resolution than bulk RNA-seq and microarray data. However, scRNA-seq data are inherently sparse, which hinders the direct application of the popular Gaussian graphical models (GGMs). Furthermore, most existing approaches for constructing GRNs with scRNA-seq data only consider gene networks under one condition. To better understand GRNs across different but related conditions at single-cell resolution, we propose to construct Joint Gene Networks with scRNA-seq data (JGNsc) under the GGMs framework. To facilitate the use of GGMs, JGNsc first proposes a hybrid imputation procedure that combines a Bayesian zero-inflated Poisson model with an iterative low-rank matrix completion step to efficiently impute zero-inflated counts resulted from technical artifacts. JGNsc then transforms the imputed data via a nonparanormal transformation, based on which joint GGMs are constructed. We demonstrate JGNsc and assess its performance using synthetic data. The application of JGNsc on two cancer clinical studies of medulloblastoma and glioblastoma gains novel insights in addition to confirming well-known biological results.
Collapse
Affiliation(s)
- Meichen Dong
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Yiping He
- Department of Pathology, School of Medicine, Duke University, Durham, North Carolina, USA
| | - Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Fei Zou
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
13
|
Krueger K, Lamenza F, Gu H, El-Hodiri H, Wester J, Oberdick J, Fischer AJ, Oghumu S. Sex differences in susceptibility to substance use disorder: Role for X chromosome inactivation and escape? Mol Cell Neurosci 2023; 125:103859. [PMID: 37207894 PMCID: PMC10286730 DOI: 10.1016/j.mcn.2023.103859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 05/01/2023] [Accepted: 05/08/2023] [Indexed: 05/21/2023] Open
Abstract
There is a sex-based disparity associated with substance use disorders (SUDs) as demonstrated by clinical and preclinical studies. Females are known to escalate from initial drug use to compulsive drug-taking behavior (telescoping) more rapidly, and experience greater negative withdrawal effects than males. Although these biological differences have largely been attributed to sex hormones, there is evidence for non-hormonal factors, such as the influence of the sex chromosome, which underlie sex disparities in addiction behavior. However, genetic and epigenetic mechanisms underlying sex chromosome influences on substance abuse behavior are not completely understood. In this review, we discuss the role that escape from X-chromosome inactivation (XCI) in females plays in sex-associated differences in addiction behavior. Females have two X chromosomes (XX), and during XCI, one X chromosome is randomly chosen to be transcriptionally silenced. However, some X-linked genes escape XCI and display biallelic gene expression. We generated a mouse model using an X-linked gene specific bicistronic dual reporter mouse as a tool to visualize allelic usage and measure XCI escape in a cell specific manner. Our results revealed a previously undiscovered X-linked gene XCI escaper (CXCR3), which is variable and cell type dependent. This illustrates the highly complex and context dependent nature of XCI escape which is largely understudied in the context of SUD. Novel approaches such as single cell RNA sequencing will provide a global molecular landscape and impact of XCI escape in addiction and facilitate our understanding of the contribution of XCI escape to sex disparities in SUD.
Collapse
Affiliation(s)
- Kate Krueger
- Department of Pharmacy, The Ohio State University, Columbus, OH, USA
| | - Felipe Lamenza
- Department of Pathology, The Ohio State University Wexner Medical Center, Columbus, OH, USA; Department of Microbiology, The Ohio State University, Columbus, OH, USA
| | - Howard Gu
- Department of Biological Chemistry and Pharmacology, The Ohio State University, Columbus, OH, USA
| | - Heithem El-Hodiri
- Department of Neuroscience, The Ohio State University, Columbus, OH, USA
| | - Jason Wester
- Department of Neuroscience, The Ohio State University, Columbus, OH, USA
| | - John Oberdick
- Department of Neuroscience, The Ohio State University, Columbus, OH, USA
| | - Andy J Fischer
- Department of Neuroscience, The Ohio State University, Columbus, OH, USA
| | - Steve Oghumu
- Department of Pathology, The Ohio State University Wexner Medical Center, Columbus, OH, USA.
| |
Collapse
|
14
|
Solomon BD, Zheng H, Dillon LW, Goldman J, Hourigan CS, Heath J, Khatri P. Prediction of HLA genotypes from single-cell transcriptome data. Front Immunol 2023; 14:1146826. [PMID: 37180102 PMCID: PMC10167300 DOI: 10.3389/fimmu.2023.1146826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 04/04/2023] [Indexed: 05/15/2023] Open
Abstract
The human leukocyte antigen (HLA) locus plays a central role in adaptive immune function and has significant clinical implications for tissue transplant compatibility and allelic disease associations. Studies using bulk-cell RNA sequencing have demonstrated that HLA transcription may be regulated in an allele-specific manner and single-cell RNA sequencing (scRNA-seq) has the potential to better characterize these expression patterns. However, quantification of allele-specific expression (ASE) for HLA loci requires sample-specific reference genotyping due to extensive polymorphism. While genotype prediction from bulk RNA sequencing is well described, the feasibility of predicting HLA genotypes directly from single-cell data is unknown. Here we evaluate and expand upon several computational HLA genotyping tools by comparing predictions from human single-cell data to gold-standard, molecular genotyping. The highest 2-field accuracy averaged across all loci was 76% by arcasHLA and increased to 86% using a composite model of multiple genotyping tools. We also developed a highly accurate model (AUC 0.93) for predicting HLA-DRB345 copy number in order to improve genotyping accuracy of the HLA-DRB locus. Genotyping accuracy improved with read depth and was reproducible at repeat sampling. Using a metanalytic approach, we also show that HLA genotypes from PHLAT and OptiType can generate ASE ratios that are highly correlated (R2 = 0.8 and 0.94, respectively) with those derived from gold-standard genotyping.
Collapse
Affiliation(s)
| | - Hong Zheng
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, Stanford, CA, United States
- Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, Stanford, CA, United States
| | - Laura W. Dillon
- Laboratory of Myeloid Malignancies, National Heart Lung and Blood Institute, Bethesda, MD, United States
| | - Jason D. Goldman
- Swedish Center for Research and Innovation, Swedish Medical Center, Seattle, WA, United States
- Providence St. Joseph Health, Renton, WA, United States
- Division of Allergy & Infectious Diseases, University of Washington, Seattle, WA, United States
| | - Christopher S. Hourigan
- Laboratory of Myeloid Malignancies, National Heart Lung and Blood Institute, Bethesda, MD, United States
| | - James R. Heath
- Institute for Systems Biology, Seattle, WA, United States
- Department of Bioengineering, University of Washington, Seattle, WA, United States
| | - Purvesh Khatri
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, Stanford, CA, United States
- Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, Stanford, CA, United States
| |
Collapse
|
15
|
Luo S, Zhang Z, Wang Z, Yang X, Chen X, Zhou T, Zhang J. Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph model. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221057. [PMID: 37035293 PMCID: PMC10073913 DOI: 10.1098/rsos.221057] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 03/13/2023] [Indexed: 06/19/2023]
Abstract
Gene expression has inherent stochasticity resulting from transcription's burst manners. Single-cell snapshot data can be exploited to rigorously infer transcriptional burst kinetics, using mathematical models as blueprints. The classical telegraph model (CTM) has been widely used to explain transcriptional bursting with Markovian assumptions. However, growing evidence suggests that the gene-state dwell times are generally non-exponential, as gene-state switching is a multi-step process in organisms. Therefore, interpretable non-Markovian mathematical models and efficient statistical inference methods are urgently required in investigating transcriptional burst kinetics. We develop an interpretable and tractable model, the generalized telegraph model (GTM), to characterize transcriptional bursting that allows arbitrary dwell-time distributions, rather than exponential distributions, to be incorporated into the ON and OFF switching process. Based on the GTM, we propose an inference method for transcriptional bursting kinetics using an approximate Bayesian computation framework. This method demonstrates an efficient and scalable estimation of burst frequency and burst size on synthetic data. Further, the application of inference to genome-wide data from mouse embryonic fibroblasts reveals that GTM would estimate lower burst frequency and higher burst size than those estimated by CTM. In conclusion, the GTM and the corresponding inference method are effective tools to infer dynamic transcriptional bursting from static single-cell snapshot data.
Collapse
Affiliation(s)
- Songhao Luo
- Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
- School of Mathematics, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
| | - Zhenquan Zhang
- Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
- School of Mathematics, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
| | - Zihao Wang
- Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
- School of Mathematics, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
| | - Xiyan Yang
- School of Financial Mathematics and Statistics, Guangdong University of Finance, Guangzhou 510521, People's Republic of China
| | - Xiaoxuan Chen
- Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
- School of Mathematics, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
| | - Tianshou Zhou
- Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
- School of Mathematics, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
| | - Jiajun Zhang
- Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
- School of Mathematics, Sun Yat-sen University, Guangzhou, Guangdong Province 510275, People's Republic of China
| |
Collapse
|
16
|
Jun SH, Toosi H, Mold J, Engblom C, Chen X, O'Flanagan C, Hagemann-Jensen M, Sandberg R, Aparicio S, Hartman J, Roth A, Lagergren J. Reconstructing clonal tree for phylo-phenotypic characterization of cancer using single-cell transcriptomics. Nat Commun 2023; 14:982. [PMID: 36813776 PMCID: PMC9946941 DOI: 10.1038/s41467-023-36202-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 01/20/2023] [Indexed: 02/24/2023] Open
Abstract
Functional characterization of the cancer clones can shed light on the evolutionary mechanisms driving cancer's proliferation and relapse mechanisms. Single-cell RNA sequencing data provide grounds for understanding the functional state of cancer as a whole; however, much research remains to identify and reconstruct clonal relationships toward characterizing the changes in functions of individual clones. We present PhylEx that integrates bulk genomics data with co-occurrences of mutations from single-cell RNA sequencing data to reconstruct high-fidelity clonal trees. We evaluate PhylEx on synthetic and well-characterized high-grade serous ovarian cancer cell line datasets. PhylEx outperforms the state-of-the-art methods both when comparing capacity for clonal tree reconstruction and for identifying clones. We analyze high-grade serous ovarian cancer and breast cancer data to show that PhylEx exploits clonal expression profiles beyond what is possible with expression-based clustering methods and clear the way for accurate inference of clonal trees and robust phylo-phenotypic analysis of cancer.
Collapse
Affiliation(s)
- Seong-Hwan Jun
- SciLifeLab, School of EECS, KTH Royal Institute of Technology, Stockholm, Sweden.,Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, USA
| | - Hosein Toosi
- SciLifeLab, School of EECS, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Jeff Mold
- Department of Cell and Molecular Biology, Karolinska Institutet, Solna, Sweden
| | - Camilla Engblom
- Department of Cell and Molecular Biology, Karolinska Institutet, Solna, Sweden
| | - Xinsong Chen
- Department of Oncology and Pathology, Karolinska Institutet, Solna, Sweden
| | - Ciara O'Flanagan
- Department of Molecular Oncology, BC Cancer, Vancouver, BC, Canada
| | | | - Rickard Sandberg
- Department of Cell and Molecular Biology, Karolinska Institutet, Solna, Sweden
| | - Samuel Aparicio
- Department of Molecular Oncology, BC Cancer, Vancouver, BC, Canada.,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Johan Hartman
- Department of Oncology and Pathology, Karolinska Institutet, Solna, Sweden.,Department of Clinical Pathology and Cytology, Karolinska University Laboratory, Stockholm, Sweden
| | - Andrew Roth
- Department of Molecular Oncology, BC Cancer, Vancouver, BC, Canada. .,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada. .,Department of Computer Science, University of British Columbia, Vancouver, Canada.
| | - Jens Lagergren
- SciLifeLab, School of EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
| |
Collapse
|
17
|
Luo S, Wang Z, Zhang Z, Zhou T, Zhang J. Genome-wide inference reveals that feedback regulations constrain promoter-dependent transcriptional burst kinetics. Nucleic Acids Res 2022; 51:68-83. [PMID: 36583343 PMCID: PMC9874261 DOI: 10.1093/nar/gkac1204] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 11/06/2022] [Accepted: 12/06/2022] [Indexed: 12/31/2022] Open
Abstract
Gene expression in mammalian cells is highly variable and episodic, resulting in a series of discontinuous bursts of mRNAs. A challenge is to understand how static promoter architecture and dynamic feedback regulations dictate bursting on a genome-wide scale. Although single-cell RNA sequencing (scRNA-seq) provides an opportunity to address this challenge, effective analytical methods are scarce. We developed an interpretable and scalable inference framework, which combined experimental data with a mechanistic model to infer transcriptional burst kinetics (sizes and frequencies) and feedback regulations. Applying this framework to scRNA-seq data generated from embryonic mouse fibroblast cells, we found Simpson's paradoxes, i.e. genome-wide burst kinetics exhibit different characteristics in two cases without and with distinguishing feedback regulations. We also showed that feedbacks differently modulate burst frequencies and sizes and conceal the effects of transcription start site distributions on burst kinetics. Notably, only in the presence of positive feedback, TATA genes are expressed with high burst frequencies and enhancer-promoter interactions mainly modulate burst frequencies. The developed inference method provided a flexible and efficient way to investigate transcriptional burst kinetics and the obtained results would be helpful for understanding cell development and fate decision.
Collapse
Affiliation(s)
| | | | - Zhenquan Zhang
- Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, 510275, P. R. China,School of Mathematics, Sun Yat-sen University, Guangzhou, Guangdong Province, 510275, P. R. China
| | - Tianshou Zhou
- Correspondence may also be addressed to Tianshou Zhou. Tel: +86 20 84134958;
| | - Jiajun Zhang
- To whom correspondence should be addressed. Tel: +86 20 84111829;
| |
Collapse
|
18
|
Harmanci A, Harmanci AS, Klisch TJ, Patel AJ. XCVATR: detection and characterization of variant impact on the Embeddings of single -cell and bulk RNA-sequencing samples. BMC Genomics 2022; 23:841. [PMID: 36539717 PMCID: PMC9764736 DOI: 10.1186/s12864-022-09004-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 11/09/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND RNA-sequencing has become a standard tool for analyzing gene activity in bulk samples and at the single-cell level. By increasing sample sizes and cell counts, this technique can uncover substantial information about cellular transcriptional states. Beyond quantification of gene expression, RNA-seq can be used for detecting variants, including single nucleotide polymorphisms, small insertions/deletions, and larger variants, such as copy number variants. Notably, joint analysis of variants with cellular transcriptional states may provide insights into the impact of mutations, especially for complex and heterogeneous samples. However, this analysis is often challenging due to a prohibitively high number of variants and cells, which are difficult to summarize and visualize. Further, there is a dearth of methods that assess and summarize the association between detected variants and cellular transcriptional states. RESULTS Here, we introduce XCVATR (eXpressed Clusters of Variant Alleles in Transcriptome pRofiles), a method that identifies variants and detects local enrichment of expressed variants within embedding of samples and cells in single-cell and bulk RNA-seq datasets. XCVATR visualizes local "clumps" of small and large-scale variants and searches for patterns of association between each variant and cellular states, as described by the coordinates of cell embedding, which can be computed independently using any type of distance metrics, such as principal component analysis or t-distributed stochastic neighbor embedding. Through simulations and analysis of real datasets, we demonstrate that XCVATR can detect enrichment of expressed variants and provide insight into the transcriptional states of cells and samples. We next sequenced 2 new single cell RNA-seq tumor samples and applied XCVATR. XCVATR revealed subtle differences in CNV impact on tumors. CONCLUSIONS XCVATR is publicly available to download from https://github.com/harmancilab/XCVATR .
Collapse
Affiliation(s)
- Arif Harmanci
- grid.267308.80000 0000 9206 2401University of Texas Health Science Center, School of Biomedical Informatics, Center for Secure Artificial intelligence For hEalthcare (SAFE), Center for Precision Health, Houston, USA
| | - Akdes Serin Harmanci
- grid.39382.330000 0001 2160 926XDepartment of Neurosurgery, Baylor College of Medicine, Houston, TX 77030 USA
| | - Tiemo J. Klisch
- grid.416975.80000 0001 2200 2638Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030 USA ,grid.39382.330000 0001 2160 926XDepartment of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Akash J. Patel
- grid.39382.330000 0001 2160 926XDepartment of Neurosurgery, Baylor College of Medicine, Houston, TX 77030 USA ,grid.416975.80000 0001 2200 2638Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030 USA ,grid.39382.330000 0001 2160 926XDepartment of Otolaryngology – Head and Neck Surgery, Baylor College of Medicine, Houston, TX 77030 USA
| |
Collapse
|
19
|
Luo X, Qin F, Xiao F, Cai G. BISC: accurate inference of transcriptional bursting kinetics from single-cell transcriptomic data. Brief Bioinform 2022; 23:6793779. [PMID: 36326081 DOI: 10.1093/bib/bbac464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 09/20/2022] [Accepted: 09/27/2022] [Indexed: 11/06/2022] Open
Abstract
Gene expression in mammalian cells is inherently stochastic and mRNAs are synthesized in discrete bursts. Single-cell transcriptomics provides an unprecedented opportunity to explore the transcriptome-wide kinetics of transcriptional bursting. However, current analysis methods provide limited accuracy in bursting inference due to substantial noise inherent to single-cell transcriptomic data. In this study, we developed BISC, a Bayesian method for inferring bursting parameters from single cell transcriptomic data. Based on a beta-gamma-Poisson model, BISC modeled the mean-variance dependency to achieve accurate estimation of bursting parameters from noisy data. Evaluation based on both simulation and real intron sequential RNA fluorescence in situ hybridization data showed improved accuracy and reliability of BISC over existing methods, especially for genes with low expression values. Further application of BISC found bursting frequency but not bursting size was strongly associated with gene expression regulation. Moreover, our analysis provided new mechanistic insights into the functional role of enhancer and superenhancer by modulating both bursting frequency and size. BISC also formulated a downstream framework to identify differential bursting (in frequency and size separately) genes in samples under different conditions. Applying to multiple datasets (a mouse embryonic cell and fibroblast dataset, a human immune cell dataset and a human pancreatic cell dataset), BISC identified known cell-type signature genes that were missed by differential expression analysis, providing additional insights in understanding the cell-specific stochastic gene transcription. Applying to datasets of human lung and colon cancers, BISC successfully detected tumor signature genes based on alterations in bursting kinetics, which illustrates its value in understanding disease development regarding transcriptional bursting. Collectively, BISC provides a new tool for accurately inferring bursting kinetics and detecting differential bursting genes. This study also produced new insights in the role of transcriptional bursting in regulating gene expression, cell identity and tumor progression.
Collapse
Affiliation(s)
- Xizhi Luo
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Fei Qin
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Feifei Xiao
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA
| | - Guoshuai Cai
- Department of Environmental Health Science, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| |
Collapse
|
20
|
Jiang Y, Harigaya Y, Zhang Z, Zhang H, Zang C, Zhang NR. Nonparametric single-cell multiomic characterization of trio relationships between transcription factors, target genes, and cis-regulatory regions. Cell Syst 2022; 13:737-751.e4. [PMID: 36055233 PMCID: PMC9509445 DOI: 10.1016/j.cels.2022.08.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 06/23/2022] [Accepted: 08/11/2022] [Indexed: 01/26/2023]
Abstract
The epigenetic control of gene expression is highly cell-type and context specific. Yet, despite its complexity, gene regulatory logic can be broken down into modular components consisting of a transcription factor (TF) activating or repressing the target gene expression through its binding to a cis-regulatory region. We propose a nonparametric approach, TRIPOD, to detect and characterize the three-way relationships between a TF, its target gene, and the accessibility of the TF's binding site using single-cell RNA and ATAC multiomic data. We apply TRIPOD to interrogate the cell-type-specific regulatory logic in peripheral blood mononuclear cells and contrast our results to detections from enhancer databases, cis-eQTL studies, ChIP-seq experiments, and TF knockdown/knockout studies. We then apply TRIPOD to mouse embryonic brain data and identify regulatory relationships, validated by ChIP-seq and PLAC-seq. Finally, we demonstrate TRIPOD on the SHARE-seq data of differentiating mouse hair follicle cells and identify lineage-specific regulation supported by histone marks and super-enhancer annotations. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599, USA; Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA; Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA.
| | - Yuriko Harigaya
- Curriculum in Bioinformatics and Computational Biology, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Zhaojun Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Hongpan Zhang
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
| | - Chongzhi Zang
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA; Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Nancy R Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
21
|
Quinones-Valdez G, Fu T, Chan TW, Xiao X. scAllele: A versatile tool for the detection and analysis of variants in scRNA-seq. SCIENCE ADVANCES 2022; 8:eabn6398. [PMID: 36054357 DOI: 10.1126/sciadv.abn6398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) data contain rich information at the gene, transcript, and nucleotide levels. Most analyses of scRNA-seq have focused on gene expression profiles, and it remains challenging to extract nucleotide variants and isoform-specific information. Here, we present scAllele, an integrative approach that detects single-nucleotide variants, insertions, deletions, and their allelic linkage with splicing patterns in scRNA-seq. We demonstrate that scAllele achieves better performance in identifying nucleotide variants than other commonly used tools. In addition, the read-specific variant calls by scAllele enables allele-specific splicing analysis, a unique feature not afforded by other methods. Applied to a lung cancer scRNA-seq dataset, scAllele identified variants with strong allelic linkage to alternative splicing, some of which are cancer specific and enriched in cancer-relevant pathways. scAllele represents a versatile tool to uncover multilayer information and previously unidentified biological insights from scRNA-seq data.
Collapse
Affiliation(s)
| | - Ting Fu
- Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Tracey W Chan
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xinshu Xiao
- Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
22
|
Gupta A, Martin-Rufino JD, Jones TR, Subramanian V, Qiu X, Grody EI, Bloemendal A, Weng C, Niu SY, Min KH, Mehta A, Zhang K, Siraj L, Al' Khafaji A, Sankaran VG, Raychaudhuri S, Cleary B, Grossman S, Lander ES. Inferring gene regulation from stochastic transcriptional variation across single cells at steady state. Proc Natl Acad Sci U S A 2022; 119:e2207392119. [PMID: 35969771 PMCID: PMC9407670 DOI: 10.1073/pnas.2207392119] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 07/20/2022] [Indexed: 12/24/2022] Open
Abstract
Regulatory relationships between transcription factors (TFs) and their target genes lie at the heart of cellular identity and function; however, uncovering these relationships is often labor-intensive and requires perturbations. Here, we propose a principled framework to systematically infer gene regulation for all TFs simultaneously in cells at steady state by leveraging the intrinsic variation in the transcriptional abundance across single cells. Through modeling and simulations, we characterize how transcriptional bursts of a TF gene are propagated to its target genes, including the expected ranges of time delay and magnitude of maximum covariation. We distinguish these temporal trends from the time-invariant covariation arising from cell states, and we delineate the experimental and technical requirements for leveraging these small but meaningful cofluctuations in the presence of measurement noise. While current technology does not yet allow adequate power for definitively detecting regulatory relationships for all TFs simultaneously in cells at steady state, we investigate a small-scale dataset to inform future experimental design. This study supports the potential value of mapping regulatory connections through stochastic variation, and it motivates further technological development to achieve its full potential.
Collapse
Affiliation(s)
- Anika Gupta
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115
| | - Jorge D. Martin-Rufino
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115
- Dana-Farber Cancer Institute, Boston, MA 02215
| | | | | | - Xiaojie Qiu
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142
- HHMI, Massachusetts Institute of Technology, Cambridge, MA 02139
| | | | | | - Chen Weng
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115
- Dana-Farber Cancer Institute, Boston, MA 02215
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142
| | | | - Kyung Hoi Min
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Arnav Mehta
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Dana-Farber Cancer Institute, Boston, MA 02215
- Department of Medicine, Massachusetts General Hospital, Boston, MA 02114
| | - Kaite Zhang
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | - Layla Siraj
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | | | - Vijay G. Sankaran
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115
- Dana-Farber Cancer Institute, Boston, MA 02215
| | - Soumya Raychaudhuri
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115
- Center for Data Sciences, Brigham and Women’s Hospital, Boston, MA 02115
| | - Brian Cleary
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | | | - Eric S. Lander
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
23
|
Burkart V, Kowalski K, Aldag-Niebling D, Beck J, Frick DA, Holler T, Radocaj A, Piep B, Zeug A, Hilfiker-Kleiner D, dos Remedios CG, van der Velden J, Montag J, Kraft T. Transcriptional bursts and heterogeneity among cardiomyocytes in hypertrophic cardiomyopathy. Front Cardiovasc Med 2022; 9:987889. [PMID: 36082122 PMCID: PMC9445301 DOI: 10.3389/fcvm.2022.987889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 08/02/2022] [Indexed: 12/01/2022] Open
Abstract
Transcriptional bursting is a common expression mode for most genes where independent transcription of alleles leads to different ratios of allelic mRNA from cell to cell. Here we investigated burst-like transcription and its consequences in cardiac tissue from Hypertrophic Cardiomyopathy (HCM) patients with heterozygous mutations in the sarcomeric proteins cardiac myosin binding protein C (cMyBP-C, MYBPC3) and cardiac troponin I (cTnI, TNNI3). Using fluorescence in situ hybridization (RNA-FISH) we found that both, MYBPC3 and TNNI3 are transcribed burst-like. Along with that, we show unequal allelic ratios of TNNI3-mRNA among single cardiomyocytes and unequally distributed wildtype cMyBP-C protein across tissue sections from heterozygous HCM-patients. The mutations led to opposing functional alterations, namely increasing (cMyBP-Cc.927−2A>G) or decreasing (cTnIR145W) calcium sensitivity. Regardless, all patients revealed highly variable calcium-dependent force generation between individual cardiomyocytes, indicating contractile imbalance, which appears widespread in HCM-patients. Altogether, we provide strong evidence that burst-like transcription of sarcomeric genes can lead to an allelic mosaic among neighboring cardiomyocytes at mRNA and protein level. In HCM-patients, this presumably induces the observed contractile imbalance among individual cardiomyocytes and promotes HCM-development.
Collapse
Affiliation(s)
- Valentin Burkart
- Institute for Molecular and Cell Physiology, Hannover Medical School, Hannover, Germany
- Valentin Burkart
| | - Kathrin Kowalski
- Institute for Molecular and Cell Physiology, Hannover Medical School, Hannover, Germany
| | - David Aldag-Niebling
- Institute for Molecular and Cell Physiology, Hannover Medical School, Hannover, Germany
| | - Julia Beck
- Institute for Molecular and Cell Physiology, Hannover Medical School, Hannover, Germany
| | - Dirk Alexander Frick
- Institute for Molecular and Cell Physiology, Hannover Medical School, Hannover, Germany
| | - Tim Holler
- Institute for Molecular and Cell Physiology, Hannover Medical School, Hannover, Germany
| | - Ante Radocaj
- Institute for Molecular and Cell Physiology, Hannover Medical School, Hannover, Germany
| | - Birgit Piep
- Institute for Molecular and Cell Physiology, Hannover Medical School, Hannover, Germany
| | - Andre Zeug
- Institute for Cellular Neurophysiology, Hannover Medical School, Hannover, Germany
| | | | - Cristobal G. dos Remedios
- Mechanosensory Biophysics Laboratory, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
| | | | - Judith Montag
- Institute for Molecular and Cell Physiology, Hannover Medical School, Hannover, Germany
- *Correspondence: Judith Montag
| | - Theresia Kraft
- Institute for Molecular and Cell Physiology, Hannover Medical School, Hannover, Germany
| |
Collapse
|
24
|
Wang R, Lin DY, Jiang Y. EPIC: Inferring relevant cell types for complex traits by integrating genome-wide association studies and single-cell RNA sequencing. PLoS Genet 2022; 18:e1010251. [PMID: 35709291 PMCID: PMC9242467 DOI: 10.1371/journal.pgen.1010251] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 06/29/2022] [Accepted: 05/12/2022] [Indexed: 11/18/2022] Open
Abstract
More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific gene expression measurements from single-cell RNA sequencing (scRNA-seq). We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We apply our framework to multiple scRNA-seq datasets from different platforms and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and scRNA-seq datasets and further validated using PubMed search and existing bulk case-control testing results.
Collapse
Affiliation(s)
- Rujin Wang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Dan-Yu Lin
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
- * E-mail: (D-YL); (YJ)
| | - Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, United States of America
- * E-mail: (D-YL); (YJ)
| |
Collapse
|
25
|
Single-cell analysis reveals X upregulation is not global in pre-gastrulation embryos. iScience 2022; 25:104465. [PMID: 35707719 PMCID: PMC9189126 DOI: 10.1016/j.isci.2022.104465] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/27/2022] [Accepted: 05/18/2022] [Indexed: 11/25/2022] Open
Abstract
In mammals, transcriptional inactivation of one X chromosome in female compensates for the dosage of X-linked gene expression between the sexes. Additionally, it is believed that the upregulation of active X chromosome in male and female balances the dosage of X-linked gene expression relative to autosomal genes, as proposed by Ohno. However, the existence of X chromosome upregulation (XCU) remains controversial. Here, we have profiled gene-wise dynamics of XCU in pre-gastrulation mouse embryos at single-cell level and found that XCU is dynamically linked with X chromosome inactivation (XCI); however, XCU is not global like XCI. Moreover, we show that upregulated genes are enriched with activating marks and have enhanced burst frequency. Finally, our In-silico model predicts that recruitment probabilities of activating factors and a surge of these factors upon X-inactivation trigger XCU. Altogether, our study provides significant insight into the gene-wise dynamics and mechanistic basis of XCU during early development and extends support for Ohno’s hypothesis. X-upregulation coincides with X chromosome inactivation in pre-gastrulation embryos X-upregulation is not chromosome-wide like X-inactivation Upregulated genes have enhanced burst frequency and are enriched with activating marks A surge of activating factors on X-inactivation triggers X-upregulation
Collapse
|
26
|
Mu W, Sarkar H, Srivastava A, Choi K, Patro R, Love MI. Airpart: interpretable statistical models for analyzing allelic imbalance in single-cell datasets. Bioinformatics 2022; 38:2773-2780. [PMID: 35561168 PMCID: PMC9113279 DOI: 10.1093/bioinformatics/btac212] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 03/05/2022] [Accepted: 04/05/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation, which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial- or time-dependent AI signals may be dampened or not detected. RESULTS We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing data, or dynamics AI from other spatially or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower Root Mean Square Error (RMSE) of allelic ratio estimates than existing methods. In real data, airpart identified differential allelic imbalance patterns across cell states and could be used to define trends of AI signal over spatial or time axes. AVAILABILITY AND IMPLEMENTATION The airpart package is available as an R/Bioconductor package at https://bioconductor.org/packages/airpart. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wancen Mu
- To whom correspondence should be addressed. or
| | - Hirak Sarkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | | | | | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | | |
Collapse
|
27
|
Abstract
This review discusses our understanding of platelet diversity with implications for the roles of platelets in hemostasis and thrombosis and identifies advanced technologies set to provide new insights. We use the term diversity to capture intrasubject platelet variability that can be intrinsic or governed by the environment and lead to a heterogeneous response pattern of aggregation, clot promotion, and external communication. Using choice examples, we discuss how the use of advanced technologies can provide new insights into the underlying causes of platelet molecular, structural, and functional diversity. As sources of diversity, we discuss the proliferating megakaryocytes with different allele-specific expression patterns, the asymmetrical formation of proplatelets, changes in platelets induced by aging and priming, interplatelet heterogeneity in thrombus organization and stability, and platelet-dependent communications. We provide indications how current knowledge gaps can be addressed using promising technologies, such as next-generation sequencing, proteomic approaches, advanced imaging techniques, multicolor flow and mass cytometry, multifunctional microfluidics assays, and organ-on-a-chip platforms. We then argue how this technology base can aid in characterizing platelet populations and in identifying platelet biomarkers relevant for the treatment of cardiovascular disease.
Collapse
Affiliation(s)
- Johan W M Heemskerk
- Department of Biochemistry, Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, the Netherlands (J.W.M.H.)
| | - Jonathan West
- Faculty of Medicine and Centre for Hybrid Biodevices, University of Southampton, United Kingdom (J.W.)
| |
Collapse
|
28
|
Heinen T, Secchia S, Reddington JP, Zhao B, Furlong EEM, Stegle O. scDALI: modeling allelic heterogeneity in single cells reveals context-specific genetic regulation. Genome Biol 2022; 23:8. [PMID: 34991671 PMCID: PMC8734213 DOI: 10.1186/s13059-021-02593-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 12/27/2021] [Indexed: 01/04/2023] Open
Abstract
While it is established that the functional impact of genetic variation can vary across cell types and states, capturing this diversity remains challenging. Current studies using bulk sequencing either ignore this heterogeneity or use sorted cell populations, reducing discovery and explanatory power. Here, we develop scDALI, a versatile computational framework that integrates information on cellular states with allelic quantifications of single-cell sequencing data to characterize cell-state-specific genetic effects. We apply scDALI to scATAC-seq profiles from developing F1 Drosophila embryos and scRNA-seq from differentiating human iPSCs, uncovering heterogeneous genetic effects in specific lineages, developmental stages, or cell types.
Collapse
Affiliation(s)
- Tobias Heinen
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Faculty of Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany
| | - Stefano Secchia
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Faculty of Biosciences, Collaboration for Joint PhD Degree between EMBL and Heidelberg University, Heidelberg, Germany
| | - James P Reddington
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Bingqing Zhao
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Eileen E M Furlong
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| | - Oliver Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| |
Collapse
|
29
|
Chen L, Zhu C, Jiao F. A generalized moment-based method for estimating parameters of stochastic gene transcription. Math Biosci 2022; 345:108780. [DOI: 10.1016/j.mbs.2022.108780] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/27/2021] [Accepted: 01/13/2022] [Indexed: 12/22/2022]
|
30
|
Jang J, Amblard F, Ghim CM. Heterogeneity is not always a source of noise: Stochastic gene expression in regulatory heterozygote. Phys Rev E 2021; 104:044401. [PMID: 34781474 DOI: 10.1103/physreve.104.044401] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 09/16/2021] [Indexed: 01/22/2023]
Abstract
Zygosity of diploid genome (i.e., degree to which two parental alleles of a gene have varied genetic sequences) adds another dimension to stochastic gene expression. The allelic imbalance in chromatin accessibility or divergence in regulatory sequences leads to fitness effects but the quantitative aspects thereof are largely left unexplored. We investigate diploid gene expression systems with homozygous (the same) and heterozygous (varied) combination of alleles in cis-regulatory sequences, not in structural gene loci, and characterize the zygosity-associated stochastic fluctuations in protein abundance. An emerging feat of heterozygosity is its counterintuitive capacity for genetic noise control. Especially when the noise is dominantly contributed to by the fluctuations in duty cycle ("reliability") rather than in process speed ("productivity") of gene expression machinery, its interallelic discrepancy acts to reduce the gene expression noise. These findings offer a novel insight into the rich repertoire of balancing selection enriched in the regulatory elements of immune response genes.
Collapse
Affiliation(s)
- Juneil Jang
- Department of Biomedical Engineering, Ulsan National Institute of Science & Technology, Ulsan 44919, Republic of Korea
| | - François Amblard
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea.,Department of Physics, Ulsan National Institute of Science & Technology, Ulsan 44919, Republic of Korea
| | - C-M Ghim
- Department of Biomedical Engineering, Ulsan National Institute of Science & Technology, Ulsan 44919, Republic of Korea.,Department of Physics, Ulsan National Institute of Science & Technology, Ulsan 44919, Republic of Korea
| |
Collapse
|
31
|
Fu R, Qin P, Zou X, Hu Z, Hong N, Wang Y, Jin W. A Comprehensive Characterization of Monoallelic Expression During Hematopoiesis and Leukemogenesis via Single-Cell RNA-Sequencing. Front Cell Dev Biol 2021; 9:702897. [PMID: 34722498 PMCID: PMC8548578 DOI: 10.3389/fcell.2021.702897] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 09/13/2021] [Indexed: 12/30/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) is becoming a powerful tool to investigate monoallelic expression (MAE) in various developmental and pathological processes. However, our knowledge of MAE during hematopoiesis and leukemogenesis is limited. In this study, we conducted a systematic interrogation of MAEs in bone marrow mononuclear cells (BMMCs) at single-cell resolution to construct a MAE atlas of BMMCs. We identified 1,020 constitutive MAEs in BMMCs, which included imprinted genes such as MEG8, NAP1L5, and IRAIN. We classified the BMMCs into six cell types and identified 74 cell type specific MAEs including MTSS1, MOB1A, and TCF12. We further identified 114 random MAEs (rMAEs) at single-cell level, with 78.1% single-allele rMAE and 21.9% biallelic mosaic rMAE. Many MAEs identified in BMMCs have not been reported and are potentially hematopoietic specific, supporting MAEs are functional relevance. Comparison of BMMC samples from a leukemia patient with multiple clinical stages showed the fractions of constitutive MAE were correlated with fractions of leukemia cells in BMMCs. Further separation of the BMMCs into leukemia cells and normal cells showed that leukemia cells have much higher constitutive MAE and rMAEs than normal cells. We identified the leukemia cell-specific MAEs and relapsed leukemia cell-specific MAEs, which were enriched in immune-related functions. These results indicate MAE is prevalent and is an important gene regulation mechanism during hematopoiesis and leukemogenesis. As the first systematical interrogation of constitutive MAEs, cell type specific MAEs, and rMAEs during hematopoiesis and leukemogenesis, the study significantly increased our knowledge about the features and functions of MAEs.
Collapse
Affiliation(s)
- Ruiqing Fu
- Shenzhen Key Laboratory of Microbiology and Gene Engineering, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China.,Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Optoelectronic Engineering, Shenzhen University, Shenzhen, China.,School of Food Engineering and Biotechnology, Hanshan Normal University, Chaozhou, China
| | - Pengfei Qin
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Xianghui Zou
- School of Food Engineering and Biotechnology, Hanshan Normal University, Chaozhou, China
| | - Zhangli Hu
- Shenzhen Key Laboratory of Microbiology and Gene Engineering, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China.,Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Optoelectronic Engineering, Shenzhen University, Shenzhen, China
| | - Ni Hong
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Yun Wang
- Shenzhen Key Laboratory of Microbiology and Gene Engineering, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China
| | - Wenfei Jin
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
32
|
Auerbach BJ, Hu J, Reilly MP, Li M. Applications of single-cell genomics and computational strategies to study common disease and population-level variation. Genome Res 2021; 31:1728-1741. [PMID: 34599006 PMCID: PMC8494214 DOI: 10.1101/gr.275430.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The advent and rapid development of single-cell technologies have made it possible to study cellular heterogeneity at an unprecedented resolution and scale. Cellular heterogeneity underlies phenotypic differences among individuals, and studying cellular heterogeneity is an important step toward our understanding of the disease molecular mechanism. Single-cell technologies offer opportunities to characterize cellular heterogeneity from different angles, but how to link cellular heterogeneity with disease phenotypes requires careful computational analysis. In this article, we will review the current applications of single-cell methods in human disease studies and describe what we have learned so far from existing studies about human genetic variation. As single-cell technologies are becoming widely applicable in human disease studies, population-level studies have become a reality. We will describe how we should go about pursuing and designing these studies, particularly how to select study subjects, how to determine the number of cells to sequence per subject, and the needed sequencing depth per cell. We also discuss computational strategies for the analysis of single-cell data and describe how single-cell data can be integrated with bulk tissue data and data generated from genome-wide association studies. Finally, we point out open problems and future research directions.
Collapse
Affiliation(s)
- Benjamin J Auerbach
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Jian Hu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Muredach P Reilly
- Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, New York 10032, USA
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
33
|
Semicoordinated allelic-bursting shape dynamic random monoallelic expression in pregastrulation embryos. iScience 2021; 24:102954. [PMID: 34458702 PMCID: PMC8379509 DOI: 10.1016/j.isci.2021.102954] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 07/27/2021] [Accepted: 07/30/2021] [Indexed: 01/14/2023] Open
Abstract
Recently, allele-specific single-cell RNA-seq analysis has demonstrated widespread dynamic random monoallelic expression of autosomal genes (aRME) in different cell types. However, the prevalence of dynamic aRME during pregastrulation remains unknown. Here, we show that dynamic aRME is widespread in different lineages of pregastrulation embryos. Additionally, the origin of dynamic aRME remains elusive. It is believed that independent transcriptional bursting from each allele leads to dynamic aRME. Here, we show that allelic burst is not perfectly independent; instead it happens in a semicoordinated fashion. Importantly, we show that semicoordinated allelic bursting of genes, particularly with low burst frequency, leads to frequent asynchronous allelic bursting, thereby contributing to dynamic aRME. Furthermore, we found that coordination of allelic bursting is lineage specific and genes regulating the development have a higher degree of coordination. Altogether, our study provides significant insights into the prevalence and origin of dynamic aRME and their developmental relevance during early development. Dynamic aRME is widespread in different lineages of pregastrulation embryos Semicoordinated bursting of genes with low burst frequency leads to dynamic aRME Degree of coordination of allelic bursting is lineage specific Developmental genes have higher degree of coordination of allelic bursting
Collapse
|
34
|
Mompart F, Kamgoué A, Lahbib-Mansais Y, Robelin D, Bonnet A, Rogel-Gaillard C, Kocanova S, Yerle-Bouissou M. The 3D nuclear conformation of the major histocompatibility complex changes upon cell activation both in porcine and human macrophages. BMC Mol Cell Biol 2021; 22:45. [PMID: 34521351 PMCID: PMC8442435 DOI: 10.1186/s12860-021-00384-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 08/30/2021] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND The crucial role of the major histocompatibility complex (MHC) for the immune response to infectious diseases is well-known, but no information is available on the 3D nuclear organization of this gene-dense region in immune cells, whereas nuclear architecture is known to play an essential role on genome function regulation. We analyzed the spatial arrangement of the three MHC regions (class I, III and II) in macrophages using 3D-FISH. Since this complex presents major differences in humans and pigs with, notably, the presence of the centromere between class III and class II regions in pigs, the analysis was implemented in both species to determine the impact of this organization on the 3D conformation of the MHC. The expression level of the three genes selected to represent each MHC region was assessed by quantitative real-time PCR. Resting and lipopolysaccharide (LPS)-activated states were investigated to ascertain whether a response to a pathogen modifies their expression level and their 3D organization. RESULTS While the three MHC regions occupy an intermediate radial position in porcine macrophages, the class I region was clearly more peripheral in humans. The BAC center-to-center distances allowed us to propose a 3D nuclear organization of the MHC in each species. LPS/IFNγ activation induces a significant decompaction of the chromatin between class I and class III regions in pigs and between class I and class II regions in humans. We detected a strong overexpression of TNFα (class III region) in both species. Moreover, a single nucleus analysis revealed that the two alleles can have either the same or a different compaction pattern. In addition, macrophage activation leads to an increase in alleles that present a decompacted pattern in humans and pigs. CONCLUSIONS The data presented demonstrate that: (i) the MHC harbors a different 3D organization in humans and pigs; (ii) LPS/IFNγ activation induces chromatin decompaction, but it is not the same area affected in the two species. These findings were supported by the application of an original computation method based on the geometrical distribution of the three target genes. Finally, the position of the centromere inside the swine MHC could influence chromatin reorganization during the activation process.
Collapse
Affiliation(s)
- Florence Mompart
- GenPhySE, Université de Toulouse, INRAE, ENVT, 1388 GenPhySE, 24 Chemin de Borde Rouge, 31326 Cedex, Castanet-Tolosan, France
| | - Alain Kamgoué
- Laboratoire de Biologie Moléculaire Eucaryote (LBME), Centre de Biologie Intégrative (CBI), CNRS, UPS, University of Toulouse, 31062, Toulouse, France
| | - Yvette Lahbib-Mansais
- GenPhySE, Université de Toulouse, INRAE, ENVT, 1388 GenPhySE, 24 Chemin de Borde Rouge, 31326 Cedex, Castanet-Tolosan, France
| | - David Robelin
- GenPhySE, Université de Toulouse, INRAE, ENVT, 1388 GenPhySE, 24 Chemin de Borde Rouge, 31326 Cedex, Castanet-Tolosan, France
| | - Agnès Bonnet
- GenPhySE, Université de Toulouse, INRAE, ENVT, 1388 GenPhySE, 24 Chemin de Borde Rouge, 31326 Cedex, Castanet-Tolosan, France
| | | | - Silvia Kocanova
- Laboratoire de Biologie Moléculaire Eucaryote (LBME), Centre de Biologie Intégrative (CBI), CNRS, UPS, University of Toulouse, 31062, Toulouse, France
| | - Martine Yerle-Bouissou
- GenPhySE, Université de Toulouse, INRAE, ENVT, 1388 GenPhySE, 24 Chemin de Borde Rouge, 31326 Cedex, Castanet-Tolosan, France.
| |
Collapse
|
35
|
Picard CL, Povilus RA, Williams BP, Gehring M. Transcriptional and imprinting complexity in Arabidopsis seeds at single-nucleus resolution. NATURE PLANTS 2021; 7:730-738. [PMID: 34059805 PMCID: PMC8217372 DOI: 10.1038/s41477-021-00922-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 04/15/2021] [Indexed: 05/06/2023]
Abstract
Seeds are a key life cycle stage for many plants. Seeds are also the basis of agriculture and the primary source of calories consumed by humans1. Here, we employ single-nucleus RNA-sequencing to generate a transcriptional atlas of developing Arabidopsis thaliana seeds, with a focus on endosperm. Endosperm, the primary site of gene imprinting in flowering plants, mediates the relationship between the maternal parent and the embryo2. We identify transcriptionally uncharacterized nuclei types in the chalazal endosperm, which interfaces with maternal tissue for nutrient unloading3,4. We demonstrate that the extent of parental bias of maternally expressed imprinted genes varies with cell-cycle phase, and that imprinting of paternally expressed imprinted genes is strongest in chalazal endosperm. Thus, imprinting is spatially and temporally heterogeneous. Increased paternal expression in the chalazal region suggests that parental conflict, which is proposed to drive imprinting evolution, is fiercest at the boundary between filial and maternal tissues.
Collapse
Affiliation(s)
- Colette L Picard
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- Computational and Systems Biology Graduate Program, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Ben P Williams
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
| | - Mary Gehring
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA.
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
36
|
Nagle MP, Tam GS, Maltz E, Hemminger Z, Wollman R. Bridging scales: From cell biology to physiology using in situ single-cell technologies. Cell Syst 2021; 12:388-400. [PMID: 34015260 DOI: 10.1016/j.cels.2021.03.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 01/30/2021] [Accepted: 03/08/2021] [Indexed: 12/14/2022]
Abstract
Biological organization crosses multiple spatial scales: from molecular, cellular, to tissues and organs. The proliferation of molecular profiling technologies enables increasingly detailed cataloging of the components at each scale. However, the scarcity of spatial profiling has made it challenging to bridge across these scales. Emerging technologies based on highly multiplexed in situ profiling are paving the way to study the spatial organization of cells and tissues in greater detail. These new technologies provide the data needed to cross the scale from cell biology to physiology and identify the fundamental principles that govern tissue organization. Here, we provide an overview of these key technologies and discuss the current and future insights these powerful techniques enable.
Collapse
Affiliation(s)
- Maeve P Nagle
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA; Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Gabriela S Tam
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA; Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Evan Maltz
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA; Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Zachary Hemminger
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA; Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Roy Wollman
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA; Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
37
|
A New Hypothesis for Type 1 Diabetes Risk: The At-Risk Allele at rs3842753 Associates With Increased Beta-cell INS Messenger RNA in a Meta-Analysis of Single-Cell RNA-Sequencing Data. Can J Diabetes 2021; 45:775-784.e2. [PMID: 34052132 DOI: 10.1016/j.jcjd.2021.03.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 03/25/2021] [Accepted: 03/26/2021] [Indexed: 12/26/2022]
Abstract
OBJECTIVES Type 1 diabetes is characterized by the autoimmune destruction of insulin-secreting beta cells. Genetic variants upstream at the insulin (INS) locus contribute to ∼10% of type 1 diabetes heritable risk. Previous studies showed an association between rs3842753 C/C genotype and type 1 diabetes susceptibility, but the molecular mechanisms remain unclear. To date, no large-scale studies have looked at the effect of genetic variation at rs3842753 on INS mRNA at the single-cell level. METHODS We aligned all human islet single-cell RNA sequencing data sets available to us in year 2020 to the reference genome GRCh38.98 and genotyped rs3842753, integrating 2,315 β cells and 1,223 β-like cells from 13 A/A protected donors, 23 A/C heterozygous donors and 35 C/C at-risk donors, including adults without diabetes and with type 2 diabetes. RESULTS INS expression mean and variance were significantly higher in single β cells from females compared with males. On comparing across β cells and β-like cells, we found that rs3842753 C‒containing cells (either homozygous or heterozygous) had the highest INS expression. We also found that β cells with the rs3842753 C allele had significantly higher endoplasmic reticulum stress marker gene expression compared with the A/A homozygous genotype. CONCLUSIONS These findings support the emerging concept that inherited risk of type 1 diabetes may be associated with inborn, persistent elevated insulin production, which may lead to β-cell endoplasmic reticulum stress and fragility.
Collapse
|
38
|
Larsson AJM, Ziegenhain C, Hagemann-Jensen M, Reinius B, Jacob T, Dalessandri T, Hendriks GJ, Kasper M, Sandberg R. Transcriptional bursts explain autosomal random monoallelic expression and affect allelic imbalance. PLoS Comput Biol 2021; 17:e1008772. [PMID: 33690599 PMCID: PMC7978379 DOI: 10.1371/journal.pcbi.1008772] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 03/19/2021] [Accepted: 02/03/2021] [Indexed: 12/02/2022] Open
Abstract
Transcriptional bursts render substantial biological noise in cellular transcriptomes. Here, we investigated the theoretical extent of allelic expression resulting from transcriptional bursting and how it compared to the amount biallelic, monoallelic and allele-biased expression observed in single-cell RNA-sequencing (scRNA-seq) data. We found that transcriptional bursting can explain the allelic expression patterns observed in single cells, including the frequent observations of autosomal monoallelic gene expression. Importantly, we identified that the burst frequency largely determined the fraction of cells with monoallelic expression, whereas the burst size had little effect on monoallelic observations. The high consistency between the bursting model predictions and scRNA-seq observations made it possible to assess the heterogeneity of a group of cells as their deviation in allelic observations from the expected. Finally, both burst frequency and size contributed to allelic imbalance observations and reinforced that studies of allelic imbalance can be confounded from the inherent noise in transcriptional bursting. Altogether, we demonstrate that allele-level transcriptional bursting renders widespread, although predictable, amounts of monoallelic and biallelic expression in single cells and cell populations. Genes are transcribed into RNA and further translated into proteins. The maternal and paternal copy of each gene are typically transcribed independently, and transcription itself occur in discrete stochastic bursts (transcriptional bursts). Pioneering single-cell analysis of RNA across cells revealed abundant fluctuations in the amounts of maternal and paternal RNA in cells, with frequent observations of RNA from only the maternal or paternal gene copy (monoallelic expression). In this study, we investigated to which extent the observed monoallelic expression across single cells can be explained by transcriptional bursting. We demonstrate that the process of transcriptional bursting is sufficient to explain the amount of monoallelic expression, and we further demonstrate that the frequency of bursts mainly determines the frequency of monoallelic observations. Furthermore, we show that transcriptional bursts may lead to false positive observations of monoallelic expression across cell populations. Therefore, stochastic transcription renders large fluctuations in allelic origin of RNA in cells over time, including frequent monoallelic observations when profiling single cells.
Collapse
Affiliation(s)
- Anton J. M. Larsson
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Christoph Ziegenhain
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | | | - Björn Reinius
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Tina Jacob
- Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden
| | - Tim Dalessandri
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Gert-Jan Hendriks
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Maria Kasper
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Rickard Sandberg
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
- * E-mail:
| |
Collapse
|
39
|
Wainer-Katsir K, Linial M. BIRD: identifying cell doublets via biallelic expression from single cells. Bioinformatics 2021; 36:i251-i257. [PMID: 32657402 PMCID: PMC7355245 DOI: 10.1093/bioinformatics/btaa474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Summary Current technologies for single-cell transcriptomics allow thousands of cells to be analyzed in a single experiment. The increased scale of these methods raises the risk of cell doublets contamination. Available tools and algorithms for identifying doublets and estimating their occurrence in single-cell experimental data focus on doublets of different species, cell types or individuals. In this study, we analyze transcriptomic data from single cells having an identical genetic background. We claim that the ratio of monoallelic to biallelic expression provides a discriminating power toward doublets’ identification. We present a pipeline called BIallelic Ratio for Doublets (BIRD) that relies on heterologous genetic variations, from single-cell RNA sequencing. For each dataset, doublets were artificially created from the actual data and used to train a predictive model. BIRD was applied on Smart-seq data from 163 primary fibroblast single cells. The model achieved 100% accuracy in annotating the randomly simulated doublets. Bonafide doublets were verified based on a biallelic expression signal amongst X-chromosome of female fibroblasts. Data from 10X Genomics microfluidics of human peripheral blood cells achieved in average 83% (±3.7%) accuracy, and an area under the curve of 0.88 (±0.04) for a collection of ∼13 300 single cells. BIRD addresses instances of doublets, which were formed from cell mixtures of identical genetic background and cell identity. Maximal performance is achieved for high-coverage data from Smart-seq. Success in identifying doublets is data specific which varies according to the experimental methodology, genomic diversity between haplotypes, sequence coverage and depth. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kerem Wainer-Katsir
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Givat Ram 91904, Israel
| | - Michal Linial
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Givat Ram 91904, Israel
| |
Collapse
|
40
|
Yu X, Abbas-Aghababazadeh F, Chen YA, Fridley BL. Statistical and Bioinformatics Analysis of Data from Bulk and Single-Cell RNA Sequencing Experiments. Methods Mol Biol 2021; 2194:143-175. [PMID: 32926366 PMCID: PMC7771369 DOI: 10.1007/978-1-0716-0849-4_9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
High-throughput sequencing (HTS) has revolutionized researchers' ability to study the human transcriptome, particularly as it relates to cancer. Recently, HTS technology has advanced to the point where now one is able to sequence individual cells (i.e., "single-cell sequencing"). Prior to single-cell sequencing technology, HTS would be completed on RNA extracted from a tissue sample consisting of multiple cell types (i.e., "bulk sequencing"). In this chapter, we review the various bioinformatics and statistical methods used in the processing, quality control, and analysis of bulk and single-cell RNA sequencing methods. Additionally, we discuss how these methods are also being used to study tumor heterogeneity.
Collapse
Affiliation(s)
- Xiaoqing Yu
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Farnoosh Abbas-Aghababazadeh
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Y Ann Chen
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Brooke L Fridley
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
| |
Collapse
|
41
|
Auer JMT, Stoddart JJ, Christodoulou I, Lima A, Skouloudaki K, Hall HN, Vukojević V, Papadopoulos DK. Of numbers and movement - understanding transcription factor pathogenesis by advanced microscopy. Dis Model Mech 2020; 13:dmm046516. [PMID: 33433399 PMCID: PMC7790199 DOI: 10.1242/dmm.046516] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Transcription factors (TFs) are life-sustaining and, therefore, the subject of intensive research. By regulating gene expression, TFs control a plethora of developmental and physiological processes, and their abnormal function commonly leads to various developmental defects and diseases in humans. Normal TF function often depends on gene dosage, which can be altered by copy-number variation or loss-of-function mutations. This explains why TF haploinsufficiency (HI) can lead to disease. Since aberrant TF numbers frequently result in pathogenic abnormalities of gene expression, quantitative analyses of TFs are a priority in the field. In vitro single-molecule methodologies have significantly aided the identification of links between TF gene dosage and transcriptional outcomes. Additionally, advances in quantitative microscopy have contributed mechanistic insights into normal and aberrant TF function. However, to understand TF biology, TF-chromatin interactions must be characterised in vivo, in a tissue-specific manner and in the context of both normal and altered TF numbers. Here, we summarise the advanced microscopy methodologies most frequently used to link TF abundance to function and dissect the molecular mechanisms underlying TF HIs. Increased application of advanced single-molecule and super-resolution microscopy modalities will improve our understanding of how TF HIs drive disease.
Collapse
Affiliation(s)
- Julia M T Auer
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 1XU, UK
| | - Jack J Stoddart
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 1XU, UK
| | | | - Ana Lima
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 1XU, UK
| | | | - Hildegard N Hall
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 1XU, UK
| | - Vladana Vukojević
- Center for Molecular Medicine (CMM), Department of Clinical Neuroscience, Karolinska Institutet, 17176 Stockholm, Sweden
| | | |
Collapse
|
42
|
Ochiai H, Hayashi T, Umeda M, Yoshimura M, Harada A, Shimizu Y, Nakano K, Saitoh N, Liu Z, Yamamoto T, Okamura T, Ohkawa Y, Kimura H, Nikaido I. Genome-wide kinetic properties of transcriptional bursting in mouse embryonic stem cells. SCIENCE ADVANCES 2020; 6:eaaz6699. [PMID: 32596448 PMCID: PMC7299619 DOI: 10.1126/sciadv.aaz6699] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 03/25/2020] [Indexed: 05/03/2023]
Abstract
Transcriptional bursting is the stochastic activation and inactivation of promoters, contributing to cell-to-cell heterogeneity in gene expression. However, the mechanism underlying the regulation of transcriptional bursting kinetics (burst size and frequency) in mammalian cells remains elusive. In this study, we performed single-cell RNA sequencing to analyze the intrinsic noise and mRNA levels for elucidating the transcriptional bursting kinetics in mouse embryonic stem cells. Informatics analyses and functional assays revealed that transcriptional bursting kinetics was regulated by a combination of promoter- and gene body-binding proteins, including the polycomb repressive complex 2 and transcription elongation factors. Furthermore, large-scale CRISPR-Cas9-based screening identified that the Akt/MAPK signaling pathway regulated bursting kinetics by modulating transcription elongation efficiency. These results uncovered the key molecular mechanisms underlying transcriptional bursting and cell-to-cell gene expression noise in mammalian cells.
Collapse
Affiliation(s)
- Hiroshi Ochiai
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-0046, Japan
- Genome Editing Innovation Center, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-0046, Japan
| | - Tetsutaro Hayashi
- Laboratory for Bioinformatics Research, RIKEN BDR, Wako, Saitama 351-0198, Japan
| | - Mana Umeda
- Laboratory for Bioinformatics Research, RIKEN BDR, Wako, Saitama 351-0198, Japan
| | - Mika Yoshimura
- Laboratory for Bioinformatics Research, RIKEN BDR, Wako, Saitama 351-0198, Japan
| | - Akihito Harada
- Division of Transcriptomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Fukuoka 812-0054, Japan
| | - Yukiko Shimizu
- Department of Animal Medicine, National Center for Global Health and Medicine (NCGM), Tokyo 812-0054, Japan
| | - Kenta Nakano
- Department of Animal Medicine, National Center for Global Health and Medicine (NCGM), Tokyo 812-0054, Japan
| | - Noriko Saitoh
- Division of Cancer Biology, The Cancer Institute of JFCR, Tokyo 135-8550, Japan
| | - Zhe Liu
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA 20147, USA
| | - Takashi Yamamoto
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-0046, Japan
- Genome Editing Innovation Center, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-0046, Japan
| | - Tadashi Okamura
- Department of Animal Medicine, National Center for Global Health and Medicine (NCGM), Tokyo 812-0054, Japan
- Section of Animal Models, Department of Infectious Diseases, National Center for Global Health and Medicine (NCGM), Tokyo 812-0054, Japan
| | - Yasuyuki Ohkawa
- Division of Transcriptomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Fukuoka 812-0054, Japan
| | - Hiroshi Kimura
- Cell Biology Center, Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8503, Japan
| | - Itoshi Nikaido
- Laboratory for Bioinformatics Research, RIKEN BDR, Wako, Saitama 351-0198, Japan
- Bioinformatics Course, Master’s/Doctoral Program in Life Science Innovation (T-LSI), School of Integrative and Global Majors (SIGMA), University of Tsukuba, Wako 351-0198, Japan
| |
Collapse
|
43
|
Deeke JM, Gagnon-Bartsch JA. Stably expressed genes in single-cell RNA sequencing. J Bioinform Comput Biol 2020; 18:2040004. [DOI: 10.1142/s0219720020400041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Motivation: In single-cell RNA-sequencing (scRNA-seq) experiments, RNA transcripts are extracted and measured from isolated cells to understand gene expression at the cellular level. Measurements from this technology are affected by many technical artifacts, including batch effects. In analogous bulk gene expression experiments, external references, e.g. synthetic gene spike-ins often from the External RNA Controls Consortium (ERCC), may be incorporated to the experimental protocol for use in adjusting measurements for technical artifacts. In scRNA-seq experiments, the use of external spike-ins is controversial due to dissimilarities with endogenous genes and uncertainty about sufficient precision of their introduction. Instead, endogenous genes with highly stable expression could be used as references within scRNA-seq to help normalize the data. First, however, a specific notion of stable expression at the single-cell level needs to be formulated; genes could be stable in absolute expression, in proportion to cell volume, or in proportion to total gene expression. Different types of stable genes will be useful for different normalizations and will need different methods for discovery. Results: We compile gene sets whose products are associated with cellular structures and record these gene sets for future reuse and analysis. We find that genes whose final products are associated with the cytosolic ribosome have expressions that are highly stable with respect to the total RNA content. Notably, these genes appear to be stable in bulk measurements as well. Supplementary information: Supplementary data are available through GitHub (johanngb/sc-stable).
Collapse
Affiliation(s)
- Julie M. Deeke
- Department of Statistics, University of Michigan, 1085 South University Ave, Ann Arbor, MI 48109, USA
| | - Johann A. Gagnon-Bartsch
- Department of Statistics, University of Michigan, 1085 South University Ave, Ann Arbor, MI 48109, USA
| |
Collapse
|
44
|
Sun M, Zhang J. Allele-specific single-cell RNA sequencing reveals different architectures of intrinsic and extrinsic gene expression noises. Nucleic Acids Res 2020; 48:533-547. [PMID: 31799601 PMCID: PMC6954418 DOI: 10.1093/nar/gkz1134] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 10/19/2019] [Accepted: 11/20/2019] [Indexed: 01/13/2023] Open
Abstract
Gene expression noise refers to the variation of the expression level of a gene among isogenic cells in the same environment, and has two sources: extrinsic noise arising from the disparity of the cell state and intrinsic noise arising from the stochastic process of gene expression in the same cell state. Due to the low throughput of the existing method for measuring the two noise components, the architectures of intrinsic and extrinsic expression noises remain elusive. Using allele-specific single-cell RNA sequencing, we here estimate the two noise components of 3975 genes in mouse fibroblast cells. Our analyses verify predicted influences of several factors such as the TATA-box and microRNA targeting on intrinsic or extrinsic noises and reveal gene function-associated noise trends implicating the action of natural selection. These findings unravel differential regulations, optimizations, and biological consequences of intrinsic and extrinsic noises and can aid the construction of desired synthetic circuits.
Collapse
Affiliation(s)
- Mengyi Sun
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
45
|
Zhou Z, Xu B, Minn A, Zhang NR. DENDRO: genetic heterogeneity profiling and subclone detection by single-cell RNA sequencing. Genome Biol 2020; 21:10. [PMID: 31937348 PMCID: PMC6961311 DOI: 10.1186/s13059-019-1922-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 12/16/2019] [Indexed: 12/18/2022] Open
Abstract
Although scRNA-seq is now ubiquitously adopted in studies of intratumor heterogeneity, detection of somatic mutations and inference of clonal membership from scRNA-seq is currently unreliable. We propose DENDRO, an analysis method for scRNA-seq data that clusters single cells into genetically distinct subclones and reconstructs the phylogenetic tree relating the subclones. DENDRO utilizes transcribed point mutations and accounts for technical noise and expression stochasticity. We benchmark DENDRO and demonstrate its application on simulation data and real data from three cancer types. In particular, on a mouse melanoma model in response to immunotherapy, DENDRO delineates the role of neoantigens in treatment response.
Collapse
Affiliation(s)
- Zilu Zhou
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA USA
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA USA
| | - Bihui Xu
- Department of Radiation Oncology, Parker Institute for Cancer Immunotherapy, Abramson Family Cancer Research Institute, Graduate Group in Cell and Molecular Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| | - Andy Minn
- Department of Radiation Oncology, Parker Institute for Cancer Immunotherapy, Abramson Family Cancer Research Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| | - Nancy R. Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA USA
| |
Collapse
|
46
|
Dong M, Thennavan A, Urrutia E, Li Y, Perou CM, Zou F, Jiang Y. SCDC: bulk gene expression deconvolution by multiple single-cell RNA sequencing references. Brief Bioinform 2020; 22:416-427. [PMID: 31925417 PMCID: PMC7820884 DOI: 10.1093/bib/bbz166] [Citation(s) in RCA: 121] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 11/04/2019] [Accepted: 12/02/2019] [Indexed: 12/14/2022] Open
Abstract
Recent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch-effect confounding. SCDC is benchmarked against existing methods using both in silico generated pseudo-bulk samples and experimentally mixed cell lines, whose known cell-type compositions serve as ground truths. We show that SCDC outperforms existing methods with improved accuracy of cell-type decomposition under both settings. To illustrate how the ENSEMBLE framework performs in complex tissues under different scenarios, we further apply our method to a human pancreatic islet dataset and a mouse mammary gland dataset. SCDC returns results that are more consistent with experimental designs and that reproduce more significant associations between cell-type proportions and measured phenotypes.
Collapse
Affiliation(s)
| | | | | | | | | | - Fei Zou
- Corresponding authors: Fei Zou and Yuchao Jiang, Department of Biostatistics and Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. ,
| | - Yuchao Jiang
- Corresponding authors: Fei Zou and Yuchao Jiang, Department of Biostatistics and Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. ,
| |
Collapse
|
47
|
Choi K, Raghupathy N, Churchill GA. A Bayesian mixture model for the analysis of allelic expression in single cells. Nat Commun 2019; 10:5188. [PMID: 31729374 PMCID: PMC6858378 DOI: 10.1038/s41467-019-13099-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 10/09/2019] [Indexed: 11/09/2022] Open
Abstract
Allele-specific expression (ASE) at single-cell resolution is a critical tool for understanding the stochastic and dynamic features of gene expression. However, low read coverage and high biological variability present challenges for analyzing ASE. We demonstrate that discarding multi-mapping reads leads to higher variability in estimates of allelic proportions, an increased frequency of sampling zeros, and can lead to spurious findings of dynamic and monoallelic gene expression. Here, we report a method for ASE analysis from single-cell RNA-Seq data that accurately classifies allelic expression states and improves estimation of allelic proportions by pooling information across cells. We further demonstrate that combining information across cells using a hierarchical mixture model reduces sampling variability without sacrificing cell-to-cell heterogeneity. We applied our approach to re-evaluate the statistical independence of allelic bursting and track changes in the allele-specific expression patterns of cells sampled over a developmental time course.
Collapse
Affiliation(s)
- Kwangbom Choi
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA
| | | | - Gary A Churchill
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA.
| |
Collapse
|
48
|
Zeng T, Dai H. Single-Cell RNA Sequencing-Based Computational Analysis to Describe Disease Heterogeneity. Front Genet 2019; 10:629. [PMID: 31354786 PMCID: PMC6640157 DOI: 10.3389/fgene.2019.00629] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 06/17/2019] [Indexed: 12/25/2022] Open
Abstract
The trillions of cells in the human body can be viewed as elementary but essential biological units that achieve different body states, but the low resolution of previous cell isolation and measurement approaches limits our understanding of the cell-specific molecular profiles. The recent establishment and rapid growth of single-cell sequencing technology has facilitated the identification of molecular profiles of heterogeneous cells, especially on the transcription level of single cells [single-cell RNA sequencing (scRNA-seq)]. As a novel method, the robustness of scRNA-seq under changing conditions will determine its practical potential in major research programs and clinical applications. In this review, we first briefly presented the scRNA-seq-related methods from the point of view of experiments and computation. Then, we compared several state-of-the-art scRNA-seq analysis frameworks mainly by analyzing their performance robustness on independent scRNA-seq datasets for the same complex disease. Finally, we elaborated on our hypothesis on consensus scRNA-seq analysis and summarized the potential indicative and predictive roles of individual cells in understanding disease heterogeneity by single-cell technologies.
Collapse
Affiliation(s)
- Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
49
|
Zhao C, Xie S, Wu H, Luan Y, Hu S, Ni J, Lin R, Zhao S, Zhang D, Li X. Quantification of allelic differential expression using a simple Fluorescence primer PCR-RFLP-based method. Sci Rep 2019; 9:6334. [PMID: 31004110 PMCID: PMC6474871 DOI: 10.1038/s41598-019-42815-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 03/29/2019] [Indexed: 12/04/2022] Open
Abstract
Allelic differential expression (ADE) is common in diploid organisms, and is often the key reason for specific phenotype variations. Thus, ADE detection is important for identification of major genes and causal mutations. To date, sensitive and simple methods to detect ADE are still lacking. In this study, we have developed an accurate, simple, and sensitive method, named fluorescence primer PCR-RFLP quantitative method (fPCR-RFLP), for ADE analysis. This method involves two rounds of PCR amplification using a pair of primers, one of which is double-labeled with an overhang 6-FAM. The two alleles are then separated by RFLP and quantified by fluorescence density. fPCR-RFLP could precisely distinguish ADE cross a range of 1- to 32-fold differences. Using this method, we verified PLAG1 and KIT, two candidate genes related to growth rate and immune response traits of pigs, to be ADE both at different developmental stages and in different tissues. Our data demonstrates that fPCR-RFLP is an accurate and sensitive method for detecting ADE on both DNA and RNA level. Therefore, this powerful tool provides a way to analyze mutations that cause ADE.
Collapse
Affiliation(s)
- Changzhi Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Shengsong Xie
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China.,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Hui Wu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Yu Luan
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Suqin Hu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Juan Ni
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Ruiyi Lin
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China.,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Dingxiao Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China. .,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China.
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China. .,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China.
| |
Collapse
|
50
|
Chen G, Ning B, Shi T. Single-Cell RNA-Seq Technologies and Related Computational Data Analysis. Front Genet 2019; 10:317. [PMID: 31024627 PMCID: PMC6460256 DOI: 10.3389/fgene.2019.00317] [Citation(s) in RCA: 497] [Impact Index Per Article: 99.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 03/21/2019] [Indexed: 12/15/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies allow the dissection of gene expression at single-cell resolution, which greatly revolutionizes transcriptomic studies. A number of scRNA-seq protocols have been developed, and these methods possess their unique features with distinct advantages and disadvantages. Due to technical limitations and biological factors, scRNA-seq data are noisier and more complex than bulk RNA-seq data. The high variability of scRNA-seq data raises computational challenges in data analysis. Although an increasing number of bioinformatics methods are proposed for analyzing and interpreting scRNA-seq data, novel algorithms are required to ensure the accuracy and reproducibility of results. In this review, we provide an overview of currently available single-cell isolation protocols and scRNA-seq technologies, and discuss the methods for diverse scRNA-seq data analyses including quality control, read mapping, gene expression quantification, batch effect correction, normalization, imputation, dimensionality reduction, feature selection, cell clustering, trajectory inference, differential expression calling, alternative splicing, allelic expression, and gene regulatory network reconstruction. Further, we outline the prospective development and applications of scRNA-seq technologies.
Collapse
Affiliation(s)
- Geng Chen
- Center for Bioinformatics and Computational Biology, and Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Baitang Ning
- National Center for Toxicological Research, United States Food and Drug Administration, Jefferson, AR, United States
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| |
Collapse
|