1
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
2
|
Missarova A, Dann E, Rosen L, Satija R, Marioni J. Leveraging neighborhood representations of single-cell data to achieve sensitive DE testing with miloDE. Genome Biol 2024; 25:189. [PMID: 39026254 PMCID: PMC11256449 DOI: 10.1186/s13059-024-03334-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 07/10/2024] [Indexed: 07/20/2024] Open
Abstract
Single-cell RNA-sequencing enables testing for differential expression (DE) between conditions at a cell type level. While powerful, one of the limitations of such approaches is that the sensitivity of DE testing is dictated by the sensitivity of clustering, which is often suboptimal. To overcome this, we present miloDE-a cluster-free framework for DE testing (available as an open-source R package). We illustrate the performance of miloDE on both simulated and real data. Using miloDE, we identify a transient hemogenic endothelia-like state in mouse embryos lacking Tal1 and detect distinct programs during macrophage activation in idiopathic pulmonary fibrosis.
Collapse
Affiliation(s)
- Alsu Missarova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Emma Dann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Leah Rosen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Rahul Satija
- Center for Genomics and Systems Biology, NYU, New York, USA.
- New York Genome Center, New York, USA.
| | - John Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
| |
Collapse
|
3
|
Duhan L, Kumari D, Naime M, Parmar VS, Chhillar AK, Dangi M, Pasrija R. Single-cell transcriptomics: background, technologies, applications, and challenges. Mol Biol Rep 2024; 51:600. [PMID: 38689046 DOI: 10.1007/s11033-024-09553-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024]
Abstract
Single-cell sequencing was developed as a high-throughput tool to elucidate unusual and transient cell states that are barely visible in the bulk. This technology reveals the evolutionary status of cells and differences between populations, helps to identify unique cell subtypes and states, reveals regulatory relationships between genes, targets and molecular mechanisms in disease processes, tumor heterogeneity, the state of the immune environment, etc. However, the high cost and technical limitations of single-cell sequencing initially prevented its widespread application, but with advances in research, numerous new single-cell sequencing techniques have been discovered, lowering the cost barrier. Many single-cell sequencing platforms and bioinformatics methods have recently become commercially available, allowing researchers to make fascinating observations. They are now increasingly being used in various industries. Several protocols have been discovered in this context and each technique has unique characteristics, capabilities and challenges. This review presents the latest advancements in single-cell transcriptomics technologies. This includes single-cell transcriptomics approaches, workflows and statistical approaches to data processing, as well as the potential advances, applications, opportunities and challenges of single-cell transcriptomics technology. You will also get an overview of the entry points for spatial transcriptomics and multi-omics.
Collapse
Affiliation(s)
- Lucky Duhan
- Department of Biochemistry, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Deepika Kumari
- Department of Biochemistry, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Mohammad Naime
- Central Research Institute of Unani Medicine (Under Central Council for Research in Unani Medicine, Ministry of Ayush, Govt of India), Uttar Pradesh, Lucknow, India
| | - Virinder S Parmar
- CUNY-Graduate Center and Departments of Chemistry, Nanoscience Program, City College & Medgar Evers College, The City University of New York, 1638 Bedford Avenue, Brooklyn, NY, 11225, USA
- Institute of Click Chemistry Research and Studies, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Anil K Chhillar
- Centre for Biotechnology, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Mehak Dangi
- Centre for Bioinformatics, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Ritu Pasrija
- Department of Biochemistry, Maharshi Dayanand University, Rohtak, Haryana, 124001, India.
| |
Collapse
|
4
|
Guo X, Ning J, Chen Y, Liu G, Zhao L, Fan Y, Sun S. Recent advances in differential expression analysis for single-cell RNA-seq and spatially resolved transcriptomic studies. Brief Funct Genomics 2024; 23:95-109. [PMID: 37022699 DOI: 10.1093/bfgp/elad011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 12/09/2022] [Accepted: 03/10/2023] [Indexed: 04/07/2023] Open
Abstract
Differential expression (DE) analysis is a necessary step in the analysis of single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) data. Unlike traditional bulk RNA-seq, DE analysis for scRNA-seq or SRT data has unique characteristics that may contribute to the difficulty of detecting DE genes. However, the plethora of DE tools that work with various assumptions makes it difficult to choose an appropriate one. Furthermore, a comprehensive review on detecting DE genes for scRNA-seq data or SRT data from multi-condition, multi-sample experimental designs is lacking. To bridge such a gap, here, we first focus on the challenges of DE detection, then highlight potential opportunities that facilitate further progress in scRNA-seq or SRT analysis, and finally provide insights and guidance in selecting appropriate DE tools or developing new computational DE methods.
Collapse
Affiliation(s)
- Xiya Guo
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Jin Ning
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Yuanze Chen
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Guoliang Liu
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Liyan Zhao
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Yue Fan
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Shiquan Sun
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| |
Collapse
|
5
|
Gorin G, Vastola JJ, Pachter L. Studying stochastic systems biology of the cell with single-cell genomics data. Cell Syst 2023; 14:822-843.e22. [PMID: 37751736 PMCID: PMC10725240 DOI: 10.1016/j.cels.2023.08.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 08/16/2023] [Accepted: 08/25/2023] [Indexed: 09/28/2023]
Abstract
Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - John J Vastola
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA.
| |
Collapse
|
6
|
Liu H, Ma W. scHiCDiff: detecting differential chromatin interactions in single-cell Hi-C data. Bioinformatics 2023; 39:btad625. [PMID: 37847655 PMCID: PMC10598576 DOI: 10.1093/bioinformatics/btad625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 08/15/2023] [Accepted: 10/16/2023] [Indexed: 10/19/2023] Open
Abstract
SUMMARY Here, we presented the scHiCDiff software tool that provides both nonparametric tests and parametirc models to detect differential chromatin interactions (DCIs) from single-cell Hi-C data. We thoroughly evaluated the scHiCDiff methods on both simulated and real data. Our results demonstrated that scHiCDiff, especially the zero-inflated negative binomial model option, can effectively detect reliable and consistent single-cell DCIs between two conditions, thereby facilitating the study of cell type-specific variations of chromatin structures at the single-cell level. AVAILABILITY AND IMPLEMENTATION scHiCDiff is implemented in R and freely available at GitHub (https://github.com/wmalab/scHiCDiff).
Collapse
Affiliation(s)
- Huiling Liu
- Department of Statistics, University of California Riverside, Riverside, CA 92521, United States
| | - Wenxiu Ma
- Department of Statistics, University of California Riverside, Riverside, CA 92521, United States
| |
Collapse
|
7
|
Liu Y, Zhao J, Adams TS, Wang N, Schupp JC, Wu W, McDonough JE, Chupp GL, Kaminski N, Wang Z, Yan X. iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects. BMC Bioinformatics 2023; 24:318. [PMID: 37608264 PMCID: PMC10463720 DOI: 10.1186/s12859-023-05432-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 07/18/2023] [Indexed: 08/24/2023] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation in scRNA-seq data with multiple subjects, which severely confounds cell type specific differential expression (DE) analysis. Moreover, dropout events are prevalent in scRNA-seq data, leading to excessive number of zeroes in the data, which further aggravates the challenge in DE analysis. RESULTS We developed iDESC to detect cell type specific DE genes between two groups of subjects in scRNA-seq data. iDESC uses a zero-inflated negative binomial mixed model to consider both subject effect and dropouts. The prevalence of dropout events (dropout rate) was demonstrated to be dependent on gene expression level, which is modeled by pooling information across genes. Subject effect is modeled as a random effect in the log-mean of the negative binomial component. We evaluated and compared the performance of iDESC with eleven existing DE analysis methods. Using simulated data, we demonstrated that iDESC had well-controlled type I error and higher power compared to the existing methods. Applications of those methods with well-controlled type I error to three real scRNA-seq datasets from the same tissue and disease showed that the results of iDESC achieved the best consistency between datasets and the best disease relevance. CONCLUSIONS iDESC was able to achieve more accurate and robust DE analysis results by separating subject effect from disease effect with consideration of dropouts to identify DE genes, suggesting the importance of considering subject effect and dropouts in the DE analysis of scRNA-seq data with multiple subjects.
Collapse
Affiliation(s)
- Yunqing Liu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Jiayi Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Taylor S Adams
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Ningya Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Jonas C Schupp
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
- Department of Respiratory Medicine, Hannover Medical School and Biomedical Research in End-Stage and Obstructive Lung Disease Hannover, German Center for Lung Research (DZL), Hannover, Germany
| | - Weimiao Wu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
- Meta Platforms, Inc, Cambridge, USA
| | - John E McDonough
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Geoffrey L Chupp
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA.
| | - Xiting Yan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA.
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA.
| |
Collapse
|
8
|
Kervadec A, Kezos J, Ni H, Yu M, Marchant J, Spiering S, Kannan S, Kwon C, Andersen P, Bodmer R, Grandi E, Ocorr K, Colas AR. Multiplatform modeling of atrial fibrillation identifies phospholamban as a central regulator of cardiac rhythm. Dis Model Mech 2023; 16:dmm049962. [PMID: 37293707 PMCID: PMC10387351 DOI: 10.1242/dmm.049962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 05/26/2023] [Indexed: 06/10/2023] Open
Abstract
Atrial fibrillation (AF) is a common and genetically inheritable form of cardiac arrhythmia; however, it is currently not known how these genetic predispositions contribute to the initiation and/or maintenance of AF-associated phenotypes. One major barrier to progress is the lack of experimental systems to investigate the effects of gene function on rhythm parameters in models with human atrial and whole-organ relevance. Here, we assembled a multi-model platform enabling high-throughput characterization of the effects of gene function on action potential duration and rhythm parameters using human induced pluripotent stem cell-derived atrial-like cardiomyocytes and a Drosophila heart model, and validation of the findings using computational models of human adult atrial myocytes and tissue. As proof of concept, we screened 20 AF-associated genes and identified phospholamban loss of function as a top conserved hit that shortens action potential duration and increases the incidence of arrhythmia phenotypes upon stress. Mechanistically, our study reveals that phospholamban regulates rhythm homeostasis by functionally interacting with L-type Ca2+ channels and NCX. In summary, our study illustrates how a multi-model system approach paves the way for the discovery and molecular delineation of gene regulatory networks controlling atrial rhythm with application to AF.
Collapse
Affiliation(s)
- Anaïs Kervadec
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - James Kezos
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Haibo Ni
- Department of Pharmacology, UC Davis, Davis, CA 95616, USA
| | - Michael Yu
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - James Marchant
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Sean Spiering
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Suraj Kannan
- Johns Hopkins University, Baltimore, MD 21205, USA
| | - Chulan Kwon
- Johns Hopkins University, Baltimore, MD 21205, USA
| | | | - Rolf Bodmer
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | | | - Karen Ocorr
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Alexandre R. Colas
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| |
Collapse
|
9
|
Gorin G, Vastola JJ, Pachter L. Studying stochastic systems biology of the cell with single-cell genomics data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.17.541250. [PMID: 37292934 PMCID: PMC10245677 DOI: 10.1101/2023.05.17.541250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, 91125
| | - John J. Vastola
- Department of Neurobiology, Harvard Medical School, Boston, MA, 02115
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125
| |
Collapse
|
10
|
Luo J, Wu X, Cheng Y, Chen G, Wang J, Song X. Expression quantitative trait locus studies in the era of single-cell omics. Front Genet 2023; 14:1182579. [PMID: 37284065 PMCID: PMC10239882 DOI: 10.3389/fgene.2023.1182579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 04/26/2023] [Indexed: 06/08/2023] Open
Abstract
Genome-wide association studies have revealed that the regulation of gene expression bridges genetic variants and complex phenotypes. Profiling of the bulk transcriptome coupled with linkage analysis (expression quantitative trait locus (eQTL) mapping) has advanced our understanding of the relationship between genetic variants and gene regulation in the context of complex phenotypes. However, bulk transcriptomics has inherited limitations as the regulation of gene expression tends to be cell-type-specific. The advent of single-cell RNA-seq technology now enables the identification of the cell-type-specific regulation of gene expression through a single-cell eQTL (sc-eQTL). In this review, we first provide an overview of sc-eQTL studies, including data processing and the mapping procedure of the sc-eQTL. We then discuss the benefits and limitations of sc-eQTL analyses. Finally, we present an overview of the current and future applications of sc-eQTL discoveries.
Collapse
Affiliation(s)
- Jie Luo
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Xinyi Wu
- Institute of Vegetables, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Yuan Cheng
- Institute of Vegetables, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Guang Chen
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Jian Wang
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Xijiao Song
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| |
Collapse
|
11
|
Luo X, Qin F, Xiao F, Cai G. BISC: accurate inference of transcriptional bursting kinetics from single-cell transcriptomic data. Brief Bioinform 2022; 23:6793779. [PMID: 36326081 DOI: 10.1093/bib/bbac464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 09/20/2022] [Accepted: 09/27/2022] [Indexed: 11/06/2022] Open
Abstract
Gene expression in mammalian cells is inherently stochastic and mRNAs are synthesized in discrete bursts. Single-cell transcriptomics provides an unprecedented opportunity to explore the transcriptome-wide kinetics of transcriptional bursting. However, current analysis methods provide limited accuracy in bursting inference due to substantial noise inherent to single-cell transcriptomic data. In this study, we developed BISC, a Bayesian method for inferring bursting parameters from single cell transcriptomic data. Based on a beta-gamma-Poisson model, BISC modeled the mean-variance dependency to achieve accurate estimation of bursting parameters from noisy data. Evaluation based on both simulation and real intron sequential RNA fluorescence in situ hybridization data showed improved accuracy and reliability of BISC over existing methods, especially for genes with low expression values. Further application of BISC found bursting frequency but not bursting size was strongly associated with gene expression regulation. Moreover, our analysis provided new mechanistic insights into the functional role of enhancer and superenhancer by modulating both bursting frequency and size. BISC also formulated a downstream framework to identify differential bursting (in frequency and size separately) genes in samples under different conditions. Applying to multiple datasets (a mouse embryonic cell and fibroblast dataset, a human immune cell dataset and a human pancreatic cell dataset), BISC identified known cell-type signature genes that were missed by differential expression analysis, providing additional insights in understanding the cell-specific stochastic gene transcription. Applying to datasets of human lung and colon cancers, BISC successfully detected tumor signature genes based on alterations in bursting kinetics, which illustrates its value in understanding disease development regarding transcriptional bursting. Collectively, BISC provides a new tool for accurately inferring bursting kinetics and detecting differential bursting genes. This study also produced new insights in the role of transcriptional bursting in regulating gene expression, cell identity and tumor progression.
Collapse
Affiliation(s)
- Xizhi Luo
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Fei Qin
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Feifei Xiao
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA
| | - Guoshuai Cai
- Department of Environmental Health Science, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| |
Collapse
|
12
|
Sardoo AM, Zhang S, Ferraro TN, Keck TM, Chen Y. Decoding brain memory formation by single-cell RNA sequencing. Brief Bioinform 2022; 23:6713514. [PMID: 36156112 PMCID: PMC9677489 DOI: 10.1093/bib/bbac412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/10/2022] [Accepted: 08/25/2022] [Indexed: 12/14/2022] Open
Abstract
To understand how distinct memories are formed and stored in the brain is an important and fundamental question in neuroscience and computational biology. A population of neurons, termed engram cells, represents the physiological manifestation of a specific memory trace and is characterized by dynamic changes in gene expression, which in turn alters the synaptic connectivity and excitability of these cells. Recent applications of single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) are promising approaches for delineating the dynamic expression profiles in these subsets of neurons, and thus understanding memory-specific genes, their combinatorial patterns and regulatory networks. The aim of this article is to review and discuss the experimental and computational procedures of sc/snRNA-seq, new studies of molecular mechanisms of memory aided by sc/snRNA-seq in human brain diseases and related mouse models, and computational challenges in understanding the regulatory mechanisms underlying long-term memory formation.
Collapse
Affiliation(s)
- Atlas M Sardoo
- Department of Biological & Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Thomas N Ferraro
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, NJ 08103, USA
| | - Thomas M Keck
- Department of Biological & Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA,Department of Chemistry & Biochemistry, Rowan University, Glassboro, NJ 08028, USA
| | - Yong Chen
- Corresponding author. Yong Chen, Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA. Tel.: +1 856 256 4500; E-mail:
| |
Collapse
|
13
|
Gorin G, Fang M, Chari T, Pachter L. RNA velocity unraveled. PLoS Comput Biol 2022; 18:e1010492. [PMID: 36094956 PMCID: PMC9499228 DOI: 10.1371/journal.pcbi.1010492] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 09/22/2022] [Accepted: 08/14/2022] [Indexed: 11/24/2022] Open
Abstract
We perform a thorough analysis of RNA velocity methods, with a view towards understanding the suitability of the various assumptions underlying popular implementations. In addition to providing a self-contained exposition of the underlying mathematics, we undertake simulations and perform controlled experiments on biological datasets to assess workflow sensitivity to parameter choices and underlying biology. Finally, we argue for a more rigorous approach to RNA velocity, and present a framework for Markovian analysis that points to directions for improvement and mitigation of current problems. Single-cell sequencing data are snapshots of biological processes, making it challenging to infer dynamic relationships between cell types. RNA velocity attempts to bypass this challenge by treating the unspliced RNA content as a proxy for spliced RNA content in the near future, and using this “extrapolation” to build directional relationships. However, the method, as implemented in several software packages, is not yet reliable enough to be actionable, in part due to the large number of arbitrary, user-set hyperparameters, as well as fundamental incompatibilities between the biophysics of transcription in the living cell and the models used throughout the velocity workflows. In this study, we review these issues, and use existing results from the fields of stochastic modeling and fluorescence transcriptomics to develop an alternative theoretical framework. We show that our framework can facilitate the development and inference of physically consistent models for sequencing data, as well as the unification of single-cell analyses to self-consistently treat variation due to cell type dynamics and identities, the stochasticity inherent to single-molecule processes, and the uncertainty introduced by sequencing experiments.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Meichen Fang
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, United States of America
- * E-mail:
| |
Collapse
|
14
|
Jones A, Townes FW, Li D, Engelhardt BE. Contrastive latent variable modeling with application to case-control sequencing experiments. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Andrew Jones
- Department of Computer Science, Princeton University
| | | | - Didong Li
- Department of Computer Science, Princeton University
| | | |
Collapse
|
15
|
Zhang S, Xie L, Cui Y, Carone BR, Chen Y. Detecting Fear-Memory-Related Genes from Neuronal scRNA-seq Data by Diverse Distributions and Bhattacharyya Distance. Biomolecules 2022; 12:biom12081130. [PMID: 36009024 PMCID: PMC9405875 DOI: 10.3390/biom12081130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/12/2022] [Accepted: 08/15/2022] [Indexed: 11/16/2022] Open
Abstract
The detection of differentially expressed genes (DEGs) is one of most important computational challenges in the analysis of single-cell RNA sequencing (scRNA-seq) data. However, due to the high heterogeneity and dropout noise inherent in scRNAseq data, challenges in detecting DEGs exist when using a single distribution of gene expression levels, leaving much room to improve the precision and robustness of current DEG detection methods. Here, we propose the use of a new method, DEGman, which utilizes several possible diverse distributions in combination with Bhattacharyya distance. DEGman can automatically select the best-fitting distributions of gene expression levels, and then detect DEGs by permutation testing of Bhattacharyya distances of the selected distributions from two cell groups. Compared with several popular DEG analysis tools on both large-scale simulation data and real scRNA-seq data, DEGman shows an overall improvement in the balance of sensitivity and precision. We applied DEGman to scRNA-seq data of TRAP; Ai14 mouse neurons to detect fear-memory-related genes that are significantly differentially expressed in neurons with and without fear memory. DEGman detected well-known fear-memory-related genes and many novel candidates. Interestingly, we found 25 DEGs in common in five neuron clusters that are functionally enriched for synaptic vesicles, indicating that the coupled dynamics of synaptic vesicles across in neurons plays a critical role in remote memory formation. The proposed method leverages the advantage of the use of diverse distributions in DEG analysis, exhibiting better performance in analyzing composite scRNA-seq datasets in real applications.
Collapse
Affiliation(s)
- Shaoqiang Zhang
- Department of Computer Science, College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Linjuan Xie
- Department of Computer Science, College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yaxuan Cui
- Department of Computer Science, College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Benjamin R. Carone
- Department of Biology and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| | - Yong Chen
- Department of Biology and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
- Correspondence: ; Tel.: +1-856-256-4500
| |
Collapse
|
16
|
Das S, Rai A, Rai SN. Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges. ENTROPY 2022; 24:e24070995. [PMID: 35885218 PMCID: PMC9315519 DOI: 10.3390/e24070995] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/25/2022] [Accepted: 07/09/2022] [Indexed: 01/11/2023]
Abstract
With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expression dynamics of genes at the single-cell level. Through scRNA-seq, a huge amount of expression data for several thousand(s) of genes over million(s) of cells are generated in a single experiment. Differential expression analysis is the primary downstream analysis of such data to identify gene markers for cell type detection and also provide inputs to other secondary analyses. Many statistical approaches for differential expression analysis have been reported in the literature. Therefore, we critically discuss the underlying statistical principles of the approaches and distinctly divide them into six major classes, i.e., generalized linear, generalized additive, Hurdle, mixture models, two-class parametric, and non-parametric approaches. We also succinctly discuss the limitations that are specific to each class of approaches, and how they are addressed by other subsequent classes of approach. A number of challenges are identified in this study that must be addressed to develop the next class of innovative approaches. Furthermore, we also emphasize the methodological challenges involved in differential expression analysis of scRNA-seq data that researchers must address to draw maximum benefit from this recent single-cell technology. This study will serve as a guide to genome researchers and experimental biologists to objectively select options for their analysis.
Collapse
Affiliation(s)
- Samarendra Das
- ICAR-Directorate of Foot and Mouth Disease, Arugul, Bhubaneswar 752050, India
- International Centre for Foot and Mouth Disease, Arugul, Bhubaneswar 752050, India
- Correspondence: or (S.D.); (S.N.R.)
| | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India;
| | - Shesh N. Rai
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
- Biostatistics and Bioinformatics Facility, Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA
- Biostatisitcs and Informatics Facility, Center for Integrative Environmental Health Sciences, University of Louisville, Louisville, KY 40202, USA
- Data Analysis and Sample Management Facility, The University of Louisville Super Fund Center, University of Louisville, Louisville, KY 40202, USA
- Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY 40202, USA
- Christina Lee Brown Envirome Institute, University of Louisville, Louisville, KY 40202, USA
- Correspondence: or (S.D.); (S.N.R.)
| |
Collapse
|
17
|
Zhang M, Guo FR. BSDE: barycenter single-cell differential expression for case-control studies. Bioinformatics 2022; 38:2765-2772. [PMID: 35561165 PMCID: PMC9113363 DOI: 10.1093/bioinformatics/btac171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 03/14/2022] [Accepted: 03/23/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Single-cell sequencing brings about a revolutionarily high resolution for finding differentially expressed genes (DEGs) by disentangling highly heterogeneous cell tissues. Yet, such analysis is so far mostly focused on comparing between different cell types from the same individual. As single-cell sequencing becomes cheaper and easier to use, an increasing number of datasets from case-control studies are becoming available, which call for new methods for identifying differential expressions between case and control individuals. RESULTS To bridge this gap, we propose barycenter single-cell differential expression (BSDE), a nonparametric method for finding DEGs for case-control studies. Through the use of optimal transportation for aggregating distributions and computing their distances, our method overcomes the restrictive parametric assumptions imposed by standard mixed-effect-modeling approaches. Through simulations, we show that BSDE can accurately detect a variety of differential expressions while maintaining the type-I error at a prescribed level. Further, 1345 and 1568 cell type-specific DEGs are identified by BSDE from datasets on pulmonary fibrosis and multiple sclerosis, among which the top findings are supported by previous results from the literature. AVAILABILITY AND IMPLEMENTATION R package BSDE is freely available from doi.org/10.5281/zenodo.6332254. For real data analysis with the R package, see doi.org/10.5281/zenodo.6332566. These can also be accessed thorough GitHub at github.com/mqzhanglab/BSDE and github.com/mqzhanglab/BSDE_pipeline. The two single-cell sequencing datasets can be download with UCSC cell browser from cells.ucsc.edu/?ds=ms and cells.ucsc.edu/?ds=lung-pf-control. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mengqi Zhang
- Department of Surgery, Perelman Medical School, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | |
Collapse
|
18
|
Zhu B, Li H, Zhang L, Chandra SS, Zhao H. A Markov random field model-based approach for differentially expressed gene detection from single-cell RNA-seq data. Brief Bioinform 2022; 23:6581434. [PMID: 35514182 PMCID: PMC9487630 DOI: 10.1093/bib/bbac166] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/02/2022] [Accepted: 04/13/2022] [Indexed: 11/13/2022] Open
Abstract
The development of single-cell RNA-sequencing (scRNA-seq) technologies has offered insights into complex biological systems at the single-cell resolution. In particular, these techniques facilitate the identifications of genes showing cell-type-specific differential expressions (DE). In this paper, we introduce MARBLES, a novel statistical model for cross-condition DE gene detection from scRNA-seq data. MARBLES employs a Markov Random Field model to borrow information across similar cell types and utilizes cell-type-specific pseudobulk count to account for sample-level variability. Our simulation results showed that MARBLES is more powerful than existing methods to detect DE genes with an appropriate control of false positive rate. Applications of MARBLES to real data identified novel disease-related DE genes and biological pathways from both a single-cell lipopolysaccharide mouse dataset with 24 381 cells and 11 076 genes and a Parkinson's disease human data set with 76 212 cells and 15 891 genes. Overall, MARBLES is a powerful tool to identify cell-type-specific DE genes across conditions from scRNA-seq data.
Collapse
Affiliation(s)
- Biqing Zhu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
| | - Hongyu Li
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, 06511, USA
| | - Le Zhang
- Department of Neurology, School of Medicine, Yale University, New Haven, CT, 06511, USA
| | - Sreeganga S Chandra
- Department of Neurology, School of Medicine, Yale University, New Haven, CT, 06511, USA,Department of Neuroscience, School of Medicine, Yale University, New Haven, CT, 06511, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA,Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, 06511, USA,Corresponding author. Hongyu Zhao, 300 George Street, Ste 503, New Haven, CT 06511. E-mail:
| |
Collapse
|
19
|
Wang M, Song WM, Ming C, Wang Q, Zhou X, Xu P, Krek A, Yoon Y, Ho L, Orr ME, Yuan GC, Zhang B. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application. Mol Neurodegener 2022; 17:17. [PMID: 35236372 PMCID: PMC8889402 DOI: 10.1186/s13024-022-00517-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/18/2022] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Won-min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Chen Ming
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Qian Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Xianxiao Zhou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Peng Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Yonejung Yoon
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Lap Ho
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Miranda E. Orr
- Department of Internal Medicine, Section of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
- Sticht Center for Healthy Aging and Alzheimer’s Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| |
Collapse
|
20
|
Missarova A, Jain J, Butler A, Ghazanfar S, Stuart T, Brusko M, Wasserfall C, Nick H, Brusko T, Atkinson M, Satija R, Marioni JC. geneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq. Genome Biol 2021; 22:333. [PMID: 34872616 PMCID: PMC8650258 DOI: 10.1186/s13059-021-02548-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 11/19/2021] [Indexed: 12/13/2022] Open
Abstract
scRNA-seq datasets are increasingly used to identify gene panels that can be probed using alternative technologies, such as spatial transcriptomics, where choosing the best subset of genes is vital. Existing methods are limited by a reliance on pre-existing cell type labels or by difficulties in identifying markers of rare cells. We introduce an iterative approach, geneBasis, for selecting an optimal gene panel, where each newly added gene captures the maximum distance between the true manifold and the manifold constructed using the currently selected gene panel. Our approach outperforms existing strategies and can resolve cell types and subtle cell state differences.
Collapse
Affiliation(s)
- Alsu Missarova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | | | - Andrew Butler
- New York Genome Center, New York, USA
- Center for Genomics and Systems Biology, NYU, New York, USA
| | - Shila Ghazanfar
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Tim Stuart
- New York Genome Center, New York, USA
- Center for Genomics and Systems Biology, NYU, New York, USA
| | - Maigan Brusko
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Clive Wasserfall
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Harry Nick
- Department of Neuroscience, College of Medicine, University of Florida, Jacksonville, USA
| | - Todd Brusko
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Mark Atkinson
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Rahul Satija
- New York Genome Center, New York, USA.
- Center for Genomics and Systems Biology, NYU, New York, USA.
| | - John C Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
21
|
Das S, Rai A, Merchant ML, Cave MC, Rai SN. A Comprehensive Survey of Statistical Approaches for Differential Expression Analysis in Single-Cell RNA Sequencing Studies. Genes (Basel) 2021; 12:1947. [PMID: 34946896 PMCID: PMC8701051 DOI: 10.3390/genes12121947] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 11/27/2021] [Accepted: 11/27/2021] [Indexed: 12/13/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput sequencing technique for studying gene expressions at the cell level. Differential Expression (DE) analysis is a major downstream analysis of scRNA-seq data. DE analysis the in presence of noises from different sources remains a key challenge in scRNA-seq. Earlier practices for addressing this involved borrowing methods from bulk RNA-seq, which are based on non-zero differences in average expressions of genes across cell populations. Later, several methods specifically designed for scRNA-seq were developed. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to comprehensively study the performance of DE analysis methods. Here, we provide a review and classification of different DE approaches adapted from bulk RNA-seq practice as well as those specifically designed for scRNA-seq. We also evaluate the performance of 19 widely used methods in terms of 13 performance metrics on 11 real scRNA-seq datasets. Our findings suggest that some bulk RNA-seq methods are quite competitive with the single-cell methods and their performance depends on the underlying models, DE test statistic(s), and data characteristics. Further, it is difficult to obtain the method which will be best-performing globally through individual performance criterion. However, the multi-criteria and combined-data analysis indicates that DECENT and EBSeq are the best options for DE analysis. The results also reveal the similarities among the tested methods in terms of detecting common DE genes. Our evaluation provides proper guidelines for selecting the proper tool which performs best under particular experimental settings in the context of the scRNA-seq.
Collapse
Affiliation(s)
- Samarendra Das
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India;
- Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
| | - Anil Rai
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India;
| | - Michael L. Merchant
- Department of Medicine, School of Medicine, University of Louisville, Louisville, KY 40202, USA;
- Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY 40202, USA
| | - Matthew C. Cave
- Biostatistics and Informatics Facility, Center for Integrative Environmental Health Sciences, University of Louisville, Louisville, KY 40202, USA;
| | - Shesh N. Rai
- Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
- Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY 40202, USA
- Biostatistics and Informatics Facility, Center for Integrative Environmental Health Sciences, University of Louisville, Louisville, KY 40202, USA;
- Christina Lee Brown Envirome Institute, University of Louisville, Louisville, KY 40202, USA
- Department of Bioinformatics and Biostatistics, School of Public Health and Information Science, University of Louisville, Louisville, KY 40202, USA
| |
Collapse
|
22
|
Li H, Zhu B, Xu Z, Adams T, Kaminski N, Zhao H. A Markov random field model for network-based differential expression analysis of single-cell RNA-seq data. BMC Bioinformatics 2021; 22:524. [PMID: 34702190 PMCID: PMC8549347 DOI: 10.1186/s12859-021-04412-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 09/15/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. In this article, we propose to borrow information through known biological networks to increase statistical power to identify differentially expressed genes (DEGs). RESULTS We develop MRFscRNAseq, which is based on a Markov random field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DEGs. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DEGs than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls. CONCLUSIONS The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method increases statistical power to detect differentially expressed genes from scRNA-seq data.
Collapse
Affiliation(s)
- Hongyu Li
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06511 USA
| | - Biqing Zhu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511 USA
| | - Zhichao Xu
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06511 USA
| | - Taylor Adams
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT 06520 USA
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT 06520 USA
| | - Hongyu Zhao
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06511 USA
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511 USA
| |
Collapse
|
23
|
Desai RV, Chen X, Martin B, Chaturvedi S, Hwang DW, Li W, Yu C, Ding S, Thomson M, Singer RH, Coleman RA, Hansen MMK, Weinberger LS. A DNA repair pathway can regulate transcriptional noise to promote cell fate transitions. Science 2021; 373:science.abc6506. [PMID: 34301855 DOI: 10.1126/science.abc6506] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 07/08/2021] [Indexed: 12/13/2022]
Abstract
Stochastic fluctuations in gene expression ("noise") are often considered detrimental, but fluctuations can also be exploited for benefit (e.g., dither). We show here that DNA base excision repair amplifies transcriptional noise to facilitate cellular reprogramming. Specifically, the DNA repair protein Apex1, which recognizes both naturally occurring and unnatural base modifications, amplifies expression noise while homeostatically maintaining mean expression levels. This amplified expression noise originates from shorter-duration, higher-intensity transcriptional bursts generated by Apex1-mediated DNA supercoiling. The remodeling of DNA topology first impedes and then accelerates transcription to maintain mean levels. This mechanism, which we refer to as "discordant transcription through repair" ("DiThR," which is pronounced "dither"), potentiates cellular reprogramming and differentiation. Our study reveals a potential functional role for transcriptional fluctuations mediated by DNA base modifications in embryonic development and disease.
Collapse
Affiliation(s)
- Ravi V Desai
- Gladstone/UCSF Center for Cell Circuitry, Gladstone Institutes, San Francisco, CA 94158, USA.,Medical Scientist Training Program and Tetrad Graduate Program, University of California, San Francisco, CA 94158, USA
| | - Xinyue Chen
- Gladstone/UCSF Center for Cell Circuitry, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Benjamin Martin
- Gladstone/UCSF Center for Cell Circuitry, Gladstone Institutes, San Francisco, CA 94158, USA.,Institute for Molecules and Materials, Radboud University, 6525 AJ Nijmegen, the Netherlands
| | - Sonali Chaturvedi
- Gladstone/UCSF Center for Cell Circuitry, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Dong Woo Hwang
- Department of Anatomy and Structural Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Weihan Li
- Department of Anatomy and Structural Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Chen Yu
- Gladstone Institute of Cardiovascular Disease, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Sheng Ding
- Gladstone Institute of Cardiovascular Disease, Gladstone Institutes, San Francisco, CA 94158, USA.,School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Matt Thomson
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Robert H Singer
- Department of Anatomy and Structural Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Robert A Coleman
- Department of Anatomy and Structural Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Maike M K Hansen
- Institute for Molecules and Materials, Radboud University, 6525 AJ Nijmegen, the Netherlands
| | - Leor S Weinberger
- Gladstone/UCSF Center for Cell Circuitry, Gladstone Institutes, San Francisco, CA 94158, USA. .,Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA.,Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94158, USA
| |
Collapse
|
24
|
Ma X, Korthauer K, Kendziorski C, Newton MA. A compositional model to assess expression changes from single-cell RNA-seq data. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Xiuyu Ma
- Department of Statistics, University of Wisconsin–Madison
| | | | - Christina Kendziorski
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison
| | | |
Collapse
|
25
|
Li L, Xiong F, Wang Y, Zhang S, Gong Z, Li X, He Y, Shi L, Wang F, Liao Q, Xiang B, Zhou M, Li X, Li Y, Li G, Zeng Z, Xiong W, Guo C. What are the applications of single-cell RNA sequencing in cancer research: a systematic review. JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH : CR 2021; 40:163. [PMID: 33975628 PMCID: PMC8111731 DOI: 10.1186/s13046-021-01955-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 12/18/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a tool for studying gene expression at the single-cell level that has been widely used due to its unprecedented high resolution. In the present review, we outline the preparation process and sequencing platforms for the scRNA-seq analysis of solid tumor specimens and discuss the main steps and methods used during data analysis, including quality control, batch-effect correction, normalization, cell cycle phase assignment, clustering, cell trajectory and pseudo-time reconstruction, differential expression analysis and gene set enrichment analysis, as well as gene regulatory network inference. Traditional bulk RNA sequencing does not address the heterogeneity within and between tumors, and since the development of the first scRNA-seq technique, this approach has been widely used in cancer research to better understand cancer cell biology and pathogenetic mechanisms. ScRNA-seq has been of great significance for the development of targeted therapy and immunotherapy. In the second part of this review, we focus on the application of scRNA-seq in solid tumors, and summarize the findings and achievements in tumor research afforded by its use. ScRNA-seq holds promise for improving our understanding of the molecular characteristics of cancer, and potentially contributing to improved diagnosis, prognosis, and therapeutics.
Collapse
Affiliation(s)
- Lvyuan Li
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Fang Xiong
- Department of Stomatology, Xiangya Hospital, Central South University, Changsha, China
| | - Yumin Wang
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China.,Department of Stomatology, Xiangya Hospital, Central South University, Changsha, China
| | - Shanshan Zhang
- Department of Stomatology, Xiangya Hospital, Central South University, Changsha, China
| | - Zhaojian Gong
- Department of Oral and Maxillofacial Surgery, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Xiayu Li
- Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Disease Genome Research Center, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Yi He
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China
| | - Lei Shi
- Department of Oral and Maxillofacial Surgery, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Fuyan Wang
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Qianjin Liao
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China
| | - Bo Xiang
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Ming Zhou
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Xiaoling Li
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Yong Li
- Department of Medicine, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Guiyuan Li
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Zhaoyang Zeng
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Wei Xiong
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China. .,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China.
| | - Can Guo
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China. .,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China.
| |
Collapse
|
26
|
Thurman AL, Ratcliff JA, Chimenti MS, Pezzulo AA. Differential gene expression analysis for multi-subject single cell RNA sequencing studies with aggregateBioVar. Bioinformatics 2021; 37:3243-3251. [PMID: 33970215 PMCID: PMC8504643 DOI: 10.1093/bioinformatics/btab337] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 04/07/2021] [Accepted: 04/30/2021] [Indexed: 11/14/2022] Open
Abstract
Motivation Single-cell RNA-sequencing (scRNA-seq) provides more granular biological information than bulk RNA-sequencing; bulk RNA sequencing remains popular due to lower costs which allows processing more biological replicates and design more powerful studies. As scRNA-seq costs have decreased, collecting data from more than one biological replicate has become more feasible, but careful modeling of different layers of biological variation remains challenging for many users. Here, we propose a statistical model for scRNA-seq gene counts, describe a simple method for estimating model parameters and show that failing to account for additional biological variation in scRNA-seq studies can inflate false discovery rates (FDRs) of statistical tests. Results First, in a simulation study, we show that when the gene expression distribution of a population of cells varies between subjects, a naïve approach to differential expression analysis will inflate the FDR. We then compare multiple differential expression testing methods on scRNA-seq datasets from human samples and from animal models. These analyses suggest that a naïve approach to differential expression testing could lead to many false discoveries; in contrast, an approach based on pseudobulk counts has better FDR control. Availability and implementation A software package, aggregateBioVar, is freely available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/aggregateBioVar.html) to accommodate compatibility with upstream and downstream methods in scRNA-seq data analysis pipelines. Supplementary information Raw gene-by-cell count matrices for pig scRNA-seq data are available as GEO accession GSE150211. Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrew L Thurman
- Department of Internal Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, IA 52242, USA
- To whom correspondence should be addressed. or
| | - Jason A Ratcliff
- Iowa Institute of Human Genetics, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, IA 52242, USA
| | - Michael S Chimenti
- Iowa Institute of Human Genetics, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, IA 52242, USA
| | - Alejandro A Pezzulo
- Department of Internal Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, IA 52242, USA
- To whom correspondence should be addressed. or
| |
Collapse
|
27
|
Kim HJ, Tam PPL, Yang P. Defining cell identity beyond the premise of differential gene expression. CELL REGENERATION (LONDON, ENGLAND) 2021; 10:20. [PMID: 33931812 PMCID: PMC8087741 DOI: 10.1186/s13619-021-00083-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Identifying genes that define cell identity is a requisite step for characterising cell types and cell states and predicting cell fate choices. By far, the most widely used approach for this task is based on differential expression (DE) of genes, whereby the shift of mean expression are used as the primary statistics for identifying gene transcripts that are specific to cell types and states. While DE-based methods are useful for pinpointing genes that discriminate cell types, their reliance on measuring difference in mean expression may not reflect the biological attributes of cell identity genes. Here, we highlight the quest for non-DE methods and provide an overview of these methods and their applications to identify genes that define cell identity and functionality.
Collapse
Affiliation(s)
- Hani Jieun Kim
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, 2145, Australia.,Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Patrick P L Tam
- Embryology Unit, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, 2145, Australia.,School of Medical Science, Faculty of Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Pengyi Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia. .,Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, 2145, Australia. .,Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia. .,School of Medical Science, Faculty of Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, 2006, Australia.
| |
Collapse
|
28
|
Das S, Rai SN. SwarnSeq: An improved statistical approach for differential expression analysis of single-cell RNA-seq data. Genomics 2021; 113:1308-1324. [PMID: 33662531 PMCID: PMC10150572 DOI: 10.1016/j.ygeno.2021.02.014] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/22/2021] [Accepted: 02/22/2021] [Indexed: 11/27/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a powerful technology that is capable of generating gene expression data at the resolution of individual cell. The scRNA-seq data is characterized by the presence of dropout events, which severely bias the results if they remain unaddressed. There are limited Differential Expression (DE) approaches which consider the biological processes, which lead to dropout events, in the modeling process. So, we develop, SwarnSeq, an improved method for DE, and other downstream analysis that considers the molecular capture process in scRNA-seq data modeling. The performance of the proposed method is benchmarked with 11 existing methods on 10 different real scRNA-seq datasets under three comparison settings. We demonstrate that SwarnSeq method has improved performance over the 11 existing methods. This improvement is consistently observed across several public scRNA-seq datasets generated using different scRNA-seq protocols. The external spike-ins data can be used in the SwarnSeq method to enhance its performance. AVAILABILITY AND IMPLEMENTATION: The method is implemented as a publicly available R package available at https://github.com/sam-uofl/SwarnSeq.
Collapse
Affiliation(s)
- Samarendra Das
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India; Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA; School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA.
| | - Shesh N Rai
- Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA; School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA; Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY 40202, USA; Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA; Biostatistics and Informatics Facility, Center for Integrative Environmental Research Sciences, University of Louisville, Louisville, KY 40202, USA; Christina Lee Brown Envirome Institute, University of Louisville, Louisville, KY 40202, USA.
| |
Collapse
|
29
|
Adil A, Kumar V, Jan AT, Asger M. Single-Cell Transcriptomics: Current Methods and Challenges in Data Acquisition and Analysis. Front Neurosci 2021; 15:591122. [PMID: 33967674 PMCID: PMC8100238 DOI: 10.3389/fnins.2021.591122] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 03/19/2021] [Indexed: 11/17/2022] Open
Abstract
Rapid cost drops and advancements in next-generation sequencing have made profiling of cells at individual level a conventional practice in scientific laboratories worldwide. Single-cell transcriptomics [single-cell RNA sequencing (SC-RNA-seq)] has an immense potential of uncovering the novel basis of human life. The well-known heterogeneity of cells at the individual level can be better studied by single-cell transcriptomics. Proper downstream analysis of this data will provide new insights into the scientific communities. However, due to low starting materials, the SC-RNA-seq data face various computational challenges: normalization, differential gene expression analysis, dimensionality reduction, etc. Additionally, new methods like 10× Chromium can profile millions of cells in parallel, which creates a considerable amount of data. Thus, single-cell data handling is another big challenge. This paper reviews the single-cell sequencing methods, library preparation, and data generation. We highlight some of the main computational challenges that require to be addressed by introducing new bioinformatics algorithms and tools for analysis. We also show single-cell transcriptomics data as a big data problem.
Collapse
Affiliation(s)
- Asif Adil
- Department of Computer Sciences, Baba Ghulam Shah Badshah University, Rajouri, India
| | - Vijay Kumar
- Department of Biotechnology, Yeungnam University, Gyeongsan, South Korea
| | - Arif Tasleem Jan
- School of Biosciences and Biotechnology, Baba Ghulam Shah Badshah University, Rajouri, India
| | - Mohammed Asger
- Department of Computer Sciences, Baba Ghulam Shah Badshah University, Rajouri, India
| |
Collapse
|
30
|
Software Benchmark—Classification Tree Algorithms for Cell Atlases Annotation Using Single-Cell RNA-Sequencing Data. MICROBIOLOGY RESEARCH 2021. [DOI: 10.3390/microbiolres12020022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Classification tree is a widely used machine learning method. It has multiple implementations as R packages; rpart, ctree, evtree, tree and C5.0. The details of these implementations are not the same, and hence their performances differ from one application to another. We are interested in their performance in the classification of cells using the single-cell RNA-Sequencing data. In this paper, we conducted a benchmark study using 22 Single-Cell RNA-sequencing data sets. Using cross-validation, we compare packages’ prediction performances based on their Precision, Recall, F1-score, Area Under the Curve (AUC). We also compared the Complexity and Run-time of these R packages. Our study shows that rpart and evtree have the best Precision; evtree is the best in Recall, F1-score and AUC; C5.0 prefers more complex trees; tree is consistently much faster than others, although its complexity is often higher than others.
Collapse
|
31
|
Cui L, Wang B, Ren C, Wang A, An H, Liang W. A Novel Method to Identify the Differences Between Two Single Cell Groups at Single Gene, Gene Pair, and Gene Module Levels. Front Genet 2021; 12:648898. [PMID: 33790951 PMCID: PMC8005607 DOI: 10.3389/fgene.2021.648898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Accepted: 02/15/2021] [Indexed: 11/13/2022] Open
Abstract
Single-cell sequencing technology can not only view the heterogeneity of cells from a molecular perspective, but also discover new cell types. Although there are many effective methods on dropout imputation, cell clustering, and lineage reconstruction based on single cell RNA sequencing (RNA-seq) data, there is no systemic pipeline on how to compare two single cell clusters at the molecular level. In the study, we present a novel pipeline on comparing two single cell clusters, including calling differential gene expression, coexpression network modules, and so on. The pipeline could reveal mechanisms behind the biological difference between cell clusters and cell types, and identify cell type specific molecular mechanisms. We applied the pipeline to two famous single-cell databases, Usoskin from mouse brain and Xin from human pancreas, which contained 622 and 1,600 cells, respectively, both of which were composed of four types of cells. As a result, we identified many significant differential genes, differential gene coexpression and network modules among the cell clusters, which confirmed that different cell clusters might perform different functions.
Collapse
Affiliation(s)
- Lingyu Cui
- School of Science, Dalian Maritime University, Dalian, China
| | - Bo Wang
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Changjing Ren
- School of Science, Dalian Maritime University, Dalian, China
| | - Ailan Wang
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Hong An
- Guangzhou Anjie Biomedical Technology Co., Ltd., Guangzhou, China
| | - Wei Liang
- Medical Clinical Laboratory, The Second People's Hospital of Lianyungang, Lianyungang, China
| |
Collapse
|
32
|
Handling the Cellular Complex Systems in Alzheimer’s Disease Through a Graph Mining Approach. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1338:135-144. [DOI: 10.1007/978-3-030-78775-2_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
33
|
Zhang W, Wei Y, Zhang D, Xu EY. ZIAQ: a quantile regression method for differential expression analysis of single-cell RNA-seq data. Bioinformatics 2020; 36:3124-3130. [PMID: 32053182 DOI: 10.1093/bioinformatics/btaa098] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 01/11/2020] [Accepted: 02/06/2020] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) has enabled the simultaneous transcriptomic profiling of individual cells under different biological conditions. scRNA-seq data have two unique challenges that can affect the sensitivity and specificity of single-cell differential expression analysis: a large proportion of expressed genes with zero or low read counts ('dropout' events) and multimodal data distributions. RESULTS We have developed a zero-inflation-adjusted quantile (ZIAQ) algorithm, which is the first method to account for both dropout rates and complex scRNA-seq data distributions in the same model. ZIAQ demonstrates superior performance over several existing methods on simulated scRNA-seq datasets by finding more differentially expressed genes. When ZIAQ was applied to the comparison of neoplastic and non-neoplastic cells from a human glioblastoma dataset, the ranking of biologically relevant genes and pathways showed clear improvement over existing methods. AVAILABILITY AND IMPLEMENTATION ZIAQ is implemented in the R language and available at https://github.com/gefeizhang/ZIAQ. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenfei Zhang
- Department of Biostatistics and Programming, Sanofi, Framingham, MA 01701, USA
| | - Ying Wei
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Donghui Zhang
- Department of Biostatistics and Programming, Sanofi, Framingham, MA 01701, USA
| | - Ethan Y Xu
- Translational Sciences, Sanofi, Framingham, MA 01701, USA
| |
Collapse
|
34
|
Abstract
BACKGROUND With the explosion in the number of methods designed to analyze bulk and single-cell RNA-seq data, there is a growing need for approaches that assess and compare these methods. The usual technique is to compare methods on data simulated according to some theoretical model. However, as real data often exhibit violations from theoretical models, this can result in unsubstantiated claims of a method's performance. RESULTS Rather than generate data from a theoretical model, in this paper we develop methods to add signal to real RNA-seq datasets. Since the resulting simulated data are not generated from an unrealistic theoretical model, they exhibit realistic (annoying) attributes of real data. This lets RNA-seq methods developers assess their procedures in non-ideal (model-violating) scenarios. Our procedures may be applied to both single-cell and bulk RNA-seq. We show that our simulation method results in more realistic datasets and can alter the conclusions of a differential expression analysis study. We also demonstrate our approach by comparing various factor analysis techniques on RNA-seq datasets. CONCLUSIONS Using data simulated from a theoretical model can substantially impact the results of a study. We developed more realistic simulation techniques for RNA-seq data. Our tools are available in the seqgendiff R package on the Comprehensive R Archive Network: https://cran.r-project.org/package=seqgendiff.
Collapse
Affiliation(s)
- David Gerard
- Department of Mathematics and Statistics, American University, Massachusetts Ave NW, Washington, DC, 20016, USA.
| |
Collapse
|
35
|
SCeQTL: an R package for identifying eQTL from single-cell parallel sequencing data. BMC Bioinformatics 2020; 21:184. [PMID: 32393315 PMCID: PMC7216638 DOI: 10.1186/s12859-020-3534-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 05/05/2020] [Indexed: 11/16/2022] Open
Abstract
Background With the rapid development of single-cell genomics, technologies for parallel sequencing of the transcriptome and genome in each single cell is being explored in several labs and is becoming available. This brings us the opportunity to uncover association between genotypes and gene expression phenotypes at single-cell level by eQTL analysis on single-cell data. New method is needed for such tasks due to special characteristics of single-cell sequencing data. Results We developed an R package SCeQTL that uses zero-inflated negative binomial regression to do eQTL analysis on single-cell data. It can distinguish two type of gene-expression differences among different genotype groups. It can also be used for finding gene expression variations associated with other grouping factors like cell lineages or cell types. Conclusions The SCeQTL method is capable for eQTL analysis on single-cell data as well as detecting associations of gene expression with other grouping factors. The R package of the method is available at https://github.com/XuegongLab/SCeQTL/.
Collapse
|
36
|
Domingues AF, Kulkarni R, Giotopoulos G, Gupta S, Vinnenberg L, Arede L, Foerner E, Khalili M, Adao RR, Johns A, Tan S, Zeka K, Huntly BJ, Prabakaran S, Pina C. Loss of Kat2a enhances transcriptional noise and depletes acute myeloid leukemia stem-like cells. eLife 2020; 9:e51754. [PMID: 31985402 PMCID: PMC7039681 DOI: 10.7554/elife.51754] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 01/24/2020] [Indexed: 12/21/2022] Open
Abstract
Acute Myeloid Leukemia (AML) is an aggressive hematological malignancy with abnormal progenitor self-renewal and defective white blood cell differentiation. Its pathogenesis comprises subversion of transcriptional regulation, through mutation and by hijacking normal chromatin regulation. Kat2a is a histone acetyltransferase central to promoter activity, that we recently associated with stability of pluripotency networks, and identified as a genetic vulnerability in AML. Through combined chromatin profiling and single-cell transcriptomics of a conditional knockout mouse, we demonstrate that Kat2a contributes to leukemia propagation through preservation of leukemia stem-like cells. Kat2a loss impacts transcription factor binding and reduces transcriptional burst frequency in a subset of gene promoters, generating enhanced variability of transcript levels. Destabilization of target programs shifts leukemia cell fate out of self-renewal into differentiation. We propose that control of transcriptional variability is central to leukemia stem-like cell propagation, and establish a paradigm exploitable in different tumors and distinct stages of cancer evolution.
Collapse
Affiliation(s)
- Ana Filipa Domingues
- Department of HaematologyUniversity of Cambridge, NHS-BT Blood Donor CentreCambridgeUnited Kingdom
| | - Rashmi Kulkarni
- Department of HaematologyUniversity of Cambridge, NHS-BT Blood Donor CentreCambridgeUnited Kingdom
| | - George Giotopoulos
- Department of HaematologyUniversity of Cambridge, Cambridge Institute for Medical ResearchCambridgeUnited Kingdom
- Wellcome Trust-Medical Research Council Cambridge Stem Cell InstituteCambridgeUnited Kingdom
| | - Shikha Gupta
- Department of HaematologyUniversity of Cambridge, NHS-BT Blood Donor CentreCambridgeUnited Kingdom
- Department of GeneticsUniversity of CambridgeCambridgeUnited Kingdom
| | - Laura Vinnenberg
- Department of HaematologyUniversity of Cambridge, NHS-BT Blood Donor CentreCambridgeUnited Kingdom
| | - Liliana Arede
- Department of HaematologyUniversity of Cambridge, NHS-BT Blood Donor CentreCambridgeUnited Kingdom
- Department of GeneticsUniversity of CambridgeCambridgeUnited Kingdom
| | - Elena Foerner
- Department of HaematologyUniversity of Cambridge, NHS-BT Blood Donor CentreCambridgeUnited Kingdom
| | - Mitra Khalili
- Department of HaematologyUniversity of Cambridge, NHS-BT Blood Donor CentreCambridgeUnited Kingdom
- Department of Medical Genetics and Molecular Medicine, School of MedicineZanjan University of Medical Sciences (ZUMS)ZanjanIslamic Republic of Iran
| | - Rita Romano Adao
- Department of HaematologyUniversity of Cambridge, NHS-BT Blood Donor CentreCambridgeUnited Kingdom
| | - Ayona Johns
- Division of Biosciences, College of Health and Life SciencesBrunel University LondonUxbridgeUnited Kingdom
| | - Shengjiang Tan
- Department of HaematologyUniversity of Cambridge, Cambridge Institute for Medical ResearchCambridgeUnited Kingdom
| | - Keti Zeka
- Department of HaematologyUniversity of Cambridge, NHS-BT Blood Donor CentreCambridgeUnited Kingdom
- Department of GeneticsUniversity of CambridgeCambridgeUnited Kingdom
| | - Brian J Huntly
- Department of HaematologyUniversity of Cambridge, Cambridge Institute for Medical ResearchCambridgeUnited Kingdom
- Wellcome Trust-Medical Research Council Cambridge Stem Cell InstituteCambridgeUnited Kingdom
| | - Sudhakaran Prabakaran
- Department of GeneticsUniversity of CambridgeCambridgeUnited Kingdom
- Department of BiologyIISERPuneIndia
| | - Cristina Pina
- Department of GeneticsUniversity of CambridgeCambridgeUnited Kingdom
- Division of Biosciences, College of Health and Life SciencesBrunel University LondonUxbridgeUnited Kingdom
| |
Collapse
|
37
|
Mou T, Deng W, Gu F, Pawitan Y, Vu TN. Reproducibility of Methods to Detect Differentially Expressed Genes from Single-Cell RNA Sequencing. Front Genet 2020; 10:1331. [PMID: 32010190 PMCID: PMC6979262 DOI: 10.3389/fgene.2019.01331] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 12/05/2019] [Indexed: 12/31/2022] Open
Abstract
Detection of differentially expressed genes is a common task in single-cell RNA-seq (scRNA-seq) studies. Various methods based on both bulk-cell and single-cell approaches are in current use. Due to the unique distributional characteristics of single-cell data, it is important to compare these methods with rigorous statistical assessments. In this study, we assess the reproducibility of 9 tools for differential expression analysis in scRNA-seq data. These tools include four methods originally designed for scRNA-seq data, three popular methods originally developed for bulk-cell RNA-seq data but have been applied in scRNA-seq analysis, and two general statistical tests. Instead of comparing the performance across all genes, we compare the methods in terms of the rediscovery rates (RDRs) of top-ranked genes, separately for highly and lowly expressed genes. Three real and one simulated scRNA-seq data sets are used for the comparisons. The results indicate that some widely used methods, such as edgeR and monocle, have worse RDR performances compared to the other methods, especially for the top-ranked genes. For highly expressed genes, many bulk-cell–based methods can perform similarly to the methods designed for scRNA-seq data. But for the lowly expressed genes performance varies substantially; edgeR and monocle are too liberal and have poor control of false positives, while DESeq2 is too conservative and consequently loses sensitivity compared to the other methods. BPSC, Limma, DEsingle, MAST, t-test and Wilcoxon have similar performances in the real data sets. Overall, the scRNA-seq based method BPSC performs well against the other methods, particularly when there is a sufficient number of cells.
Collapse
Affiliation(s)
- Tian Mou
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Wenjiang Deng
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Fengyun Gu
- School of Mathematical Sciences, University College Cork, Cork, Ireland
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Trung Nghia Vu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
38
|
Wu Z, Zhang Y, Stitzel ML, Wu H. Two-phase differential expression analysis for single cell RNA-seq. Bioinformatics 2019; 34:3340-3348. [PMID: 29688282 DOI: 10.1093/bioinformatics/bty329] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 04/21/2018] [Indexed: 12/13/2022] Open
Abstract
Motivation Single-cell RNA-sequencing (scRNA-seq) has brought the study of the transcriptome to higher resolution and makes it possible for scientists to provide answers with more clarity to the question of 'differential expression'. However, most computational methods still stick with the old mentality of viewing differential expression as a simple 'up or down' phenomenon. We advocate that we should fully embrace the features of single cell data, which allows us to observe binary (from Off to On) as well as continuous (the amount of expression) regulations. Results We develop a method, termed SC2P, that first identifies the phase of expression a gene is in, by taking into account of both cell- and gene-specific contexts, in a model-based and data-driven fashion. We then identify two forms of transcription regulation: phase transition, and magnitude tuning. We demonstrate that compared with existing methods, SC2P provides substantial improvement in sensitivity without sacrificing the control of false discovery, as well as better robustness. Furthermore, the analysis provides better interpretation of the nature of regulation types in different genes. Availability and implementation SC2P is implemented as an open source R package publicly available at https://github.com/haowulab/SC2P. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhijin Wu
- Department of Biostatistics, Brown University, Providence, RI, USA.,Center for Statistical Sciences, Brown University, Providence, RI, USA.,Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - Yi Zhang
- Department of Biostatistics, Brown University, Providence, RI, USA
| | - Michael L Stitzel
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.,Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA.,Department of Genetics & Genome Sciences, University of Connecticut, Farmington, CT, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| |
Collapse
|
39
|
Chen G, Ning B, Shi T. Single-Cell RNA-Seq Technologies and Related Computational Data Analysis. Front Genet 2019; 10:317. [PMID: 31024627 PMCID: PMC6460256 DOI: 10.3389/fgene.2019.00317] [Citation(s) in RCA: 495] [Impact Index Per Article: 99.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 03/21/2019] [Indexed: 12/15/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies allow the dissection of gene expression at single-cell resolution, which greatly revolutionizes transcriptomic studies. A number of scRNA-seq protocols have been developed, and these methods possess their unique features with distinct advantages and disadvantages. Due to technical limitations and biological factors, scRNA-seq data are noisier and more complex than bulk RNA-seq data. The high variability of scRNA-seq data raises computational challenges in data analysis. Although an increasing number of bioinformatics methods are proposed for analyzing and interpreting scRNA-seq data, novel algorithms are required to ensure the accuracy and reproducibility of results. In this review, we provide an overview of currently available single-cell isolation protocols and scRNA-seq technologies, and discuss the methods for diverse scRNA-seq data analyses including quality control, read mapping, gene expression quantification, batch effect correction, normalization, imputation, dimensionality reduction, feature selection, cell clustering, trajectory inference, differential expression calling, alternative splicing, allelic expression, and gene regulatory network reconstruction. Further, we outline the prospective development and applications of scRNA-seq technologies.
Collapse
Affiliation(s)
- Geng Chen
- Center for Bioinformatics and Computational Biology, and Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Baitang Ning
- National Center for Toxicological Research, United States Food and Drug Administration, Jefferson, AR, United States
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| |
Collapse
|
40
|
Wang T, Li B, Nelson CE, Nabavi S. Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinformatics 2019; 20:40. [PMID: 30658573 PMCID: PMC6339299 DOI: 10.1186/s12859-019-2599-6] [Citation(s) in RCA: 147] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 01/03/2019] [Indexed: 12/16/2022] Open
Abstract
Background The analysis of single-cell RNA sequencing (scRNAseq) data plays an important role in understanding the intrinsic and extrinsic cellular processes in biological and biomedical research. One significant effort in this area is the detection of differentially expressed (DE) genes. scRNAseq data, however, are highly heterogeneous and have a large number of zero counts, which introduces challenges in detecting DE genes. Addressing these challenges requires employing new approaches beyond the conventional ones, which are based on a nonzero difference in average expression. Several methods have been developed for differential gene expression analysis of scRNAseq data. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to evaluate and compare the performance of differential gene expression analysis methods for scRNAseq data. Results In this study, we conducted a comprehensive evaluation of the performance of eleven differential gene expression analysis software tools, which are designed for scRNAseq data or can be applied to them. We used simulated and real data to evaluate the accuracy and precision of detection. Using simulated data, we investigated the effect of sample size on the detection accuracy of the tools. Using real data, we examined the agreement among the tools in identifying DE genes, the run time of the tools, and the biological relevance of the detected DE genes. Conclusions In general, agreement among the tools in calling DE genes is not high. There is a trade-off between true-positive rates and the precision of calling DE genes. Methods with higher true positive rates tend to show low precision due to their introducing false positives, whereas methods with high precision show low true positive rates due to identifying few DE genes. We observed that current methods designed for scRNAseq data do not tend to show better performance compared to methods designed for bulk RNAseq data. Data multimodality and abundance of zero read counts are the main characteristics of scRNAseq data, which play important roles in the performance of differential gene expression analysis methods and need to be considered in terms of the development of new methods. Electronic supplementary material The online version of this article (10.1186/s12859-019-2599-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tianyu Wang
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA
| | - Boyang Li
- Department of Molecular & Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Craig E Nelson
- Department of Molecular & Cell Biology, The Institute for Systems Genomics, CLAS, University of Connecticut, Storrs, CT, USA
| | - Sheida Nabavi
- Computer Science and Engineering Department, The Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
41
|
Ngara M, Palmkvist M, Sagasser S, Hjelmqvist D, Björklund ÅK, Wahlgren M, Ankarklev J, Sandberg R. Exploring parasite heterogeneity using single-cell RNA-seq reveals a gene signature among sexual stage Plasmodium falciparum parasites. Exp Cell Res 2018; 371:130-138. [PMID: 30096287 DOI: 10.1016/j.yexcr.2018.08.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 08/01/2018] [Accepted: 08/02/2018] [Indexed: 10/28/2022]
Abstract
The malaria parasite has a complex lifecycle, including several events of differentiation and stage progression, while actively evading immunity in both its mosquito and human hosts. Important parasite gene expression and regulation during these events remain hidden in rare populations of cells. Here, we combine a capillary-based platform for cell isolation with single-cell RNA-sequencing to transcriptionally profile 165 single infected red blood cells (iRBCs) during the intra-erythrocytic developmental cycle (IDC). Unbiased analyses of single-cell data grouped the cells into eight transcriptional states during IDC. Interestingly, we uncovered a gene signature from the single iRBC analyses that can successfully discriminate between developing asexual and sexual stage parasites at cellular resolution, and we verify five, previously undefined, gametocyte stage specific genes. Moreover, we show the capacity of detecting expressed genes from the variable gene families in single parasites, despite the sparse nature of data. In total, the single parasite transcriptomics holds promise for molecular dissection of rare parasite phenotypes throughout the malaria lifecycle.
Collapse
Affiliation(s)
- Mtakai Ngara
- Ludwig Institute for Cancer Research, Karolinska Institutet, Box 240, SE-171 77 Stockholm, Sweden; Dept. of Cell and Molecular Biology, Karolinska Institutet, Solnavägen 1, Box 285, SE-171 77 Stockholm, Sweden
| | - Mia Palmkvist
- Department of Microbiology, Tumor and Cell Biology, Nobels väg 16, Karolinska Institutet, SE-171 77 Stockholm, Sweden
| | - Sven Sagasser
- Ludwig Institute for Cancer Research, Karolinska Institutet, Box 240, SE-171 77 Stockholm, Sweden
| | - Daisy Hjelmqvist
- Dept. of Cell and Molecular Biology, Karolinska Institutet, Solnavägen 1, Box 285, SE-171 77 Stockholm, Sweden
| | - Åsa K Björklund
- Ludwig Institute for Cancer Research, Karolinska Institutet, Box 240, SE-171 77 Stockholm, Sweden
| | - Mats Wahlgren
- Department of Microbiology, Tumor and Cell Biology, Nobels väg 16, Karolinska Institutet, SE-171 77 Stockholm, Sweden
| | - Johan Ankarklev
- Department of Microbiology, Tumor and Cell Biology, Nobels väg 16, Karolinska Institutet, SE-171 77 Stockholm, Sweden; Department of Microbiology and Immunology, Weill-Cornell Medical College of Cornell University, 1300 York Avenue, Box 62, New York, NY 10062, United States; Department of Molecular Biosciences, The Wenner Gren Institute, Stockholm University, Svante Arrhenius väg 20C, SE-106 91 Stockholm, Sweden.
| | - Rickard Sandberg
- Ludwig Institute for Cancer Research, Karolinska Institutet, Box 240, SE-171 77 Stockholm, Sweden; Dept. of Cell and Molecular Biology, Karolinska Institutet, Solnavägen 1, Box 285, SE-171 77 Stockholm, Sweden.
| |
Collapse
|
42
|
Hon CC, Shin JW, Carninci P, Stubbington MJT. The Human Cell Atlas: Technical approaches and challenges. Brief Funct Genomics 2018; 17:283-294. [PMID: 29092000 PMCID: PMC6063304 DOI: 10.1093/bfgp/elx029] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The Human Cell Atlas is a large, international consortium that aims to identify and describe every cell type in the human body. The comprehensive cellular maps that arise from this ambitious effort have the potential to transform many aspects of fundamental biology and clinical practice. Here, we discuss the technical approaches that could be used today to generate such a resource and also the technical challenges that will be encountered.
Collapse
Affiliation(s)
- Chung-Chau Hon
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Jay W Shin
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | | |
Collapse
|
43
|
Abstract
Single-cell RNA sequencing (scRNA-seq) is currently transforming our understanding of biology, as it is a powerful tool to resolve cellular heterogeneity and molecular networks. Over 50 protocols have been developed in recent years and also data processing and analyzes tools are evolving fast. Here, we review the basic principles underlying the different experimental protocols and how to benchmark them. We also review and compare the essential methods to process scRNA-seq data from mapping, filtering, normalization and batch corrections to basic differential expression analysis. We hope that this helps to choose appropriate experimental and computational methods for the research question at hand.
Collapse
Affiliation(s)
- Christoph Ziegenhain
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Beate Vieth
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Swati Parekh
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Ines Hellmann
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Wolfgang Enard
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| |
Collapse
|
44
|
Abstract
We developed deconvolution of single-cell expression distribution (DESCEND), a method to recover cross-cell distribution of the true gene expression level from observed counts in single-cell RNA sequencing, allowing adjustment of known confounding cell-level factors. With the recovered distribution, DESCEND provides reliable estimates of distribution-based measurements, such as the dispersion of true gene expression and the probability that true gene expression is positive. This is important, as with better estimates of these measurements, DESCEND clarifies and improves many downstream analyses including finding differentially expressed genes, identifying cell types, and selecting differentiation markers. Another contribution is that we verified using nine public datasets a simple “Poisson-alpha” noise model for the technical noise of unique molecular identifier-based single-cell RNA-sequencing data, clarifying the current intense debate on this issue. Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene’s expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data are noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a reexamination of nine public datasets, we propose a simple technical noise model for scRNA-seq data with unique molecular identifiers (UMI). We develop deconvolution of single-cell expression distribution (DESCEND), a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and nonzero fraction. DESCEND can adjust for cell-level covariates such as cell size, cell cycle, and batch effects. DESCEND’s noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations and through its effectiveness in removing known batch effects. We demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially expressed genes, identifying cell types, and selecting differentiation markers.
Collapse
|
45
|
Chen S, Mar JC. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics 2018; 19:232. [PMID: 29914350 PMCID: PMC6006753 DOI: 10.1186/s12859-018-2217-z] [Citation(s) in RCA: 119] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Accepted: 05/24/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially developed for data collected from bulk samples may not be suitable for single cells. Meanwhile, although methods that are specific for single cell data are now emerging, whether they have improved performance over general methods is unknown. In this study, we evaluate the applicability of five general methods and three single cell methods for inferring gene regulatory networks from both experimental single cell gene expression data and in silico simulated data. RESULTS Standard evaluation metrics using ROC curves and Precision-Recall curves against reference sets sourced from the literature demonstrated that most of the methods performed poorly when they were applied to either experimental single cell data, or simulated single cell data, which demonstrates their lack of performance for this task. Using default settings, network methods were applied to the same datasets. Comparisons of the learned networks highlighted the uniqueness of some predicted edges for each method. The fact that different methods infer networks that vary substantially reflects the underlying mathematical rationale and assumptions that distinguish network methods from each other. CONCLUSIONS This study provides a comprehensive evaluation of network modeling algorithms applied to experimental single cell gene expression data and in silico simulated datasets where the network structure is known. Comparisons demonstrate that most of these assessed network methods are not able to predict network structures from single cell expression data accurately, even if they are specifically developed for single cell methods. Also, single cell methods, which usually depend on more elaborative algorithms, in general have less similarity to each other in the sets of edges detected. The results from this study emphasize the importance for developing more accurate optimized network modeling methods that are compatible for single cell data. Newly-developed single cell methods may uniquely capture particular features of potential gene-gene relationships, and caution should be taken when we interpret these results.
Collapse
Affiliation(s)
- Shuonan Chen
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Jessica C Mar
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA. .,Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, USA. .,Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
46
|
Stévant I, Nef S. Single cell transcriptome sequencing: A new approach for the study of mammalian sex determination. Mol Cell Endocrinol 2018; 468:11-18. [PMID: 29371022 DOI: 10.1016/j.mce.2018.01.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Revised: 01/21/2018] [Accepted: 01/21/2018] [Indexed: 10/18/2022]
Abstract
Mammalian sex determination is a highly complex developmental process that is particularly difficult to study due to the limited number of gonadal cells present at the bipotential stage, the large cellular heterogeneity in both testis and ovaries and the rapid sex-dependent differentiation processes. Single-cell RNA-sequencing (scRNA-seq) circumvents the averaging artifacts associated with methods traditionally used to profile bulk populations of cells. It is a powerful tool that allows the identification and classification of cell populations in a comprehensive and unbiased manner. In particular, scRNA-seq enables the tracing of cells along developmental trajectories and characterization of the transcriptional dynamics controlling their differentiation. In this review, we describe the current state-of-the-art experimental methods used for scRNA-seq and discuss their strengths and limitations. Additionally, we summarize the multiple key insights that scRNA-seq has provided to the understanding of mammalian sex determination. Finally, we briefly discuss the future of this technology, as well as complementary applications in single cell -omics in the context of mammalian sex determination.
Collapse
Affiliation(s)
- Isabelle Stévant
- Department of Genetic Medicine and Development, University of Geneva, 1211 Geneva, Switzerland; iGE3, Institute of Genetics and Genomics of Geneva, University of Geneva, 1211 Geneva, Switzerland; SIB, Swiss Institute of Bioinformatics, University of Geneva, 1211 Geneva, Switzerland
| | - Serge Nef
- Department of Genetic Medicine and Development, University of Geneva, 1211 Geneva, Switzerland; iGE3, Institute of Genetics and Genomics of Geneva, University of Geneva, 1211 Geneva, Switzerland.
| |
Collapse
|
47
|
Miao Z, Deng K, Wang X, Zhang X. DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics 2018; 34:3223-3224. [DOI: 10.1093/bioinformatics/bty332] [Citation(s) in RCA: 121] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 04/20/2018] [Indexed: 01/08/2023] Open
Affiliation(s)
- Zhun Miao
- MOE Key Laboratory of Bioinformatics, Division of Bioinformatics and Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, China
| | - Ke Deng
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Xiaowo Wang
- MOE Key Laboratory of Bioinformatics, Division of Bioinformatics and Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics, Division of Bioinformatics and Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, China
- School of Life Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
48
|
Wang T, Nabavi S. SigEMD: A powerful method for differential gene expression analysis in single-cell RNA sequencing data. Methods 2018; 145:25-32. [PMID: 29702224 DOI: 10.1016/j.ymeth.2018.04.017] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 04/13/2018] [Accepted: 04/19/2018] [Indexed: 10/17/2022] Open
Abstract
Differential gene expression analysis is one of the significant efforts in single cell RNA sequencing (scRNAseq) analysis to discover the specific changes in expression levels of individual cell types. Since scRNAseq exhibits multimodality, large amounts of zero counts, and sparsity, it is different from the traditional bulk RNA sequencing (RNAseq) data. The new challenges of scRNAseq data promote the development of new methods for identifying differentially expressed (DE) genes. In this study, we proposed a new method, SigEMD, that combines a data imputation approach, a logistic regression model and a nonparametric method based on the Earth Mover's Distance, to precisely and efficiently identify DE genes in scRNAseq data. The regression model and data imputation are used to reduce the impact of large amounts of zero counts, and the nonparametric method is used to improve the sensitivity of detecting DE genes from multimodal scRNAseq data. By additionally employing gene interaction network information to adjust the final states of DE genes, we further reduce the false positives of calling DE genes. We used simulated datasets and real datasets to evaluate the detection accuracy of the proposed method and to compare its performance with those of other differential expression analysis methods. Results indicate that the proposed method has an overall powerful performance in terms of precision in detection, sensitivity, and specificity.
Collapse
Affiliation(s)
- Tianyu Wang
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.
| | - Sheida Nabavi
- Computer Science and Engineering Department and Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
49
|
Soneson C, Robinson MD. Bias, robustness and scalability in single-cell differential expression analysis. Nat Methods 2018; 15:255-261. [DOI: 10.1038/nmeth.4612] [Citation(s) in RCA: 429] [Impact Index Per Article: 71.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 01/16/2018] [Indexed: 12/31/2022]
|
50
|
Huang X, Liu S, Wu L, Jiang M, Hou Y. High Throughput Single Cell RNA Sequencing, Bioinformatics Analysis and Applications. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1068:33-43. [PMID: 29943294 DOI: 10.1007/978-981-13-0502-3_4] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Single cell sequencing (SCS) can be harnessed to acquire the genomes, transcriptomes and epigenomes from individual cells. Next generation sequencing (NGS) technology is the driving force for single cell sequencing. scRNA-seq requires a lengthy pipeline comprising of single cell sorting, RNA extraction, reverse transcription, amplification, library construction, sequencing and subsequent bioinformatic analysis. Computational algorithms are essential to fulfill many tasks of interest using scRNA-seq data. scRNA-seq has already enabled researchers to revisit long-standing questions in cancer biology, including cancer metastasis, heterogeneity and evolution. Circulating Tumor Cells (CTC) are not only an important mechanism for cancer metastasis, but also provide a possibility to diagnose and monitor cancer in a convenient way independent of surgical resection of the cancer.
Collapse
|