1
|
Landa B, Kluger Y. The Dyson equalizer: adaptive noise stabilization for low-rank signal detection and recovery. INFORMATION AND INFERENCE : A JOURNAL OF THE IMA 2025; 14:iaae036. [PMID: 39830802 PMCID: PMC11735832 DOI: 10.1093/imaiai/iaae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/19/2024] [Accepted: 12/10/2024] [Indexed: 01/22/2025]
Abstract
Detecting and recovering a low-rank signal in a noisy data matrix is a fundamental task in data analysis. Typically, this task is addressed by inspecting and manipulating the spectrum of the observed data, e.g. thresholding the singular values of the data matrix at a certain critical level. This approach is well established in the case of homoskedastic noise, where the noise variance is identical across the entries. However, in numerous applications, the noise can be heteroskedastic, where the noise characteristics may vary considerably across the rows and columns of the data. In this scenario, the spectral behaviour of the noise can differ significantly from the homoskedastic case, posing various challenges for signal detection and recovery. To address these challenges, we develop an adaptive normalization procedure that equalizes the average noise variance across the rows and columns of a given data matrix. Our proposed procedure is data-driven and fully automatic, supporting a broad range of noise distributions, variance patterns and signal structures. Our approach relies on random matrix theory results that describe the resolvent of the noise via the so-called Dyson equation. By leveraging this relation, we can accurately infer the noise level in each row and each column directly from the resolvent of the data. We establish that in many cases, our normalization enforces the standard spectral behaviour of homoskedastic noise-the Marchenko-Pastur (MP) law, allowing for simple and reliable detection of signal components. Furthermore, we demonstrate that our approach can substantially improve signal recovery in heteroskedastic settings by manipulating the spectrum after normalization. Lastly, we apply our method to single-cell RNA sequencing and spatial transcriptomics data, showcasing accurate fits to the MP law after normalization.
Collapse
Affiliation(s)
- Boris Landa
- Department of Electrical Engineering, Yale University, New Haven, CT 06520, US
- Program in Applied Mathematics, Yale University, New Haven, CT 06520, US
| | - Yuval Kluger
- Program in Applied Mathematics, Yale University, New Haven, CT 06520, US
- Department of Pathology, Yale University, New Haven, CT 06520, US
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, US
| |
Collapse
|
2
|
Mosquera-Yuqui F, Ramos-Lopez D, Hu X, Yang Y, Mendoza JL, Asare E, Habiger J, Hurtado-Gonzales OP, Espindola AS. A comparative template-switching cDNA approach for HTS-based multiplex detection of three viruses and one viroid commonly found in apple trees. Sci Rep 2025; 15:1657. [PMID: 39794400 PMCID: PMC11724120 DOI: 10.1038/s41598-025-86065-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 01/08/2025] [Indexed: 01/13/2025] Open
Abstract
Exclusion is a keystone of integrated pest management to prevent the introduction of pathogens. U.S. plant quarantine programs employ PCR and high-throughput sequencing (HTS) to test imported plants for viruses and viroids of concern. Achieving a low limit of detection in any HTS protocol could be challenging. Following a template-switching cDNA amplification protocol, seven cDNA synthesis treatments were used to test simultaneously the relative abundance and coverage of the three most commonly latent RNA viruses found in apples: apple chlorotic leaf spot virus, apple stem grooving virus, and apple stem pitting virus, as well as the viroid apple hammerhead viroid. Amplified double-stranded cDNAs were subjected to library preparation using Nanopore SQK-DCS109 and Illumina Nextera XT, and sequenced with MinION and NextSeq2000, respectively. Treatments with oligo d(T)23-VN or its combination with random hexamers yielded the highest relative reads for viruses, while treatments containing the reverse primer pool produced more relative reads for AHVd. These treatments and random hexamers also generated the highest genome coverages, which were typically similar in both HTS workflows. However, relative abundances of viruses determined with SQK-DCS109 were up to 2.22-fold higher compared to Nextera XT. In contrast, Nextera XT yielded viroid reads 3.30-fold higher than SQK-DCS109. A framework of considerations for expanding this sensitive approach to other targets and crops is discussed.
Collapse
Affiliation(s)
- Francisco Mosquera-Yuqui
- Institute for Biosecurity and Microbial Forensics (IBMF), Oklahoma State University, Stillwater, OK, USA
- Department of Entomology and Plant Pathology, Oklahoma State University, Stillwater, OK, USA
| | - Daniel Ramos-Lopez
- Institute for Biosecurity and Microbial Forensics (IBMF), Oklahoma State University, Stillwater, OK, USA
- Department of Entomology and Plant Pathology, Oklahoma State University, Stillwater, OK, USA
| | - Xiaojun Hu
- United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service (APHIS), Plant Protection and Quarantine (PPQ), Plant Germplasm Quarantine Program (PGQP), United States, Beltsville, MD, USA
| | - Yu Yang
- United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service (APHIS), Plant Protection and Quarantine (PPQ), Plant Germplasm Quarantine Program (PGQP), United States, Beltsville, MD, USA
| | - Joshua L Mendoza
- United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service (APHIS), Plant Protection and Quarantine (PPQ), Plant Germplasm Quarantine Program (PGQP), United States, Beltsville, MD, USA
| | - Emmanuel Asare
- Department of Statistics, Oklahoma State University, Stillwater, OK, USA
| | - Joshua Habiger
- Department of Statistics, Oklahoma State University, Stillwater, OK, USA
| | - Oscar P Hurtado-Gonzales
- United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service (APHIS), Plant Protection and Quarantine (PPQ), Plant Germplasm Quarantine Program (PGQP), United States, Beltsville, MD, USA
| | - Andres S Espindola
- Institute for Biosecurity and Microbial Forensics (IBMF), Oklahoma State University, Stillwater, OK, USA.
- Department of Entomology and Plant Pathology, Oklahoma State University, Stillwater, OK, USA.
| |
Collapse
|
3
|
Yang S, Deng C, Pu C, Bai X, Tian C, Chang M, Feng M. Single-Cell RNA Sequencing and Its Applications in Pituitary Research. Neuroendocrinology 2024; 114:875-893. [PMID: 39053437 PMCID: PMC11460981 DOI: 10.1159/000540352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Accepted: 07/10/2024] [Indexed: 07/27/2024]
Abstract
BACKGROUND Mounting evidence underscores the significance of cellular diversity within the endocrine system and the intricate interplay between different cell types and tissues, essential for preserving physiological balance and influencing disease trajectories. The pituitary gland, a central player in the endocrine orchestra, exemplifies this complexity with its assortment of hormone-secreting and nonsecreting cells. SUMMARY The pituitary gland houses several types of cells responsible for hormone production, alongside nonsecretory cells like fibroblasts and endothelial cells, each playing a crucial role in the gland's function and regulatory mechanisms. Despite the acknowledged importance of these cellular interactions, the detailed mechanisms by which they contribute to pituitary gland physiology and pathology remain largely uncharted. The last decade has seen the emergence of groundbreaking technologies such as single-cell RNA sequencing, offering unprecedented insights into cellular heterogeneity and interactions. However, the application of this advanced tool in exploring the pituitary gland's complexities has been scant. This review provides an overview of this methodology, highlighting its strengths and limitations, and discusses future possibilities for employing it to deepen our understanding of the pituitary gland and its dysfunction in disease states. KEY MESSAGE Single-cell RNA sequencing technology offers an unprecedented means to study the heterogeneity and interactions of pituitary cells, though its application has been limited thus far. Further utilization of this tool will help uncover the complex physiological and pathological mechanisms of the pituitary, advancing research and treatment of pituitary diseases.
Collapse
Affiliation(s)
- Shuangjian Yang
- Department of Neurosurgery, China Pituitary Disease Registry Center, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Congcong Deng
- Department of Neurosurgery, China Pituitary Disease Registry Center, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Changqin Pu
- Department of Neurosurgery, China Pituitary Disease Registry Center, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Xuexue Bai
- Department of Neurosurgery, China Pituitary Disease Registry Center, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Chenxin Tian
- Department of Neurosurgery, China Pituitary Disease Registry Center, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Mengqi Chang
- Department of Neurosurgery, China Pituitary Disease Registry Center, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Ming Feng
- Department of Neurosurgery, China Pituitary Disease Registry Center, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
4
|
Xiong J, Gong F, Ma L, Wan L. scVIC: deep generative modeling of heterogeneity for scRNA-seq data. BIOINFORMATICS ADVANCES 2024; 4:vbae086. [PMID: 39027640 PMCID: PMC11256938 DOI: 10.1093/bioadv/vbae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 05/15/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024]
Abstract
Motivation Single-cell RNA sequencing (scRNA-seq) has become a valuable tool for studying cellular heterogeneity. However, the analysis of scRNA-seq data is challenging because of inherent noise and technical variability. Existing methods often struggle to simultaneously explore heterogeneity across cells, handle dropout events, and account for batch effects. These drawbacks call for a robust and comprehensive method that can address these challenges and provide accurate insights into heterogeneity at the single-cell level. Results In this study, we introduce scVIC, an algorithm designed to account for variational inference, while simultaneously handling biological heterogeneity and batch effects at the single-cell level. scVIC explicitly models both biological heterogeneity and technical variability to learn cellular heterogeneity in a manner free from dropout events and the bias of batch effects. By leveraging variational inference, we provide a robust framework for inferring the parameters of scVIC. To test the performance of scVIC, we employed both simulated and biological scRNA-seq datasets, either including, or not, batch effects. scVIC was found to outperform other approaches because of its superior clustering ability and circumvention of the batch effects problem. Availability and implementation The code of scVIC and replication for this study are available at https://github.com/HiBearME/scVIC/tree/v1.0.
Collapse
Affiliation(s)
- Jiankang Xiong
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fuzhou Gong
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liang Ma
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Lin Wan
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
5
|
Duo H, Li Y, Lan Y, Tao J, Yang Q, Xiao Y, Sun J, Li L, Nie X, Zhang X, Liang G, Liu M, Hao Y, Li B. Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios. Genome Biol 2024; 25:145. [PMID: 38831386 PMCID: PMC11149245 DOI: 10.1186/s13059-024-03290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. RESULTS We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. CONCLUSIONS No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
Collapse
Affiliation(s)
- Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, People's Republic of China
| | - Yang Lan
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Army Medical University, Chongqing, 400038, People's Republic of China
| | - Jingxin Tao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, People's Republic of China
| | - Yingxue Xiao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Jing Sun
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Lei Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Xiaoxi Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Mingwei Liu
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, People's Republic of China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| |
Collapse
|
6
|
Cuevas-Diaz Duran R, Wei H, Wu J. Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets. BMC Genomics 2024; 25:444. [PMID: 38711017 PMCID: PMC11073985 DOI: 10.1186/s12864-024-10364-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 04/29/2024] [Indexed: 05/08/2024] Open
Abstract
BACKGROUND Normalization is a critical step in the analysis of single-cell RNA-sequencing (scRNA-seq) datasets. Its main goal is to make gene counts comparable within and between cells. To do so, normalization methods must account for technical and biological variability. Numerous normalization methods have been developed addressing different sources of dispersion and making specific assumptions about the count data. MAIN BODY The selection of a normalization method has a direct impact on downstream analysis, for example differential gene expression and cluster identification. Thus, the objective of this review is to guide the reader in making an informed decision on the most appropriate normalization method to use. To this aim, we first give an overview of the different single cell sequencing platforms and methods commonly used including isolation and library preparation protocols. Next, we discuss the inherent sources of variability of scRNA-seq datasets. We describe the categories of normalization methods and include examples of each. We also delineate imputation and batch-effect correction methods. Furthermore, we describe data-driven metrics commonly used to evaluate the performance of normalization methods. We also discuss common scRNA-seq methods and toolkits used for integrated data analysis. CONCLUSIONS According to the correction performed, normalization methods can be broadly classified as within and between-sample algorithms. Moreover, with respect to the mathematical model used, normalization methods can further be classified into: global scaling methods, generalized linear models, mixed methods, and machine learning-based methods. Each of these methods depict pros and cons and make different statistical assumptions. However, there is no better performing normalization method. Instead, metrics such as silhouette width, K-nearest neighbor batch-effect test, or Highly Variable Genes are recommended to assess the performance of normalization methods.
Collapse
Affiliation(s)
- Raquel Cuevas-Diaz Duran
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, Nuevo Leon, 64710, Mexico.
| | - Haichao Wei
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, 77030, USA
| | - Jiaqian Wu
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, 77030, USA.
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, 77030, USA.
| |
Collapse
|
7
|
Park Y, Hauschild AC. The effect of data transformation on low-dimensional integration of single-cell RNA-seq. BMC Bioinformatics 2024; 25:171. [PMID: 38689234 PMCID: PMC11059821 DOI: 10.1186/s12859-024-05788-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/16/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen, Göttingen, Germany
| | - Anne-Christin Hauschild
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany.
- Campus-Institute Data Science (CIDAS), Georg-August-Universität Göttingen, Göttingen, Germany.
| |
Collapse
|
8
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BMC Bioinformatics 2024; 25:113. [PMID: 38486150 PMCID: PMC10941434 DOI: 10.1186/s12859-024-05724-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/28/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. RESULTS We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. CONCLUSIONS eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Yixuan Qiu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
9
|
Fourneaux C, Racine L, Koering C, Dussurgey S, Vallin E, Moussy A, Parmentier R, Brunard F, Stockholm D, Modolo L, Picard F, Gandrillon O, Paldi A, Gonin-Giraud S. Differentiation is accompanied by a progressive loss in transcriptional memory. BMC Biol 2024; 22:58. [PMID: 38468285 PMCID: PMC10929117 DOI: 10.1186/s12915-024-01846-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 02/13/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Cell differentiation requires the integration of two opposite processes, a stabilizing cellular memory, especially at the transcriptional scale, and a burst of gene expression variability which follows the differentiation induction. Therefore, the actual capacity of a cell to undergo phenotypic change during a differentiation process relies upon a modification in this balance which favors change-inducing gene expression variability. However, there are no experimental data providing insight on how fast the transcriptomes of identical cells would diverge on the scale of the very first two cell divisions during the differentiation process. RESULTS In order to quantitatively address this question, we developed different experimental methods to recover the transcriptomes of related cells, after one and two divisions, while preserving the information about their lineage at the scale of a single cell division. We analyzed the transcriptomes of related cells from two differentiation biological systems (human CD34+ cells and T2EC chicken primary erythrocytic progenitors) using two different single-cell transcriptomics technologies (scRT-qPCR and scRNA-seq). CONCLUSIONS We identified that the gene transcription profiles of differentiating sister cells are more similar to each other than to those of non-related cells of the same type, sharing the same environment and undergoing similar biological processes. More importantly, we observed greater discrepancies between differentiating sister cells than between self-renewing sister cells. Furthermore, a progressive increase in this divergence from first generation to second generation was observed when comparing differentiating cousin cells to self renewing cousin cells. Our results are in favor of a gradual erasure of transcriptional memory during the differentiation process.
Collapse
Affiliation(s)
- Camille Fourneaux
- Laboratoire de Biologie et Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Laëtitia Racine
- Ecole Pratique des Hautes Etudes, PSL Research University, Sorbonne Université, INSERM, CRSA, Paris, 75012, France
| | - Catherine Koering
- Laboratoire de Biologie et Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Sébastien Dussurgey
- Plateforme AniRA-Cytométrie, Université Claude Bernard Lyon 1, CNRS UAR3444, Inserm US8, ENS de Lyon, SFR Biosciences, Lyon, F-69007, France
| | - Elodie Vallin
- Laboratoire de Biologie et Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Alice Moussy
- Ecole Pratique des Hautes Etudes, PSL Research University, Sorbonne Université, INSERM, CRSA, Paris, 75012, France
| | - Romuald Parmentier
- Ecole Pratique des Hautes Etudes, PSL Research University, Sorbonne Université, INSERM, CRSA, Paris, 75012, France
| | - Fanny Brunard
- Laboratoire de Biologie et Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Daniel Stockholm
- Ecole Pratique des Hautes Etudes, PSL Research University, Sorbonne Université, INSERM, CRSA, Paris, 75012, France
| | - Laurent Modolo
- Laboratoire de Biologie et Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Franck Picard
- Laboratoire de Biologie et Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Olivier Gandrillon
- Laboratoire de Biologie et Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
- Inria Center, Grenoble Rhone-Alpes, Equipe Dracula, Villeurbanne, F69100, France
| | - Andras Paldi
- Ecole Pratique des Hautes Etudes, PSL Research University, Sorbonne Université, INSERM, CRSA, Paris, 75012, France
| | - Sandrine Gonin-Giraud
- Laboratoire de Biologie et Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France.
| |
Collapse
|
10
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: Cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.22.568369. [PMID: 38045428 PMCID: PMC10690270 DOI: 10.1101/2023.11.22.568369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Background Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. Results We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. Conclusions eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Yixuan Qiu
- School of Statistics & Management, Shanghai University of Finance and Economics, Shanghai,People's Republic of China
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
11
|
Pullin JM, McCarthy DJ. A comparison of marker gene selection methods for single-cell RNA sequencing data. Genome Biol 2024; 25:56. [PMID: 38409056 PMCID: PMC10895860 DOI: 10.1186/s13059-024-03183-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 02/07/2024] [Indexed: 02/28/2024] Open
Abstract
BACKGROUND The development of single-cell RNA sequencing (scRNA-seq) has enabled scientists to catalog and probe the transcriptional heterogeneity of individual cells in unprecedented detail. A common step in the analysis of scRNA-seq data is the selection of so-called marker genes, most commonly to enable annotation of the biological cell types present in the sample. In this paper, we benchmark 59 computational methods for selecting marker genes in scRNA-seq data. RESULTS We compare the performance of the methods using 14 real scRNA-seq datasets and over 170 additional simulated datasets. Methods are compared on their ability to recover simulated and expert-annotated marker genes, the predictive performance and characteristics of the gene sets they select, their memory usage and speed, and their implementation quality. In addition, various case studies are used to scrutinize the most commonly used methods, highlighting issues and inconsistencies. CONCLUSIONS Overall, we present a comprehensive evaluation of methods for selecting marker genes in scRNA-seq data. Our results highlight the efficacy of simple methods, especially the Wilcoxon rank-sum test, Student's t-test, and logistic regression.
Collapse
Affiliation(s)
- Jeffrey M Pullin
- Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, 9 Princes St, Fitzroy, 3065, VIC, Australia
- School of Mathematics and Statistics, University of Melbourne, Parkville, 3010, VIC, Australia
- Melbourne Integrative Genomics, University of Melbourne, Parkville, 3010, VIC, Australia
| | - Davis J McCarthy
- Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, 9 Princes St, Fitzroy, 3065, VIC, Australia.
- School of Mathematics and Statistics, University of Melbourne, Parkville, 3010, VIC, Australia.
- Melbourne Integrative Genomics, University of Melbourne, Parkville, 3010, VIC, Australia.
| |
Collapse
|
12
|
Li Y, Chen S, Liu W, Zhao D, Gao Y, Hu S, Liu H, Li Y, Qu L, Liu X. A full-body transcription factor expression atlas with completely resolved cell identities in C. elegans. Nat Commun 2024; 15:358. [PMID: 38195740 PMCID: PMC10776613 DOI: 10.1038/s41467-023-42677-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/18/2023] [Indexed: 01/11/2024] Open
Abstract
Invariant cell lineage in C. elegans enables spatiotemporal resolution of transcriptional regulatory mechanisms controlling the fate of each cell. Here, we develop RAPCAT (Robust-point-matching- And Piecewise-affine-based Cell Annotation Tool) to automate cell identity assignment in three-dimensional image stacks of L1 larvae and profile reporter expression of 620 transcription factors in every cell. Transcription factor profile-based clustering analysis defines 80 cell types distinct from conventional phenotypic cell types and identifies three general phenotypic modalities related to these classifications. First, transcription factors are broadly downregulated in quiescent stage Hermaphrodite Specific Neurons, suggesting stage- and cell type-specific variation in transcriptome size. Second, transcription factor expression is more closely associated with morphology than other phenotypic modalities in different pre- and post-differentiation developmental stages. Finally, embryonic cell lineages can be associated with specific transcription factor expression patterns and functions that persist throughout postembryonic life. This study presents a comprehensive transcription factor atlas for investigation of intra-cell type heterogeneity.
Collapse
Affiliation(s)
- Yongbin Li
- College of Life Sciences, Capital Normal University, Beijing, 100048, China
| | - Siyu Chen
- School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Weihong Liu
- School of Life Sciences, Tsinghua University, Beijing, 100084, China
- Intelligent Perception Lab, Hanwang Technology Co., Ltd, Beijing, 100193, China
| | - Di Zhao
- School of Life Sciences, Tsinghua University, Beijing, 100084, China
- Tianjin Key Laboratory of Exercise Physiology and Sports Medicine, Institute of Sport, Exercise & Health, Tianjin University of Sport, Tianjin, 300381, China
| | - Yimeng Gao
- College of Life Sciences, Capital Normal University, Beijing, 100048, China
| | - Shipeng Hu
- College of Life Sciences, Capital Normal University, Beijing, 100048, China
| | - Hanyu Liu
- College of Life Sciences, Capital Normal University, Beijing, 100048, China
| | - Yuanyuan Li
- Ministry of Education Key Laboratory of Intelligent Computation & Signal Processing, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Electronics and Information Engineering, Anhui University, Hefei, 230039, China
| | - Lei Qu
- Ministry of Education Key Laboratory of Intelligent Computation & Signal Processing, Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Electronics and Information Engineering, Anhui University, Hefei, 230039, China
| | - Xiao Liu
- College of Life Sciences, Capital Normal University, Beijing, 100048, China.
| |
Collapse
|
13
|
Li D, Ge S, Liu Y, Pan M, Wang X, Han G, Zou S, Liu R, Niu K, Zhao C, Liu N, Qu L. Epitranscriptome analysis of NAD-capped RNA by spike-in-based normalization and prediction of chronological age. iScience 2023; 26:108558. [PMID: 38094247 PMCID: PMC10716591 DOI: 10.1016/j.isci.2023.108558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 10/14/2023] [Accepted: 11/20/2023] [Indexed: 01/20/2025] Open
Abstract
Nicotinamide adenine dinucleotide (NAD) can be used as an initiating nucleotide in RNA transcription to produce NAD-capped RNA (NAD-RNA). RNA modification by NAD that links metabolite with expressed transcript is a poorly studied epitranscriptomic modification. Current NAD-RNA profiling methods involve multi-steps of chemo-enzymatic labeling and affinity-based enrichment, thus presenting a critical analytical challenge to remove unwanted variations, particularly batch effects. Here, we propose a computational framework, enONE, to remove unwanted variations. We demonstrate that designed spike-in RNA, together with modular normalization procedures and evaluation metrics, can mitigate technical noise, empowering quantitative and comparative assessment of NAD-RNA across different datasets. Using enONE and a human aging cohort, we reveal age-associated features of NAD-capping and further develop an accurate RNA-based aging clock that combines signatures from both transcriptome and NAD-modified epitranscriptome. enONE facilitates the discovery of NAD-RNA responsive to physiological changes, laying an important foundation for functional investigations into this modification.
Collapse
Affiliation(s)
- Dean Li
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 100 Hai Ke Road, Pudong, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shuwen Ge
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 100 Hai Ke Road, Pudong, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yandong Liu
- Department of Vascular and Endovascular Surgery, Chang Zheng Hospital, Naval Medical University, Shanghai 200003, China
| | - Miaomiao Pan
- National Clinical Research Center for Aging and Medicine, Huashan Hospital, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, 131 Dong An Road, Shanghai 200032, China
- Metalife Biotechnology, 1000 Zhen Chen Road, Baoshan, Shanghai 200444, China
| | - Xueting Wang
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 100 Hai Ke Road, Pudong, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guojing Han
- Department of Vascular and Endovascular Surgery, Chang Zheng Hospital, Naval Medical University, Shanghai 200003, China
| | - Sili Zou
- Department of Vascular and Endovascular Surgery, Chang Zheng Hospital, Naval Medical University, Shanghai 200003, China
| | - Rui Liu
- Singlera Genomics, 500 Fu Rong Hua Road, Pudong, Shanghai 201204, China
| | - Kongyan Niu
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 100 Hai Ke Road, Pudong, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chao Zhao
- National Clinical Research Center for Aging and Medicine, Huashan Hospital, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, 131 Dong An Road, Shanghai 200032, China
| | - Nan Liu
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 100 Hai Ke Road, Pudong, Shanghai 201210, China
- National Clinical Research Center for Aging and Medicine, Huashan Hospital, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, 131 Dong An Road, Shanghai 200032, China
- Shanghai Key Laboratory of Aging Studies, 100 Hai Ke Road, Pudong, Shanghai 201210, China
| | - Lefeng Qu
- Department of Vascular and Endovascular Surgery, Chang Zheng Hospital, Naval Medical University, Shanghai 200003, China
| |
Collapse
|
14
|
Aslan Kamil M, Fourneaux C, Yilmaz A, Stavros S, Parmentier R, Paldi A, Gonin-Giraud S, deMello AJ, Gandrillon O. An image-guided microfluidic system for single-cell lineage tracking. PLoS One 2023; 18:e0288655. [PMID: 37527253 PMCID: PMC10393162 DOI: 10.1371/journal.pone.0288655] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 06/30/2023] [Indexed: 08/03/2023] Open
Abstract
Cell lineage tracking is a long-standing and unresolved problem in biology. Microfluidic technologies have the potential to address this problem, by virtue of their ability to manipulate and process single-cells in a rapid, controllable and efficient manner. Indeed, when coupled with traditional imaging approaches, microfluidic systems allow the experimentalist to follow single-cell divisions over time. Herein, we present a valve-based microfluidic system able to probe the decision-making processes of single-cells, by tracking their lineage over multiple generations. The system operates by trapping single-cells within growth chambers, allowing the trapped cells to grow and divide, isolating sister cells after a user-defined number of divisions and finally extracting them for downstream transcriptome analysis. The platform incorporates multiple cell manipulation operations, image processing-based automation for cell loading and growth monitoring, reagent addition and device washing. To demonstrate the efficacy of the microfluidic workflow, 6C2 (chicken erythroleukemia) and T2EC (primary chicken erythrocytic progenitors) cells are tracked inside the microfluidic device over two generations, with a cell viability rate in excess of 90%. Sister cells are successfully isolated after division and extracted within a 500 nL volume, which was demonstrated to be compatible with downstream single-cell RNA sequencing analysis.
Collapse
Affiliation(s)
- Mahmut Aslan Kamil
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zürich, Zürich, Switzerland
| | - Camille Fourneaux
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard, Lyon, France
| | | | - Stavrakis Stavros
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zürich, Zürich, Switzerland
| | - Romuald Parmentier
- Ecole Pratique des Hautes Etudes, St-Antoine Research Center, Inserm U938, PSL Research University, Paris, France
| | - Andras Paldi
- Ecole Pratique des Hautes Etudes, St-Antoine Research Center, Inserm U938, PSL Research University, Paris, France
| | - Sandrine Gonin-Giraud
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard, Lyon, France
| | - Andrew J deMello
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zürich, Zürich, Switzerland
| | - Olivier Gandrillon
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard, Lyon, France
- Inria, France
| |
Collapse
|
15
|
Zhang S, Li X, Lin J, Lin Q, Wong KC. Review of single-cell RNA-seq data clustering for cell-type identification and characterization. RNA (NEW YORK, N.Y.) 2023; 29:517-530. [PMID: 36737104 PMCID: PMC10158997 DOI: 10.1261/rna.078965.121] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Accepted: 01/03/2023] [Indexed: 05/06/2023]
Abstract
In recent years, the advances in single-cell RNA-seq techniques have enabled us to perform large-scale transcriptomic profiling at single-cell resolution in a high-throughput manner. Unsupervised learning such as data clustering has become the central component to identify and characterize novel cell types and gene expression patterns. In this study, we review the existing single-cell RNA-seq data clustering methods with critical insights into the related advantages and limitations. In addition, we also review the upstream single-cell RNA-seq data processing techniques such as quality control, normalization, and dimension reduction. We conduct performance comparison experiments to evaluate several popular single-cell RNA-seq clustering approaches on simulated and multiple single-cell transcriptomic data sets.
Collapse
Affiliation(s)
- Shixiong Zhang
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin 130012, China
| | - Jiecong Lin
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Qiuzhen Lin
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
16
|
Crowell HL, Morillo Leonardo SX, Soneson C, Robinson MD. The shaky foundations of simulating single-cell RNA sequencing data. Genome Biol 2023; 24:62. [PMID: 36991470 PMCID: PMC10061781 DOI: 10.1186/s13059-023-02904-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 03/20/2023] [Indexed: 03/31/2023] Open
Abstract
BACKGROUND With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant-on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons.
Collapse
Affiliation(s)
- Helena L Crowell
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | | | - Charlotte Soneson
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
- Current address: Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Mark D Robinson
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
17
|
Pirrotta S, Masatti L, Corrà A, Pedrini F, Esposito G, Martini P, Risso D, Romualdi C, Calura E. signifinder enables the identification of tumor cell states and cancer expression signatures in bulk, single-cell and spatial transcriptomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.07.530940. [PMID: 36945491 PMCID: PMC10028855 DOI: 10.1101/2023.03.07.530940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
Over the last decade, many studies and some clinical trials have proposed gene expression signatures as a valuable tool for understanding cancer mechanisms, defining subtypes, monitoring patient prognosis, and therapy efficacy. However, technical and biological concerns about reproducibility have been raised. Technical reproducibility is a major concern: we currently lack a computational implementation of the proposed signatures, which would provide detailed signature definition and assure reproducibility, dissemination, and usability of the classifier. Another concern regards intratumor heterogeneity, which has never been addressed when studying these types of biomarkers using bulk transcriptomics. With the aim of providing a tool able to improve the reproducibility and usability of gene expression signatures, we propose signifinder, an R package that provides the infrastructure to collect, implement, and compare expression-based signatures from cancer literature. The included signatures cover a wide range of biological processes from metabolism and programmed cell death, to morphological changes, such as quantification of epithelial or mesenchymal-like status. Collected signatures can score tumor cell characteristics, such as the predicted response to therapy or the survival association, and can quantify microenvironmental information, including hypoxia and immune response activity. signifinder has been used to characterize tumor samples and to investigate intra-tumor heterogeneity, extending its application to single-cell and spatial transcriptomic data. Through these higher-resolution technologies, it has become increasingly apparent that the single-sample score assessment obtained by transcriptional signatures is conditioned by the phenotypic and genetic intratumor heterogeneity of tumor masses. Since the characteristics of the most abundant cell type or clone might not necessarily predict the properties of mixed populations, signature prediction efficacy is lowered, thus impeding effective clinical diagnostics. Through signifinder, we offer general principles for interpreting and comparing transcriptional signatures, as well as suggestions for additional signatures that would allow for more complete and robust data inferences. We consider signifinder a useful tool to pave the way for reproducibility and comparison of transcriptional signatures in oncology.
Collapse
Affiliation(s)
| | - Laura Masatti
- Department of Biology, University of Padua, Padua, Italy
| | - Anna Corrà
- Department of Biology, University of Padua, Padua, Italy
| | | | - Giovanni Esposito
- Immunology and Molecular Oncology Diagnostic Unit of The Veneto Institute of Oncology IOV – IRCCS, Padua, Italy
| | - Paolo Martini
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Davide Risso
- Department of Statistical Sciences, University of Padua, Italy
| | | | - Enrica Calura
- Department of Biology, University of Padua, Padua, Italy
| |
Collapse
|
18
|
Metayer C, Imani P, Dudoit S, Morimoto L, Ma X, Wiemels JL, Petrick LM. One-Carbon (Folate) Metabolism Pathway at Birth and Risk of Childhood Acute Lymphoblastic Leukemia: A Biomarker Study in Newborns. Cancers (Basel) 2023; 15:1011. [PMID: 36831356 PMCID: PMC9953980 DOI: 10.3390/cancers15041011] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 01/25/2023] [Accepted: 02/02/2023] [Indexed: 02/08/2023] Open
Abstract
Leukemia is the most common cancer in children in industrialized countries, and its initiation often occurs prenatally. Folic acid is a key vitamin in the production and modification of DNA, and prenatal folic acid intake is known to reduce the risk of childhood leukemia. We characterized the one-carbon (folate) metabolism nutrients that may influence risk of childhood acute lymphoblastic leukemia (ALL) among 122 cases diagnosed at age 0-14 years during 1988-2011 and 122 controls matched on sex, age, and race/ethnicity. Using hydrophilic interaction chromatography (HILIC) applied to neonatal dried blood spots, we evaluated 11 folate pathway metabolites, overall and by sex, race/ethnicity, and age at diagnosis. To conduct the prediction analyses, the 244 samples were separated into learning (75%) and test (25%) sets, maintaining the matched pairings. The learning set was used to train classification methods which were evaluated on the test set. High classification error rates indicate that the folate pathway metabolites measured have little predictive capacity for pediatric ALL. In conclusion, the one-carbon metabolism nutrients measured at birth were unable to predict subsequent leukemia in children. These negative findings are reflective of the last weeks of pregnancy and our study does not address the impact of these nutrients at the time of conception or during the first trimester of pregnancy that are critical for the embryo's DNA methylation programming.
Collapse
Affiliation(s)
- Catherine Metayer
- Division of Epidemiology, School of Public Health, University of California, Berkeley, CA 94704, USA
| | - Partow Imani
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA 94704, USA
| | - Sandrine Dudoit
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA 94704, USA
- Department of Statistics, University of California, Berkeley, CA 94720, USA
| | - Libby Morimoto
- Division of Epidemiology, School of Public Health, University of California, Berkeley, CA 94704, USA
| | - Xiaomei Ma
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, CT 06510, USA
| | - Joseph L. Wiemels
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Lauren M. Petrick
- Department of Environmental Medicine and Public Health, Icahn School of Medicine, Mount Sinai, New York, NY 10029, USA
- The Bert Strassburger Metabolic Center, Sheba Medical Center, Tel-Hashomer, Ramat Gan 5211401, Israel
| |
Collapse
|
19
|
Lu J, Sheng Y, Qian W, Pan M, Zhao X, Ge Q. scRNA-seq data analysis method to improve analysis performance. IET Nanobiotechnol 2023; 17:246-256. [PMID: 36727937 DOI: 10.1049/nbt2.12115] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/28/2022] [Accepted: 12/30/2022] [Indexed: 02/03/2023] Open
Abstract
With the development of single-cell RNA sequencing technology (scRNA-seq), we have the ability to study biological questions at the level of the individual cell transcriptome. Nowadays, many analysis tools, specifically suitable for single-cell RNA sequencing data, have been developed. In this review, the currently commonly used scRNA-seq protocols are discussed. The upstream processing flow pipeline of scRNA-seq data, including goals and popular tools for reads mapping and expression quantification, quality control, normalization, imputation, and batch effect removal is also introduced. Finally, methods to evaluate these tools in both cellular and genetic dimensions, clustering and differential expression analysis are presented.
Collapse
Affiliation(s)
- Junru Lu
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Yuqi Sheng
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Weiheng Qian
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Min Pan
- School of Medicine, Southeast University, Nanjing, China
| | - Xiangwei Zhao
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
20
|
Mishra S, Pandey N, Chawla S, Sharma M, Chandra O, Jha IP, SenGupta D, Natarajan KN, Kumar V. Matching queried single-cell open-chromatin profiles to large pools of single-cell transcriptomes and epigenomes for reference supported analysis. Genome Res 2023; 33:218-231. [PMID: 36653120 PMCID: PMC10069468 DOI: 10.1101/gr.277015.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 01/09/2023] [Indexed: 01/19/2023]
Abstract
The true benefits of large single-cell transcriptome and epigenome data sets can be realized only with the development of new approaches and search tools for annotating individual cells. Matching a single-cell epigenome profile to a large pool of reference cells remains a major challenge. Here, we present scEpiSearch, which enables searching, comparison, and independent classification of single-cell open-chromatin profiles against a large reference of single-cell expression and open-chromatin data sets. Across performance benchmarks, scEpiSearch outperformed multiple methods in accuracy of search and low-dimensional coembedding of single-cell profiles, irrespective of platforms and species. Here we also demonstrate the unconventional utilities of scEpiSearch by applying it on single-cell epigenome profiles of K562 cells and samples from patients with acute leukaemia to reveal different aspects of their heterogeneity, multipotent behavior, and dedifferentiated states. Applying scEpiSearch on our single-cell open-chromatin profiles from embryonic stem cells (ESCs), we identified ESC subpopulations with more activity and poising for endoplasmic reticulum stress and unfolded protein response. Thus, scEpiSearch solves the nontrivial problem of amalgamating information from a large pool of single cells to identify and study the regulatory states of cells using their single-cell epigenomes.
Collapse
Affiliation(s)
- Shreya Mishra
- Department for Computational Biology, IIIT Delhi 110020, India
| | - Neetesh Pandey
- Department for Computational Biology, IIIT Delhi 110020, India
| | - Smriti Chawla
- Department for Computational Biology, IIIT Delhi 110020, India
| | - Madhu Sharma
- Department for Computational Biology, IIIT Delhi 110020, India
| | - Omkar Chandra
- Department for Computational Biology, IIIT Delhi 110020, India
| | | | - Debarka SenGupta
- Department for Computational Biology, IIIT Delhi 110020, India.,Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane 4001, Australia
| | - Kedar Nath Natarajan
- DTU Bioengineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Vibhor Kumar
- Department for Computational Biology, IIIT Delhi 110020, India;
| |
Collapse
|
21
|
Juan H, Huang H. Quantitative analysis of high‐throughput biological data. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- Hsueh‐Fen Juan
- Department of Life Science, Institute of Biomedical Electronics and Bioinformatics, and Center for Systems Biology National Taiwan University Taipei Taiwan
- Taiwan AI Labs Taipei Taiwan
| | - Hsuan‐Cheng Huang
- Institute of Biomedical Informatics National Yang Ming Chiao Tung University Taipei Taiwan
| |
Collapse
|
22
|
Tian T, Zhong C, Lin X, Wei Z, Hakonarson H. Complex hierarchical structures in single-cell genomics data unveiled by deep hyperbolic manifold learning. Genome Res 2023; 33:232-246. [PMID: 36849204 PMCID: PMC10069463 DOI: 10.1101/gr.277068.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 01/24/2023] [Indexed: 03/01/2023]
Abstract
With the advances in single-cell sequencing techniques, numerous analytical methods have been developed for delineating cell development. However, most are based on Euclidean space, which would distort the complex hierarchical structure of cell differentiation. Recently, methods acting on hyperbolic space have been proposed to visualize hierarchical structures in single-cell RNA-seq (scRNA-seq) data and have been proven to be superior to methods acting on Euclidean space. However, these methods have fundamental limitations and are not optimized for the highly sparse single-cell count data. To address these limitations, we propose scDHMap, a model-based deep learning approach to visualize the complex hierarchical structures of scRNA-seq data in low-dimensional hyperbolic space. The evaluations on extensive simulation and real experiments show that scDHMap outperforms existing dimensionality-reduction methods in various common analytical tasks as needed for scRNA-seq data, including revealing trajectory branches, batch correction, and denoising the count matrix with high dropout rates. In addition, we extend scDHMap to visualize single-cell ATAC-seq data.
Collapse
Affiliation(s)
- Tian Tian
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Cheng Zhong
- Department of Computer Science, Ying Wu College of Computing, New Jersey Institute of Technology, Newark, New Jersey 07102, USA
| | - Xiang Lin
- Department of Computer Science, Ying Wu College of Computing, New Jersey Institute of Technology, Newark, New Jersey 07102, USA
| | - Zhi Wei
- Department of Computer Science, Ying Wu College of Computing, New Jersey Institute of Technology, Newark, New Jersey 07102, USA;
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA.,Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
23
|
Dong X, Bacher R. Analysis of Single-Cell RNA-seq Data. Methods Mol Biol 2023; 2629:95-114. [PMID: 36929075 DOI: 10.1007/978-1-0716-2986-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
As single-cell RNA sequencing experiments continue to advance scientific discoveries across biological disciplines, an increasing number of analysis tools and workflows for analyzing the data have been developed. In this chapter, we describe a standard workflow and elaborate on relevant data analysis tools for analyzing single-cell RNA sequencing data. We provide recommendations for the appropriate use of commonly used methods, with code examples and analysis interpretations.
Collapse
Affiliation(s)
- Xiaoru Dong
- Department of Biostatistics, University of Florida, Gainesville, Florida, USA
| | - Rhonda Bacher
- Department of Biostatistics, University of Florida, Gainesville, Florida, USA.
| |
Collapse
|
24
|
Shu Z, Long Q, Zhang L, Yu Z, Wu XJ. Robust Graph Regularized NMF with Dissimilarity and Similarity Constraints for ScRNA-seq Data Clustering. J Chem Inf Model 2022; 62:6271-6286. [PMID: 36459053 DOI: 10.1021/acs.jcim.2c01305] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
The notable progress in single-cell RNA sequencing (ScRNA-seq) technology is beneficial to accurately discover the heterogeneity and diversity of cells. Clustering is an extremely important step during the ScRNA-seq data analysis. However, it cannot achieve satisfactory performances by directly clustering ScRNA-seq data due to its high dimensionality and noise. To address these issues, we propose a novel ScRNA-seq data representation model, termed Robust Graph regularized Non-Negative Matrix Factorization with Dissimilarity and Similarity constraints (RGNMF-DS), for ScRNA-seq data clustering. To accurately characterize the structure information of the labeled samples and the unlabeled samples, respectively, the proposed RGNMF-DS model adopts a couple of complementary regularizers (i.e., similarity and dissimilar regularizers) to guide matrix decomposition. In addition, we construct a graph regularizer to discover the local geometric structure hidden in ScRNA-seq data. Moreover, we adopt the l2,1-norm to measure the reconstruction error and thereby effectively improve the robustness of the proposed RGNMF-DS model to the noises. Experimental results on several ScRNA-seq datasets have demonstrated that our proposed RGNMF-DS model outperforms other state-of-the-art competitors in clustering.
Collapse
Affiliation(s)
- Zhenqiu Shu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China
| | - Qinghan Long
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China
| | - Luping Zhang
- Library of Kunming Medical University, Kunming 650031, China
| | - Zhengtao Yu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China
| | - Xiao-Jun Wu
- Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
25
|
Zeng L, Yang K, Zhang T, Zhu X, Hao W, Chen H, Ge J. Research progress of single-cell transcriptome sequencing in autoimmune diseases and autoinflammatory disease: A review. J Autoimmun 2022; 133:102919. [PMID: 36242821 DOI: 10.1016/j.jaut.2022.102919] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 09/16/2022] [Accepted: 09/19/2022] [Indexed: 12/07/2022]
Abstract
Autoimmunity refers to the phenomenon that the body's immune system produces antibodies or sensitized lymphocytes to its own tissues to cause an immune response. Immune disorders caused by autoimmunity can mediate autoimmune diseases. Autoimmune diseases have complicated pathogenesis due to the many types of cells involved, and the mechanism is still unclear. The emergence of single-cell research technology can solve the problem that ordinary transcriptome technology cannot be accurate to cell type. It provides unbiased results through independent analysis of cells in tissues and provides more mRNA information for identifying cell subpopulations, which provides a novel approach to study disruption of immune tolerance and disturbance of pro-inflammatory pathways on a cellular basis. It may fundamentally change the understanding of molecular pathways in the pathogenesis of autoimmune diseases and develop targeted drugs. Single-cell transcriptome sequencing (scRNA-seq) has been widely applied in autoimmune diseases, which provides a powerful tool for demonstrating the cellular heterogeneity of tissues involved in various immune inflammations, identifying pathogenic cell populations, and revealing the mechanism of disease occurrence and development. This review describes the principles of scRNA-seq, introduces common sequencing platforms and practical procedures, and focuses on the progress of scRNA-seq in 41 autoimmune diseases, which include 9 systemic autoimmune diseases and autoinflammatory diseases (rheumatoid arthritis, systemic lupus erythematosus, etc.) and 32 organ-specific autoimmune diseases (5 Skin diseases, 3 Nervous system diseases, 4 Eye diseases, 2 Respiratory system diseases, 2 Circulatory system diseases, 6 Liver, Gallbladder and Pancreas diseases, 2 Gastrointestinal system diseases, 3 Muscle, Bones and joint diseases, 3 Urinary system diseases, 2 Reproductive system diseases). This review also prospects the molecular mechanism targets of autoimmune diseases from the multi-molecular level and multi-dimensional analysis combined with single-cell multi-omics sequencing technology (such as scRNA-seq, Single cell ATAC-seq and single cell immune group library sequencing), which provides a reference for further exploring the pathogenesis and marker screening of autoimmune diseases and autoimmune inflammatory diseases in the future.
Collapse
Affiliation(s)
- Liuting Zeng
- Department of Rheumatology, Peking Union Medical College Hospital, Chinese Academy of Medical Science & Peking Union Medical College, National Clinical Research Center for Dermatologic and Immunologic Diseases, State Key Laboratory of Complex Severe and Rare Diseases, Beijing, China.
| | - Kailin Yang
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China.
| | - Tianqing Zhang
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China
| | - Xiaofei Zhu
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China.
| | - Wensa Hao
- Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Hua Chen
- Department of Rheumatology, Peking Union Medical College Hospital, Chinese Academy of Medical Science & Peking Union Medical College, National Clinical Research Center for Dermatologic and Immunologic Diseases, State Key Laboratory of Complex Severe and Rare Diseases, Beijing, China.
| | - Jinwen Ge
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China; Hunan Academy of Chinese Medicine, Changsha, China.
| |
Collapse
|
26
|
Van den Berge K, Chou HJ, Roux de Bézieux H, Street K, Risso D, Ngai J, Dudoit S. Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects. CELL REPORTS METHODS 2022; 2:100321. [PMID: 36452861 PMCID: PMC9701614 DOI: 10.1016/j.crmeth.2022.100321] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 02/23/2022] [Accepted: 10/06/2022] [Indexed: 06/17/2023]
Abstract
The assay for transposase-accessible chromatin using sequencing (ATAC-seq) allows the study of epigenetic regulation of gene expression by assessing chromatin configuration for an entire genome. Despite its popularity, there have been limited studies investigating the analytical challenges related to ATAC-seq data, with most studies leveraging tools developed for bulk transcriptome sequencing. Here, we show that GC-content effects are omnipresent in ATAC-seq datasets. Since the GC-content effects are sample specific, they can bias downstream analyses such as clustering and differential accessibility analysis. We introduce a normalization method based on smooth-quantile normalization within GC-content bins and evaluate it together with 11 different normalization procedures on 8 public ATAC-seq datasets. Accounting for GC-content effects in the normalization is crucial for common downstream ATAC-seq data analyses, improving accuracy and interpretability. Through case studies, we show that exploratory data analysis is essential to guide the choice of an appropriate normalization method for a given dataset.
Collapse
Affiliation(s)
- Koen Van den Berge
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Hsin-Jung Chou
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Hector Roux de Bézieux
- Division of Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Kelly Street
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Davide Risso
- Department of Statistical Sciences, University of Padova, Padova, Italy
| | - John Ngai
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Sandrine Dudoit
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
- Division of Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| |
Collapse
|
27
|
Cuevas-Diaz Duran R, González-Orozco JC, Velasco I, Wu JQ. Single-cell and single-nuclei RNA sequencing as powerful tools to decipher cellular heterogeneity and dysregulation in neurodegenerative diseases. Front Cell Dev Biol 2022; 10:884748. [PMID: 36353512 PMCID: PMC9637968 DOI: 10.3389/fcell.2022.884748] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 10/06/2022] [Indexed: 08/10/2023] Open
Abstract
Neurodegenerative diseases affect millions of people worldwide and there are currently no cures. Two types of common neurodegenerative diseases are Alzheimer's (AD) and Parkinson's disease (PD). Single-cell and single-nuclei RNA sequencing (scRNA-seq and snRNA-seq) have become powerful tools to elucidate the inherent complexity and dynamics of the central nervous system at cellular resolution. This technology has allowed the identification of cell types and states, providing new insights into cellular susceptibilities and molecular mechanisms underlying neurodegenerative conditions. Exciting research using high throughput scRNA-seq and snRNA-seq technologies to study AD and PD is emerging. Herein we review the recent progress in understanding these neurodegenerative diseases using these state-of-the-art technologies. We discuss the fundamental principles and implications of single-cell sequencing of the human brain. Moreover, we review some examples of the computational and analytical tools required to interpret the extensive amount of data generated from these assays. We conclude by highlighting challenges and limitations in the application of these technologies in the study of AD and PD.
Collapse
Affiliation(s)
| | | | - Iván Velasco
- Instituto de Fisiología Celular—Neurociencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Laboratorio de Reprogramación Celular, Instituto Nacional de Neurología y Neurocirugía “Manuel Velasco Suárez”, Mexico City, Mexico
| | - Jia Qian Wu
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, United States
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, United States
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, United States
| |
Collapse
|
28
|
Wen B, Jaehnig EJ, Zhang B. OmicsEV: a tool for comprehensive quality evaluation of omics data tables. Bioinformatics 2022; 38:5463-5465. [PMID: 36271853 PMCID: PMC9750102 DOI: 10.1093/bioinformatics/btac698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Accepted: 10/20/2022] [Indexed: 12/25/2022] Open
Abstract
SUMMARY RNA-Seq and mass spectrometry-based studies generate omics data tables with measurements for tens of thousands of genes across all samples in a study. The success of a study relies on the quality of these data tables, which is determined by both experimental data generation and computational methods used to process raw experimental data into quantitative data tables. We present OmicsEV, an R package for the quality evaluation of omics data tables. For each data table, OmicsEV uses a series of methods to evaluate data depth, data normalization, batch effect, biological signal, platform reproducibility and multi-omics concordance, producing comprehensive visual and quantitative evaluation results that help assess the data quality of individual data tables and facilitate the identification of the optimal data processing method and parameters for the omics study under investigation. AVAILABILITY AND IMPLEMENTATION The source code and the user manual of OmicsEV are available at https://github.com/bzhanglab/OmicsEV, and the source code is released under the GPL-3 license.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric J Jaehnig
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Bing Zhang
- To whom correspondence should be addressed.
| |
Collapse
|
29
|
Fang S, Chen B, Zhang Y, Sun H, Liu L, Liu S, Li Y, Xu X. Computational Approaches and Challenges in Spatial Transcriptomics. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00129-2. [PMID: 36252814 PMCID: PMC10372921 DOI: 10.1016/j.gpb.2022.10.001] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 09/08/2022] [Accepted: 10/09/2022] [Indexed: 01/19/2023]
Abstract
The development of spatial transcriptomics (ST) technologies has transformed genetic research from a single-cell data level to a two-dimensional spatial coordinate system and facilitated the study of the composition and function of various cell subsets in different environments and organs. The large-scale data generated by these ST technologies, which contain spatial gene expression information, have elicited the need for spatially resolved approaches to meet the requirements of computational and biological data interpretation. These requirements include dealing with the explosive growth of data to determine the cell-level and gene-level expression, correcting the inner batch effect and loss of expression to improve the data quality, conducting efficient interpretation and in-depth knowledge mining both at the single-cell and tissue-wide levels, and conducting multi-omics integration analysis to provide an extensible framework toward the in-depth understanding of biological processes. However, algorithms designed specifically for ST technologies to meet these requirements are still in their infancy. Here, we review computational approaches to these problems in light of corresponding issues and challenges, and present forward-looking insights into algorithm development.
Collapse
|
30
|
Junttila S, Smolander J, Elo LL. Benchmarking methods for detecting differential states between conditions from multi-subject single-cell RNA-seq data. Brief Bioinform 2022; 23:6649780. [PMID: 35880426 PMCID: PMC9487674 DOI: 10.1093/bib/bbac286] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 06/07/2022] [Accepted: 06/23/2022] [Indexed: 12/13/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) enables researchers to quantify transcriptomes of thousands of cells simultaneously and study transcriptomic changes between cells. scRNA-seq datasets increasingly include multisubject, multicondition experiments to investigate cell-type-specific differential states (DS) between conditions. This can be performed by first identifying the cell types in all the subjects and then by performing a DS analysis between the conditions within each cell type. Naïve single-cell DS analysis methods that treat cells statistically independent are subject to false positives in the presence of variation between biological replicates, an issue known as the pseudoreplicate bias. While several methods have already been introduced to carry out the statistical testing in multisubject scRNA-seq analysis, comparisons that include all these methods are currently lacking. Here, we performed a comprehensive comparison of 18 methods for the identification of DS changes between conditions from multisubject scRNA-seq data. Our results suggest that the pseudobulk methods performed generally best. Both pseudobulks and mixed models that model the subjects as a random effect were superior compared with the naïve single-cell methods that do not model the subjects in any way. While the naïve models achieved higher sensitivity than the pseudobulk methods and the mixed models, they were subject to a high number of false positives. In addition, accounting for subjects through latent variable modeling did not improve the performance of the naïve methods.
Collapse
Affiliation(s)
| | | | - Laura L Elo
- Corresponding author: Laura L. Elo, Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland. Tel.: +358504680795; E-mail:
| |
Collapse
|
31
|
Gorin G, Fang M, Chari T, Pachter L. RNA velocity unraveled. PLoS Comput Biol 2022; 18:e1010492. [PMID: 36094956 PMCID: PMC9499228 DOI: 10.1371/journal.pcbi.1010492] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 09/22/2022] [Accepted: 08/14/2022] [Indexed: 11/24/2022] Open
Abstract
We perform a thorough analysis of RNA velocity methods, with a view towards understanding the suitability of the various assumptions underlying popular implementations. In addition to providing a self-contained exposition of the underlying mathematics, we undertake simulations and perform controlled experiments on biological datasets to assess workflow sensitivity to parameter choices and underlying biology. Finally, we argue for a more rigorous approach to RNA velocity, and present a framework for Markovian analysis that points to directions for improvement and mitigation of current problems.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Meichen Fang
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, United States of America
| |
Collapse
|
32
|
Nguyen H, Chen H, Vuppalapaty M, Whisler E, Logas KR, Sampathkumar P, Fletcher RB, Sura A, Suen N, Gupta S, Lopez T, Ye J, Tu S, Bolaki M, Yeh WC, Li Y, Lee SJ. SZN-413, a FZD4 Agonist, as a Potential Novel Therapeutic for the Treatment of Diabetic Retinopathy. Transl Vis Sci Technol 2022; 11:19. [PMID: 36149648 PMCID: PMC9520515 DOI: 10.1167/tvst.11.9.19] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Purpose There remains a high unmet need for therapies with new mechanisms of action to achieve reperfusion of ischemic retina in diabetic retinopathy. We examined whether a novel frizzled class receptor 4 (FZD4) agonist could promote regeneration of functional blood vessels in animal models of retinopathy. Methods We developed a novel Norrin mimetic (SZN-413-p) targeting FZD4 and low-density lipoprotein receptor-related protein 5 (LRP5) and examined its effect on retinal and brain endothelial cells in vitro. SZN-413-p was subsequently humanized, resulting in the therapeutic candidate SZN-413, and was examined in animal models of retinopathy. In an oxygen-induced retinopathy mouse model, avascular and neovascularization areas were measured. Furthermore, in a vascular endothelial growth factor (VEGF)-induced retinal vascular leakage rabbit model, the impact on vascular leakage by SZN-413 was examined by measuring fluorescein leakage. Results SZN-413-p induced Wnt/β-catenin signaling and upregulated blood-brain barrier/blood-retina barrier gene expressions in endothelial cells. In the oxygen-induced retinopathy mouse model, SZN-413-p and SZN-413 significantly reduced the neovascularization area size (P < 0.001) to a level comparable to, or better than the positive control aflibercept. Both agonists also showed a reduction in avascular area size compared to vehicle (P < 0.001) and aflibercept groups (P < 0.05 and P < 0.01 for SZN-413-p and SZN-413, respectively). In the VEGF-induced retinal vascular leakage rabbit model, SZN-413 reduced retinal vascular leakage by ∼80%, compared to the vehicle-treated group (P < 0.01). Conclusions Reduction of neovascular tufts and avascular areas and of VEGF-driven retinal vascular leakage suggests that SZN-413 can simultaneously address retinal non-perfusion and vascular leakage. Translational Relevance FZD4 signaling modulation by SZN-413 is a novel mechanism of action that can offer a new therapeutic strategy for diabetic retinopathy.
Collapse
Affiliation(s)
- Huy Nguyen
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | - Hui Chen
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | | | | | | | | | | | - Asmiti Sura
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | - Nicholas Suen
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | - Suhani Gupta
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | - Tom Lopez
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | - Jay Ye
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | - Shengjiang Tu
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | - Menaka Bolaki
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | - Wen-Chen Yeh
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | - Yang Li
- Surrozen Operating, Inc., South San Francisco, CA, USA
| | - Sung-Jin Lee
- Surrozen Operating, Inc., South San Francisco, CA, USA
| |
Collapse
|
33
|
Buen Abad Najar CF, Burra P, Yosef N, Lareau LF. Identifying cell state-associated alternative splicing events and their coregulation. Genome Res 2022; 32:1385-1397. [PMID: 35858747 PMCID: PMC9341514 DOI: 10.1101/gr.276109.121] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 06/01/2022] [Indexed: 11/25/2022]
Abstract
Alternative splicing shapes the transcriptome and contributes to each cell's unique identity, but single-cell RNA sequencing (scRNA-seq) has struggled to capture the impact of alternative splicing. We previously showed that low recovery of mRNAs from single cells led to erroneous conclusions about the cell-to-cell variability of alternative splicing. Here, we present a method, Psix, to confidently identify splicing that changes across a landscape of single cells, using a probabilistic model that is robust against the data limitations of scRNA-seq. Its autocorrelation-inspired approach finds patterns of alternative splicing that correspond to patterns of cell identity, such as cell type or developmental stage, without the need for explicit cell clustering, labeling, or trajectory inference. Applying Psix to data that follow the trajectory of mouse brain development, we identify exons whose alternative splicing patterns cluster into modules of coregulation. We show that the exons in these modules are enriched for binding by distinct neuronal splicing factors and that their changes in splicing correspond to changes in expression of these splicing factors. Thus, Psix reveals cell type-dependent splicing patterns and the wiring of the splicing regulatory networks that control them. Our new method will enable scRNA-seq analysis to go beyond transcription to understand the roles of post-transcriptional regulation in determining cell identity.
Collapse
Affiliation(s)
| | - Prakruthi Burra
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, California 94720, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Chan Zuckerberg Biohub, San Francisco, California 94158, USA
| | - Liana F Lareau
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- Chan Zuckerberg Biohub, San Francisco, California 94158, USA
- Department of Bioengineering, University of California, Berkeley, California 94720, USA
| |
Collapse
|
34
|
Kothalawala WJ, Barták BK, Nagy ZB, Zsigrai S, Szigeti KA, Valcz G, Takács I, Kalmár A, Molnár B. A Detailed Overview About the Single-Cell Analyses of Solid Tumors Focusing on Colorectal Cancer. PATHOLOGY AND ONCOLOGY RESEARCH 2022; 28:1610342. [PMID: 35928965 PMCID: PMC9344373 DOI: 10.3389/pore.2022.1610342] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 06/15/2022] [Indexed: 11/21/2022]
Abstract
In recent years, the evolution of the molecular biological technical background led to the widespread application of single-cell sequencing, a versatile tool particularly useful in the investigation of tumor heterogeneity. Even 10 years ago the comprehensive characterization of colorectal cancers by The Cancer Genome Atlas was based on measurements of bulk samples. Nowadays, with single-cell approaches, tumor heterogeneity, the tumor microenvironment, and the interplay between tumor cells and their surroundings can be described in unprecedented detail. In this review article we aimed to emphasize the importance of single-cell analyses by presenting tumor heterogeneity and the limitations of conventional investigational approaches, followed by an overview of the whole single-cell analytic workflow from sample isolation to amplification, sequencing and bioinformatic analysis and a review of recent literature regarding the single-cell analysis of colorectal cancers.
Collapse
Affiliation(s)
- William J. Kothalawala
- Department of Internal Medicine and Oncology, Semmelweis University, Budapest, Hungary
- *Correspondence: William J. Kothalawala,
| | - Barbara K. Barták
- Department of Internal Medicine and Oncology, Semmelweis University, Budapest, Hungary
| | - Zsófia B. Nagy
- Department of Internal Medicine and Oncology, Semmelweis University, Budapest, Hungary
| | - Sára Zsigrai
- Department of Internal Medicine and Oncology, Semmelweis University, Budapest, Hungary
| | - Krisztina A. Szigeti
- Department of Internal Medicine and Oncology, Semmelweis University, Budapest, Hungary
| | - Gábor Valcz
- Department of Internal Medicine and Oncology, Semmelweis University, Budapest, Hungary
- Molecular Medicine Research Group, Eötvös Loránd Research Network, Budapest, Hungary
| | - István Takács
- Department of Internal Medicine and Oncology, Semmelweis University, Budapest, Hungary
| | - Alexandra Kalmár
- Department of Internal Medicine and Oncology, Semmelweis University, Budapest, Hungary
- Molecular Medicine Research Group, Eötvös Loránd Research Network, Budapest, Hungary
| | - Béla Molnár
- Department of Internal Medicine and Oncology, Semmelweis University, Budapest, Hungary
- Molecular Medicine Research Group, Eötvös Loránd Research Network, Budapest, Hungary
| |
Collapse
|
35
|
Zreika S, Fourneaux C, Vallin E, Modolo L, Seraphin R, Moussy A, Ventre E, Bouvier M, Ozier-Lafontaine A, Bonnaffoux A, Picard F, Gandrillon O, Gonin-Giraud S. Evidence for close molecular proximity between reverting and undifferentiated cells. BMC Biol 2022; 20:155. [PMID: 35794592 PMCID: PMC9258043 DOI: 10.1186/s12915-022-01363-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 06/27/2022] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND According to Waddington's epigenetic landscape concept, the differentiation process can be illustrated by a cell akin to a ball rolling down from the top of a hill (proliferation state) and crossing furrows before stopping in basins or "attractor states" to reach its stable differentiated state. However, it is now clear that some committed cells can retain a certain degree of plasticity and reacquire phenotypical characteristics of a more pluripotent cell state. In line with this dynamic model, we have previously shown that differentiating cells (chicken erythrocytic progenitors (T2EC)) retain for 24 h the ability to self-renew when transferred back in self-renewal conditions. Despite those intriguing and promising results, the underlying molecular state of those "reverting" cells remains unexplored. The aim of the present study was therefore to molecularly characterize the T2EC reversion process by combining advanced statistical tools to make the most of single-cell transcriptomic data. For this purpose, T2EC, initially maintained in a self-renewal medium (0H), were induced to differentiate for 24H (24H differentiating cells); then, a part of these cells was transferred back to the self-renewal medium (48H reverting cells) and the other part was maintained in the differentiation medium for another 24H (48H differentiating cells). For each time point, cell transcriptomes were generated using scRT-qPCR and scRNAseq. RESULTS Our results showed a strong overlap between 0H and 48H reverting cells when applying dimensional reduction. Moreover, the statistical comparison of cell distributions and differential expression analysis indicated no significant differences between these two cell groups. Interestingly, gene pattern distributions highlighted that, while 48H reverting cells have gene expression pattern more similar to 0H cells, they are not completely identical, which suggest that for some genes a longer delay may be required for the cells to fully recover. Finally, sparse PLS (sparse partial least square) analysis showed that only the expression of 3 genes discriminates 48H reverting and 0H cells. CONCLUSIONS Altogether, we show that reverting cells return to an earlier molecular state almost identical to undifferentiated cells and demonstrate a previously undocumented physiological and molecular plasticity during the differentiation process, which most likely results from the dynamic behavior of the underlying molecular network.
Collapse
Affiliation(s)
- Souad Zreika
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
- Azm Center for Research in Biotechnology and its Applications, LBA3B, EDST, Lebanese University, Tripoli, 1300, Lebanon
| | - Camille Fourneaux
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Elodie Vallin
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Laurent Modolo
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Rémi Seraphin
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Alice Moussy
- Ecole Pratique des Hautes Etudes, PSL Research University, UMRS951, INSERM, Univ-Evry, Paris, France
| | - Elias Ventre
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
- Inria Team Dracula, Inria Center Grenoble Rhone-Alpes, Grenoble, France
- Institut Camille Jordan, CNRS UMR 5208, Université Claude Bernard Lyon 1, Villeurbanne, France
| | - Matteo Bouvier
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
- Vidium solutions, Lyon, France
| | - Anthony Ozier-Lafontaine
- Nantes Université, Centrale Nantes, Laboratoire de mathématiques Jean Leray, LMJL, F-44000, Nantes, France
| | - Arnaud Bonnaffoux
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
- Vidium solutions, Lyon, France
| | - Franck Picard
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - Olivier Gandrillon
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
- Inria Team Dracula, Inria Center Grenoble Rhone-Alpes, Grenoble, France
| | - Sandrine Gonin-Giraud
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France.
| |
Collapse
|
36
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. UIPBC: An effective clustering for scRNA-seq data analysis without user input. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
37
|
DAE-TPGM: A deep autoencoder network based on a two-part-gamma model for analyzing single-cell RNA-seq data. Comput Biol Med 2022; 146:105578. [DOI: 10.1016/j.compbiomed.2022.105578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/29/2022] [Accepted: 04/30/2022] [Indexed: 11/18/2022]
|
38
|
He J, Lin L, Chen J. Practical bioinformatics pipelines for single-cell RNA-seq data analysis. BIOPHYSICS REPORTS 2022; 8:158-169. [PMID: 37288243 PMCID: PMC10189648 DOI: 10.52601/bpr.2022.210041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 03/01/2022] [Indexed: 11/05/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a revolutionary tool to explore cells. With an increasing number of scRNA-seq data analysis tools that have been developed, it is challenging for users to choose and compare their performance. Here, we present an overview of the workflow for computational analysis of scRNA-seq data. We detail the steps of a typical scRNA-seq analysis, including experimental design, pre-processing and quality control, feature selection, dimensionality reduction, cell clustering and annotation, and downstream analysis including batch correction, trajectory inference and cell-cell communication. We provide guidelines according to our best practice. This review will be helpful for the experimentalists interested in analyzing their data, and will aid the users seeking to update their analysis pipelines.
Collapse
Affiliation(s)
- Jiangping He
- Center for Cell Lineage and Atlas (CCLA), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510320, China
| | - Lihui Lin
- Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Jiekai Chen
- Center for Cell Lineage and Atlas (CCLA), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510320, China
- Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| |
Collapse
|
39
|
Zandavi SM, Koch FC, Vijayan A, Zanini F, Mora F, Ortega D, Vafaee F. Disentangling single-cell omics representation with a power spectral density-based feature extraction. Nucleic Acids Res 2022; 50:5482-5492. [PMID: 35639509 PMCID: PMC9178020 DOI: 10.1093/nar/gkac436] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 04/26/2022] [Accepted: 05/10/2022] [Indexed: 12/13/2022] Open
Abstract
Emerging single-cell technologies provide high-resolution measurements of distinct cellular modalities opening new avenues for generating detailed cellular atlases of many and diverse tissues. The high dimensionality, sparsity, and inaccuracy of single cell sequencing measurements, however, can obscure discriminatory information, mask cellular subtype variations and complicate downstream analyses which can limit our understanding of cell function and tissue heterogeneity. Here, we present a novel pre-processing method (scPSD) inspired by power spectral density analysis that enhances the accuracy for cell subtype separation from large-scale single-cell omics data. We comprehensively benchmarked our method on a wide range of single-cell RNA-sequencing datasets and showed that scPSD pre-processing, while being fast and scalable, significantly reduces data complexity, enhances cell-type separation, and enables rare cell identification. Additionally, we applied scPSD to transcriptomics and chromatin accessibility cell atlases and demonstrated its capacity to discriminate over 100 cell types across the whole organism and across different modalities of single-cell omics data.
Collapse
Affiliation(s)
- Seid Miad Zandavi
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Australia
- Programs in Metabolism and Medical & Population Genetics, Broad Institute, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Forrest C Koch
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Australia
| | - Abhishek Vijayan
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Australia
| | - Fabio Zanini
- Prince of Wales Clinical School, UNSW Sydney, Australia
- Cellular Genomics Future Institute, UNSW Sydney, Australia
| | - Fatima Valdes Mora
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Australia
- School of Women's and Children's Health, Faculty of Medicine, UNSW, Sydney, Australia
| | - David Gallego Ortega
- School of Biomedical Engineering, University of Technology Sydney (UTS), Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Australia
- Cellular Genomics Future Institute, UNSW Sydney, Australia
- UNSW Data Science Hub (uDASH), UNSW Sydney, Australia
| |
Collapse
|
40
|
Abondio P, De Intinis C, da Silva Gonçalves Vianez Júnior JL, Pace L. SINGLE CELL MULTIOMIC APPROACHES TO DISENTANGLE T CELL HETEROGENEITY. Immunol Lett 2022; 246:37-51. [DOI: 10.1016/j.imlet.2022.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 04/16/2022] [Accepted: 04/26/2022] [Indexed: 11/29/2022]
|
41
|
Wang M, Song WM, Ming C, Wang Q, Zhou X, Xu P, Krek A, Yoon Y, Ho L, Orr ME, Yuan GC, Zhang B. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application. Mol Neurodegener 2022; 17:17. [PMID: 35236372 PMCID: PMC8889402 DOI: 10.1186/s13024-022-00517-z] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/18/2022] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Won-min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Chen Ming
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Qian Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Xianxiao Zhou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Peng Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Yonejung Yoon
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Lap Ho
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Miranda E. Orr
- Department of Internal Medicine, Section of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
- Sticht Center for Healthy Aging and Alzheimer’s Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| |
Collapse
|
42
|
Wang R, Zheng X, Wang J, Wan S, Song F, Wong MH, Leung KS, Cheng L. Improving bulk RNA-seq classification by transferring gene signature from single cells in acute myeloid leukemia. Brief Bioinform 2022; 23:6523149. [PMID: 35136933 DOI: 10.1093/bib/bbac002] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Revised: 12/22/2021] [Accepted: 01/04/2022] [Indexed: 12/13/2022] Open
Abstract
The advances in single-cell RNA sequencing (scRNA-seq) technologies enable the characterization of transcriptomic profiles at the cellular level and demonstrate great promise in bulk sample analysis thereby offering opportunities to transfer gene signature from scRNA-seq to bulk data. However, the gene expression signatures identified from single cells are typically inapplicable to bulk RNA-seq data due to the profiling differences of distinct sequencing technologies. Here, we propose single-cell pair-wise gene expression (scPAGE), a novel method to develop single-cell gene pair signatures (scGPSs) that were beneficial to bulk RNA-seq classification to transfer knowledge across platforms. PAGE was adopted to tackle the challenge of profiling differences. We applied the method to acute myeloid leukemia (AML) and identified the scGPS from mouse scRNA-seq that allowed discriminating between AML and control cells. The scGPS was validated in bulk RNA-seq datasets and demonstrated better performance (average area under the curve [AUC] = 0.96) than the conventional gene expression strategies (average AUC$\le$ 0.88) suggesting its potential in disclosing the molecular mechanism of AML. The scGPS also outperformed its bulk counterpart, which highlighted the benefit of gene signature transfer. Furthermore, we confirmed the utility of scPAGE in sepsis as an example of other disease scenarios. scPAGE leveraged the advantages of single-cell profiles to enhance the analysis of bulk samples revealing great potential of transferring knowledge from single-cell to bulk transcriptome studies.
Collapse
Affiliation(s)
- Ran Wang
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China.,Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Xubin Zheng
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China.,Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Jun Wang
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
| | - Shibiao Wan
- Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Fangda Song
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518000, China
| | - Man Hon Wong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Kwong Sak Leung
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Lixin Cheng
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
| |
Collapse
|
43
|
Baruzzo G, Patuzzi I, Di Camillo B. Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results. BMC Bioinformatics 2022; 22:618. [PMID: 35130833 PMCID: PMC8822630 DOI: 10.1186/s12859-022-04587-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 01/27/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND 16S rRNA-gene sequencing is a valuable approach to characterize the taxonomic content of the whole bacterial population inhabiting a metabolic and spatial niche, providing an important opportunity to study bacteria and their role in many health and environmental mechanisms. The analysis of data produced by amplicon sequencing, however, brings very specific methodological issues that need to be properly addressed to obtain reliable biological conclusions. Among these, 16S count data tend to be very sparse, with many null values reflecting species that are present but got unobserved due to the multiplexing constraints. However, current data workflows do not consider a step in which the information about unobserved species is recovered. RESULTS In this work, we evaluate for the first time the effects of introducing in the 16S data workflow a new preprocessing step, zero-imputation, to recover this lost information. Due to the lack of published zero-imputation methods specifically designed for 16S count data, we considered a set of zero-imputation strategies available for other frameworks, and benchmarked them using in silico 16S count data reflecting different experimental designs. Additionally, we assessed the effect of combining zero-imputation and normalization, i.e. the only preprocessing step in current 16S workflow. Overall, we benchmarked 35 16S preprocessing pipelines assessing their ability to handle data sparsity, identify species presence/absence, recovery sample proportional abundance distributions, and improve typical downstream analyses such as computation of alpha and beta diversity indices and differential abundance analysis. CONCLUSIONS The results clearly show that 16S data analysis greatly benefits from a properly-performed zero-imputation step, despite the choice of the right zero-imputation method having a pivotal role. In addition, we identify a set of best-performing pipelines that could be a valuable indication for data analysts.
Collapse
Affiliation(s)
- Giacomo Baruzzo
- Department of Information Engineering, University of Padova, Padua, Italy
| | - Ilaria Patuzzi
- Department of Information Engineering, University of Padova, Padua, Italy
- Microbial Ecology Unit, Istituto Zooprofilattico Sperimentale Delle Venezie, Padua, Italy
- Research & Development Division, EuBiome S.R.L., Padua, Italy
| | - Barbara Di Camillo
- Department of Information Engineering, University of Padova, Padua, Italy.
- CRIBI Biotechnology Centre, University of Padova, Padua, Italy.
- Department of Comparative Biomedicine and Food Science, University of Padova, Padua, Italy.
| |
Collapse
|
44
|
Effect of imputation on gene network reconstruction from single-cell RNA-seq data. PATTERNS 2022; 3:100414. [PMID: 35199064 PMCID: PMC8848013 DOI: 10.1016/j.patter.2021.100414] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 07/30/2021] [Accepted: 11/25/2021] [Indexed: 01/03/2023]
Abstract
Despite the advances in single-cell transcriptomics, the reconstruction of gene regulatory networks remains challenging. Both the large amount of zero counts in experimental data and the lack of a consensus preprocessing pipeline for single-cell RNA sequencing (scRNA-seq) data make it hard to infer networks. Imputation can be applied in order to enhance gene-gene correlations and facilitate downstream analysis. However, it is unclear what consequences imputation methods have on the reconstruction of gene regulatory networks. To study this, we evaluate the differences on the performance and structure of reconstructed networks before and after imputation in single-cell data. We observe an inflation of gene-gene correlations that affects the predicted network structures and may decrease the performance of network reconstruction in general. However, within the modest limits of achievable results, we also make a recommendation as to an advisable combination of algorithms while warning against the indiscriminate use of imputation before network reconstruction in general. Gene network reconstruction does not necessarily profit from imputation Imputation rather than network reconstruction method influences network result Inflation of enhanced gene-gene correlations can obscure inferred network structures
Data analysis for single-cell transcriptomics requires sophisticated software pipelines. By studying the interplay between two prominent tasks, imputation of missing data, and gene network reconstruction, we point out the pitfalls of freely combining components as part of an analysis pipeline. In our application, an earlier decision for a particular imputation algorithm is shown to largely determine the results achievable in the later gene network reconstruction task. This interdependence constitutes the flip side of the convenience that comes with the availability of user-friendly computational pipelines.
Collapse
|
45
|
Gong B, Zhou Y, Purdom E. Cobolt: integrative analysis of multimodal single-cell sequencing data. Genome Biol 2021; 22:351. [PMID: 34963480 PMCID: PMC8715620 DOI: 10.1186/s13059-021-02556-z] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/22/2021] [Indexed: 11/29/2022] Open
Abstract
A growing number of single-cell sequencing platforms enable joint profiling of multiple omics from the same cells. We present Cobolt, a novel method that not only allows for analyzing the data from joint-modality platforms, but provides a coherent framework for the integration of multiple datasets measured on different modalities. We demonstrate its performance on multi-modality data of gene expression and chromatin accessibility and illustrate the integration abilities of Cobolt by jointly analyzing this multi-modality data with single-cell RNA-seq and ATAC-seq datasets.
Collapse
Affiliation(s)
- Boying Gong
- Division of Biostatistics, University of California, Berkeley, Berkeley, CA USA
| | - Yun Zhou
- Division of Biostatistics, University of California, Berkeley, Berkeley, CA USA
| | - Elizabeth Purdom
- Department of Statistics, University of California, Berkeley, Berkeley, CA USA
| |
Collapse
|
46
|
Azodi CB, Zappia L, Oshlack A, McCarthy DJ. splatPop: simulating population scale single-cell RNA sequencing data. Genome Biol 2021; 22:341. [PMID: 34911537 DOI: 10.1186/s13059-021-02546-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Accepted: 11/19/2021] [Indexed: 11/10/2022] Open
Abstract
Population-scale single-cell RNA sequencing (scRNA-seq) is now viable, enabling finer resolution functional genomics studies and leading to a rush to adapt bulk methods and develop new single-cell-specific methods to perform these studies. Simulations are useful for developing, testing, and benchmarking methods but current scRNA-seq simulation frameworks do not simulate population-scale data with genetic effects. Here, we present splatPop, a model for flexible, reproducible, and well-documented simulation of population-scale scRNA-seq data with known expression quantitative trait loci. splatPop can also simulate complex batch, cell group, and conditional effects between individuals from different cohorts as well as genetically-driven co-expression.
Collapse
Affiliation(s)
- Christina B Azodi
- St. Vincent's Institute of Medical Research, 9 Princes Street, Fitzroy, 3065, VIC, Australia.,University of Melbourne, Royal Parade, Parkville, 3010, VIC, Australia
| | - Luke Zappia
- Department of Mathematics, Technical University of Munich, Boltzmannstraße 3, Garching bei München, 85748, Germany.,Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, Neuherberg, 85764, Germany
| | - Alicia Oshlack
- University of Melbourne, Royal Parade, Parkville, 3010, VIC, Australia.,Peter MacCallum Cancer Centre, Grattan Street, Melbourne, 3000, VIC, Australia
| | - Davis J McCarthy
- St. Vincent's Institute of Medical Research, 9 Princes Street, Fitzroy, 3065, VIC, Australia. .,University of Melbourne, Royal Parade, Parkville, 3010, VIC, Australia.
| |
Collapse
|
47
|
You Y, Tian L, Su S, Dong X, Jabbari JS, Hickey PF, Ritchie ME. Benchmarking UMI-based single-cell RNA-seq preprocessing workflows. Genome Biol 2021; 22:339. [PMID: 34906205 PMCID: PMC8672463 DOI: 10.1186/s13059-021-02552-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/22/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. RESULTS Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. CONCLUSIONS In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.
Collapse
Affiliation(s)
- Yue You
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Luyi Tian
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Jafar S. Jabbari
- Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, Melbourne, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, The University of Melbourne at The Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Peter F. Hickey
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
- Single-Cell Open Research Endeavour (SCORE), The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
- School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia
| |
Collapse
|
48
|
Sheng J, Li WV. Selecting gene features for unsupervised analysis of single-cell gene expression data. Brief Bioinform 2021; 22:bbab295. [PMID: 34351383 PMCID: PMC8574996 DOI: 10.1093/bib/bbab295] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 06/17/2021] [Accepted: 07/12/2021] [Indexed: 11/15/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies facilitate the characterization of transcriptomic landscapes in diverse species, tissues, and cell types with unprecedented molecular resolution. In order to evaluate various biological hypotheses using high-dimensional single-cell gene expression data, most computational and statistical methods depend on a gene feature selection step to identify genes with high biological variability and reduce computational complexity. Even though many gene selection methods have been developed for scRNA-seq analysis, there lacks a systematic comparison of the assumptions, statistical models, and selection criteria used by these methods. In this article, we summarize and discuss 17 computational methods for selecting gene features in unsupervised analysis of single-cell gene expression data, with unified notations and statistical frameworks. Our discussion provides a useful summary to help practitioners select appropriate methods based on their assumptions and applicability, and to assist method developers in designing new computational tools for unsupervised learning of scRNA-seq data.
Collapse
Affiliation(s)
- Jie Sheng
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Wei Vivian Li
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Piscataway, NJ 08854, USA
| |
Collapse
|
49
|
Yao Y, Wyrozżemski Ł, Lundin KEA, Sandve GK, Qiao SW. Differential expression profile of gluten-specific T cells identified by single-cell RNA-seq. PLoS One 2021; 16:e0258029. [PMID: 34618841 PMCID: PMC8496852 DOI: 10.1371/journal.pone.0258029] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Accepted: 09/17/2021] [Indexed: 01/22/2023] Open
Abstract
Gluten-specific CD4+ T cells drive the pathogenesis of celiac disease and circulating gluten-specific T cells can be identified by staining with HLA-DQ:gluten tetramers. In this first single-cell RNA-seq study of tetramer-sorted T cells from untreated celiac disease patients blood, we found that gluten-specific T cells showed distinct transcriptomic profiles consistent with activated effector memory T cells that shared features with Th1 and follicular helper T cells. Compared to non-specific cells, gluten-specific T cells showed differential expression of several genes involved in T-cell receptor signaling, translational processes, apoptosis, fatty acid transport, and redox potentials. Many of the gluten-specific T cells studied shared T-cell receptor with each other, indicating that circulating gluten-specific T cells belong to a limited number of clones. Moreover, the transcriptional profiles of cells that shared the same clonal origin were transcriptionally more similar compared with between clonally unrelated gluten-specific cells.
Collapse
Affiliation(s)
- Ying Yao
- Department of Immunology, University of Oslo, Oslo, Norway
- Centre for Immune Regulation, University of Oslo, Oslo, Norway
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, Norway
| | - Łukasz Wyrozżemski
- Department of Immunology, University of Oslo, Oslo, Norway
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, Norway
| | - Knut E. A. Lundin
- Department of Immunology, University of Oslo, Oslo, Norway
- Centre for Immune Regulation, University of Oslo, Oslo, Norway
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, Norway
| | - Geir Kjetil Sandve
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Shuo-Wang Qiao
- Department of Immunology, University of Oslo, Oslo, Norway
- Centre for Immune Regulation, University of Oslo, Oslo, Norway
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, Norway
- * E-mail:
| |
Collapse
|
50
|
Borella M, Martello G, Risso D, Romualdi C. PsiNorm: a scalable normalization for single-cell RNA-seq data. Bioinformatics 2021; 38:164-172. [PMID: 34499096 PMCID: PMC8696108 DOI: 10.1093/bioinformatics/btab641] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 08/30/2021] [Accepted: 09/06/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) enables transcriptome-wide gene expression measurements at single-cell resolution providing a comprehensive view of the compositions and dynamics of tissue and organism development. The evolution of scRNA-seq protocols has led to a dramatic increase of cells throughput, exacerbating many of the computational and statistical issues that previously arose for bulk sequencing. In particular, with scRNA-seq data all the analyses steps, including normalization, have become computationally intensive, both in terms of memory usage and computational time. In this perspective, new accurate methods able to scale efficiently are desirable. RESULTS Here, we propose PsiNorm, a between-sample normalization method based on the power-law Pareto distribution parameter estimate. Here, we show that the Pareto distribution well resembles scRNA-seq data, especially those coming from platforms that use unique molecular identifiers. Motivated by this result, we implement PsiNorm, a simple and highly scalable normalization method. We benchmark PsiNorm against seven other methods in terms of cluster identification, concordance and computational resources required. We demonstrate that PsiNorm is among the top performing methods showing a good trade-off between accuracy and scalability. Moreover, PsiNorm does not need a reference, a characteristic that makes it useful in supervised classification settings, in which new out-of-sample data need to be normalized. AVAILABILITY AND IMPLEMENTATION PsiNorm is implemented in the scone Bioconductor package and available at https://bioconductor.org/packages/scone/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matteo Borella
- Department of Biology, University of Padova, Padua 35121, Italy
| | | | | | | |
Collapse
|