1
|
Xie X, Wang P, Jin M, Wang Y, Qi L, Wu C, Guo S, Li C, Zhang X, Yuan Y, Ma X, Liu F, Liu W, Liu H, Duan C, Ye P, Li X, Borish L, Zhao W, Feng X. IL-1β-induced epithelial cell and fibroblast transdifferentiation promotes neutrophil recruitment in chronic rhinosinusitis with nasal polyps. Nat Commun 2024; 15:9101. [PMID: 39438439 PMCID: PMC11496833 DOI: 10.1038/s41467-024-53307-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 10/09/2024] [Indexed: 10/25/2024] Open
Abstract
Neutrophilic inflammation contributes to multiple chronic inflammatory airway diseases, including asthma and chronic rhinosinusitis with nasal polyps (CRSwNP), and is associated with an unfavorable prognosis. Here, using single-cell RNA sequencing (scRNA-seq) to profile human nasal mucosa obtained from the inferior turbinates, middle turbinates, and nasal polyps of CRSwNP patients, we identify two IL-1 signaling-induced cell subsets-LY6D+ club cells and IDO1+ fibroblasts-that promote neutrophil recruitment by respectively releasing S100A8/A9 and CXCL1/2/3/5/6/8 into inflammatory regions. IL-1β, a pro-inflammatory cytokine involved in IL-1 signaling, induces the transdifferentiation of LY6D+ club cells and IDO1+ fibroblasts from primary epithelial cells and fibroblasts, respectively. In an LPS-induced neutrophilic CRSwNP mouse model, blocking IL-1β activity with a receptor antagonist significantly reduces the numbers of LY6D+ club cells and IDO1+ fibroblasts and mitigates nasal inflammation. This study implicates the function of two cell subsets in neutrophil recruitment and demonstrates an IL-1-based intervention for mitigating neutrophilic inflammation in CRSwNP.
Collapse
Affiliation(s)
- Xinyu Xie
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
- Shandong Provincial Key Medical and Health Discipline, Qilu Hospital of Shandong University, Jinan, China
| | - Pin Wang
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
- Shandong Provincial Key Medical and Health Discipline, Qilu Hospital of Shandong University, Jinan, China
| | - Min Jin
- Department of Anesthesiology, Qilu Hospital of Shandong University, Jinan, China
| | - Yue Wang
- Department of Gastroenterology, Qilu Hospital of Shandong University, Jinan, China
| | - Lijie Qi
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
- Shandong Provincial Key Medical and Health Discipline, Qilu Hospital of Shandong University, Jinan, China
| | - Changhua Wu
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
| | - Shu Guo
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
| | - Changqing Li
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
| | - Xiaojun Zhang
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
- Shandong Provincial Key Medical and Health Discipline, Qilu Hospital of Shandong University, Jinan, China
| | - Ye Yuan
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
| | - Xinyi Ma
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
| | - Fangying Liu
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
| | - Weiyuan Liu
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
| | - Heng Liu
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
| | - Chen Duan
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
- Shandong Provincial Key Medical and Health Discipline, Qilu Hospital of Shandong University, Jinan, China
| | - Ping Ye
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
- Shandong Provincial Key Medical and Health Discipline, Qilu Hospital of Shandong University, Jinan, China
| | - Xuezhong Li
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
- Shandong Provincial Key Medical and Health Discipline, Qilu Hospital of Shandong University, Jinan, China
| | - Larry Borish
- Departments of Medicine, University of Virginia Health System, Charlottesville, VA, USA
- Departments of Microbiology, University of Virginia Health System, Charlottesville, VA, USA
| | - Wei Zhao
- Key Laboratory for Experimental Teratology of the Chinese Ministry of Education, School of Basic Medical Science, Shandong University, Jinan, China
- Key Laboratory of Infection and Immunity of Shandong Province, School of Basic Medical Science, Shandong University, Jinan, China
| | - Xin Feng
- Department of Otorhinolaryngology, National Health Commission Key Laboratory of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China.
- Shandong Provincial Key Medical and Health Discipline, Qilu Hospital of Shandong University, Jinan, China.
| |
Collapse
|
2
|
Sullivan DK, Min KHJ, Hjörleifsson KE, Luebbert L, Holley G, Moses L, Gustafsson J, Bray NL, Pimentel H, Booeshaghi AS, Melsted P, Pachter L. kallisto, bustools and kb-python for quantifying bulk, single-cell and single-nucleus RNA-seq. Nat Protoc 2024:10.1038/s41596-024-01057-0. [PMID: 39390263 DOI: 10.1038/s41596-024-01057-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 07/29/2024] [Indexed: 10/12/2024]
Abstract
The term 'RNA-seq' refers to a collection of assays based on sequencing experiments that involve quantifying RNA species from bulk tissue, single cells or single nuclei. The kallisto, bustools and kb-python programs are free, open-source software tools for performing this analysis that together can produce gene expression quantification from raw sequencing reads. The quantifications can be individualized for multiple cells, multiple samples or both. Additionally, these tools allow gene expression values to be classified as originating from nascent RNA species or mature RNA species, making this workflow amenable to both cell-based and nucleus-based assays. This protocol describes in detail how to use kallisto and bustools in conjunction with a wrapper, kb-python, to preprocess RNA-seq data. Execution of this protocol requires basic familiarity with a command line environment. With this protocol, quantification of a moderately sized RNA-seq dataset can be completed within minutes.
Collapse
Affiliation(s)
- Delaney K Sullivan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | | | | | - Laura Luebbert
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | | | - Lambda Moses
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | | | | | - Harold Pimentel
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - A Sina Booeshaghi
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA.
| | - Páll Melsted
- deCODE Genetics/Amgen Inc., Reykjavik, Iceland.
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland.
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
3
|
Tiberi S, Meili J, Cai P, Soneson C, He D, Sarkar H, Avalos-Pacheco A, Patro R, Robinson MD. DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes. Biostatistics 2024; 25:1079-1093. [PMID: 38887902 DOI: 10.1093/biostatistics/kxae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 03/21/2024] [Accepted: 05/15/2024] [Indexed: 06/20/2024] Open
Abstract
Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.
Collapse
Affiliation(s)
- Simone Tiberi
- Department of Statistical Sciences, University of Bologna, Via delle Belle Arti 41, Bologna, 40126, Italy
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Joël Meili
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Peiying Cai
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Computational Biology Platform, Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Fabrikstrasse 24, Basel, 4056, Switzerland
| | - Dongze He
- Department of Cell Biology and Molecular Genetics, University of Maryland, 4062 Campus Drive, College Park, MD 20742, United States
- Center for Bioinformatics and Computational Biology, University of Maryland, 8125 Paint Branch Dr, College Park, MD 20742, United States
| | - Hirak Sarkar
- Department of Computer Science, Princeton University, 35 Olden St, Princeton, NJ 08540, United States
| | - Alejandra Avalos-Pacheco
- Research Unit of Applied Statistics, TU Wien, Wiedner Hauptstrabe 8-10/105, Wien 1040, Austria
- Harvard-MIT Center for Regulatory Science, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115200 Longwood Avenue, Boston, MA 02115, United States
| | - Rob Patro
- Center for Bioinformatics and Computational Biology, University of Maryland, 8125 Paint Branch Dr, College Park, MD 20742, United States
- Department of Computer Science, University of Maryland, 8125 Paint Branch Dr, College Park, MD 20742, United States
| | - Mark D Robinson
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| |
Collapse
|
4
|
Cheng S, Li L, Yeh Y, Shi Y, Franco O, Corey E, Yu X. Unveiling novel double-negative prostate cancer subtypes through single-cell RNA sequencing analysis. NPJ Precis Oncol 2024; 8:171. [PMID: 39095562 PMCID: PMC11297170 DOI: 10.1038/s41698-024-00667-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Accepted: 07/24/2024] [Indexed: 08/04/2024] Open
Abstract
Recent advancements in single-cell RNA sequencing (scRNAseq) have facilitated the discovery of previously unrecognized subtypes within prostate cancer (PCa), offering new insights into cancer heterogeneity and progression. In this study, we integrated scRNAseq data from multiple studies, comprising publicly available cohorts and data generated by our research team, and established the Human Prostate Single cell Atlas (HuPSA) and Mouse Prostate Single cell Atlas (MoPSA) datasets. Through comprehensive analysis, we identified two novel double-negative PCa populations: KRT7 cells characterized by elevated KRT7 expression and progenitor-like cells marked by SOX2 and FOXA2 expression, distinct from NEPCa, and displaying stem/progenitor features. Furthermore, HuPSA-based deconvolution re-classified human PCa specimens, validating the presence of these novel subtypes. We then developed a user-friendly web application, "HuPSA-MoPSA" ( https://pcatools.shinyapps.io/HuPSA-MoPSA/ ), for visualizing gene expression across all newly established datasets. Our study provides comprehensive tools for PCa research and uncovers novel cancer subtypes that can inform clinical diagnosis and treatment strategies.
Collapse
Affiliation(s)
- Siyuan Cheng
- Department of Biochemistry and Molecular Biology, LSU Health Shreveport, Shreveport, LA, USA.
- Feist-Weiller Cancer Center, LSU Health Shreveport, Shreveport, LA, USA.
| | - Lin Li
- Department of Biochemistry and Molecular Biology, LSU Health Shreveport, Shreveport, LA, USA
- Feist-Weiller Cancer Center, LSU Health Shreveport, Shreveport, LA, USA
| | - Yunshin Yeh
- Pathology & Laboratory Medicine Service, Overton Brooks VA Medical Center, Shreveport, LA, USA
| | - Yingli Shi
- Department of Biochemistry and Molecular Biology, LSU Health Shreveport, Shreveport, LA, USA
- Feist-Weiller Cancer Center, LSU Health Shreveport, Shreveport, LA, USA
| | - Omar Franco
- Department of Biochemistry and Molecular Biology, LSU Health Shreveport, Shreveport, LA, USA
- Feist-Weiller Cancer Center, LSU Health Shreveport, Shreveport, LA, USA
| | - Eva Corey
- Department of Urology, University of Washington, Seattle, WA, USA
| | - Xiuping Yu
- Department of Biochemistry and Molecular Biology, LSU Health Shreveport, Shreveport, LA, USA.
- Feist-Weiller Cancer Center, LSU Health Shreveport, Shreveport, LA, USA.
- Department of Urology, LSU Health Shreveport, Shreveport, LA, USA.
| |
Collapse
|
5
|
Barrios EL, Rincon JC, Willis M, Polcz VE, Leary JR, Darden DB, Balch JA, Larson SD, Loftus TJ, Mohr AM, Wallet S, Brusko MA, Balzano-Nogueira L, Cai G, Sharma A, Upchurch GR, Kladde MP, Mathews CE, Maile R, Moldawer LL, Bacher R, Efron PA. TRANSCRIPTOMIC DIFFERENCES IN PERIPHERAL MONOCYTE POPULATIONS IN SEPTIC PATIENTS BASED ON OUTCOME. Shock 2024; 62:208-216. [PMID: 38713581 DOI: 10.1097/shk.0000000000002379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2024]
Abstract
ABSTRACT Postsepsis early mortality is being replaced by survivors who experience either a rapid recovery and favorable hospital discharge or the development of chronic critical illness with suboptimal outcomes. The underlying immunological response that determines these clinical trajectories remains poorly defined at the transcriptomic level. As classical and nonclassical monocytes are key leukocytes in both the innate and adaptive immune systems, we sought to delineate the transcriptomic response of these cell types. Using single-cell RNA sequencing and pathway analyses, we identified gene expression patterns between these two groups that are consistent with differences in TNF-α production based on clinical outcome. This may provide therapeutic targets for those at risk for chronic critical illness in order to improve their phenotype/endotype, morbidity, and long-term mortality.
Collapse
Affiliation(s)
- Evan L Barrios
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Jaimar C Rincon
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Micah Willis
- Department of Oral Biology, College of Dentistry, Gainesville, Florida
| | - Valerie E Polcz
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Jack R Leary
- Department of Biostatistics, College of Medicine, Gainesville, Florida
| | - Dijoia B Darden
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Jeremy A Balch
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Shawn D Larson
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Tyler J Loftus
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Alicia M Mohr
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Shannon Wallet
- Department of Oral Biology, College of Dentistry, Gainesville, Florida
| | - Maigan A Brusko
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, Gainesville, Florida
| | | | - Guoshuai Cai
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Ashish Sharma
- Department of Surgery, College of Medicine, Gainesville, Florida
| | | | - Michael P Kladde
- Department of Biochemistry and Molecular Biology, College of Medicine, Gainesville, Florida
| | - Clayton E Mathews
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, Gainesville, Florida
| | - Robert Maile
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Lyle L Moldawer
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| | - Rhonda Bacher
- Department of Biostatistics, College of Medicine, Gainesville, Florida
| | - Philip A Efron
- Sepsis and Critical Illness Research Center, Department of Surgery, College of Medicine, Gainesville, Florida
| |
Collapse
|
6
|
Zhou T, Yang H, Assa C, DeRoo E, Bontekoe J, Burkel B, Ponik S, Lu HS, Daugherty A, Liu B. Myeloid-Specific Thrombospondin-1 Deficiency Exacerbates Aortic Rupture via Broad Suppression of Extracellular Matrix Proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.30.605216. [PMID: 39211130 PMCID: PMC11361016 DOI: 10.1101/2024.07.30.605216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Rationale Rupture of abdominal aortic aneurysms (AAA) is associated with high mortality. However, the precise molecular and cellular drivers of AAA rupture remain elusive. Our prior study showed that global and myeloid-specific deletion of matricellular protein thrombospondin-1 (TSP1) protects mice from aneurysm formation primarily by inhibiting vascular inflammation. Objective To investigate the cellular and molecular mechanisms that drive AAA rupture by testing how TSP1 deficiency in different cell populations affects the rupture event. Methods and Results We deleted TSP1 in endothelial cells and macrophages --- the major TSP1-expressing cells in aneurysmal tissues ---- by crossbreeding Thbs1 flox/flox mice with VE-cadherin Cre and Lyz2-cre mice, respectively. Aortic aneurysm and rupture were induced by angiotensin II in mice with hypercholesterolemia. Myeloid-specific Thbs1 knockout, but not endothelial-specific knockout, increased the rate of lethal aortic rupture by more than 2 folds. Combined analyses of single-cell RNA sequencing and histology showed a unique cellular and molecular signature of the rupture-prone aorta that was characterized by a broad suppression in inflammation and extracellular matrix production. Visium spatial transcriptomic analysis on human AAA tissues showed a correlation between low TSP1 expression and aortic dissection. Conclusions TSP1 expression by myeloid cells negatively regulates aneurysm rupture, likely through promoting the matrix repair phenotypes of vascular smooth muscle cells thereby increasing the strength of the vascular wall.
Collapse
|
7
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
8
|
Downie JM, Musich RJ, Geraghty CM, Caraballo A, He S, Khawaled S, Lachut K, Long T, Zhou JY, Yilmaz OH, Stappenbeck T, Chan AT, Drew DA. Optimizing single-cell RNA sequencing methods for human colon biopsies: droplet-based vs. picowell-based platforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.24.600526. [PMID: 38979379 PMCID: PMC11230261 DOI: 10.1101/2024.06.24.600526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background & Aims Single-cell RNA sequencing (scRNA) has empowered many insights into gastrointestinal microenvironments. However, profiling human biopsies using droplet-based scRNA (D-scRNA) is challenging since it requires immediate processing to minimize epithelial cell damage. In contrast, picowell-based (P-scRNA) platforms permit short-term frozen storage before sequencing. We compared P- and D-scRNA platforms on cells derived from human colon biopsies. Methods Endoscopic rectosigmoid mucosal biopsies were obtained from two adults and conducted D-scRNA (10X Chromium) and P-scRNA (Honeycomb HIVE) in parallel using an individual's pool of single cells (> 10,000 cells/participant). Three experiments were performed to evaluate 1) P-scRNA with cells under specific storage conditions (immediately processed [fresh], vs. frozen at -20C vs. -80C [2 weeks]); 2) fresh P-scRNA versus fresh D-scRNA; and 3) P-scRNA stored at -80C with fresh D-scRNA. Results Significant recovery of loaded cells was achieved for fresh (80.9%) and -80C (48.5%) P-scRNA and D-scRNA (76.6%), but not -20C P-scRNA (3.7%). However, D-scRNA captures more typeable cells among recovered cells (71.5% vs. 15.8% Fresh and 18.4% -80C P-scRNA), and these cells exhibit higher gene coverage at the expense of higher mitochondrial read fractions across most cell types. Cells profiled using D-scRNA demonstrated more consistent gene expression profiles among the same cell type than those profiled using P-scRNA. Significant intra-cell-type differences were observed in profiled gene classes across platforms. Conclusions Our results highlight non-overlapping advantages of P-scRNA and D-scRNA and underscore the need for innovation to enable high-fidelity capture of colonic epithelial cells. The platform-specific variation highlights the challenges of maintaining rigor and reproducibility across studies that use different platforms.
Collapse
|
9
|
He D, Gao Y, Chan SS, Quintana-Parrilla N, Patro R. Forseti: a mechanistic and predictive model of the splicing status of scRNA-seq reads. Bioinformatics 2024; 40:i297-i306. [PMID: 38940130 PMCID: PMC11256924 DOI: 10.1093/bioinformatics/btae207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Short-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics in scRNA-seq data is challenging, with inherent difficulty in even the seemingly straightforward task of elucidating the splicing status of the molecules from which sequenced fragments are drawn. This difficulty arises, in part, from the limited read length and positional biases, which substantially reduce the specificity of the sequenced fragments. As a result, the splicing status of many reads in scRNA-seq is ambiguous because of a lack of definitive evidence. We are therefore in need of methods that can recover the splicing status of ambiguous reads which, in turn, can lead to more accuracy and confidence in downstream analyses. RESULTS We develop Forseti, a predictive model to probabilistically assign a splicing status to scRNA-seq reads. Our model has two key components. First, we train a binding affinity model to assign a probability that a given transcriptomic site is used in fragment generation. Second, we fit a robust fragment length distribution model that generalizes well across datasets deriving from different species and tissue types. Forseti combines these two trained models to predict the splicing status of the molecule of origin of reads by scoring putative fragments that associate each alignment of sequenced reads with proximate potential priming sites. Using both simulated and experimental data, we show that our model can precisely predict the splicing status of many reads and identify the true gene origin of multi-gene mapped reads. AVAILABILITY AND IMPLEMENTATION Forseti and the code used for producing the results are available at https://github.com/COMBINE-lab/forseti under a BSD 3-clause license.
Collapse
Affiliation(s)
- Dongze He
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, United States
- Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, United States
| | - Yuan Gao
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, United States
- Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, United States
| | - Spencer Skylar Chan
- Department of Computer Science, University of Maryland, College Park, MD 20742, United States
| | | | - Rob Patro
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, United States
- Department of Computer Science, University of Maryland, College Park, MD 20742, United States
| |
Collapse
|
10
|
Tiberi S, Meili J, Cai P, Soneson C, He D, Sarkar H, Avalos-Pacheco A, Patro R, Robinson MD. DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.17.553679. [PMID: 37645841 PMCID: PMC10462127 DOI: 10.1101/2023.08.17.553679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Motivation Although transcriptomics data is typically used to analyse mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g., healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, i.e., reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Results Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, versus state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. Availability and implementation DifferentialRegulation is distributed as a Bioconductor R package.
Collapse
Affiliation(s)
- Simone Tiberi
- Department of Statistical Sciences, University of Bologna, Bologna, Italy
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Joël Meili
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Peiying Cai
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Charlotte Soneson
- Computational Biology Platform, Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Dongze He
- Department of Cell Biology and Molecular Genetics, University of Maryland, MD, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Hirak Sarkar
- Department of Computer Science, Princeton University, NJ, USA
| | - Alejandra Avalos-Pacheco
- Research Unit of Applied Statistics, TU Wien, Vienna, Austria
- Harvard-MIT Center for Regulatory Science, Harvard Medical School, Boston, MA, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, MD, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Mark D Robinson
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
11
|
Cheng S, Li L, Yeh Y, Shi Y, Franco O, Corey E, Yu X. Unveiling Novel Double-Negative Prostate Cancer Subtypes Through Single-Cell RNA Sequencing Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.11.553009. [PMID: 38746150 PMCID: PMC11092429 DOI: 10.1101/2023.08.11.553009] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Recent advancements in single-cell RNA sequencing (scRNAseq) have facilitated the discovery of previously unrecognized subtypes within prostate cancer (PCa), offering new insights into disease heterogeneity and progression. In this study, we integrated scRNAseq data from multiple studies, comprising both publicly available cohorts and data generated by our research team, and established the HuPSA (Human Prostate Single cell Atlas) and the MoPSA (Mouse Prostate Single cell Atlas) datasets. Through comprehensive analysis, we identified two novel double-negative PCa populations: KRT7 cells characterized by elevated KRT7 expression, and progenitor-like cells marked by SOX2 and FOXA2 expression, distinct from NEPCa, and displaying stem/progenitor features. Furthermore, HuPSA-based deconvolution allowed for the re-classification of human PCa specimens, validating the presence of these novel subtypes. Leveraging these findings, we developed a user-friendly web application, "HuPSA-MoPSA" (https://pcatools.shinyapps.io/HuPSA-MoPSA/), for visualizing gene expression across all newly-established datasets. Our study provides comprehensive tools for PCa research and uncovers novel cancer subtypes that can inform clinical diagnosis and treatment strategies.
Collapse
Affiliation(s)
- Siyuan Cheng
- Department of Biochemistry and Molecular biology, LSU Health Shreveport, Shreveport, LA
- Feist-Weiller Cancer Center, LSU Health Shreveport, Shreveport, LA
| | - Lin Li
- Department of Biochemistry and Molecular biology, LSU Health Shreveport, Shreveport, LA
- Feist-Weiller Cancer Center, LSU Health Shreveport, Shreveport, LA
| | - Yunshin Yeh
- Pathology & Laboratory Medicine Service, Overton Brooks VA Medical Center, Shreveport, LA
| | - Yingli Shi
- Department of Biochemistry and Molecular biology, LSU Health Shreveport, Shreveport, LA
- Feist-Weiller Cancer Center, LSU Health Shreveport, Shreveport, LA
| | - Omar Franco
- Department of Biochemistry and Molecular biology, LSU Health Shreveport, Shreveport, LA
- Feist-Weiller Cancer Center, LSU Health Shreveport, Shreveport, LA
| | - Eva Corey
- Department of Urology, University of Washington, Seattle, WA
| | - Xiuping Yu
- Department of Biochemistry and Molecular biology, LSU Health Shreveport, Shreveport, LA
- Feist-Weiller Cancer Center, LSU Health Shreveport, Shreveport, LA
- Department of Urology, LSU Health Shreveport, Shreveport, LA
| |
Collapse
|
12
|
Barrios EL, Leary JR, Darden DB, Rincon JC, Willis M, Polcz VE, Gillies GS, Munley JA, Dirain ML, Ungaro R, Nacionales DC, Gauthier MPL, Larson SD, Morel L, Loftus TJ, Mohr AM, Maile R, Kladde MP, Mathews CE, Brusko MA, Brusko TM, Moldawer LL, Bacher R, Efron PA. The post-septic peripheral myeloid compartment reveals unexpected diversity in myeloid-derived suppressor cells. Front Immunol 2024; 15:1355405. [PMID: 38720891 PMCID: PMC11076668 DOI: 10.3389/fimmu.2024.1355405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 04/09/2024] [Indexed: 05/12/2024] Open
Abstract
Introduction Sepsis engenders distinct host immunologic changes that include the expansion of myeloid-derived suppressor cells (MDSCs). These cells play a physiologic role in tempering acute inflammatory responses but can persist in patients who develop chronic critical illness. Methods Cellular Indexing of Transcriptomes and Epitopes by Sequencing and transcriptomic analysis are used to describe MDSC subpopulations based on differential gene expression, RNA velocities, and biologic process clustering. Results We identify a unique lineage and differentiation pathway for MDSCs after sepsis and describe a novel MDSC subpopulation. Additionally, we report that the heterogeneous response of the myeloid compartment of blood to sepsis is dependent on clinical outcome. Discussion The origins and lineage of these MDSC subpopulations were previously assumed to be discrete and unidirectional; however, these cells exhibit a dynamic phenotype with considerable plasticity.
Collapse
Affiliation(s)
- Evan L. Barrios
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Jack R. Leary
- Department of Biostatistics, University of Florida College of Medicine and Public Health and Health Sciences, Gainesville, FL, United States
| | - Dijoia B. Darden
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Jaimar C. Rincon
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Micah Willis
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Valerie E. Polcz
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Gwendolyn S. Gillies
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Jennifer A. Munley
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Marvin L. Dirain
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Ricardo Ungaro
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Dina C. Nacionales
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Marie-Pierre L. Gauthier
- Department of Biochemistry and Molecular Biology, University of Florida College of Medicine, Gainesville, FL, United States
| | - Shawn D. Larson
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Laurence Morel
- Department of Microbiology and Immunology, University of Texas San Antonio School of Medicine, San Antonio, TX, United States
| | - Tyler J. Loftus
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Alicia M. Mohr
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Robert Maile
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Michael P. Kladde
- Department of Biochemistry and Molecular Biology, University of Florida College of Medicine, Gainesville, FL, United States
| | - Clayton E. Mathews
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida College of Medicine, Gainesville, FL, United States
| | - Maigan A. Brusko
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida College of Medicine, Gainesville, FL, United States
| | - Todd M. Brusko
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida College of Medicine, Gainesville, FL, United States
| | - Lyle L. Moldawer
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| | - Rhonda Bacher
- Department of Biostatistics, University of Florida College of Medicine and Public Health and Health Sciences, Gainesville, FL, United States
| | - Philip A. Efron
- Sepsis and Critical Illness Research Center, Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States
| |
Collapse
|
13
|
Su Z, Tong Y, Wei GW. Hodge Decomposition of Single-Cell RNA Velocity. J Chem Inf Model 2024; 64:3558-3568. [PMID: 38572676 PMCID: PMC11035094 DOI: 10.1021/acs.jcim.4c00132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 04/05/2024]
Abstract
RNA velocity has the ability to capture the cell dynamic information in the biological processes; yet, a comprehensive analysis of the cell state transitions and their associated chemical and biological processes remains a gap. In this work, we provide the Hodge decomposition, coupled with discrete exterior calculus (DEC), to unveil cell dynamics by examining the decomposed curl-free, divergence-free, and harmonic components of the RNA velocity field in a low dimensional representation, such as a UMAP or a t-SNE representation. Decomposition results show that the decomposed components distinctly reveal key cell dynamic features such as cell cycle, bifurcation, and cell lineage differentiation, regardless of the choice of the low-dimensional representations. The consistency across different representations demonstrates that the Hodge decomposition is a reliable and robust way to extract these cell dynamic features, offering unique analysis and insightful visualization of single-cell RNA velocity fields.
Collapse
Affiliation(s)
- Zhe Su
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yiying Tong
- Department
of Computer Science and Engineering, Michigan
State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East
Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
14
|
Wang S, Prieux M, de Bernard S, Dubois M, Laubreton D, Djebali S, Zala M, Arpin C, Genestier L, Leverrier Y, Gandrillon O, Crauste F, Jiang W, Marvel J. Exogenous IL-2 delays memory precursors generation and is essential for enhancing memory cells effector functions. iScience 2024; 27:109411. [PMID: 38510150 PMCID: PMC10952031 DOI: 10.1016/j.isci.2024.109411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 11/27/2023] [Accepted: 02/29/2024] [Indexed: 03/22/2024] Open
Abstract
To investigate the impact of paracrine IL-2 signals on memory precursor (MP) cell differentiation, we activated CD8 T cell in vitro in the presence or absence of exogenous IL-2 (ex-IL-2). We assessed memory differentiation by transferring these cells into virus-infected mice. Both conditions generated CD8 T cells that participate in the ongoing response and gave rise to similar memory cells. Nevertheless, when transferred into a naive host, T cells activated with ex-IL-2 generated a higher frequency of memory cells displaying increased functional memory traits. Single-cell RNA-seq analysis indicated that without ex-IL-2, cells rapidly acquire an MP signature, while in its presence they adopted an effector signature. This was confirmed at the protein level and in a functional assay. Overall, ex-IL-2 delays the transition into MP cells, allowing the acquisition of effector functions that become imprinted in their progeny. These findings may help to optimize the generation of therapeutic T cells.
Collapse
Affiliation(s)
- Shaoying Wang
- Centre International de Recherche en Infectiologie, INSERM, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Margaux Prieux
- Centre International de Recherche en Infectiologie, INSERM, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
- Laboratoire de Biologie et de Modélisation de la Cellule, Université de Lyon, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Lyon, France
| | | | - Maxence Dubois
- Centre International de Recherche en Infectiologie, INSERM, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
| | - Daphne Laubreton
- Centre International de Recherche en Infectiologie, INSERM, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
| | - Sophia Djebali
- Centre International de Recherche en Infectiologie, INSERM, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
| | - Manon Zala
- Centre International de Recherche en Infectiologie, INSERM, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
- Faculté de Médecine Lyon-Sud, Université de Lyon, Oullins, France
| | - Christophe Arpin
- Centre International de Recherche en Infectiologie, INSERM, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
- Laboratoire de Biologie et de Modélisation de la Cellule, Université de Lyon, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Lyon, France
| | - Laurent Genestier
- Centre International de Recherche en Infectiologie, INSERM, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
- Faculté de Médecine Lyon-Sud, Université de Lyon, Oullins, France
| | - Yann Leverrier
- Centre International de Recherche en Infectiologie, INSERM, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
| | - Olivier Gandrillon
- Inria, Villeurbanne, France
- Laboratoire de Biologie et de Modélisation de la Cellule, Université de Lyon, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Lyon, France
| | - Fabien Crauste
- Laboratoire MAP5 (UMR CNRS 8145), Université Paris Cité, Paris, France
| | - Wenzheng Jiang
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Jacqueline Marvel
- Centre International de Recherche en Infectiologie, INSERM, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
| |
Collapse
|
15
|
Kuijpers L, Hornung B, van den Hout-van Vroonhoven MCGN, van IJcken WFJ, Grosveld F, Mulugeta E. Split Pool Ligation-based Single-cell Transcriptome sequencing (SPLiT-seq) data processing pipeline comparison. BMC Genomics 2024; 25:361. [PMID: 38609853 PMCID: PMC11010347 DOI: 10.1186/s12864-024-10285-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 04/03/2024] [Indexed: 04/14/2024] Open
Abstract
BACKGROUND Single-cell sequencing techniques are revolutionizing every field of biology by providing the ability to measure the abundance of biological molecules at a single-cell resolution. Although single-cell sequencing approaches have been developed for several molecular modalities, single-cell transcriptome sequencing is the most prevalent and widely applied technique. SPLiT-seq (split-pool ligation-based transcriptome sequencing) is one of these single-cell transcriptome techniques that applies a unique combinatorial-barcoding approach by splitting and pooling cells into multi-well plates containing barcodes. This unique approach required the development of dedicated computational tools to preprocess the data and extract the count matrices. Here we compare eight bioinformatic pipelines (alevin-fry splitp, LR-splitpipe, SCSit, splitpipe, splitpipeline, SPLiTseq-demultiplex, STARsolo and zUMI) that have been developed to process SPLiT-seq data. We provide an overview of the tools, their computational performance, functionality and impact on downstream processing of the single-cell data, which vary greatly depending on the tool used. RESULTS We show that STARsolo, splitpipe and alevin-fry splitp can all handle large amount of data within reasonable time. In contrast, the other five pipelines are slow when handling large datasets. When using smaller dataset, cell barcode results are similar with the exception of SPLiTseq-demultiplex and splitpipeline. LR-splitpipe that is originally designed for processing long-read sequencing data is the slowest of all pipelines. Alevin-fry produced different down-stream results that are difficult to interpret. STARsolo functions nearly identical to splitpipe and produce results that are highly similar to each other. However, STARsolo lacks the function to collapse random hexamer reads for which some additional coding is required. CONCLUSION Our comprehensive comparative analysis aids users in selecting the most suitable analysis tool for efficient SPLiT-seq data processing, while also detailing the specific prerequisites for each of these pipelines. From the available pipelines, we recommend splitpipe or STARSolo for SPLiT-seq data analysis.
Collapse
Affiliation(s)
- Lucas Kuijpers
- Department of Cell Biology, Erasmus University Medical Center Rotterdam (Erasmus MC), Wytemaweg 80, Rotterdam, 3015CN, The Netherlands.
| | - Bastian Hornung
- Center for Biomics, Erasmus University Medical Center Rotterdam (Erasmus MC), Rotterdam, The Netherlands
| | | | - Wilfred F J van IJcken
- Center for Biomics, Erasmus University Medical Center Rotterdam (Erasmus MC), Rotterdam, The Netherlands
| | - Frank Grosveld
- Department of Cell Biology, Erasmus University Medical Center Rotterdam (Erasmus MC), Wytemaweg 80, Rotterdam, 3015CN, The Netherlands
| | - Eskeatnaf Mulugeta
- Department of Cell Biology, Erasmus University Medical Center Rotterdam (Erasmus MC), Wytemaweg 80, Rotterdam, 3015CN, The Netherlands.
| |
Collapse
|
16
|
Booeshaghi AS, Chen X, Pachter L. A machine-readable specification for genomics assays. Bioinformatics 2024; 40:btae168. [PMID: 38579259 PMCID: PMC11009023 DOI: 10.1093/bioinformatics/btae168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/04/2023] [Accepted: 04/04/2024] [Indexed: 04/07/2024] Open
Abstract
MOTIVATION Understanding the structure of sequenced fragments from genomics libraries is essential for accurate read preprocessing. Currently, different assays and sequencing technologies require custom scripts and programs that do not leverage the common structure of sequence elements present in genomics libraries. RESULTS We present seqspec, a machine-readable specification for libraries produced by genomics assays that facilitates standardization of preprocessing and enables tracking and comparison of genomics assays. AVAILABILITY AND IMPLEMENTATION The specification and associated seqspec command line tool is available at https://www.doi.org/10.5281/zenodo.10213865.
Collapse
Affiliation(s)
- Ali Sina Booeshaghi
- Department of Bioengineering, University of California, Berkeley, CA, 94720, United States
| | - Xi Chen
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, United States
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, United States
| |
Collapse
|
17
|
Brooks TG, Lahens NF, Mrčela A, Sarantopoulou D, Nayak S, Naik A, Sengupta S, Choi PS, Grant GR. BEERS2: RNA-Seq simulation through high fidelity in silico modeling. Brief Bioinform 2024; 25:bbae164. [PMID: 38605641 PMCID: PMC11009461 DOI: 10.1093/bib/bbae164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 01/26/2024] [Accepted: 03/26/2024] [Indexed: 04/13/2024] Open
Abstract
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully length messenger RNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in polymerase chain reaction (PCR) amplification, barcode read errors and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Dimitra Sarantopoulou
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Soumyashant Nayak
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: Statistics and Mathematics Unit, Indian Statistical Institute, Bengaluru, Karnataka, India
| | - Amruta Naik
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shaon Sengupta
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Peter S Choi
- Division of Cancer Pathobiology, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology & Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
18
|
Harkany T, Tretiakov E, Varela L, Jarc J, Rebernik P, Newbold S, Keimpema E, Verkhratsky A, Horvath T, Romanov R. Molecularly stratified hypothalamic astrocytes are cellular foci for obesity. RESEARCH SQUARE 2024:rs.3.rs-3748581. [PMID: 38405925 PMCID: PMC10889077 DOI: 10.21203/rs.3.rs-3748581/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Astrocytes safeguard the homeostasis of the central nervous system1,2. Despite their prominent morphological plasticity under conditions that challenge the brain's adaptive capacity3-5, the classification of astrocytes, and relating their molecular make-up to spatially devolved neuronal operations that specify behavior or metabolism, remained mostly futile6,7. Although it seems unexpected in the era of single-cell biology, the lack of a major advance in stratifying astrocytes under physiological conditions rests on the incompatibility of 'neurocentric' algorithms that rely on stable developmental endpoints, lifelong transcriptional, neurotransmitter, and neuropeptide signatures for classification6-8 with the dynamic functional states, anatomic allocation, and allostatic plasticity of astrocytes1. Simplistically, therefore, astrocytes are still grouped as 'resting' vs. 'reactive', the latter referring to pathological states marked by various inducible genes3,9,10. Here, we introduced a machine learning-based feature recognition algorithm that benefits from the cumulative power of published single-cell RNA-seq data on astrocytes as a reference map to stepwise eliminate pleiotropic and inducible cellular features. For the healthy hypothalamus, this walk-back approach revealed gene regulatory networks (GRNs) that specified subsets of astrocytes, and could be used as landmarking tools for their anatomical assignment. The core molecular censuses retained by astrocyte subsets were sufficient to stratify them by allostatic competence, chiefly their signaling and metabolic interplay with neurons. Particularly, we found differentially expressed mitochondrial genes in insulin-sensing astrocytes and demonstrated their reciprocal signaling with neurons that work antagonistically within the food intake circuitry. As a proof-of-concept, we showed that disrupting Mfn2 expression in astrocytes reduced their ability to support dynamic circuit reorganization, a time-locked feature of satiety in the hypothalamus, thus leading to obesity in mice. Overall, our results suggest that astrocytes in the healthy brain are fundamentally more heterogeneous than previously thought and topologically mirror the specificity of local neurocircuits.
Collapse
Affiliation(s)
- Tibor Harkany
- Center for Brain Research, Medical University of Vienna
| | | | | | - Jasna Jarc
- Center for Brain Research, Medical University of Vienna
| | | | | | - Erik Keimpema
- Medical University of Vienna, Center for Brain Research
| | | | | | | |
Collapse
|
19
|
He D, Gao Y, Chan SS, Quintana-Parrilla N, Patro R. Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.577813. [PMID: 38370848 PMCID: PMC10871212 DOI: 10.1101/2024.02.01.577813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Motivation Short-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics in scRNA-seq data is challenging, with inherent difficulty in even the seemingly straightforward task of elucidating the splicing status of the molecules from which sequenced fragments are drawn. This difficulty arises, in part, from the limited read length and positional biases, which substantially reduce the specificity of the sequenced fragments. As a result, the splicing status of many reads in scRNA-seq is ambiguous because of a lack of definitive evidence. We are therefore in need of methods that can recover the splicing status of ambiguous reads which, in turn, can lead to more accuracy and confidence in downstream analyses. Results We develop Forseti, a predictive model to probabilistically assign a splicing status to scRNA-seq reads. Our model has two key components. First, we train a binding affinity model to assign a probability that a given transcriptomic site is used in fragment generation. Second, we fit a robust fragment length distribution model that generalizes well across datasets deriving from different species and tissue types. Forseti combines these two trained models to predict the splicing status of the molecule of origin of reads by scoring putative fragments that associate each alignment of sequenced reads with proximate potential priming sites. Using both simulated and experimental data, we show that our model can precisely predict the splicing status of reads and identify the true gene origin of multi-gene mapped reads. Availability Forseti and the code used for producing the results are available at https://github.com/COMBINE-lab/forseti under a BSD 3-clause license.
Collapse
Affiliation(s)
- Dongze He
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, USA
| | - Yuan Gao
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, USA
| | - Spencer Skylar Chan
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | | | - Rob Patro
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
20
|
Morris JA, Sun JS, Sanjana NE. Next-generation forward genetic screens: uniting high-throughput perturbations with single-cell analysis. Trends Genet 2024; 40:118-133. [PMID: 37989654 PMCID: PMC10872607 DOI: 10.1016/j.tig.2023.10.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 10/22/2023] [Accepted: 10/23/2023] [Indexed: 11/23/2023]
Abstract
Programmable genome-engineering technologies, such as CRISPR (clustered regularly interspaced short palindromic repeats) nucleases and massively parallel CRISPR screens that capitalize on this programmability, have transformed biomedical science. These screens connect genes and noncoding genome elements to disease-relevant phenotypes, but until recently have been limited to individual phenotypes such as growth or fluorescent reporters of gene expression. By pairing massively parallel screens with high-dimensional profiling of single-cell types/states, we can now measure how individual genetic perturbations or combinations of perturbations impact the cellular transcriptome, proteome, and epigenome. We review technologies that pair CRISPR screens with single-cell multiomics and the unique opportunities afforded by extending pooled screens using deep multimodal phenotyping.
Collapse
Affiliation(s)
- John A Morris
- New York Genome Center, New York, NY 10013, USA; Department of Biology, New York University, New York, NY 10003, USA
| | - Jennifer S Sun
- New York Genome Center, New York, NY 10013, USA; Department of Biology, New York University, New York, NY 10003, USA
| | - Neville E Sanjana
- New York Genome Center, New York, NY 10013, USA; Department of Biology, New York University, New York, NY 10003, USA.
| |
Collapse
|
21
|
He D, Mount SM, Patro R. scCensus: Off-target scRNA-seq reads reveal meaningful biology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577807. [PMID: 38352549 PMCID: PMC10862729 DOI: 10.1101/2024.01.29.577807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Single-cell RNA-sequencing (scRNA-seq) provides unprecedented insights into cellular heterogeneity. Although scRNA-seq reads from most prevalent and popular tagged-end protocols are expected to arise from the 3' end of polyadenylated RNAs, recent studies have shown that "off-target" reads can constitute a substantial portion of the read population. In this work, we introduced scCensus, a comprehensive analysis workflow for systematically evaluating and categorizing off-target reads in scRNA-seq. We applied scCensus to seven scRNA-seq datasets. Our analysis of intergenic reads shows that these off-target reads contain information about chromatin structure and can be used to identify similar cells across modalities. Our analysis of antisense reads suggests that these reads can be used to improve gene detection and capture interesting transcriptional activities like antisense transcription. Furthermore, using splice-aware quantification, we find that spliced and unspliced reads provide distinct information about cell clusters and biomarkers, suggesting the utility of integrating signals from reads with different splicing statuses. Overall, our results suggest that off-target scRNA-seq reads contain underappreciated information about various transcriptional activities. These observations about yet-unexploited information in existing scRNA-seq data will help guide and motivate the community to improve current algorithms and analysis methods, and to develop novel approaches that utilize off-target reads to extend the reach and accuracy of single-cell data analysis pipelines.
Collapse
Affiliation(s)
- Dongze He
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, USA
| | - Stephen M. Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - Rob Patro
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
22
|
Sullivan DK, Min KH(J, Hjörleifsson KE, Luebbert L, Holley G, Moses L, Gustafsson J, Bray NL, Pimentel H, Booeshaghi AS, Melsted P, Pachter L. kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.21.568164. [PMID: 38045414 PMCID: PMC10690192 DOI: 10.1101/2023.11.21.568164] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
The term "RNA-seq" refers to a collection of assays based on sequencing experiments that involve quantifying RNA species from bulk tissue, from single cells, or from single nuclei. The kallisto, bustools, and kb-python programs are free, open-source software tools for performing this analysis that together can produce gene expression quantification from raw sequencing reads. The quantifications can be individualized for multiple cells, multiple samples, or both. Additionally, these tools allow gene expression values to be classified as originating from nascent RNA species or mature RNA species, making this workflow amenable to both cell-based and nucleus-based assays. This protocol describes in detail how to use kallisto and bustools in conjunction with a wrapper, kb-python, to preprocess RNA-seq data.
Collapse
Affiliation(s)
- Delaney K. Sullivan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | | | | | - Laura Luebbert
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | | | - Lambda Moses
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | | | - Nicolas L. Bray
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Harold Pimentel
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - A. Sina Booeshaghi
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Páll Melsted
- deCODE Genetics/Amgen Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA
| |
Collapse
|
23
|
Fan J, Khan J, Singh NP, Pibiri GE, Patro R. Fulgor: a fast and compact k-mer index for large-scale matching and color queries. Algorithms Mol Biol 2024; 19:3. [PMID: 38254124 PMCID: PMC10810250 DOI: 10.1186/s13015-024-00251-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024] Open
Abstract
The problem of sequence identification or matching-determining the subset of reference sequences from a given collection that are likely to contain a short, queried nucleotide sequence-is relevant for many important tasks in Computational Biology, such as metagenomics and pangenome analysis. Due to the complex nature of such analyses and the large scale of the reference collections a resource-efficient solution to this problem is of utmost importance. This poses the threefold challenge of representing the reference collection with a data structure that is efficient to query, has light memory usage, and scales well to large collections. To solve this problem, we describe an efficient colored de Bruijn graph index, arising as the combination of a k-mer dictionary with a compressed inverted index. The proposed index takes full advantage of the fact that unitigs in the colored compacted de Bruijn graph are monochromatic (i.e., all k-mers in a unitig have the same set of references of origin, or color). Specifically, the unitigs are kept in the dictionary in color order, thereby allowing for the encoding of the map from k-mers to their colors in as little as 1 + o(1) bits per unitig. Hence, one color per unitig is stored in the index with almost no space/time overhead. By combining this property with simple but effective compression methods for integer lists, the index achieves very small space. We implement these methods in a tool called Fulgor, and conduct an extensive experimental analysis to demonstrate the improvement of our tool over previous solutions. For example, compared to Themisto-the strongest competitor in terms of index space vs. query time trade-off-Fulgor requires significantly less space (up to 43% less space for a collection of 150,000 Salmonella enterica genomes), is at least twice as fast for color queries, and is 2-6[Formula: see text] faster to construct.
Collapse
Affiliation(s)
- Jason Fan
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Jamshed Khan
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Noor Pratap Singh
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | | | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA.
| |
Collapse
|
24
|
Cui H, Maan H, Vladoiu MC, Zhang J, Taylor MD, Wang B. DeepVelo: deep learning extends RNA velocity to multi-lineage systems with cell-specific kinetics. Genome Biol 2024; 25:27. [PMID: 38243313 PMCID: PMC10799431 DOI: 10.1186/s13059-023-03148-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 12/18/2023] [Indexed: 01/21/2024] Open
Abstract
Existing RNA velocity estimation methods strongly rely on predefined dynamics and cell-agnostic constant transcriptional kinetic rates, assumptions often violated in complex and heterogeneous single-cell RNA sequencing (scRNA-seq) data. Using a graph convolution network, DeepVelo overcomes these limitations by generalizing RNA velocity to cell populations containing time-dependent kinetics and multiple lineages. DeepVelo infers time-varying cellular rates of transcription, splicing, and degradation, recovers each cell's stage in the differentiation process, and detects functionally relevant driver genes regulating these processes. Application to various developmental and pathogenic processes demonstrates DeepVelo's capacity to study complex differentiation and lineage decision events in heterogeneous scRNA-seq data.
Collapse
Affiliation(s)
- Haotian Cui
- Peter Munk Cardiac Center, University Health Network, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | - Hassaan Maan
- Peter Munk Cardiac Center, University Health Network, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Maria C Vladoiu
- Department of Pathology and Molecular Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Jiao Zhang
- The Arthur and Sonia Labatt Brain Tumor Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Michael D Taylor
- The Arthur and Sonia Labatt Brain Tumor Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Baylor College of Medicine, Houston, TX, USA
- Texas Children's Hospital, Houston, TX, USA
| | - Bo Wang
- Peter Munk Cardiac Center, University Health Network, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Vector Institute, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
25
|
George N, Fexova S, Fuentes AM, Madrigal P, Bi Y, Iqbal H, Kumbham U, Nolte N, Zhao L, Thanki A, Yu I, Marugan Calles J, Erdos K, Vilmovsky L, Kurri S, Vathrakokoili-Pournara A, Osumi-Sutherland D, Prakash A, Wang S, Tello-Ruiz M, Kumari S, Ware D, Goutte-Gattat D, Hu Y, Brown N, Perrimon N, Vizcaíno JA, Burdett T, Teichmann S, Brazma A, Papatheodorou I. Expression Atlas update: insights from sequencing data at both bulk and single cell level. Nucleic Acids Res 2024; 52:D107-D114. [PMID: 37992296 PMCID: PMC10767917 DOI: 10.1093/nar/gkad1021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/13/2023] [Accepted: 10/30/2023] [Indexed: 11/24/2023] Open
Abstract
Expression Atlas (www.ebi.ac.uk/gxa) and its newest counterpart the Single Cell Expression Atlas (www.ebi.ac.uk/gxa/sc) are EMBL-EBI's knowledgebases for gene and protein expression and localisation in bulk and at single cell level. These resources aim to allow users to investigate their expression in normal tissue (baseline) or in response to perturbations such as disease or changes to genotype (differential) across multiple species. Users are invited to search for genes or metadata terms across species or biological conditions in a standardised consistent interface. Alongside these data, new features in Single Cell Expression Atlas allow users to query metadata through our new cell type wheel search. At the experiment level data can be explored through two types of dimensionality reduction plots, t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP), overlaid with either clustering or metadata information to assist users' understanding. Data are also visualised as marker gene heatmaps identifying genes that help confer cluster identity. For some data, additional visualisations are available as interactive cell level anatomograms and cell type gene expression heatmaps.
Collapse
Affiliation(s)
- Nancy George
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Silvie Fexova
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Alfonso Munoz Fuentes
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Pedro Madrigal
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Yalan Bi
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Haider Iqbal
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Upendra Kumbham
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Nadja Francesca Nolte
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Lingyun Zhao
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Anil S Thanki
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Iris D Yu
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Jose C Marugan Calles
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Karoly Erdos
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Liora Vilmovsky
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Sandeep R Kurri
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | | | - David Osumi-Sutherland
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Marcela K Tello-Ruiz
- Cold Spring Harbour Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Sunita Kumari
- Cold Spring Harbour Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Doreen Ware
- Cold Spring Harbour Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA
- USDA ARS NEA, Plant Soil & Nutrition Laboratory Research Unit, Ithaca, NY 14853, USA
| | - Damien Goutte-Gattat
- FlyBase-Cambridge, Department of Physiology, Development and Neuroscience, University of Cambridge Downing Street, Cambridge CB2 3DY, UK
| | - Yanhui Hu
- Perrimon Lab, Department of Genetics, Harvard Medical School, Boston MA 02115, USA
| | - Nick Brown
- FlyBase-Cambridge, Department of Physiology, Development and Neuroscience, University of Cambridge Downing Street, Cambridge CB2 3DY, UK
| | - Norbert Perrimon
- Perrimon Lab, Department of Genetics, Harvard Medical School, Boston MA 02115, USA
- FlyBase-Harvard Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Sarah Teichmann
- Wellcome Trust Sanger Institute. Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| |
Collapse
|
26
|
Gayoso A, Weiler P, Lotfollahi M, Klein D, Hong J, Streets A, Theis FJ, Yosef N. Deep generative modeling of transcriptional dynamics for RNA velocity analysis in single cells. Nat Methods 2024; 21:50-59. [PMID: 37735568 PMCID: PMC10776389 DOI: 10.1038/s41592-023-01994-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 08/08/2023] [Indexed: 09/23/2023]
Abstract
RNA velocity has been rapidly adopted to guide interpretation of transcriptional dynamics in snapshot single-cell data; however, current approaches for estimating RNA velocity lack effective strategies for quantifying uncertainty and determining the overall applicability to the system of interest. Here, we present veloVI (velocity variational inference), a deep generative modeling framework for estimating RNA velocity. veloVI learns a gene-specific dynamical model of RNA metabolism and provides a transcriptome-wide quantification of velocity uncertainty. We show that veloVI compares favorably to previous approaches with respect to goodness of fit, consistency across transcriptionally similar cells and stability across preprocessing pipelines for quantifying RNA abundance. Further, we demonstrate that veloVI's posterior velocity uncertainty can be used to assess whether velocity analysis is appropriate for a given dataset. Finally, we highlight veloVI as a flexible framework for modeling transcriptional dynamics by adapting the underlying dynamical model to use time-dependent transcription rates.
Collapse
Affiliation(s)
- Adam Gayoso
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Philipp Weiler
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Dominik Klein
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Justin Hong
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Aaron Streets
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
27
|
Li H, Rahman MA, Ruesch M, Eisele CD, Anderson EM, Wright PW, Cao J, Ratnayake S, Chen Q, Yan C, Meerzaman D, Abraham RS, Freud AG, Anderson SK. Abundant binary promoter switches in lineage-determining transcription factors indicate a digital component of cell fate determination. Cell Rep 2023; 42:113454. [PMID: 37976160 PMCID: PMC10842785 DOI: 10.1016/j.celrep.2023.113454] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 10/02/2023] [Accepted: 11/01/2023] [Indexed: 11/19/2023] Open
Abstract
Previous studies of the murine Ly49 and human KIR gene clusters implicated competing sense and antisense promoters in the control of variegated gene expression. In the current study, an examination of transcription factor genes defines an abundance of convergent and divergent sense/antisense promoter pairs, suggesting that competing promoters may control cell fate determination. Differentiation of CD34+ hematopoietic progenitors in vitro shows that cells with GATA1 antisense transcription have enhanced GATA2 transcription and a mast cell phenotype, whereas cells with GATA2 antisense transcription have increased GATA1 transcripts and an erythroblast phenotype. Detailed analyses of the AHR and RORC genes demonstrate the ability of competing promoters to act as binary switches and the association of antisense transcription with an immature/progenitor cell phenotype. These data indicate that alternative cell fates generated by promoter competition in lineage-determining transcription factors contribute to the programming of cell differentiation.
Collapse
Affiliation(s)
- Hongchuan Li
- Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Md Ahasanur Rahman
- Cancer Innovation Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702, USA
| | - Michael Ruesch
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA; Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA; Medical Scientist Training Program, The Ohio State University, Columbus, OH 43210, USA
| | - Caprice D Eisele
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA; Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Erik M Anderson
- Cancer Innovation Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702, USA
| | - Paul W Wright
- Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Jennie Cao
- Cancer Innovation Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702, USA
| | - Shashikala Ratnayake
- Cancer Genomics and Bioinformatics Branch, Center for Biomedical Informatics & Information Technology, National Cancer Institute, Bethesda, MD 20892, USA
| | - Qingrong Chen
- Cancer Genomics and Bioinformatics Branch, Center for Biomedical Informatics & Information Technology, National Cancer Institute, Bethesda, MD 20892, USA
| | - Chunhua Yan
- Cancer Genomics and Bioinformatics Branch, Center for Biomedical Informatics & Information Technology, National Cancer Institute, Bethesda, MD 20892, USA
| | - Daoud Meerzaman
- Cancer Genomics and Bioinformatics Branch, Center for Biomedical Informatics & Information Technology, National Cancer Institute, Bethesda, MD 20892, USA
| | - Roshini S Abraham
- Department of Pathology and Laboratory Medicine, Nationwide Children's Hospital, Columbus, OH 43210, USA; Department of Pathology, The Ohio State University, Columbus, OH 43210, USA
| | - Aharon G Freud
- Department of Pathology, The Ohio State University, Columbus, OH 43210, USA
| | - Stephen K Anderson
- Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Cancer Innovation Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702, USA.
| |
Collapse
|
28
|
He D, Patro R. simpleaf: a simple, flexible, and scalable framework for single-cell data processing using alevin-fry. Bioinformatics 2023; 39:btad614. [PMID: 37802884 PMCID: PMC10580267 DOI: 10.1093/bioinformatics/btad614] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 09/02/2023] [Accepted: 10/05/2023] [Indexed: 10/08/2023] Open
Abstract
SUMMARY The alevin-fry ecosystem provides a robust and growing suite of programs for single-cell data processing. However, as new single-cell technologies are introduced, as the community continues to adjust best practices for data processing, and as the alevin-fry ecosystem itself expands and grows, it is becoming increasingly important to manage the complexity of alevin-fry's single-cell preprocessing workflows while retaining the performance and flexibility that make these tools enticing. We introduce simpleaf, a program that simplifies the processing of single-cell data using tools from the alevin-fry ecosystem, and adds new functionality and capabilities, while retaining the flexibility and performance of the underlying tools. AVAILABILITY AND IMPLEMENTATION Simpleaf is written in Rust and released under a BSD 3-Clause license. It is freely available from its GitHub repository https://github.com/COMBINE-lab/simpleaf, and via bioconda. Documentation for simpleaf is available at https://simpleaf.readthedocs.io/en/latest/ and tutorials for simpleaf that have been developed can be accessed at https://combine-lab.github.io/alevin-fry-tutorials.
Collapse
Affiliation(s)
- Dongze He
- Department of Cell Biology and Molecular Genetics and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, United States
| | - Rob Patro
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, United States
| |
Collapse
|
29
|
Pool AH, Poldsam H, Chen S, Thomson M, Oka Y. Recovery of missing single-cell RNA-sequencing data with optimized transcriptomic references. Nat Methods 2023; 20:1506-1515. [PMID: 37697162 DOI: 10.1038/s41592-023-02003-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 08/15/2023] [Indexed: 09/13/2023]
Abstract
Single-cell RNA-sequencing (scRNA-seq) is an indispensable tool for characterizing cellular diversity and generating hypotheses throughout biology. Droplet-based scRNA-seq datasets often lack expression data for genes that can be detected with other methods. Here we show that the observed sensitivity deficits stem from three sources: (1) poor annotation of 3' gene ends; (2) issues with intronic read incorporation; and (3) gene overlap-derived read loss. We show that missing gene expression data can be recovered by optimizing the reference transcriptome for scRNA-seq through recovering false intergenic reads, implementing a hybrid pre-mRNA mapping strategy and resolving gene overlaps. We demonstrate, with a diverse collection of mouse and human tissue data, that reference optimization can substantially improve cellular profiling resolution and reveal missing cell types and marker genes. Our findings argue that transcriptomic references need to be optimized for scRNA-seq analysis and warrant a reanalysis of previously published datasets and cell atlases.
Collapse
Affiliation(s)
- Allan-Hermann Pool
- Department of Neuroscience, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Peter O'Donnell Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Department of Anesthesiology and Pain Management, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| | - Helen Poldsam
- Department of Neuroscience, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Chemistry and Biotechnology, Tallinn University of Technology, Tallinn, Estonia
| | - Sisi Chen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Matt Thomson
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Yuki Oka
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
30
|
Booeshaghi AS, Sullivan DK, Pachter L. Universal preprocessing of single-cell genomics data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.14.543267. [PMID: 37745572 PMCID: PMC10515959 DOI: 10.1101/2023.09.14.543267] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
We describe a workflow for preprocessing a wide variety of single-cell genomics data types. The approach is based on parsing of machine-readable seqspec assay specifications to customize inputs for kb-python, which uses kallisto and bustools to catalog reads, error correct barcodes, and count reads. The universal preprocessing method is implemented in the Python package cellatlas that is available for download at: https://github.com/cellatlas/cellatlas/.
Collapse
Affiliation(s)
- A. Sina Booeshaghi
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Delaney K. Sullivan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Computing & Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
31
|
Booeshaghi AS, Chen X, Pachter L. A machine-readable specification for genomics assays. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.17.533215. [PMID: 36993635 PMCID: PMC10055303 DOI: 10.1101/2023.03.17.533215] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Understanding the structure of sequenced fragments from genomics libraries is essential for accurate read preprocessing. Currently, different assays and sequencing technologies require custom scripts and programs that do not leverage the common structure of sequence elements present in genomics libraries. We present seqspec, a machine-readable specification for libraries produced by genomics assays that facilitates standardization of preprocessing and enables tracking and comparison of genomics assays. The specification and associated seqspec command line tool is available at https://github.com/IGVF/seqspec.
Collapse
Affiliation(s)
- A Sina Booeshaghi
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Xi Chen
- School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
32
|
Liao Y, Raghu D, Pal B, Mielke LA, Shi W. cellCounts: an R function for quantifying 10x Chromium single-cell RNA sequencing data. BIOINFORMATICS (OXFORD, ENGLAND) 2023; 39:btad439. [PMID: 37462540 PMCID: PMC10365925 DOI: 10.1093/bioinformatics/btad439] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 06/29/2023] [Accepted: 07/17/2023] [Indexed: 07/26/2023]
Abstract
SUMMARY The 10x Genomics Chromium single-cell RNA sequencing technology is a powerful gene expression profiling platform, which is capable of profiling expression of thousands of genes in tens of thousands of cells simultaneously. This platform can produce hundreds of million reads in a single experiment, making it a very challenging task to quantify expression of genes in individual cells due to the massive data volume. Here, we present cellCounts, a new tool for efficient and accurate quantification of Chromium data. cellCounts employs the seed-and-vote strategy to align reads to a reference genome, collapses reads to Unique Molecular Identifiers (UMIs) and then assigns UMIs to genes based on the featureCounts program. Using both simulation and real datasets for evaluation, cellCounts was found to compare favourably to cellRanger and STARsolo. cellCounts is implemented in R, making it easily integrated with other R programs for analysing Chromium data. AVAILABILITY AND IMPLEMENTATION cellCounts was implemented as a function in R package Rsubread that can be downloaded from http://bioconductor.org/packages/release/bioc/html/Rsubread.html. Data and analysis code used in this study can be freely accessed via La Trobe University's Institutional Repository at https://doi.org/10.26181/21588276.
Collapse
Affiliation(s)
- Yang Liao
- Olivia Newton-John Cancer Research Institute, Heidelberg, Victoria 3084, Australia
- School of Cancer Medicine, La Trobe University, Bundoora, Victoria 3086, Australia
| | - Dinesh Raghu
- Olivia Newton-John Cancer Research Institute, Heidelberg, Victoria 3084, Australia
- School of Cancer Medicine, La Trobe University, Bundoora, Victoria 3086, Australia
| | - Bhupinder Pal
- Olivia Newton-John Cancer Research Institute, Heidelberg, Victoria 3084, Australia
- School of Cancer Medicine, La Trobe University, Bundoora, Victoria 3086, Australia
| | - Lisa A Mielke
- Olivia Newton-John Cancer Research Institute, Heidelberg, Victoria 3084, Australia
- School of Cancer Medicine, La Trobe University, Bundoora, Victoria 3086, Australia
| | - Wei Shi
- Olivia Newton-John Cancer Research Institute, Heidelberg, Victoria 3084, Australia
- School of Cancer Medicine, La Trobe University, Bundoora, Victoria 3086, Australia
| |
Collapse
|
33
|
Fan J, Singh NP, Khan J, Pibiri GE, Patro R. Fulgor: A fast and compact k-mer index for large-scale matching and color queries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.09.539895. [PMID: 37214944 PMCID: PMC10197524 DOI: 10.1101/2023.05.09.539895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The problem of sequence identification or matching - determining the subset of references from a given collection that are likely to contain a query nucleotide sequence - is relevant for many important tasks in Computational Biology, such as metagenomics and pan-genome analysis. Due to the complex nature of such analyses and the large scale of the reference collections a resourceefficient solution to this problem is of utmost importance. The reference collection should therefore be pre-processed into an index for fast queries. This poses the threefold challenge of designing an index that is efficient to query, has light memory usage, and scales well to large collections. To solve this problem, we describe how recent advancements in associative, order-preserving, k-mer dictionaries can be combined with a compressed inverted index to implement a fast and compact colored de Bruijn graph data structure. This index takes full advantage of the fact that unitigs in the colored de Bruijn graph are monochromatic (all k-mers in a unitig have the same set of references of origin, or "color"), leveraging the order-preserving property of its dictionary. In fact, k-mers are kept in unitig order by the dictionary, thereby allowing for the encoding of the map from k-mers to their inverted lists in as little as 1+o(1) bits per unitig. Hence, one inverted list per unitig is stored in the index with almost no space/time overhead. By combining this property with simple but effective compression methods for inverted lists, the index achieves very small space. We implement these methods in a tool called Fulgor. Compared to Themisto, the prior state of the art, Fulgor indexes a heterogeneous collection of 30,691 bacterial genomes in 3.8× less space, a collection of 150,000 Salmonella enterica genomes in approximately 2× less space, is at least twice as fast for color queries, and is 2 - 6× faster to construct.
Collapse
Affiliation(s)
- Jason Fan
- Department of Computer Science, University of Maryland, College Park, MD 20440, USA
| | - Noor Pratap Singh
- Department of Computer Science, University of Maryland, College Park, MD 20440, USA
| | - Jamshed Khan
- Department of Computer Science, University of Maryland, College Park, MD 20440, USA
| | | | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD 20440, USA
| |
Collapse
|
34
|
Berger B, Yu YW. Navigating bottlenecks and trade-offs in genomic data analysis. Nat Rev Genet 2023; 24:235-250. [PMID: 36476810 PMCID: PMC10204111 DOI: 10.1038/s41576-022-00551-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2022] [Indexed: 12/12/2022]
Abstract
Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Yun William Yu
- Department of Computer and Mathematical Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Tri-Campus Department of Mathematics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
35
|
He D, Patro R. simpleaf: A simple, flexible, and scalable framework for single-cell transcriptomics data processing using alevin-fry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.28.534653. [PMID: 37034702 PMCID: PMC10081176 DOI: 10.1101/2023.03.28.534653] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Summary The alevin-fry ecosystem provides a robust and growing suite of programs for single-cell data processing. However, as new single-cell technologies are introduced, as the community continues to adjust best practices for data processing, and as the alevin-fry ecosystem itself expands and grows, it is becoming increasingly important to manage the complexity of alevin-fry ’s single-cell preprocessing workflows while retaining the performance and flexibility that make these tools enticing. We introduce simpleaf , a program that simplifies the processing of single-cell data using tools from the alevin-fry ecosystem, and adds new functionality and capabilities, while retaining the flexibility and performance of the underlying tools. Availability and implementation Simpleaf is written in Rust and released under a BSD 3-Clause license. It is freely available from its GitHub repository https://github.com/COMBINE-lab/simpleaf , and via bioconda. Documentation for simpleaf is available at https://simpleaf.readthedocs.io/en/latest/ and tutorials for simpleaf are being developed that can be accessed at https://combine-lab.github.io/alevin-fry-tutorials .
Collapse
Affiliation(s)
- Dongze He
- Department of Cell Biology and Molecular Genetics and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| | - Rob Patro
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| |
Collapse
|
36
|
Depletion of HIV reservoir by activation of ISR signaling in resting CD4 +T cells. iScience 2023; 26:105743. [PMID: 36590168 PMCID: PMC9800255 DOI: 10.1016/j.isci.2022.105743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 10/21/2022] [Accepted: 12/02/2022] [Indexed: 12/13/2022] Open
Abstract
HIV reservoirs are extremely stable and pose a tremendous challenge to clear HIV infection. Here, we demonstrate that activation of ISR/ATF4 signaling reverses HIV latency, which also selectively eliminates HIV+ cells in primary CD4+T cell model of latency without effect on HIV-negative CD4+T cells. The reduction of HIV+ cells is associated with apoptosis enhancement, but surprisingly is largely seen in HIV-infected cells in which gag-pol RNA transcripts are detected in HIV RNA-induced ATF4/IFIT signaling. In resting CD4+ (rCD4+) T cells isolated from people living with HIV on antiretroviral therapy, induction of ISR/ATF4 signaling reduced HIV reservoirs by depletion of replication-competent HIV without global reduction in the rCD4+ T cell population. These findings suggest that compromised ISR/ATF4 signaling maintains stable and quiescent HIV reservoirs whereas activation of ISR/ATF4 signaling results in the disruption of latent HIV and clearance of persistently infected CD4+T cells.
Collapse
|
37
|
He D, Soneson C, Patro R. Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.04.522742. [PMID: 36711921 PMCID: PMC9881993 DOI: 10.1101/2023.01.04.522742] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Recently, a new modification has been proposed by Hjörleifsson and Sullivan et al. to the model used to classify the splicing status of reads (as spliced (mature), unspliced (nascent), or ambiguous) in single-cell and single-nucleus RNA-seq data. Here, we evaluate both the theoretical basis and practical implementation of the proposed method. The proposed method is highly-conservative, and therefore, unlikely to mischaracterize reads as spliced (mature) or unspliced (nascent) when they are not. However, we find that it leaves a large fraction of reads classified as ambiguous, and, in practice, allocates these ambiguous reads in an all-or-nothing manner, and differently between single-cell and single-nucleus RNA-seq data. Further, as implemented in practice, the ambiguous classification is implicit and based on the index against which the reads are mapped, which leads to several drawbacks compared to methods that consider both spliced (mature) and unspliced (nascent) mapping targets simultaneously - for example, the ability to use confidently assigned reads to rescue ambiguous reads based on shared UMIs and gene targets. Nonetheless, we show that these conservative assignment rules can be obtained directly in existing approaches simply by altering the set of targets that are indexed. To this end, we introduce the spliceu reference and show that its use with alevin-fry recapitulates the more conservative proposed classification. We also observe that, on experimental data, and under the proposed allocation rules for ambiguous UMIs, the difference between the proposed classification scheme and existing conventions appears much smaller than previously reported. We demonstrate the use of the new piscem index for mapping simultaneously against spliced (mature) and unspliced (nascent) targets, allowing classification against the full nascent and mature transcriptome in human or mouse in <3GB of memory. Finally, we discuss the potential of incorporating probabilistic evidence into the inference of splicing status, and suggest that it may provide benefits beyond what can be obtained from discrete classification of UMIs as splicing-ambiguous.
Collapse
Affiliation(s)
- Dongze He
- Department of Cell Biology and Molecular Genetics and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Rob Patro
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| |
Collapse
|
38
|
Ideno H, Imaizumi K, Shimada H, Sanosaka T, Nemoto A, Kohyama J, Okano H. Human PSCs determine the competency of cerebral organoid differentiation via FGF signaling and epigenetic mechanisms. iScience 2022; 25:105140. [PMID: 36185382 PMCID: PMC9523398 DOI: 10.1016/j.isci.2022.105140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 08/06/2022] [Accepted: 09/10/2022] [Indexed: 11/17/2022] Open
Affiliation(s)
- Hirosato Ideno
- Department of Physiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan
| | - Kent Imaizumi
- Department of Physiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan
- Corresponding author
| | - Hiroko Shimada
- Department of Physiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan
| | - Tsukasa Sanosaka
- Department of Physiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan
| | - Akisa Nemoto
- Department of Physiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan
| | - Jun Kohyama
- Department of Physiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan
| | - Hideyuki Okano
- Department of Physiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan
- Corresponding author
| |
Collapse
|