1
|
Ritchie C, Li L. PELI2 is a negative regulator of STING signaling that is dynamically repressed during viral infection. Mol Cell 2024; 84:2423-2435.e5. [PMID: 38917796 PMCID: PMC11246219 DOI: 10.1016/j.molcel.2024.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 03/12/2024] [Accepted: 06/01/2024] [Indexed: 06/27/2024]
Abstract
The innate immune cGAS-STING pathway is activated by cytosolic double-stranded DNA (dsDNA), a ubiquitous danger signal, to produce interferon, a potent anti-viral and anti-cancer cytokine. However, STING activation must be tightly controlled because aberrant interferon production leads to debilitating interferonopathies. Here, we discover PELI2 as a crucial negative regulator of STING. Mechanistically, PELI2 inhibits the transcription factor IRF3 by binding to phosphorylated Thr354 and Thr356 on the C-terminal tail of STING, leading to ubiquitination and inhibition of the kinase TBK1. PELI2 sets a threshold for STING activation that tolerates low levels of cytosolic dsDNA, such as that caused by silenced TREX1, RNASEH2B, BRCA1, or SETX. When this threshold is reached, such as during viral infection, STING-induced interferon production temporarily downregulates PELI2, creating a positive feedback loop allowing a robust immune response. Lupus patients have insufficient PELI2 levels and high basal interferon production, suggesting that PELI2 dysregulation may drive the onset of lupus and other interferonopathies.
Collapse
Affiliation(s)
- Christopher Ritchie
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA; Sarafan ChEM-H Institute, Stanford University, Stanford, CA 94305, USA; Arc Institute, Palo Alto, CA 94304, USA.
| | - Lingyin Li
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA; Sarafan ChEM-H Institute, Stanford University, Stanford, CA 94305, USA; Arc Institute, Palo Alto, CA 94304, USA.
| |
Collapse
|
2
|
Shi M, Cheng X, Dai Y. STPDA: Leveraging spatial-temporal patterns for downstream analysis in spatial transcriptomic data. Comput Biol Chem 2024; 112:108127. [PMID: 38870559 DOI: 10.1016/j.compbiolchem.2024.108127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/17/2024] [Accepted: 06/10/2024] [Indexed: 06/15/2024]
Abstract
Spatial transcriptomics, a groundbreaking field in cellular biology, faces the challenge of effectively deciphering complex spatial-temporal gene expression patterns. Traditional data analysis methods often fail to capture the intricate nuances of this data, limiting the depth of understanding in spatial distribution and gene interactions. In response, we present Spatial-Temporal Patterns for Downstream Analysis (STPDA), a sophisticated computational framework tailored for spatial transcriptomic data analysis. STPDA leverages high-resolution mapping to bridge the gap between genomics and histopathology, offering a comprehensive perspective on the spatial dynamics of gene expression within tissues. This approach enables a view of cellular function and organization, marking a paradigm shift in our comprehension of biological systems. By employing Autoregressive Moving Average (ARMA) and Long Short-Term Memory (LSTM) models, STPDA effectively deciphers both global and local spatio-temporal dynamics in cellular environments. This integration of spatial-temporal patterns for downstream analysis offers a transformative approach to spatial transcriptomics data analysis. STPDA excels in various single-cell analytical tasks, including the identification of ligand-receptor interactions and cell type classification. Its ability to harness spatial-temporal patterns not only matches but frequently surpasses the performance of existing state-of-the-art methods. To ensure widespread usability and impact, we have encapsulated STPDA in a scalable and accessible Python package, addressing single-cell tasks through advanced spatial-temporal pattern analysis. This development promises to enhance our understanding of cellular biology, offering novel insights and therapeutic strategies, and represents a substantial advancement in the field of spatial transcriptomics.
Collapse
Affiliation(s)
- Mingguang Shi
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China.
| | - Xudong Cheng
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China
| | - Yulong Dai
- Anhui Medical University, Hefei, Anhui 230032, China
| |
Collapse
|
3
|
Theunissen L, Mortier T, Saeys Y, Waegeman W. Uncertainty-aware single-cell annotation with a hierarchical reject option. Bioinformatics 2024; 40:btae128. [PMID: 38441258 PMCID: PMC10957513 DOI: 10.1093/bioinformatics/btae128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 02/23/2024] [Accepted: 03/01/2024] [Indexed: 03/23/2024] Open
Abstract
MOTIVATION Automatic cell type annotation methods assign cell type labels to new datasets by extracting relationships from a reference RNA-seq dataset. However, due to the limited resolution of gene expression features, there is always uncertainty present in the label assignment. To enhance the reliability and robustness of annotation, most machine learning methods address this uncertainty by providing a full reject option, i.e. when the predicted confidence score of a cell type label falls below a user-defined threshold, no label is assigned and no prediction is made. As a better alternative, some methods deploy hierarchical models and consider a so-called partial rejection by returning internal nodes of the hierarchy as label assignment. However, because a detailed experimental analysis of various rejection approaches is missing in the literature, there is currently no consensus on best practices. RESULTS We evaluate three annotation approaches (i) full rejection, (ii) partial rejection, and (iii) no rejection for both flat and hierarchical probabilistic classifiers. Our findings indicate that hierarchical classifiers are superior when rejection is applied, with partial rejection being the preferred rejection approach, as it preserves a significant amount of label information. For optimal rejection implementation, the rejection threshold should be determined through careful examination of a method's rejection behavior. Without rejection, flat and hierarchical annotation perform equally well, as long as the cell type hierarchy accurately captures transcriptomic relationships. AVAILABILITY AND IMPLEMENTATION Code is freely available at https://github.com/Latheuni/Hierarchical_reject and https://doi.org/10.5281/zenodo.10697468.
Collapse
Affiliation(s)
- Lauren Theunissen
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Thomas Mortier
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Willem Waegeman
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| |
Collapse
|
4
|
Croydon-Veleslavov IA, Stumpf MPH. Repeated Decision Stumping Distils Simple Rules from Single-Cell Data. J Comput Biol 2024; 31:21-40. [PMID: 38170180 DOI: 10.1089/cmb.2021.0613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024] Open
Abstract
Single-cell data afford unprecedented insights into molecular processes. But the complexity and size of these data sets have proved challenging and given rise to a large armory of statistical and machine learning approaches. The majority of approaches focuses on either describing features of these data, or making predictions and classifying unlabeled samples. In this study, we introduce repeated decision stumping (ReDX) as a method to distill simple models from single-cell data. We develop decision trees of depth one-hence "stumps"-to identify in an inductive manner, gene products involved in driving cell fate transitions, and in applications to published data we are able to discover the key players involved in these processes in an unbiased manner without prior knowledge. Our algorithm is deliberately targeting the simplest possible candidate hypotheses that can be extracted from complex high-dimensional data. There are three reasons for this: (1) the predictions become straightforwardly testable hypotheses; (2) the identified candidates form the basis for further mechanistic model development, for example, for engineering and synthetic biology interventions; and (3) this approach complements existing descriptive modeling approaches and frameworks. The approach is computationally efficient, has remarkable predictive power, including in simulation studies where the ground truth is known, and yields robust and statistically stable predictors; the same set of candidates is generated by applying the algorithm to different subsamples of experimental data.
Collapse
Affiliation(s)
- Ivan A Croydon-Veleslavov
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
| | - Michael P H Stumpf
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
- School of BioSciences, University of Melbourne, Parkville, Australia
- School of Mathematics and Statistics, University of Melbourne, Parkville, Australia
| |
Collapse
|
5
|
Xu C, Prete M, Webb S, Jardine L, Stewart BJ, Hoo R, He P, Meyer KB, Teichmann SA. Automatic cell-type harmonization and integration across Human Cell Atlas datasets. Cell 2023; 186:5876-5891.e20. [PMID: 38134877 DOI: 10.1016/j.cell.2023.11.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 08/24/2023] [Accepted: 11/23/2023] [Indexed: 12/24/2023]
Abstract
Harmonizing cell types across the single-cell community and assembling them into a common framework is central to building a standardized Human Cell Atlas. Here, we present CellHint, a predictive clustering tree-based tool to resolve cell-type differences in annotation resolution and technical biases across datasets. CellHint accurately quantifies cell-cell transcriptomic similarities and places cell types into a relationship graph that hierarchically defines shared and unique cell subtypes. Application to multiple immune datasets recapitulates expert-curated annotations. CellHint also reveals underexplored relationships between healthy and diseased lung cell states in eight diseases. Furthermore, we present a workflow for fast cross-dataset integration guided by harmonized cell types and cell hierarchy, which uncovers underappreciated cell types in adult human hippocampus. Finally, we apply CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with ∼3.7 million cells and various machine learning models for automatic cell annotation across human tissues.
Collapse
Affiliation(s)
- Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Martin Prete
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Simone Webb
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Laura Jardine
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Benjamin J Stewart
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, UK; Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge CB2 0QQ, UK
| | - Regina Hoo
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Peng He
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Kerstin B Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Theory of Condensed Matter Group, Department of Physics, Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, UK.
| |
Collapse
|
6
|
Molstad AJ, Motwani K. Multiresolution categorical regression for interpretable cell-type annotation. Biometrics 2023; 79:3485-3496. [PMID: 37798600 DOI: 10.1111/biom.13926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 08/07/2023] [Indexed: 10/07/2023]
Abstract
In many categorical response regression applications, the response categories admit a multiresolution structure. That is, subsets of the response categories may naturally be combined into coarser response categories. In such applications, practitioners are often interested in estimating the resolution at which a predictor affects the response category probabilities. In this paper, we propose a method for fitting the multinomial logistic regression model in high dimensions that addresses this problem in a unified and data-driven way. Our method allows practitioners to identify which predictors distinguish between coarse categories but not fine categories, which predictors distinguish between fine categories, and which predictors are irrelevant. For model fitting, we propose a scalable algorithm that can be applied when the coarse categories are defined by either overlapping or nonoverlapping sets of fine categories. Statistical properties of our method reveal that it can take advantage of this multiresolution structure in a way existing estimators cannot. We use our method to model cell-type probabilities as a function of a cell's gene expression profile (i.e., cell-type annotation). Our fitted model provides novel biological insights which may be useful for future automated and manual cell-type annotation methodology.
Collapse
Affiliation(s)
- Aaron J Molstad
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Keshav Motwani
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| |
Collapse
|
7
|
Ren P, Shi X, Yu Z, Dong X, Ding X, Wang J, Sun L, Yan Y, Hu J, Zhang P, Chen Q, Zhang J, Li T, Wang C. Single-cell assignment using multiple-adversarial domain adaptation network with large-scale references. CELL REPORTS METHODS 2023; 3:100577. [PMID: 37751689 PMCID: PMC10545911 DOI: 10.1016/j.crmeth.2023.100577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 06/11/2023] [Accepted: 08/09/2023] [Indexed: 09/28/2023]
Abstract
The rapid accumulation of single-cell RNA-seq data has provided rich resources to characterize various human cell populations. However, achieving accurate cell-type annotation using public references presents challenges due to inconsistent annotations, batch effects, and rare cell types. Here, we introduce SELINA (single-cell identity navigator), an integrative and automatic cell-type annotation framework based on a pre-curated reference atlas spanning various tissues. SELINA employs a multiple-adversarial domain adaptation network to remove batch effects within the reference dataset. Additionally, it enhances the annotation of less frequent cell types by synthetic minority oversampling and fits query data with the reference data using an autoencoder. SELINA culminates in the creation of a comprehensive and uniform reference atlas, encompassing 1.7 million cells covering 230 distinct human cell types. We substantiate its robustness and superiority across a multitude of human tissues. Notably, SELINA could accurately annotate cells within diverse disease contexts. SELINA provides a complete solution for human single-cell RNA-seq data annotation with both python and R packages.
Collapse
Affiliation(s)
- Pengfei Ren
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100084, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100084, China
| | - Xiaoying Shi
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Zhiguang Yu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Guangxi 530004, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xuanxin Ding
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Jin Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Liangdong Sun
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Yilv Yan
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Junjie Hu
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Peng Zhang
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Qianming Chen
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing 211166, China
| | - Jing Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Science and Technology, Tongji University, Shanghai, China.
| | - Taiwen Li
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing 211166, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.
| |
Collapse
|
8
|
Tan K, Song Y, Xu M, You Z. Clinical evidence for a role of E2F1-induced replication stress in modulating tumor mutational burden and immune microenvironment. DNA Repair (Amst) 2023; 129:103531. [PMID: 37453246 DOI: 10.1016/j.dnarep.2023.103531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 06/05/2023] [Accepted: 06/28/2023] [Indexed: 07/18/2023]
Abstract
DNA replication stress (RS) is frequently induced by oncogene activation and is believed to promote tumorigenesis. However, clinical evidence for the role of oncogene-induced RS in tumorigenesis remains scarce, and the mechanisms by which RS promotes cancer development remain incompletely understood. By performing a series of bioinformatic analyses on the oncogene E2F1, other RS-inducing factors, and replication fork processing factors in TCGA cancer database using previously established tools, we show that hyperactivity of E2F1 likely promotes the expression of several of these factors in virtually all types of cancer to induce RS and cytosolic self-DNA production. In addition, the expression of these factors positively correlates with that of ATR and Chk1 that govern the cellular response to RS, the tumor mutational load, and tumor infiltration of immune-suppressive CD4+Th2 cells and myeloid-derived suppressor cells (MDSCs). Consistently, high expression of these factors is associated with poor patient survival. Our study provides new insights into the role of E2F1-induced RS in tumorigenesis and suggests therapeutic approaches for E2F1-overexpressing cancers by targeting genomic instability, cytosolic self-DNA and the tumor immune microenvironment.
Collapse
Affiliation(s)
- Ke Tan
- Department of Gastroenterology, Affiliated Hospital of Jiangsu University, Zhenjiang, Jiangsu 212013, China; Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Yizhe Song
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Min Xu
- Department of Gastroenterology, Affiliated Hospital of Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - Zhongsheng You
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO 63110, USA.
| |
Collapse
|
9
|
Xiong YX, Wang MG, Chen L, Zhang XF. Cell-type annotation with accurate unseen cell-type identification using multiple references. PLoS Comput Biol 2023; 19:e1011261. [PMID: 37379341 PMCID: PMC10335708 DOI: 10.1371/journal.pcbi.1011261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 07/11/2023] [Accepted: 06/11/2023] [Indexed: 06/30/2023] Open
Abstract
The recent advances in single-cell RNA sequencing (scRNA-seq) techniques have stimulated efforts to identify and characterize the cellular composition of complex tissues. With the advent of various sequencing techniques, automated cell-type annotation using a well-annotated scRNA-seq reference becomes popular. But it relies on the diversity of cell types in the reference, which may not capture all the cell types present in the query data of interest. There are generally unseen cell types in the query data of interest because most data atlases are obtained for different purposes and techniques. Identifying previously unseen cell types is essential for improving annotation accuracy and uncovering novel biological discoveries. To address this challenge, we propose mtANN (multiple-reference-based scRNA-seq data annotation), a new method to automatically annotate query data while accurately identifying unseen cell types with the aid of multiple references. Key innovations of mtANN include the integration of deep learning and ensemble learning to improve prediction accuracy, and the introduction of a new metric that considers three complementary aspects to distinguish between unseen cell types and shared cell types. Additionally, we provide a data-driven method to adaptively select a threshold for identifying previously unseen cell types. We demonstrate the advantages of mtANN over state-of-the-art methods for unseen cell-type identification and cell-type annotation on two benchmark dataset collections, as well as its predictive power on a collection of COVID-19 datasets. The source code and tutorial are available at https://github.com/Zhangxf-ccnu/mtANN.
Collapse
Affiliation(s)
- Yi-Xuan Xiong
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan, China
| | - Meng-Guo Wang
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan, China
| | - Luonan Chen
- State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan, China
| |
Collapse
|
10
|
Long F, Wu H, Li H, Zuo W, Ao Q. Genome-Wide Analysis of MYB Transcription Factors and Screening of MYBs Involved in the Red Color Formation in Rhododendron delavayi. Int J Mol Sci 2023; 24:ijms24054641. [PMID: 36902072 PMCID: PMC10037418 DOI: 10.3390/ijms24054641] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/23/2023] [Accepted: 02/24/2023] [Indexed: 03/06/2023] Open
Abstract
Flower color is one of the crucial traits of ornamental plants. Rhododendron delavayi Franch. is a famous ornamental plant species distributed in the mountain areas of Southwest China. This plant has red inflorescence and young branchlets. However, the molecular basis of the color formation of R. delavayi is unclear. In this study, 184 MYB genes were identified based on the released genome of R. delavayi. These genes included 78 1R-MYB, 101 R2R3-MYB, 4 3R-MYB, and 1 4R-MYB. The MYBs were divided into 35 subgroups using phylogenetic analysis of the MYBs of Arabidopsis thaliana. The members of the same subgroup in R. delavayi had similar conserved domains and motifs, gene structures, and promoter cis-acting elements, which indicate their relatively conserved function. In addition, transcriptome based on unique molecular identifier strategy and color difference of the spotted petals, unspotted petals, spotted throat, unspotted throat, and branchlet cortex were detected. Results showed significant differences in the expression levels of R2R3-MYB genes. Weighted co-expression network analysis between transcriptome and chromatic aberration values of five types of red samples showed that the MYBs were the most important TFs involved in the color formation, of which seven were R2R3-MYB, and three were 1R-MYB. Two R2R3-MYB (DUH019226.1 and DUH019400.1) had the highest connectivity in the whole regulation network, and they were identified as hub genes for red color formation. These two MYB hub genes provide references for the study of transcriptional regulation of the red color formation of R. delavayi.
Collapse
Affiliation(s)
- Fenfang Long
- College of Agriculture, Guizhou University, Guiyang 550025, China
| | - Hairong Wu
- College of Agriculture, Guizhou University, Guiyang 550025, China
| | - Huie Li
- College of Agriculture, Guizhou University, Guiyang 550025, China
| | - Weiwei Zuo
- College of Agriculture, Guizhou University, Guiyang 550025, China
| | - Qian Ao
- College of Agriculture, Guizhou University, Guiyang 550025, China
| |
Collapse
|
11
|
Cuevas-Diaz Duran R, González-Orozco JC, Velasco I, Wu JQ. Single-cell and single-nuclei RNA sequencing as powerful tools to decipher cellular heterogeneity and dysregulation in neurodegenerative diseases. Front Cell Dev Biol 2022; 10:884748. [PMID: 36353512 PMCID: PMC9637968 DOI: 10.3389/fcell.2022.884748] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 10/06/2022] [Indexed: 08/10/2023] Open
Abstract
Neurodegenerative diseases affect millions of people worldwide and there are currently no cures. Two types of common neurodegenerative diseases are Alzheimer's (AD) and Parkinson's disease (PD). Single-cell and single-nuclei RNA sequencing (scRNA-seq and snRNA-seq) have become powerful tools to elucidate the inherent complexity and dynamics of the central nervous system at cellular resolution. This technology has allowed the identification of cell types and states, providing new insights into cellular susceptibilities and molecular mechanisms underlying neurodegenerative conditions. Exciting research using high throughput scRNA-seq and snRNA-seq technologies to study AD and PD is emerging. Herein we review the recent progress in understanding these neurodegenerative diseases using these state-of-the-art technologies. We discuss the fundamental principles and implications of single-cell sequencing of the human brain. Moreover, we review some examples of the computational and analytical tools required to interpret the extensive amount of data generated from these assays. We conclude by highlighting challenges and limitations in the application of these technologies in the study of AD and PD.
Collapse
Affiliation(s)
| | | | - Iván Velasco
- Instituto de Fisiología Celular—Neurociencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Laboratorio de Reprogramación Celular, Instituto Nacional de Neurología y Neurocirugía “Manuel Velasco Suárez”, Mexico City, Mexico
| | - Jia Qian Wu
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, United States
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, United States
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, United States
| |
Collapse
|
12
|
Kleino I, Frolovaitė P, Suomi T, Elo LL. Computational solutions for spatial transcriptomics. Comput Struct Biotechnol J 2022; 20:4870-4884. [PMID: 36147664 PMCID: PMC9464853 DOI: 10.1016/j.csbj.2022.08.043] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 08/18/2022] [Accepted: 08/18/2022] [Indexed: 11/18/2022] Open
Abstract
Transcriptome level expression data connected to the spatial organization of the cells and molecules would allow a comprehensive understanding of how gene expression is connected to the structure and function in the biological systems. The spatial transcriptomics platforms may soon provide such information. However, the current platforms still lack spatial resolution, capture only a fraction of the transcriptome heterogeneity, or lack the throughput for large scale studies. The strengths and weaknesses in current ST platforms and computational solutions need to be taken into account when planning spatial transcriptomics studies. The basis of the computational ST analysis is the solutions developed for single-cell RNA-sequencing data, with advancements taking into account the spatial connectedness of the transcriptomes. The scRNA-seq tools are modified for spatial transcriptomics or new solutions like deep learning-based joint analysis of expression, spatial, and image data are developed to extract biological information in the spatially resolved transcriptomes. The computational ST analysis can reveal remarkable biological insights into spatial patterns of gene expression, cell signaling, and cell type variations in connection with cell type-specific signaling and organization in complex tissues. This review covers the topics that help choosing the platform and computational solutions for spatial transcriptomics research. We focus on the currently available ST methods and platforms and their strengths and limitations. Of the computational solutions, we provide an overview of the analysis steps and tools used in the ST data analysis. The compatibility with the data types and the tools provided by the current ST analysis frameworks are summarized.
Collapse
Key Words
- AOI, area of illumination
- BICCN, Brain Initiative Cell Census Network
- BOLORAMIS, barcoded oligonucleotides ligated on RNA amplified for multiplexed and parallel in situ analyses
- Baysor, Bayesian Segmentation of Spatial Transcriptomics Data
- BinSpect, Binary Spatial Extraction
- CCC, cell–cell communication
- CCI, cell–cell interactions
- CNV, copy-number variation
- Computational biology
- DSP, digital spatial profiling
- DbiT-Seq, Deterministic Barcoding in Tissue for spatial omics sequencing
- FA, factor analysis
- FFPE, formalin-fixed, paraffin-embedded
- FISH, fluorescence in situ hybridization
- FISSEQ, fluorescence in situ sequencing of RNA
- FOV, Field of view
- GRNs, gene regulation networks
- GSEA, gene set enrichment analysis
- GSVA, gene set variation analysis
- HDST, high definition spatial transcriptomics
- HMRF, hidden Markov random field
- ICG, interaction changed genes
- ISH, in situ hybridization
- ISS, in situ sequencing
- JSTA, Joint cell segmentation and cell type annotation
- KNN, k-nearest neighbor
- LCM, Laser Capture Microdissection
- LCM-seq, laser capture microdissection coupled with RNA sequencing
- LOH, loss of heterozygosity analysis
- MC, Molecular Cartography
- MERFISH, multiplexed error-robust FISH
- NMF (NNMF), Non-negative matrix factorization
- PCA, Principal Component Analysis
- PIXEL-seq, Polony (or DNA cluster)-indexed library-sequencing
- PL-lig, padlock ligation
- QC, quality control
- RNAseq, RNA sequencing
- ROI, region of interest
- SCENIC, Single-Cell rEgulatory Network Inference and Clustering
- SME, Spatial Morphological gene Expression normalization
- SPATA, SPAtial Transcriptomic Analysis
- ST Pipeline, Spatial Transcriptomics Pipeline
- ST, Spatial transcriptomics
- STARmap, spatially-resolved transcript amplicon readout mapping
- Single-cell analysis
- Spatial data analysis frameworks
- Spatial deconvolution
- Spatial transcriptomics
- TIVA, Transcriptome in Vivo Analysis
- TMA, tissue microarray
- TME, tumor micro environment
- UMAP, Uniform Manifold Approximation and Projection for Dimension Reduction
- UMI, unique molecular identifier
- ZipSeq, zipcoded sequencing.
- scRNA-seq, single-cell RNA sequencing
- scvi-tools, single-cell variational inference tools
- seqFISH, sequential fluorescence in situ hybridization
- sequ-smFISH, sequential single-molecule fluorescent in situ hybridization
- smFISH, single molecule FISH
- t-SNE, t-distributed stochastic neighbor embedding
Collapse
Affiliation(s)
- Iivari Kleino
- Turku Bioscience Centre, University of Turku and Åbo Akademi University Turku, Turku, Finland
| | - Paulina Frolovaitė
- Turku Bioscience Centre, University of Turku and Åbo Akademi University Turku, Turku, Finland
| | - Tomi Suomi
- Turku Bioscience Centre, University of Turku and Åbo Akademi University Turku, Turku, Finland
| | - Laura L. Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University Turku, Turku, Finland
- Institute of Biomedicine, University of Turku, Turku, Finland
| |
Collapse
|
13
|
Zhang D, Zhang T, Zhang Y, Li Z, Li H, Zhang Y, Liu C, Han Z, Li J, Zhu J. Screening the components of Saussurea involucrata for novel targets for the treatment of NSCLC using network pharmacology. BMC Complement Med Ther 2022; 22:53. [PMID: 35227278 PMCID: PMC8886885 DOI: 10.1186/s12906-021-03501-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 12/30/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Saussurea involucrata (SAIN), also known as Snow lotus (SI), is mainly distributed in high-altitude areas such as Tibet and Xinjiang in China. To identify novel targets for the prevention or treatment of lung adenocarcinoma and lung squamous cell carcinoma (LUAD&LUSC), and to facilitate better alternative new drug discovery as well as clinical application services, the therapeutic effects of SAIN on LUAD&LUSC were evaluated by gene differential analysis of clinical samples, compound target molecular docking, and GROMACS molecular dynamics simulation. RESULTS Through data screening, alignment, analysis, and validation it was confirmed that three of the major active ingredients in SAIN, namely quercetin (Q), luteolin (L), and kaempferol (K), mainly act on six protein targets, which mainly regulate signaling pathways in cancer, transcriptional misregulation in cancer, EGFR tyrosine kinase inhibitor resistance, adherens junction, IL-17 signaling pathway, melanoma, and non-small cell lung cancer. In addition, microRNAs in cancer exert preventive or therapeutic effects on LUAD&LUSC. Molecular dynamics (MD) simulations of Q, L, or K in complex with EGFR, MET, MMP1, or MMP3 revealed the presence of Q in a very stable tertiary structure in the human body. CONCLUSION There are three active compounds of Q, L, and K in SAIN, which play a role in the treatment and prevention of non-small cell lung cancer (NSCLC) by directly or indirectly regulating the expression of genes such as MMP1, MMP3, and EGFR.
Collapse
Affiliation(s)
- Dongdong Zhang
- School of Life Sciences, Shihezi University, Xiangyang street, Shihezi, 832003, PR China
| | - Tieying Zhang
- School of Life Sciences, Shihezi University, Xiangyang street, Shihezi, 832003, PR China
| | - Yao Zhang
- School of Life Sciences, Shihezi University, Xiangyang street, Shihezi, 832003, PR China
| | - Zhongqing Li
- School of Life Sciences, Shihezi University, Xiangyang street, Shihezi, 832003, PR China
| | - He Li
- School of Life Sciences, Shihezi University, Xiangyang street, Shihezi, 832003, PR China
| | - Yueyang Zhang
- School of Life Sciences, Shihezi University, Xiangyang street, Shihezi, 832003, PR China
| | - Chenggong Liu
- School of Life Sciences, Shihezi University, Xiangyang street, Shihezi, 832003, PR China
| | - Zichao Han
- School of Life Sciences, Shihezi University, Xiangyang street, Shihezi, 832003, PR China
| | - Jin Li
- School of Life Sciences, Shihezi University, Xiangyang street, Shihezi, 832003, PR China.
| | - Jianbo Zhu
- School of Life Sciences, Shihezi University, Xiangyang street, Shihezi, 832003, PR China.
| |
Collapse
|
14
|
Li J, Yang S, Yang X, Wu H, Tang H, Yang L. PlantGF: an analysis and annotation platform for plant gene families. Database (Oxford) 2022; 2022:6520816. [PMID: 35134149 PMCID: PMC9278324 DOI: 10.1093/database/baab088] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 12/26/2021] [Accepted: 01/01/2022] [Indexed: 12/22/2022]
Abstract
Gene families contain genes that come from the same ancestor and have similar sequences and structures. They perform certain specific functions within and among different species. Currently, there is no complete process or platform for the rapid analysis of plant gene families. In this study, a comprehensive query and analysis platform of plant gene families, the Plant Gene Family Platform (PlantGF), was constructed. The platform is composed of four main parts: Search, Tools, Statistics and Auxiliary. A total of 2 909 580 gene family members were identified from 138 plant species in PlantGF. The data can be queried in the Search section through a user-friendly interface. A general process for gene family analysis, having nine steps, is provided. The platform also includes four online tools (HMM-Search, BLAST, MAFFT and HMMER) in the Tools section for useful additional analyses. The statistical analysis of the relevant gene families is shown on the Statistics page. Auxiliary pages are provided for data downloading. The datasets for all 138 plant species' protein sequences and their gene families can be acquired on the Download page. A user's manual and some useful links are displayed on the Manual and Links pages, respectively. To the best of our knowledge, PlantGF is the first comprehensive platform for studying plant gene families, and it will make important contributions to plant gene family-related research. Database URL: http://biodb.sdau.edu.cn/PGF/index.html.
Collapse
Affiliation(s)
| | | | - Xiaojie Yang
- Agricultural Big-Data Research Center and College of Plant Protection, Shandong Agricultural University, Daizong Road No.61, Taian 271018, China
| | - Hui Wu
- Agricultural Big-Data Research Center and College of Plant Protection, Shandong Agricultural University, Daizong Road No.61, Taian 271018, China
| | - Heng Tang
- *Corresponding author: Tel: (+86) 0538-8241575; Email Correspondence may also be addressed to Heng Tang. Tel: (+86) 0538-8241575; Email
| | - Long Yang
- *Corresponding author: Tel: (+86) 0538-8241575; Email Correspondence may also be addressed to Heng Tang. Tel: (+86) 0538-8241575; Email
| |
Collapse
|
15
|
Li J, Sheng Q, Shyr Y, Liu Q. scMRMA: single cell multiresolution marker-based annotation. Nucleic Acids Res 2022; 50:e7. [PMID: 34648021 PMCID: PMC8789072 DOI: 10.1093/nar/gkab931] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 09/10/2021] [Accepted: 09/28/2021] [Indexed: 01/22/2023] Open
Abstract
Single-cell RNA sequencing has become a powerful tool for identifying and characterizing cellular heterogeneity. One essential step to understanding cellular heterogeneity is determining cell identities. The widely used strategy predicts identities by projecting cells or cell clusters unidirectionally against a reference to find the best match. Here, we develop a bidirectional method, scMRMA, where a hierarchical reference guides iterative clustering and deep annotation with enhanced resolutions. Taking full advantage of the reference, scMRMA greatly improves the annotation accuracy. scMRMA achieved better performance than existing methods in four benchmark datasets and successfully revealed the expansion of CD8 T cell populations in squamous cell carcinoma after anti-PD-1 treatment.
Collapse
Affiliation(s)
- Jia Li
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Quanhu Sheng
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| |
Collapse
|
16
|
Osumi-Sutherland D, Xu C, Keays M, Levine AP, Kharchenko PV, Regev A, Lein E, Teichmann SA. Cell type ontologies of the Human Cell Atlas. Nat Cell Biol 2021; 23:1129-1135. [PMID: 34750578 DOI: 10.1038/s41556-021-00787-7] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 09/28/2021] [Indexed: 12/24/2022]
Abstract
Massive single-cell profiling efforts have accelerated our discovery of the cellular composition of the human body while at the same time raising the need to formalize this new knowledge. Here, we discuss current efforts to harmonize and integrate different sources of annotations of cell types and states into a reference cell ontology. We illustrate with examples how a unified ontology can consolidate and advance our understanding of cell types across scientific communities and biological domains.
Collapse
Affiliation(s)
| | - Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Maria Keays
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Adam P Levine
- Research Department of Pathology, University College London, London, UK
| | - Peter V Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Aviv Regev
- Genentech, South San Francisco, CA, USA.,Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ed Lein
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK. .,Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
17
|
Wartmann H, Heins S, Kloiber K, Bonn S. Bias-invariant RNA-sequencing metadata annotation. Gigascience 2021; 10:giab064. [PMID: 34553213 PMCID: PMC8559615 DOI: 10.1093/gigascience/giab064] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 06/11/2021] [Accepted: 09/01/2021] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs. FINDINGS Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples. CONCLUSION Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.
Collapse
Affiliation(s)
- Hannes Wartmann
- Institute of Medical Systems Biology, Center for Biomedical AI, University
Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sven Heins
- Institute of Medical Systems Biology, Center for Biomedical AI, University
Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Karin Kloiber
- Institute of Medical Systems Biology, Center for Biomedical AI, University
Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Stefan Bonn
- Institute of Medical Systems Biology, Center for Biomedical AI, University
Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
18
|
Wang S, Pisco AO, McGeever A, Brbic M, Zitnik M, Darmanis S, Leskovec J, Karkanias J, Altman RB. Leveraging the Cell Ontology to classify unseen cell types. Nat Commun 2021; 12:5556. [PMID: 34548483 PMCID: PMC8455606 DOI: 10.1038/s41467-021-25725-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 08/17/2021] [Indexed: 11/09/2022] Open
Abstract
Single cell technologies are rapidly generating large amounts of data that enables us to understand biological systems at single-cell resolution. However, joint analysis of datasets generated by independent labs remains challenging due to a lack of consistent terminology to describe cell types. Here, we present OnClass, an algorithm and accompanying software for automatically classifying cells into cell types that are part of the controlled vocabulary that forms the Cell Ontology. A key advantage of OnClass is its capability to classify cells into cell types not present in the training data because it uses the Cell Ontology graph to infer cell type relationships. Furthermore, OnClass can be used to identify marker genes for all the cell ontology categories, regardless of whether the cell types are present or absent in the training data, suggesting that OnClass goes beyond a simple annotation tool for single cell datasets, being the first algorithm capable to identify marker genes specific to all terms of the Cell Ontology and offering the possibility of refining the Cell Ontology using a data-centric approach.
Collapse
Affiliation(s)
- Sheng Wang
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | | | | | - Maria Brbic
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Marinka Zitnik
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | | | - Jure Leskovec
- Chan Zuckerberg Biohub, San Francisco, CA, 94158, USA
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Jim Karkanias
- Chan Zuckerberg Biohub, San Francisco, CA, 94158, USA
| | - Russ B Altman
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA.
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, 94158, USA.
| |
Collapse
|
19
|
Abstract
Cell type annotation is important in the analysis of single-cell RNA-seq data. CellO is a machine-learning-based tool for annotating cells using the Cell Ontology, a rich hierarchy of known cell types. We provide a protocol for using the CellO Python package to annotate human cells. We demonstrate how to use CellO in conjunction with Scanpy, a Python library for performing single-cell analysis, annotate a lung tissue data set, interpret its hierarchically structured cell type annotations, and create publication-ready figures. For complete details on the use and execution of this protocol, please refer to Bernstein et al. (2021). CellO is a Python package for annotating cell types in single-cell RNA-seq data CellO classifies cells against the hierarchically structured Cell Ontology CellO can be integrated into single-cell analysis pipelines implemented with Scanpy We present a tutorial that classifies cells in an existing lung tumor data set
Collapse
Affiliation(s)
| | - Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53792, USA.,Department of Computer Sciences, University of Wisconsin - Madison, Madison, WI 53706, USA
| |
Collapse
|
20
|
Schaffer LV, Ideker T. Mapping the multiscale structure of biological systems. Cell Syst 2021; 12:622-635. [PMID: 34139169 PMCID: PMC8245186 DOI: 10.1016/j.cels.2021.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 05/04/2021] [Accepted: 05/14/2021] [Indexed: 01/14/2023]
Abstract
Biological systems are by nature multiscale, consisting of subsystems that factor into progressively smaller units in a deeply hierarchical structure. At any level of the hierarchy, an ever-increasing diversity of technologies can be applied to characterize the corresponding biological units and their relations, resulting in large networks of physical or functional proximities-e.g., proximities of amino acids within a protein, of proteins within a complex, or of cell types within a tissue. Here, we review general concepts and progress in using network proximity measures as a basis for creation of multiscale hierarchical maps of biological systems. We discuss the functionalization of these maps to create predictive models, including those useful in translation of genotype to phenotype, along with strategies for model visualization and challenges faced by multiscale modeling in the near future. Collectively, these approaches enable a unified hierarchical approach to biological data, with application from the molecular to the macroscopic.
Collapse
Affiliation(s)
- Leah V Schaffer
- Division of Genetics, Department of Medicine, University of California San Diego, San Diego, La Jolla, CA 92093, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
21
|
Roychowdhury A, Jondhale M, Saldanha E, Ghosh D, Kumar Panda C, Chandrani P, Mukherjee N. Landscape of toll-like receptors expression in tumor microenvironment of triple negative breast cancer (TNBC): Distinct roles of TLR4 and TLR8. Gene 2021; 792:145728. [PMID: 34022297 DOI: 10.1016/j.gene.2021.145728] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Revised: 05/16/2021] [Accepted: 05/17/2021] [Indexed: 12/11/2022]
Abstract
TNBC is the most aggressive and hormone receptor-negative subtype of breast cancer with molecular heterogeneity in bulk tumors hindering effective treatment. Toll-like receptors (TLRs) have the potential to ignite diverse immune responses in the tumor microenvironment (TME). This encouraged us to screen their transcript expression in the publically available TCGA datasets. Reported molecular subtypes of TNBC may represent different TMEs and we observed differentially expressed TLRs (DETs) i.e. TLR3/4/6/8/9 have unique expression pattern in the TNBC subtypes, particularly in Immunomodulatory (IM) TNBC subtype. We then dissected expression of the DETs in immune and other components of the TME. TLR4 and TLR8 showed significant (p-value ≤ 0.05) negative partial correlation with tumor purity compared to other DETs. Interestingly, TLR4 and TLR8 expression showed a significant (adjusted p-value ≤ 0.05) correlation with different subsets of immune infiltrating cells having the highest correlation with monocytes/macrophage/dendritic cell populations mediating both innate and adaptive response in TNBC. The co-expression network identified genes correlated with these immune cells. Further, GSEA analysis of co-expressed genes showed a significant association of TLR8 partners with 'Peptide ligand binding', 'Gά-signaling', and 'Cytokine-cytokine interaction' while TLR4 associated genes correlated with 'Adaptive immune system' and 'Systemic lupus erythematosus' interactome. Finally, the expression of TLR4 protein was validated in a panel of TNBC cell lines. TLR4 expression in chemoresponsive TNBC was also validated in TNBC cell lines upon Paclitaxel (PTX) treatment. Collectively, the present study identified specific DETs in TNBC and discovered a prospective role of TLR4 and TLR8 in the maintenance of tumor-immune-microenvironment.
Collapse
Affiliation(s)
- Anirban Roychowdhury
- Department of Oncogene Regulation, Chittaranjan National Cancer Institute, Kolkata, India
| | - Mayur Jondhale
- Department of Molecular and Cellular Biology, National Institute for Research on Reproductive Health, Mumbai, India
| | - Elveera Saldanha
- Medical Oncology Molecular Laboratory, Medical Oncology Department, Tata Memorial Hospital, Mumbai, India
| | - Deblina Ghosh
- Department of Life Science & Biotechnology, Jadavpur University, Kolkata, India
| | - Chinmay Kumar Panda
- Department of Oncogene Regulation, Chittaranjan National Cancer Institute, Kolkata, India
| | - Pratik Chandrani
- Medical Oncology Molecular Laboratory, Medical Oncology Department, Tata Memorial Hospital, Mumbai, India; Centre for Computational Biology, Bioinformatics and Crosstalk Laboratory, ACTREC-Tata MemorialCentre, Navi Mumbai, India; Homi Bhabha National Institute, Mumbai, India
| | - Nupur Mukherjee
- Department of Molecular and Cellular Biology, National Institute for Research on Reproductive Health, Mumbai, India.
| |
Collapse
|
22
|
Bernstein MN, Ni Z, Collins M, Burkard ME, Kendziorski C, Stewart R. CHARTS: a web application for characterizing and comparing tumor subpopulations in publicly available single-cell RNA-seq data sets. BMC Bioinformatics 2021; 22:83. [PMID: 33622236 PMCID: PMC7903756 DOI: 10.1186/s12859-021-04021-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 02/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Single-cell RNA-seq (scRNA-seq) enables the profiling of genome-wide gene expression at the single-cell level and in so doing facilitates insight into and information about cellular heterogeneity within a tissue. This is especially important in cancer, where tumor and tumor microenvironment heterogeneity directly impact development, maintenance, and progression of disease. While publicly available scRNA-seq cancer data sets offer unprecedented opportunity to better understand the mechanisms underlying tumor progression, metastasis, drug resistance, and immune evasion, much of the available information has been underutilized, in part, due to the lack of tools available for aggregating and analysing these data. RESULTS We present CHARacterizing Tumor Subpopulations (CHARTS), a web application for exploring publicly available scRNA-seq cancer data sets in the NCBI's Gene Expression Omnibus. More specifically, CHARTS enables the exploration of individual gene expression, cell type, malignancy-status, differentially expressed genes, and gene set enrichment results in subpopulations of cells across tumors and data sets. Along with the web application, we also make available the backend computational pipeline that was used to produce the analyses that are available for exploration in the web application. CONCLUSION CHARTS is an easy to use, comprehensive platform for exploring single-cell subpopulations within tumors across the ever-growing collection of public scRNA-seq cancer data sets. CHARTS is freely available at charts.morgridge.org.
Collapse
Affiliation(s)
| | - Zijian Ni
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
| | | | - Mark E Burkard
- Department of Medicine, Hematology/Oncology, University of Wisconsin - Madison, Madison, WI, 53705, USA
- University of Wisconsin Carbone Cancer Center, Madison, WI, 53705, USA
| | - Christina Kendziorski
- Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI, 53792, USA.
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI, 53715, USA.
| |
Collapse
|