1
|
Aragones SD, Ferrer E. Clustering Analysis of Time Series of Affect in Dyadic Interactions. MULTIVARIATE BEHAVIORAL RESEARCH 2024; 59:320-341. [PMID: 38407099 DOI: 10.1080/00273171.2023.2283633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
An important goal when analyzing multivariate time series is the identification of heterogeneity, both within and across individuals over time. This heterogeneity can represent different ways in which psychological processes manifest, either between people or within a person across time. In many instances, those differences can have systematic patterns that can be related to future outcomes. In close relationships, for example, the daily exchange of affect between two individuals in a couple can contain a particular structure that is different across people and can result in varying levels of relationship satisfaction. In this paper we use Louvain, a clustering method, as a tool to characterize heterogeneity in multivariate time series data. Using affect measures from dyadic interactions, we first determine that Louvain is adept at detecting homogeneous patterns that are distinct from one another. Additionally, these homogeneous points are linked, at some level, by time. Thus, we find that clustering via Louvain is useful to find time periods of stable, reoccurring patterns. However, using measures founded on information theory reveals that there is some level of information loss that is inevitable when clustering on levels of variable expression. Finally, we evaluate the predictive validity of the clustering method by examining the relation between the identified clusters of affect and measures outside the time series (i.e., relationship satisfaction and breakup taken one and two years later).
Collapse
|
2
|
Nazaret A, Fan JL, Lavallée VP, Cornish AE, Kiseliovas V, Masilionis I, Chun J, Bowman RL, Eisman SE, Wang J, Shi L, Levine RL, Mazutis L, Blei D, Pe'er D, Azizi E. Deep generative model deciphers derailed trajectories in acute myeloid leukemia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.11.566719. [PMID: 38014231 PMCID: PMC10680623 DOI: 10.1101/2023.11.11.566719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Single-cell genomics has the potential to map cell states and their dynamics in an unbiased way in response to perturbations like disease. However, elucidating the cell-state transitions from healthy to disease requires analyzing data from perturbed samples jointly with unperturbed reference samples. Existing methods for integrating and jointly visualizing single-cell datasets from distinct contexts tend to remove key biological differences or do not correctly harmonize shared mechanisms. We present Decipher, a model that combines variational autoencoders with deep exponential families to reconstruct derailed trajectories ( https://github.com/azizilab/decipher ). Decipher jointly represents normal and perturbed single-cell RNA-seq datasets, revealing shared and disrupted dynamics. It further introduces a novel approach to visualize data, without the need for methods such as UMAP or TSNE. We demonstrate Decipher on data from acute myeloid leukemia patient bone marrow specimens, showing that it successfully characterizes the divergence from normal hematopoiesis and identifies transcriptional programs that become disrupted in each patient when they acquire NPM1 driver mutations.
Collapse
|
3
|
Gunawan I, Vafaee F, Meijering E, Lock JG. An introduction to representation learning for single-cell data analysis. CELL REPORTS METHODS 2023; 3:100547. [PMID: 37671013 PMCID: PMC10475795 DOI: 10.1016/j.crmeth.2023.100547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze these complex, high-dimensional data by projecting them into lower-dimensional embeddings. This facilitates the interpretation and interrogation of the structures, dynamics, and regulation of cell heterogeneity. Reflecting their central role in analyzing diverse single-cell data types, a myriad of representation learning methods exist, with new approaches continually emerging. Here, we contrast general features of representation learning methods spanning statistical, manifold learning, and neural network approaches. We consider key steps involved in representation learning with single-cell data, including data pre-processing, hyperparameter optimization, downstream analysis, and biological validation. Interdependencies and contingencies linking these steps are also highlighted. This overview is intended to guide researchers in the selection, application, and optimization of representation learning strategies for current and future single-cell research applications.
Collapse
Affiliation(s)
- Ihuan Gunawan
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
| | - Erik Meijering
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - John George Lock
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
- Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| |
Collapse
|
4
|
Li Y, Nguyen J, Anastasiu DC, Arriaga EA. CosTaL: an accurate and scalable graph-based clustering algorithm for high-dimensional single-cell data analysis. Brief Bioinform 2023; 24:bbad157. [PMID: 37150778 PMCID: PMC10199777 DOI: 10.1093/bib/bbad157] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 03/28/2023] [Accepted: 04/02/2023] [Indexed: 05/09/2023] Open
Abstract
With the aim of analyzing large-sized multidimensional single-cell datasets, we are describing a method for Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL). As a graph-based clustering method, CosTaL transforms the cells with high-dimensional features into a weighted k-nearest-neighbor (kNN) graph. The cells are represented by the vertices of the graph, while an edge between two vertices in the graph represents the close relatedness between the two cells. Specifically, CosTaL builds an exact kNN graph using cosine similarity and uses the Tanimoto coefficient as the refining strategy to re-weight the edges in order to improve the effectiveness of clustering. We demonstrate that CosTaL generally achieves equivalent or higher effectiveness scores on seven benchmark cytometry datasets and six single-cell RNA-sequencing datasets using six different evaluation metrics, compared with other state-of-the-art graph-based clustering methods, including PhenoGraph, Scanpy and PARC. As indicated by the combined evaluation metrics, Costal has high efficiency with small datasets and acceptable scalability for large datasets, which is beneficial for large-scale analysis.
Collapse
Affiliation(s)
- Yijia Li
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, 420 Washington Ave. S.E., Minneapolis, 55455, Minnesota, USA
| | - Jonathan Nguyen
- Department of Computer Science and Engineering, Santa Clara University, 500 El Camino Real, Santa Clara, 95053, California, USA
| | - David C Anastasiu
- Department of Computer Science and Engineering, Santa Clara University, 500 El Camino Real, Santa Clara, 95053, California, USA
| | - Edgar A Arriaga
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, 420 Washington Ave. S.E., Minneapolis, 55455, Minnesota, USA
- Department of Chemistry, University of Minnesota, Smith Hall, 139 Smith Hall, Pleasant St SE, Minneapolis, 55455, Minnesota, USA
| |
Collapse
|
5
|
Li K, Sun YH, Ouyang Z, Negi S, Gao Z, Zhu J, Wang W, Chen Y, Piya S, Hu W, Zavodszky MI, Yalamanchili H, Cao S, Gehrke A, Sheehan M, Huh D, Casey F, Zhang X, Zhang B. scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing. BMC Genomics 2023; 24:228. [PMID: 37131143 PMCID: PMC10155351 DOI: 10.1186/s12864-023-09332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 04/25/2023] [Indexed: 05/04/2023] Open
Abstract
BACKGROUND Single-cell RNA sequencing is a state-of-the-art technology to understand gene expression in complex tissues. With the growing amount of data being generated, the standardization and automation of data analysis are critical to generating hypotheses and discovering biological insights. RESULTS Here, we present scRNASequest, a semi-automated single-cell RNA-seq (scRNA-seq) data analysis workflow which allows (1) preprocessing from raw UMI count data, (2) harmonization by one or multiple methods, (3) reference-dataset-based cell type label transfer and embedding projection, (4) multi-sample, multi-condition single-cell level differential gene expression analysis, and (5) seamless integration with cellxgene VIP for visualization and with CellDepot for data hosting and sharing by generating compatible h5ad files. CONCLUSIONS We developed scRNASequest, an end-to-end pipeline for single-cell RNA-seq data analysis, visualization, and publishing. The source code under MIT open-source license is provided at https://github.com/interactivereport/scRNASequest . We also prepared a bookdown tutorial for the installation and detailed usage of the pipeline: https://interactivereport.github.io/scRNAsequest/tutorial/docs/ . Users have the option to run it on a local computer with a Linux/Unix system including MacOS, or interact with SGE/Slurm schedulers on high-performance computing (HPC) clusters.
Collapse
Affiliation(s)
- Kejie Li
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Yu H Sun
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | | | - Soumya Negi
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Zhen Gao
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Jing Zhu
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Wanli Wang
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Yirui Chen
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Sarbottam Piya
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Wenxing Hu
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Maria I Zavodszky
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Hima Yalamanchili
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Shaolong Cao
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Andrew Gehrke
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Mark Sheehan
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Dann Huh
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Fergal Casey
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA
| | - Xinmin Zhang
- Data Science, BioInfoRx Inc., Madison, WI, 53719, USA
| | - Baohong Zhang
- Research Data Sciences, Translational Biology, Biogen Inc., Cambridge, MA, 02142, USA.
| |
Collapse
|
6
|
Zhang Y, Sun H, Lian X, Tang J, Zhu F. ANPELA: Significantly Enhanced Quantification Tool for Cytometry-Based Single-Cell Proteomics. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2207061. [PMID: 36950745 DOI: 10.1002/advs.202207061] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/13/2023] [Indexed: 05/27/2023]
Abstract
ANPELA is widely used for quantifying traditional bulk proteomic data. Recently, there is a clear shift from bulk proteomics to the single-cell ones (SCP), for which powerful cytometry techniques demonstrate the fantastic capacity of capturing cellular heterogeneity that is completely overlooked by traditional bulk profiling. However, the in-depth and high-quality quantification of SCP data is still challenging and severely affected by the large numbers of quantification workflows and extreme performance dependence on the studied datasets. In other words, the proper selection of well-performing workflow(s) for any studied dataset is elusory, and it is urgently needed to have a significantly enhanced and accelerated tool to address this issue. However, no such tool is developed yet. Herein, ANPELA is therefore updated to its 2.0 version (https://idrblab.org/anpela/), which is unique in providing the most comprehensive set of quantification alternatives (>1000 workflows) among all existing tools, enabling systematic performance evaluation from multiple perspectives based on machine learning, and identifying the optimal workflow(s) using overall performance ranking together with the parallel computation. Extensive validation on different benchmark datasets and representative application scenarios suggest the great application potential of ANPELA in current SCP research for gaining more accurate and reliable biological insights.
Collapse
Affiliation(s)
- Ying Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Xichen Lian
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Jing Tang
- Department of Bioinformatics, Chongqing Medical University, Chongqing, 400016, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| |
Collapse
|
7
|
Becker LM, Chen SH, Rodor J, de Rooij LPMH, Baker AH, Carmeliet P. Deciphering endothelial heterogeneity in health and disease at single-cell resolution: progress and perspectives. Cardiovasc Res 2023; 119:6-27. [PMID: 35179567 PMCID: PMC10022871 DOI: 10.1093/cvr/cvac018] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/16/2021] [Accepted: 02/16/2022] [Indexed: 11/14/2022] Open
Abstract
Endothelial cells (ECs) constitute the inner lining of vascular beds in mammals and are crucial for homeostatic regulation of blood vessel physiology, but also play a key role in pathogenesis of many diseases, thereby representing realistic therapeutic targets. However, it has become evident that ECs are heterogeneous, encompassing several subtypes with distinct functions, which makes EC targeting and modulation in diseases challenging. The rise of the new single-cell era has led to an emergence of studies aimed at interrogating transcriptome diversity along the vascular tree, and has revolutionized our understanding of EC heterogeneity from both a physiological and pathophysiological context. Here, we discuss recent landmark studies aimed at teasing apart the heterogeneous nature of ECs. We cover driving (epi)genetic, transcriptomic, and metabolic forces underlying EC heterogeneity in health and disease, as well as current strategies used to combat disease-enriched EC phenotypes, and propose strategies to transcend largely descriptive heterogeneity towards prioritization and functional validation of therapeutically targetable drivers of EC diversity. Lastly, we provide an overview of the most recent advances and hurdles in single EC OMICs.
Collapse
Affiliation(s)
| | | | | | | | - Andrew H Baker
- Corresponding authors. Tel: +32 16 32 62 47, E-mail: (P.C.); Tel: +44 (0)131 242 6774, E-mail: (A.H.B.)
| | - Peter Carmeliet
- Corresponding authors. Tel: +32 16 32 62 47, E-mail: (P.C.); Tel: +44 (0)131 242 6774, E-mail: (A.H.B.)
| |
Collapse
|
8
|
Rather AA, Chachoo MA. Robust correlation estimation and UMAP assisted topological analysis of omics data for disease subtyping. Comput Biol Med 2023; 155:106640. [PMID: 36774889 DOI: 10.1016/j.compbiomed.2023.106640] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/08/2023] [Accepted: 02/05/2023] [Indexed: 02/10/2023]
Abstract
Deciphering information hidden in the gene expression assays for identifying disease subtypes has significant importance in precision medicine. However, computational limitations thwart this process due to the intricacy of the biological networks and the curse of dimensionality of gene expression data. Therefore, clustering in such scenarios often becomes the first choice of exploratory data analysis to identify natural structures and intrinsic patterns in the data. However, sparse and high dimensional nature of omics data prevents conventional clustering algorithms to discover subtypes that are clinically relevant and statistically significant. Hence, non-linear dimensionality reduction techniques coupled with clustering in such scenarios often becomes imperative to improve the clustering results. In this study, we present a robust pipeline to discover disease subtypes with clinical relevance. Specifically, we focus on discovering patient sub-groups that have a residual life patterns remarkably different from other sub-groups. This is significant because by refining prognosis, subtyping can reduce uncertainty in approximating patients expected outcome. The methodology present is based on robust correlation estimation, UMAP- a non-linear dimensionality reduction method and mapper- a tool from topology. Notably, we suggest a method for improving the robustness of the correlation matrix of gene expression data for improving the clustering results. The performance of the model is evaluated by applying to five cancer datasets obtained through TCGA and comparisons are performed with some state of the art methods of NEMO, RSC-OTRI and SNF with regard to log-rank test and Restricted Life Expectancy Difference. For example in GBM dataset, the minimum separation for any two discovered subtypes is 221 days which is significantly higher than the other methodologies. We also compared the results without using the robust correlation based estimate and observed that robust correlation improves separability between survival curves significantly. From the results we infer that our methodology performs better compared to other methodologies with regard to separating survival curves of patient sub-groups despite using single omics profiles of patients compared to multiple omics profiles of SNF and NEMO. Pathway over-representation analysis is performed on the final clustering results to investigate the biological underpinnings characterizing each subtype.
Collapse
Affiliation(s)
- Arif Ahmad Rather
- Department of Computer Sciences, University of Kashmir, Srinagar, JK, India.
| | | |
Collapse
|
9
|
Tian SZ, Li G, Ning D, Jing K, Xu Y, Yang Y, Fullwood MJ, Yin P, Huang G, Plewczynski D, Zhai J, Dai Z, Chen W, Zheng M. MCIBox: a toolkit for single-molecule multi-way chromatin interaction visualization and micro-domains identification. Brief Bioinform 2022; 23:6696142. [PMID: 36094071 DOI: 10.1093/bib/bbac380] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 08/05/2022] [Accepted: 08/09/2022] [Indexed: 12/14/2022] Open
Abstract
The emerging ligation-free three-dimensional (3D) genome mapping technologies can identify multiplex chromatin interactions with single-molecule precision. These technologies not only offer new insight into high-dimensional chromatin organization and gene regulation, but also introduce new challenges in data visualization and analysis. To overcome these challenges, we developed MCIBox, a toolkit for multi-way chromatin interaction (MCI) analysis, including a visualization tool and a platform for identifying micro-domains with clustered single-molecule chromatin complexes. MCIBox is based on various clustering algorithms integrated with dimensionality reduction methods that can display multiplex chromatin interactions at single-molecule level, allowing users to explore chromatin extrusion patterns and super-enhancers regulation modes in transcription, and to identify single-molecule chromatin complexes that are clustered into micro-domains. Furthermore, MCIBox incorporates a two-dimensional kernel density estimation algorithm to identify micro-domains boundaries automatically. These micro-domains were stratified with distinctive signatures of transcription activity and contained different cell-cycle-associated genes. Taken together, MCIBox represents an invaluable tool for the study of multiple chromatin interactions and inaugurates a previously unappreciated view of 3D genome structure.
Collapse
Affiliation(s)
- Simon Zhongyuan Tian
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| | - Guoliang Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, No.1 Shizishan Street, Hongshan District, Wuhan, 430070, Hubei, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, No.1, Shizishan Street, Hongshan District, Wuhan, 430070, Hubei, China
| | - Duo Ning
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| | - Kai Jing
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| | - Yewen Xu
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| | - Yang Yang
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| | - Melissa J Fullwood
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, 637551, Singapore.,Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, 117599, Singapore.,Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), 61 Biopolis Dr, 138673, Singapore
| | - Pengfei Yin
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| | - Guangyu Huang
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Pl. Politechniki 1, 00-661, Warsaw, Poland.,Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, S. Banacha 2c, 00-927, Warsaw, Poland
| | - Jixian Zhai
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China.,Institute of Plant and Food Science, Southern University of Science and Technology, Southern University of Science and Technology, 1088, Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China.,Key Laboratory of Molecular Design for Plant Cell Factory of Guangdong Higher Education Institutes, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| | - Ziwei Dai
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| | - Wei Chen
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| | - Meizhen Zheng
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
| |
Collapse
|
10
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. UIPBC: An effective clustering for scRNA-seq data analysis without user input. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
11
|
Dong J, Zhang Y, Wang F. scSemiAE: a deep model with semi-supervised learning for single-cell transcriptomics. BMC Bioinformatics 2022; 23:161. [PMID: 35513780 PMCID: PMC9069784 DOI: 10.1186/s12859-022-04703-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 04/28/2022] [Indexed: 11/30/2022] Open
Abstract
Background With the development of modern sequencing technology, hundreds of thousands of single-cell RNA-sequencing (scRNA-seq) profiles allow to explore the heterogeneity in the cell level, but it faces the challenges of high dimensions and high sparsity. Dimensionality reduction is essential for downstream analysis, such as clustering to identify cell subpopulations. Usually, dimensionality reduction follows unsupervised approach. Results In this paper, we introduce a semi-supervised dimensionality reduction method named scSemiAE, which is based on an autoencoder model. It transfers the information contained in available datasets with cell subpopulation labels to guide the search of better low-dimensional representations, which can ease further analysis. Conclusions Experiments on five public datasets show that, scSemiAE outperforms both unsupervised and semi-supervised baselines whether the transferred information embodied in the number of labeled cells and labeled cell subpopulations is much or less. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04703-0.
Collapse
Affiliation(s)
- Jiayi Dong
- Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,School of Computer Science and Technology, Fudan University, Shanghai, China
| | - Yin Zhang
- Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,School of Computer Science and Technology, Fudan University, Shanghai, China
| | - Fei Wang
- Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China. .,School of Computer Science and Technology, Fudan University, Shanghai, China.
| |
Collapse
|
12
|
CASSL: A cell-type annotation method for single cell transcriptomics data using semi-supervised learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03440-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
13
|
Seth S, Mallik S, Bhadra T, Zhao Z. Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data. Front Genet 2022; 13:828479. [PMID: 35198011 PMCID: PMC8859265 DOI: 10.3389/fgene.2022.828479] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 01/05/2022] [Indexed: 02/02/2023] Open
Abstract
The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method "LogNormalize" for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty "significant"principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a log 2 FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.
Collapse
Affiliation(s)
- Soumita Seth
- Department of Computer Science & Engineering, Aliah University, Kolkata, India
| | - Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas, Health Science Center at Houston, Houston, TX, United States,*Correspondence: Saurav Mallik , ; Zhongming Zhao ,
| | - Tapas Bhadra
- Department of Computer Science & Engineering, Aliah University, Kolkata, India
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas, Health Science Center at Houston, Houston, TX, United States,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States,*Correspondence: Saurav Mallik , ; Zhongming Zhao ,
| |
Collapse
|
14
|
Vasighizaker A, Danda S, Rueda L. Discovering cell types using manifold learning and enhanced visualization of single-cell RNA-Seq data. Sci Rep 2022; 12:120. [PMID: 34996927 PMCID: PMC8742092 DOI: 10.1038/s41598-021-03613-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 12/07/2021] [Indexed: 01/03/2023] Open
Abstract
Identifying relevant disease modules such as target cell types is a significant step for studying diseases. High-throughput single-cell RNA-Seq (scRNA-seq) technologies have advanced in recent years, enabling researchers to investigate cells individually and understand their biological mechanisms. Computational techniques such as clustering, are the most suitable approach in scRNA-seq data analysis when the cell types have not been well-characterized. These techniques can be used to identify a group of genes that belong to a specific cell type based on their similar gene expression patterns. However, due to the sparsity and high-dimensionality of scRNA-seq data, classical clustering methods are not efficient. Therefore, the use of non-linear dimensionality reduction techniques to improve clustering results is crucial. We introduce a method that is used to identify representative clusters of different cell types by combining non-linear dimensionality reduction techniques and clustering algorithms. We assess the impact of different dimensionality reduction techniques combined with the clustering of thirteen publicly available scRNA-seq datasets of different tissues, sizes, and technologies. We further performed gene set enrichment analysis to evaluate the proposed method's performance. As such, our results show that modified locally linear embedding combined with independent component analysis yields overall the best performance relative to the existing unsupervised methods across different datasets.
Collapse
Affiliation(s)
| | - Saiteja Danda
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, ON, Canada.
| |
Collapse
|
15
|
Abstract
Epigenome regulation has emerged as an important mechanism for the maintenance of organ function in health and disease. Dissecting epigenomic alterations and resultant gene expression changes in single cells provides unprecedented resolution and insight into cellular diversity, modes of gene regulation, transcription factor dynamics and 3D genome organization. In this chapter, we summarize the transformative single-cell epigenomic technologies that have deepened our understanding of the fundamental principles of gene regulation. We provide a historical perspective of these methods, brief procedural outline with emphasis on the computational tools used to meaningfully dissect information. Our overall goal is to aid scientists using these technologies in their favorite system of interest.
Collapse
Affiliation(s)
- Krystyna Mazan-Mamczarz
- Laboratory of Genetics and Genomics, National Institute on Aging (NIA), Intramural Research Program (IRP), National Institutes of Health (NIH), Baltimore, MD, USA
| | - Jisu Ha
- Laboratory of Genetics and Genomics, National Institute on Aging (NIA), Intramural Research Program (IRP), National Institutes of Health (NIH), Baltimore, MD, USA
| | - Supriyo De
- Laboratory of Genetics and Genomics, National Institute on Aging (NIA), Intramural Research Program (IRP), National Institutes of Health (NIH), Baltimore, MD, USA
- Laboratory of Genetics and Genomics, and Computational Biology and Genomics Core, National Institute on Aging-Intramural Research Program, National Institute of Health, Baltimore, MD, USA
| | - Payel Sen
- Laboratory of Genetics and Genomics, National Institute on Aging (NIA), Intramural Research Program (IRP), National Institutes of Health (NIH), Baltimore, MD, USA.
| |
Collapse
|
16
|
Abstract
High-throughput single-cell transcriptomic approaches have revolutionized our view of gene expression at the level of individual cells, providing new insights into their heterogeneity, identities, and functions. Recently, technical challenges to the application of single-cell transcriptomics to plants have been overcome, and many plant organs and tissues have now been subjected to analyses at single-cell resolution. In this review, we describe these studies and their impact on our understanding of the diversity, differentiation, and activities of plant cells. We particularly highlight their impact on plant cell identity, including unprecedented views of cell transitions and definitions of rare and novel cell types. We also point out current challenges and future opportunities for the application and analyses of single-cell transcriptomics in plants. Expected final online publication date for the Annual Review of Genetics, Volume 55 is November 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Kook Hui Ryu
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, Michigan 48109, USA; , ,
| | - Yan Zhu
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, Michigan 48109, USA; , ,
| | - John Schiefelbein
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, Michigan 48109, USA; , ,
| |
Collapse
|
17
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. UICPC: Centrality-based clustering for scRNA-seq data analysis without user input. Comput Biol Med 2021; 137:104820. [PMID: 34508973 DOI: 10.1016/j.compbiomed.2021.104820] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/24/2021] [Accepted: 08/27/2021] [Indexed: 11/16/2022]
Abstract
scRNA-seq data analysis enables new possibilities for identification of novel cells, specific characterization of known cells and study of cell heterogeneity. The performance of most clustering methods especially developed for scRNA-seq is greatly influenced by user input. We propose a centrality-clustering method named UICPC and compare its performance with 9 state-of-the-art clustering methods on 11 real-world scRNA-seq datasets to demonstrate its effectiveness and usefulness in discovering cell groups. Our method does not require user input. However, it requires settings of threshold, which are benchmarked after performing extensive experiments. We observe that most compared approaches show poor performance due to high heterogeneity and large dataset dimensions. However, UICPC shows excellent performance in terms of NMI, Purity and ARI, respectively. UICPC is available as an R package and can be downloaded by clicking the link https://sites.google.com/view/hussinchowdhury/software.
Collapse
Affiliation(s)
| | | | - Jugal Kumar Kalita
- Computer Science, College of Engineering and Applied Science, University of Colorado, Colorado Springs, CO, 80933-7150, USA.
| |
Collapse
|
18
|
Zhang W, Xue X, Zheng X, Fan Z. NMFLRR: Clustering scRNA-seq data by integrating non-negative matrix factorization with low rank representation. IEEE J Biomed Health Inform 2021; 26:1394-1405. [PMID: 34310328 DOI: 10.1109/jbhi.2021.3099127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Fast-developing single-cell technologies create unprecedented opportunities to reveal cell heterogeneity and diversity. Accurate classification of single cells is a critical prerequisite for recovering the mechanisms of heterogeneity. However, the scRNA-seq profiles we obtained at present have high dimensionality, sparsity, and noise, which pose challenges for existing clustering methods in grouping cells that belong to the same subpopulation based on transcriptomic profiles. Although many computational methods have been proposed developing novel and effective computational methods to accurately identify cell types remains a considerable challenge. We present a new computational framework to identify cell types by integrating low-rank representation (LRR) and nonnegative matrix factorization (NMF); this framework is named NMFLRR. The LRR captures the global properties of original data by using nuclear norms, and a locality constrained graph regularization term is introduced to characterize the data's local geometric information. The similarity matrix and low-dimensional features of data can be simultaneously obtained by applying the alternating direction method of multipliers (ADMM) algorithm to handle each variable alternatively in an iterative way. We finally obtained the predicted cell types by using a spectral algorithm based on the optimized similarity matrix. Nine real scRNA-seq datasets were used to test the performance of NMFLRR and fifteen other competitive methods, and the accuracy and robustness of the simulation results suggest the NMFLRR is a promising algorithm for the classification of single cells. The simulation code is freely available at: https://github.com/wzhangwhu/NMFLRR_code.
Collapse
|
19
|
Kopf A, Fortuin V, Somnath VR, Claassen M. Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations on single cell data. PLoS Comput Biol 2021; 17:e1009086. [PMID: 34191792 PMCID: PMC8277074 DOI: 10.1371/journal.pcbi.1009086] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 07/13/2021] [Accepted: 05/14/2021] [Indexed: 12/17/2022] Open
Abstract
Clustering high-dimensional data, such as images or biological measurements, is a long-standing problem and has been studied extensively. Recently, Deep Clustering has gained popularity due to its flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model. The model can learn multi-modal distributions of high-dimensional data and use these to generate realistic data with high efficacy and efficiency. MoE-Sim-VAE is based on a Variational Autoencoder (VAE), where the decoder consists of a Mixture-of-Experts (MoE) architecture. This specific architecture allows for various modes of the data to be automatically learned by means of the experts. Additionally, we encourage the lower dimensional latent representation of our model to follow a Gaussian mixture distribution and to accurately represent the similarities between the data points. We assess the performance of our model on the MNIST benchmark data set and challenging real-world tasks of clustering mouse organs from single-cell RNA-sequencing measurements and defining cell subpopulations from mass cytometry (CyTOF) measurements on hundreds of different datasets. MoE-Sim-VAE exhibits superior clustering performance on all these tasks in comparison to the baselines as well as competitor methods.
Collapse
Affiliation(s)
- Andreas Kopf
- Institute of Molecular Systems Biology, Department of Biology, ETH Zürich, Zurich, Switzerland
- Life Science Graduate School Zurich, PhD Program Systems Biology, Zurich, Switzerland
| | - Vincent Fortuin
- Biomedical Informatics Group, Department of Computer Science, ETH Zürich, Zurich, Switzerland
- Swiss Institute of Bioinformatics (SIB), Zurich, Switzerland
| | - Vignesh Ram Somnath
- Institute of Molecular Systems Biology, Department of Biology, ETH Zürich, Zurich, Switzerland
| | - Manfred Claassen
- Division of Clinical Bioinformatics, Department of Internal Medicine I, University of Tübingen, Tübingen, Germany
- * E-mail:
| |
Collapse
|
20
|
Gaydosik AM, Tabib T, Domsic R, Khanna D, Lafyatis R, Fuschiotti P. Single-cell transcriptome analysis identifies skin-specific T-cell responses in systemic sclerosis. Ann Rheum Dis 2021; 80:1453-1460. [PMID: 34031030 DOI: 10.1136/annrheumdis-2021-220209] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 05/08/2021] [Indexed: 12/13/2022]
Abstract
OBJECTIVES Although T cells have been implicated in the pathogenesis of systemic sclerosis (SSc), a comprehensive study of T-cell-mediated immune responses in the affected skin of patients with progressive SSc is lacking. Droplet-based single-cell transcriptome analysis of SSc skin biopsies opens avenues for dissecting patient-specific T-cell heterogeneity, providing a basis for identifying novel gene expression related to functional pathways associated with severity of SSc skin disease. METHODS Single-cell RNA sequencing was performed by droplet-based sequencing (10x Genomics), focusing on 3729 CD3+ lymphocytes (867 cells from normal and 2862 cells from SSc skin samples) from skin biopsies of 27 patients with active SSc and 10 healthy donors. Confocal immunofluorescence microscopy of progressive SSc skin samples validated transcriptional results and visualised spatial localisations of T-cell subsets. RESULTS We identified several subsets of recirculating and tissue-resident T cells in healthy and SSc skin that were associated with distinct signalling pathways. While most clusters shared a common gene expression signature between patients and controls, we identified a unique cluster of recirculating CXCL13+ T cells in SSc skin which expressed a T helper follicular-like gene expression signature and that appears to be poised to promote B-cell responses within the inflamed skin of patients. CONCLUSIONS Current available therapies to reverse or even slow progression of SSc lead to broad killing of immune cells and consequent toxicities, including death. Identifying the precise immune mechanism(s) driving SSc pathogenesis could lead to innovative therapies that selectively target the aberrant immune response, resulting in better efficacy and less toxicity.
Collapse
Affiliation(s)
- Alyxzandria M Gaydosik
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Tracy Tabib
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Robyn Domsic
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Dinesh Khanna
- Division of Rheumatology, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Robert Lafyatis
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Patrizia Fuschiotti
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| |
Collapse
|
21
|
Moehlin J, Mollet B, Colombo BM, Mendoza-Parra MA. Inferring biologically relevant molecular tissue substructures by agglomerative clustering of digitized spatial transcriptomes with multilayer. Cell Syst 2021; 12:694-705.e3. [PMID: 34159899 DOI: 10.1016/j.cels.2021.04.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 01/08/2021] [Accepted: 04/13/2021] [Indexed: 01/04/2023]
Abstract
Spatially resolved transcriptomics (SrT) can investigate organ or tissue architecture from the angle of gene programs that define their molecular complexity. However, computational methods to analyze SrT data underexploit their spatial signature. Inspired by contextual pixel classification strategies applied to image analysis, we developed MULTILAYER to stratify maps into functionally relevant molecular substructures. MULTILAYER applies agglomerative clustering within contiguous locally defined transcriptomes (gene expression elements or "gexels") combined with community detection methods for graphical partitioning. MULTILAYER resolves molecular tissue substructures within a variety of SrT data with superior performance to commonly used dimensionality reduction strategies and still detects differentially expressed genes on par with existing methods. MULTILAYER can process high-resolution as well as multiple SrT data in a comparative mode, anticipating future needs in the field. MULTILAYER provides a digital image perspective for SrT analysis and opens the door to contextual gexel classification strategies for developing self-supervised molecular diagnosis solutions. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Julien Moehlin
- Génomique métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Bastien Mollet
- Génomique métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France; École Normale Supérieure de Lyon, Université Claude Bernard - Lyon 1, Université de Lyon, 69342 Lyon Cedex 07, France
| | - Bruno Maria Colombo
- Génomique métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Marco Antonio Mendoza-Parra
- Génomique métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| |
Collapse
|
22
|
Nussbaum YI, Manjunath Y, Suvilesh KN, Warren WC, Shyu CR, Kaifi JT, Ciorba MA, Mitchem JB. Current and Prospective Methods for Assessing Anti-Tumor Immunity in Colorectal Cancer. Int J Mol Sci 2021; 22:4802. [PMID: 33946558 PMCID: PMC8125332 DOI: 10.3390/ijms22094802] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 04/23/2021] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open
Abstract
Colorectal cancer (CRC) remains one of the deadliest malignancies worldwide despite recent progress in treatment strategies. Though immune checkpoint inhibition has proven effective for a number of other tumors, it offers benefits in only a small group of CRC patients with high microsatellite instability. In general, heterogenous cell groups in the tumor microenvironment are considered as the major barrier for unveiling the causes of low immune response. Therefore, deconvolution of cellular components in highly heterogeneous microenvironments is crucial for understanding the immune contexture of cancer. In this review, we assimilate current knowledge and recent studies examining anti-tumor immunity in CRC. We also discuss the utilization of novel immune contexture assessment methods that have not been used in CRC research to date.
Collapse
Affiliation(s)
- Yulia I. Nussbaum
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65201, USA; (Y.I.N.); (C.-R.S.); (J.T.K.)
| | - Yariswamy Manjunath
- Department of Surgery, Columbia, MO 65212, USA; (Y.M.); (K.N.S.); (W.C.W.)
- Harry S. Truman Memorial Veterans’ Hospital, Columbia, MO 65201, USA
| | - Kanve N. Suvilesh
- Department of Surgery, Columbia, MO 65212, USA; (Y.M.); (K.N.S.); (W.C.W.)
| | - Wesley C. Warren
- Department of Surgery, Columbia, MO 65212, USA; (Y.M.); (K.N.S.); (W.C.W.)
- Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Chi-Ren Shyu
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65201, USA; (Y.I.N.); (C.-R.S.); (J.T.K.)
| | - Jussuf T. Kaifi
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65201, USA; (Y.I.N.); (C.-R.S.); (J.T.K.)
- Department of Surgery, Columbia, MO 65212, USA; (Y.M.); (K.N.S.); (W.C.W.)
- Harry S. Truman Memorial Veterans’ Hospital, Columbia, MO 65201, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA;
| | - Matthew A. Ciorba
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA;
- Division of Gastroenterology, Department of Medicine, Washington School of Medicine, St. Louis, MO 63110, USA
| | - Jonathan B. Mitchem
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65201, USA; (Y.I.N.); (C.-R.S.); (J.T.K.)
- Department of Surgery, Columbia, MO 65212, USA; (Y.M.); (K.N.S.); (W.C.W.)
- Harry S. Truman Memorial Veterans’ Hospital, Columbia, MO 65201, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA;
| |
Collapse
|
23
|
Xi NM, Li JJ. Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data. Cell Syst 2021; 12:176-194.e6. [PMID: 33338399 PMCID: PMC7897250 DOI: 10.1016/j.cels.2020.11.008] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 10/06/2020] [Accepted: 11/19/2020] [Indexed: 12/29/2022]
Abstract
In single-cell RNA sequencing (scRNA-seq), doublets form when two cells are encapsulated into one reaction volume. The existence of doublets, which appear to be-but are not-real cells, is a key confounder in scRNA-seq data analysis. Computational methods have been developed to detect doublets in scRNA-seq data; however, the scRNA-seq field lacks a comprehensive benchmarking of these methods, making it difficult for researchers to choose an appropriate method for specific analyses. We conducted a systematic benchmark study of nine cutting-edge computational doublet-detection methods. Our study included 16 real datasets, which contained experimentally annotated doublets, and 112 realistic synthetic datasets. We compared doublet-detection methods regarding detection accuracy under various experimental settings, impacts on downstream analyses, and computational efficiencies. Our results show that existing methods exhibited diverse performance and distinct advantages in different aspects. Overall, the DoubletFinder method has the best detection accuracy, and the cxds method has the highest computational efficiency. A record of this paper's transparent peer review process is included in the Supplemental Information.
Collapse
Affiliation(s)
- Nan Miles Xi
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA; Department of Human Genetics, University of California, Los Angeles, CA 90095-7088, USA; Department of Computational Medicine, University of California, Los Angeles, CA 90095-1766, USA.
| |
Collapse
|
24
|
Technique of Gene Expression Profiles Extraction Based on the Complex Use of Clustering and Classification Methods. Diagnostics (Basel) 2020; 10:diagnostics10080584. [PMID: 32806785 PMCID: PMC7460566 DOI: 10.3390/diagnostics10080584] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 08/10/2020] [Accepted: 08/11/2020] [Indexed: 11/16/2022] Open
Abstract
In this paper, we present the results of the research concerning extraction of informative gene expression profiles from high-dimensional array of gene expressions considering the state of patients' health using clustering method, ML-based binary classifiers and fuzzy inference system. Applying of the proposed stepwise procedure can allow us to extract the most informative genes taking into account both the subtypes of disease or state of the patient's health for further reconstruction of gene regulatory networks based on the allocated genes and following simulation of the reconstructed models. We used the publicly available gene expressions data as the experimental ones which were obtained using DNA microarray experiments and contained two types of patients' gene expression profiles-the patients with lung cancer tumor and healthy patients. The stepwise procedure of the data processing assumes the following steps-in the beginning, we reduce the number of genes by removing non-informative genes in terms of statistical criteria and Shannon entropy; then, we perform the stepwise hierarchical clustering of gene expression profiles at hierarchical levels from 1 to 10 using the SOTA (Self-Organizing Tree Algorithm) clustering algorithm with correlation distance metric. The quality of the obtained clustering was evaluated using the complex clustering quality criterion which is considered both the gene expression profiles distribution relative to center of the clusters where these gene expression profiles are allocated and the centers of the clusters distribution. The result of this stage execution was a selection of the optimal cluster at each of the hierarchical levels which corresponded to the minimum value of the quality criterion. At the next step, we have implemented a classification procedure of the examined objects using four well known binary classifiers-logistic regression, support-vector machine, decision trees and random forest classifier. The effectiveness of the appropriate technique was evaluated based on the use of ROC (Receiver Operating Characteristic) analysis using criteria, included as the components, the errors of both the first and the second kinds. The final decision concerning the extraction of the most informative subset of gene expression profiles was taken based on the use of the fuzzy inference system, the inputs of which are the results of the appropriate single classifiers operation and the output is the final solution concerning state of the patient's health. To our mind, the implementation of the proposed stepwise procedure of the informative gene expression profiles extraction create the conditions for the increasing effectiveness of the further procedure of gene regulatory networks reconstruction and the following simulation of the reconstructed models considering the subtypes of the disease and/or state of the patient's health.
Collapse
|