1
|
Gong M, Yu Y, Wang Z, Zhang J, Wang X, Fu C, Zhang Y, Wang X. scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis. Comput Biol Med 2024; 171:108230. [PMID: 38442554 DOI: 10.1016/j.compbiomed.2024.108230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/06/2024] [Accepted: 02/25/2024] [Indexed: 03/07/2024]
Abstract
Interpreting single-cell chromatin accessibility data is crucial for understanding intercellular heterogeneity regulation. Despite the progress in computational methods for analyzing this data, there is still a lack of a comprehensive analytical framework and a user-friendly online analysis tool. To fill this gap, we developed a pre-trained deep learning-based framework, single-cell auto-correlation transformers (scAuto), to overcome the challenge. Following DNABERT's methodology of pre-training and fine-tuning, scAuto learns a general understanding of DNA sequence's grammar by being pre-trained on unlabeled human genome via self-supervision; it is then transferred to the single-cell chromatin accessibility analysis task of scATAC-seq data for supervised fine-tuning. We extensively validated scAuto on the Buenrostro2018 dataset, demonstrating its superior performance on chromatin accessibility prediction, single-cell clustering, and data denoising. Based on scAuto, we further developed an interactive web server for single-cell chromatin accessibility data analysis. It integrates tutorial-style interfaces for those with limited programming skills. The platform is accessible at http://zhanglab.icaup.cn. To our knowledge, this work is expected to help analyze single-cell chromatin accessibility data and facilitate the development of precision medicine.
Collapse
Affiliation(s)
- Meiqin Gong
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China
| | - Yun Yu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Zixuan Wang
- College of Electronics and information Engineering, SiChuan University, Chengdu, 610065, China
| | - Junming Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiongyi Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Cheng Fu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiaodong Wang
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
2
|
Yano A, Liu S, Suzuki Y, Imai M, Mogi M, Sugiyama T. Single-cell transcriptomic architecture and cellular communication circuits of parametrial adipose tissue in pregnant mice. Life Sci 2023; 334:122214. [PMID: 37907153 DOI: 10.1016/j.lfs.2023.122214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 10/16/2023] [Accepted: 10/25/2023] [Indexed: 11/02/2023]
Abstract
AIMS The activity and interactions of cellular subpopulations in the adipose tissue microenvironment are critical for the coordination of local and systemic adaptation during pregnancy. With a particular interest in parametrial adipose tissue (PmAT), single-cell RNA-sequencing (scRNA-seq) was utilized to unveil the gestative cellular composition and functional shift. MATERIALS AND METHODS To identify cell-type-enriched transcriptome profiles, a total of 18,074 cells in adipose tissue were studied. The cell populations were cataloged, and signaling crosstalk between adipocytes and other composition factions via soluble and membrane-bound factors were evaluated. KEY FINDINGS A marked decline of pregnancy adipocytes and relative elevation of non-adipocyte fractions were observed. A subpopulation of adipocytes, Adipo_5, with unique properties in the response to estrogen and the embryonic processes involved in pregnancy, was defined. Interactome analysis revealed the potential contribution of PmAT to the establishment of maternal-fetal immune tolerance. During gestation, adipocytes shut down outgoing signaling, resulting in deterioration of the resistin-related incoming signaling network in B cells, which would therefore benefit tissue-specific maternal-fetal tolerance. Furthermore, a subpopulation of adipocytes, Aipo_2, was also considered to take part in a paradigm shift in the process of pregnancy-induced chemical stiffness-triggered vesicular remodeling via the THBS signaling pathway network. SIGNIFICANCE These data-derived findings will encourage investigation into the role of pregnant PmTA in pregnancy-related immunological, hypertensive and metabolic disorders, with the ultimate goal of establishing preventive strategies to mitigate these pregnancy-related health challenges. This translational aspect of our work holds significant promise for improving maternal and fetal well-being.
Collapse
Affiliation(s)
- Akiko Yano
- Department of Obstetrics & Gynecology, Ehime University School of Medicine, Shitsukawa, Toon, Ehime, Japan; Department of Pharmacology, Ehime University Graduate School of Medicine, Shitsukawa, Toon, Ehime, Japan
| | - Shuang Liu
- Department of Pharmacology, Ehime University Graduate School of Medicine, Shitsukawa, Toon, Ehime, Japan.
| | - Yasuyuki Suzuki
- Department of Pharmacology, Ehime University Graduate School of Medicine, Shitsukawa, Toon, Ehime, Japan; Department of Anesthesiology, Saiseikai Matsuyama Hospital, Matsuyama, Japan; Research Division, Saiseikai Research Institute of Health Care and Welfare, Tokyo, Japan
| | - Matome Imai
- Department of Obstetrics & Gynecology, Ehime University School of Medicine, Shitsukawa, Toon, Ehime, Japan; Department of Pharmacology, Ehime University Graduate School of Medicine, Shitsukawa, Toon, Ehime, Japan
| | - Masaki Mogi
- Department of Pharmacology, Ehime University Graduate School of Medicine, Shitsukawa, Toon, Ehime, Japan
| | - Takashi Sugiyama
- Department of Obstetrics & Gynecology, Ehime University School of Medicine, Shitsukawa, Toon, Ehime, Japan
| |
Collapse
|
3
|
Tan J, Zhao Y, Burns CC, Tian D, Zhao K. Novel Network Method Major Minor Variation Clustering Enables Identification of Poliovirus Clusters with High-Resolution Linkages. J Comput Biol 2023; 30:409-419. [PMID: 36112351 PMCID: PMC11299649 DOI: 10.1089/cmb.2022.0292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Global Polio Eradication Initiative uses an outbreak response protocol that defines type 2 Sabin or Sabin-like virus as those with 0-5 nucleotides diverging from their parental strain in the complete VP1 genomic region. Sabin or Sabin-like viruses share highly similar genome sequences, regardless of their origin. Thus, it is challenging to distinguish viruses at a higher resolution to detect polio clusters or trace sources for local transmissions of viruses at an early stage. To identify type 2 Sabin or Sabin-like sources and improve our ability to map viral sources to campaigns during the polio endgame, we investigated the feasibility of a new method for genetic sequence analysis. We named the method Major Minor Variation Clustering (MMVC), which uses a network model to simultaneously incorporate sequence similarity in major and minor variants in addition to onset dates to detect fine-scale polio clusters. Each identified cluster represents a collection of sequences that are highly similar in both major and minor variants, enabling the discovery of new links between viruses. By applying the method to a published data set collected in Nigeria during 2009-2012, we found that clusters identified using this method have several improvements over clusters derived from a phylogenetic tree approach. Integrative data analysis reveals that sequences in the same cluster have greater genomic similarities and better agreement with onset dates. As a complement to current phylogenetic tree approaches, MMVC has the potential to improve epidemiological surveillance and investigation precision to guide polio eradication.
Collapse
Affiliation(s)
- Jiahui Tan
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, China
| | - Yutong Zhao
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, China
| | - Cara C Burns
- Polio and Picornavirus Laboratory Branch, Division of Viral Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Dechao Tian
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, China
| | - Kun Zhao
- Polio and Picornavirus Laboratory Branch, Division of Viral Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| |
Collapse
|
4
|
Zhao J, Luo Y, Xiao R, Wu R, Fan T. Tri-Training Algorithm for Adaptive Nearest Neighbor Density Editing and Cross Entropy Evaluation. ENTROPY (BASEL, SWITZERLAND) 2023; 25:480. [PMID: 36981368 PMCID: PMC10047771 DOI: 10.3390/e25030480] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/14/2023] [Accepted: 03/02/2023] [Indexed: 06/18/2023]
Abstract
Tri-training expands the training set by adding pseudo-labels to unlabeled data, which effectively improves the generalization ability of the classifier, but it is easy to mislabel unlabeled data into training noise, which damages the learning efficiency of the classifier, and the explicit decision mechanism tends to make the training noise degrade the accuracy of the classification model in the prediction stage. This study proposes the Tri-training algorithm for adaptive nearest neighbor density editing and cross-entropy evaluation (TTADEC), which is used to reduce the training noise formed during the classifier iteration and to solve the problem of inaccurate prediction by explicit decision mechanism. First, the TTADEC algorithm uses the nearest neighbor editing to label high-confidence samples. Then, combined with the relative nearest neighbor to define the local density of samples to screen the pre-training samples, and then dynamically expand the training set by adaptive technique. Finally, the decision process uses cross-entropy to evaluate the completed base classifier of training and assign appropriate weights to it to construct a decision function. The effectiveness of the TTADEC algorithm is verified on the UCI dataset, and the experimental results show that compared with the standard Tri-training algorithm and its improvement algorithm, the TTADEC algorithm has better classification performance and can effectively deal with the semi-supervised classification problem where the training set is insufficient.
Collapse
Affiliation(s)
- Jia Zhao
- School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China
| | - Yuhang Luo
- School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China
| | - Renbin Xiao
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Runxiu Wu
- School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China
| | - Tanghuai Fan
- School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China
| |
Collapse
|
5
|
Liao N, Li C, Cao L, Chen Y, Ren C, Chen X, Mok H, Wen L, Li K, Wang Y, Zhang Y, Li Y, Lv J, Cao F, Luo Y, Li H, Wu W, Balch CM, Giuliano AE. Single-cell profile of tumor and immune cells in primary breast cancer, sentinel lymph node, and metastatic lymph node. Breast Cancer 2023; 30:77-87. [PMID: 36129636 DOI: 10.1007/s12282-022-01400-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 08/24/2022] [Indexed: 01/07/2023]
Abstract
PURPOSE Little is known about the host-tumor interaction in the lymph-node basin at a single cell level. This study examines single cell sequences in breast cancer nodal metastases of a patient with triple-negative breast cancer. METHODS The primary breast tumor, sentinel lymph node, an adjacent lymph node with metastatic involvement and a clinically normal-appearing lymph node were collected during surgery. Single-cell sequencing was performed on all four specimens. RESULTS 14,016 cells were clustered into 6 cell subpopulations. Cancer cells demonstrated the molecular characteristics of TNBC basal B subtype and highly expressed genes in the MAPK signaling cascade. Tumor-associated macrophages regulated antigen processing and presentation and other immune-related pathways to promote tumor invasion. CD8 + and CD4 + T lymphocytes concentrated more in sentinel lymph node and mainly stratified into two transcriptional states. The immune-cell amount variation among primary tumor, sentinel and normal lymph nodes showed a similar tendency between the sc-RNA-seq profile of TNBC samples and a previous reported bulk RNA-seq profile of a breast cancer cohort, including all four breast cancer subtype samples. DISCUSSION Single-cell sequencing analysis suggested that the sentinel lymph node was the initial meeting site of tumor infiltration and immune response, where partial T lymphocytes perform anti-tumor activity, while other T cells exhibit an exhausted state. We proposed a molecular explanation to the well-established clinical principle that the 5-year and 10-year survival outcomes were noninferior between SLND and ALND.
Collapse
Affiliation(s)
- Ning Liao
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China.
| | - Cheukfai Li
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Li Cao
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Yanhua Chen
- Berry Oncology Corporation, No.2 Road Donghu, Fuzhou, 350200, China
| | - Chongyang Ren
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Xiaoqing Chen
- Foshan Maternity and Children's Healthcare Hospital, Affiliated to Southern Medical University, Foshan, China
| | - Hsiaopei Mok
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Lingzhu Wen
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Kai Li
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Yulei Wang
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Yuchen Zhang
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Yingzi Li
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Jiaoyi Lv
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Fangrong Cao
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Yuting Luo
- Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, 106 Zhongshan Er Road, Guangzhou, 510080, China
| | - Hongrui Li
- Berry Oncology Corporation, No.2 Road Donghu, Fuzhou, 350200, China
| | - Wendy Wu
- Berry Oncology Corporation, No.2 Road Donghu, Fuzhou, 350200, China.
| | - Charles M Balch
- University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | |
Collapse
|
6
|
Huang Y, Chang H, Chen X, Meng J, Han M, Huang T, Yuan L, Zhang G. A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data. QUANTITATIVE BIOLOGY 2023. [DOI: 10.15302/j-qb-022-0311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
|
7
|
Liu Q, Zhao X, Wang G. A Clustering Ensemble Method for Cell Type Detection by Multiobjective Particle Optimization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1-14. [PMID: 34860653 DOI: 10.1109/tcbb.2021.3132400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a new technology different from previous sequencing methods that measure the average expression level for each gene across a large population of cells. Thus, new computational methods are required to reveal cell types among cell populations. We present a clustering ensemble algorithm using optimized multiobjective particle (CEMP). It is featured with several mechanisms: 1) A multi-subspace projection method for mapping the original data to low-dimensional subspaces is applied in order to detect complex data structure at both gene level and sample level. 2) The basic partition module in different subspaces is utilized to generate clustering solutions. 3) A transforming representation between clusters and particles is used to bridge the gap between the discrete clustering ensemble optimization problem and the continuous multiobjective optimization algorithm. 4) We propose a clustering ensemble optimization. To guide the multiobjective ensemble optimization process, three cluster metrics are embedded into CEMP as objective functions in which the final clustering will be dynamically evaluated. Experiments on 9 real scRNA-seq datasets indicated that CEMP had superior performance over several other clustering algorithms in clustering accuracy and robustness. The case study conducted on mouse neuronal cells identified main cell types and cell subtypes successfully.
Collapse
|
8
|
Sengupta S, Das S, Crespo AC, Cornel AM, Patel AG, Mahadevan NR, Campisi M, Ali AK, Sharma B, Rowe JH, Huang H, Debruyne DN, Cerda ED, Krajewska M, Dries R, Chen M, Zhang S, Soriano L, Cohen MA, Versteeg R, Jaenisch R, Spranger S, Romee R, Miller BC, Barbie DA, Nierkens S, Dyer MA, Lieberman J, George RE. Mesenchymal and adrenergic cell lineage states in neuroblastoma possess distinct immunogenic phenotypes. NATURE CANCER 2022; 3:1228-1246. [PMID: 36138189 PMCID: PMC10171398 DOI: 10.1038/s43018-022-00427-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 07/20/2022] [Indexed: 11/08/2022]
Abstract
Apart from the anti-GD2 antibody, immunotherapy for neuroblastoma has had limited success due to immune evasion mechanisms, coupled with an incomplete understanding of predictors of response. Here, from bulk and single-cell transcriptomic analyses, we identify a subset of neuroblastomas enriched for transcripts associated with immune activation and inhibition and show that these are predominantly characterized by gene expression signatures of the mesenchymal lineage state. By contrast, tumors expressing adrenergic lineage signatures are less immunogenic. The inherent presence or induction of the mesenchymal state through transcriptional reprogramming or therapy resistance is accompanied by innate and adaptive immune gene activation through epigenetic remodeling. Mesenchymal lineage cells promote T cell infiltration by secreting inflammatory cytokines, are efficiently targeted by cytotoxic T and natural killer cells and respond to immune checkpoint blockade. Together, we demonstrate that distinct immunogenic phenotypes define the divergent lineage states of neuroblastoma and highlight the immunogenic potential of the mesenchymal lineage.
Collapse
Affiliation(s)
- Satyaki Sengupta
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Sanjukta Das
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Angela C Crespo
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Annelisa M Cornel
- Center for Translational Immunology, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Princess Máxima Center for Pediatric Oncology, Utrecht University, Utrecht, The Netherlands
| | - Anand G Patel
- Department of Developmental Neurobiology, St Jude Children's Research Hospital, Memphis, TN, USA
- Department of Oncology, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Navin R Mahadevan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Marco Campisi
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Alaa K Ali
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Cellular Therapy and Stem Cell Transplant Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Bandana Sharma
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Jared H Rowe
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Hao Huang
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - David N Debruyne
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Esther D Cerda
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Malgorzata Krajewska
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Ruben Dries
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Minyue Chen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Shupei Zhang
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
| | - Luigi Soriano
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Malkiel A Cohen
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
| | - Rogier Versteeg
- Department of Oncogenomics, University Medical Center Amsterdam, University of Amsterdam, Amsterdam, The Netherlands
| | - Rudolf Jaenisch
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Stefani Spranger
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research, Cambridge, MA, USA
| | - Rizwan Romee
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Cellular Therapy and Stem Cell Transplant Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Brian C Miller
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Immunology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Evergrande Center for Immunological Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA, USA
| | - David A Barbie
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Stefan Nierkens
- Center for Translational Immunology, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Princess Máxima Center for Pediatric Oncology, Utrecht University, Utrecht, The Netherlands
| | - Michael A Dyer
- Department of Developmental Neurobiology, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Judy Lieberman
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Rani E George
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
9
|
Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nat Commun 2022; 13:5455. [PMID: 36114209 PMCID: PMC9481560 DOI: 10.1038/s41467-022-33136-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 09/05/2022] [Indexed: 11/30/2022] Open
Abstract
Clustering is a powerful machine learning method for discovering similar patterns according to the proximity of elements in feature space. It is widely used in computer science, bioscience, geoscience, and economics. Although the state-of-the-art partition-based and connectivity-based clustering methods have been developed, weak connectivity and heterogeneous density in data impede their effectiveness. In this work, we propose a boundary-seeking Clustering algorithm using the local Direction Centrality (CDC). It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points, thereby preventing cross-cluster connections and separating weakly-connected clusters. We demonstrate the validity of CDC by detecting complex structured clusters in challenging synthetic datasets, identifying cell types from single-cell RNA sequencing (scRNA-seq) and mass cytometry (CyTOF) data, recognizing speakers on voice corpuses, and testifying on various types of real-world benchmarks. Clustering is a powerful machine learning method for discovering similar patterns according to the proximity of elements in feature space. Here the authors propose a local direction centrality clustering algorithm that copes with heterogeneous density and weak connectivity issues.
Collapse
|
10
|
Blair AP, Hu RK, Farah EN, Chi NC, Pollard KS, Przytycki PF, Kathiriya IS, Bruneau BG. Cell Layers: uncovering clustering structure in unsupervised single-cell transcriptomic analysis. BIOINFORMATICS ADVANCES 2022; 2:vbac051. [PMID: 35967929 PMCID: PMC9362878 DOI: 10.1093/bioadv/vbac051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/23/2022] [Accepted: 08/01/2022] [Indexed: 11/19/2022]
Abstract
Motivation Unsupervised clustering of single-cell transcriptomics is a powerful method for identifying cell populations. Static visualization techniques for single-cell clustering only display results for a single resolution parameter. Analysts will often evaluate more than one resolution parameter but then only report one. Results We developed Cell Layers, an interactive Sankey tool for the quantitative investigation of gene expression, co-expression, biological processes and cluster integrity across clustering resolutions. Cell Layers enhances the interpretability of single-cell clustering by linking molecular data and cluster evaluation metrics, providing novel insight into cell populations. Availability and implementation https://github.com/apblair/CellLayers.
Collapse
Affiliation(s)
- Andrew P Blair
- Biological and Medical Informatics Graduate Program, University of California, San Francisco, CA 94143, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Robert K Hu
- Division of Cardiology, Department of Medicine, University of California, San Diego, CA 92093, USA
| | - Elie N Farah
- Division of Cardiology, Department of Medicine, University of California, San Diego, CA 92093, USA
- Biomedical Sciences Graduate Program, University of California, San Diego, CA 92093, USA
| | - Neil C Chi
- Division of Cardiology, Department of Medicine, University of California, San Diego, CA 92093, USA
- Institute for Genomic Medicine, University of California, San Diego, CA 92093, USA
| | - Katherine S Pollard
- Gladstone Institutes, San Francisco, CA 94158, USA
- Chan-Zuckerberg Biohub, San Francisco, CA 94143, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA 94143, USA
- Institute for Human Genetics, University of California, San Francisco, CA 94143, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94143, USA
- Quantitative Biology Institute, University of California, San Francisco, CA 94143, USA
| | | | - Irfan S Kathiriya
- Gladstone Institutes, San Francisco, CA 94158, USA
- Department of Anesthesia and Perioperative Care, University of California, San Francisco, CA 94143, USA
| | - Benoit G Bruneau
- Gladstone Institutes, San Francisco, CA 94158, USA
- Roddenberry Center for Stem Cell Biology and Medicine, Gladstone Institutes, San Francisco, CA 94158, USA
- Cardiovascular Research Institute, University of California, San Francisco, CA 94143, USA
- Department of Pediatrics, University of California, San Francisco, CA 94143, USA
| |
Collapse
|
11
|
Zhu X, Li J, Lin Y, Zhao L, Wang J, Peng X. Dimensionality Reduction of Single-Cell RNA Sequencing Data by Combining Entropy and Denoising AutoEncoder. JOURNAL OF COMPUTATIONAL BIOLOGY : A JOURNAL OF COMPUTATIONAL MOLECULAR CELL BIOLOGY 2022; 29:1074-1084. [PMID: 35834604 DOI: 10.1089/cmb.2022.0118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
ABSTRACT Single-cell RNA sequencing (scRNA-seq) can present cellular heterogeneity at higher resolution when measuring the gene expression in an individual cell. However, there are still some computational problems in scRNA-seq data, including high dimensionality, high sparseness, and high noise. To solve them, dimensionality reduction is essential as it reduces dimensions and also removes most of the zeros and noise. Therefore, we propose a hybrid dimensionality reduction algorithm for scRNA-seq data by integrating binning-based entropy and a denoising autoencoder, named ScEDA. In ScEDA, a novel binning-based entropy estimation method is performed to select efficient genes, while removing noise. For each gene, binning-based entropy is designed to describe the differences in its expression across all cells, that is, the distribution of expression of each gene in all cells. Genes are regarded as inefficient and removed when they achieve low binning-based entropy. Moreover, by combining Kullback-Leibler (KL) divergence with the autoencoder, the objective function is reconstructed to maximize the similarity in distribution between input data and reconstructed data. Furthermore, by adding Poisson-distributed noise to the original input data, the denoising autoencoder is used to improve robustness. Compared with three other clustering methods, ScEDA provides superior average performance on 16 real scRNA-seq datasets, with obvious enhancement in large-scale datasets.
Collapse
Affiliation(s)
- Xiaoshu Zhu
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Jian Li
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Yongchang Lin
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Liquan Zhao
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiaoqing Peng
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| |
Collapse
|
12
|
scCAN: single-cell clustering using autoencoder and network fusion. Sci Rep 2022; 12:10267. [PMID: 35715568 PMCID: PMC9206025 DOI: 10.1038/s41598-022-14218-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 06/02/2022] [Indexed: 11/30/2022] Open
Abstract
Unsupervised clustering of single-cell RNA sequencing data (scRNA-seq) is important because it allows us to identify putative cell types. However, the large number of cells (up to millions), the high-dimensionality of the data (tens of thousands of genes), and the high dropout rates all present substantial challenges in single-cell analysis. Here we introduce a new method, named single-cell Clustering using Autoencoder and Network fusion (scCAN), that can overcome these challenges to accurately segregate different cell types in large and sparse scRNA-seq data. In an extensive analysis using 28 real scRNA-seq datasets (more than three million cells) and 243 simulated datasets, we validate that scCAN: (1) correctly estimates the number of true cell types, (2) accurately segregates cells of different types, (3) is robust against dropouts, and (4) is fast and memory efficient. We also compare scCAN with CIDR, SEURAT3, Monocle3, SHARP, and SCANPY. scCAN outperforms these state-of-the-art methods in terms of both accuracy and scalability. The scCAN package is available at https://cran.r-project.org/package=scCAN. Data and R scripts are available at http://sccan.tinnguyen-lab.com/
Collapse
|
13
|
Zhu X, Li J, Li HD, Xie M, Wang J. Sc-GPE: A Graph Partitioning-Based Cluster Ensemble Method for Single-Cell. Front Genet 2020; 11:604790. [PMID: 33384718 PMCID: PMC7770236 DOI: 10.3389/fgene.2020.604790] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 11/23/2020] [Indexed: 01/23/2023] Open
Abstract
Clustering is an efficient way to analyze single-cell RNA sequencing data. It is commonly used to identify cell types, which can help in understanding cell differentiation processes. However, different clustering results can be obtained from different single-cell clustering methods, sometimes including conflicting conclusions, and biologists will often fail to get the right clustering results and interpret the biological significance. The cluster ensemble strategy can be an effective solution for the problem. As the graph partitioning-based clustering methods are good at clustering single-cell, we developed Sc-GPE, a novel cluster ensemble method combining five single-cell graph partitioning-based clustering methods. The five methods are SNN-cliq, PhenoGraph, SC3, SSNN-Louvain, and MPGS-Louvain. In Sc-GPE, a consensus matrix is constructed based on the five clustering solutions by calculating the probability that the cell pairs are divided into the same cluster. It solved the problem in the hypergraph-based ensemble approach, including the different cluster labels that were assigned in the individual clustering method, and it was difficult to find the corresponding cluster labels across all methods. Then, to distinguish the different importance of each method in a clustering ensemble, a weighted consensus matrix was constructed by designing an importance score strategy. Finally, hierarchical clustering was performed on the weighted consensus matrix to cluster cells. To evaluate the performance, we compared Sc-GPE with the individual clustering methods and the state-of-the-art SAME-clustering on 12 single-cell RNA-seq datasets. The results show that Sc-GPE obtained the best average performance, and achieved the highest NMI and ARI value in five datasets.
Collapse
Affiliation(s)
- Xiaoshu Zhu
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China.,Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Jian Li
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Hong-Dong Li
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Miao Xie
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Jianxin Wang
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|