1
|
Dong M, Agrawal K, Fan R, Sefik E, Flavell RA, Kluger Y. Scaling deep identifiable models enables zero-shot characterization of single-cell biological states. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.11.566161. [PMID: 38014345 PMCID: PMC10680588 DOI: 10.1101/2023.11.11.566161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
How to identify true biological differences across samples while overcoming batch effects has been a persistent challenge in single-cell RNA-seq data analysis, hindering analyses across datasets for transferable biological findings. In this work, we show that scaling up deep identifiable models leads to a surprisingly effective solution for this challenging task. We developed scShift, a deep variational inference framework with theoretical support in disentangling batch-dependent and independent variations. By training the model with compendiums of scRNA-seq atlases, scShift shows remarkable zero-shot capabilities in revealing representations of cell types and biological states in single-cell data while overcoming batch effects. We employed scShift to systematically compare lung fibrosis states across different datasets, tissues and experimental systems. scShift uniquely extrapolates lung fibrosis states to previously unseen post-COVID-19 fibrosis, characterizing universal myeloid-fibrosis signatures, potential repurposing drug targets and fibrosis-associated cell interactions. Evaluations of over 200 trained scShift models demonstrate emergent zero-shot capabilities and a scaling law beyond a transition threshold, with respect to dataset diversity. With its scaling performance on massive single-cell compendiums and exceptional zero-shot capabilities, scShift represents an important advance toward next-generation computational models for single-cell analysis.
Collapse
|
2
|
Oh VKS, Li RW. Wise Roles and Future Visionary Endeavors of Current Emperor: Advancing Dynamic Methods for Longitudinal Microbiome Meta-Omics Data in Personalized and Precision Medicine. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400458. [PMID: 39535493 DOI: 10.1002/advs.202400458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 09/16/2024] [Indexed: 11/16/2024]
Abstract
Understanding the etiological complexity of diseases requires identifying biomarkers longitudinally associated with specific phenotypes. Advanced sequencing tools generate dynamic microbiome data, providing insights into microbial community functions and their impact on health. This review aims to explore the current roles and future visionary endeavors of dynamic methods for integrating longitudinal microbiome multi-omics data in personalized and precision medicine. This work seeks to synthesize existing research, propose best practices, and highlight innovative techniques. The development and application of advanced dynamic methods, including the unified analytical frameworks and deep learning tools in artificial intelligence, are critically examined. Aggregating data on microbes, metabolites, genes, and other entities offers profound insights into the interactions among microorganisms, host physiology, and external stimuli. Despite progress, the absence of gold standards for validating analytical protocols and data resources of various longitudinal multi-omics studies remains a significant challenge. The interdependence of workflow steps critically affects overall outcomes. This work provides a comprehensive roadmap for best practices, addressing current challenges with advanced dynamic methods. The review underscores the biological effects of clinical, experimental, and analytical protocol settings on outcomes. Establishing consensus on dynamic microbiome inter-studies and advancing reliable analytical protocols are pivotal for the future of personalized and precision medicine.
Collapse
Affiliation(s)
- Vera-Khlara S Oh
- Big Biomedical Data Integration and Statistical Analysis (DIANA) Research Center, Department of Data Science, College of Natural Sciences, Jeju National University, Jeju City, Jeju Do, 63243, South Korea
| | - Robert W Li
- United States Department of Agriculture, Agricultural Research Service, Animal Genomics and Improvement Laboratory, Beltsville, MD, 20705, USA
| |
Collapse
|
3
|
Wlosik J, Granjeaud S, Gorvel L, Olive D, Chretien AS. A beginner's guide to supervised analysis for mass cytometry data in cancer biology. Cytometry A 2024; 105:853-869. [PMID: 39486897 DOI: 10.1002/cyto.a.24901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 09/16/2024] [Accepted: 10/01/2024] [Indexed: 11/04/2024]
Abstract
Mass cytometry enables deep profiling of biological samples at single-cell resolution. This technology is more than relevant in cancer research due to high cellular heterogeneity and complexity. Downstream analysis of high-dimensional datasets increasingly relies on machine learning (ML) to extract clinically relevant information, including supervised algorithms for classification and regression purposes. In cancer research, they are used to develop predictive models that will guide clinical decision making. However, the development of supervised algorithms faces major challenges, such as sufficient validation, before being translated into the clinics. In this work, we provide a framework for the analysis of mass cytometry data with a specific focus on supervised algorithms and practical examples of their applications. We also raise awareness on key issues regarding good practices for researchers curious to implement supervised ML on their mass cytometry data. Finally, we discuss the challenges of supervised ML application to cancer research.
Collapse
Affiliation(s)
- Julia Wlosik
- Team 'Immunity and Cancer', Marseille Cancer Research Center, Inserm U1068, CNRS UMR7258, Paoli-Calmettes Institute, Aix-Marseille University UM105, Marseille, France
- Immunomonitoring Department, Paoli-Calmettes Institute, Marseille, France
| | - Samuel Granjeaud
- Systems Biology Platform, Marseille Cancer Research Center, Inserm U1068, CNRS UMR7258, Paoli-Calmettes Institute, Aix-Marseille University UM105, Marseille, France
| | - Laurent Gorvel
- Team 'Immunity and Cancer', Marseille Cancer Research Center, Inserm U1068, CNRS UMR7258, Paoli-Calmettes Institute, Aix-Marseille University UM105, Marseille, France
- Immunomonitoring Department, Paoli-Calmettes Institute, Marseille, France
| | - Daniel Olive
- Team 'Immunity and Cancer', Marseille Cancer Research Center, Inserm U1068, CNRS UMR7258, Paoli-Calmettes Institute, Aix-Marseille University UM105, Marseille, France
- Immunomonitoring Department, Paoli-Calmettes Institute, Marseille, France
| | - Anne-Sophie Chretien
- Team 'Immunity and Cancer', Marseille Cancer Research Center, Inserm U1068, CNRS UMR7258, Paoli-Calmettes Institute, Aix-Marseille University UM105, Marseille, France
- Immunomonitoring Department, Paoli-Calmettes Institute, Marseille, France
| |
Collapse
|
4
|
Rong Z, Song J, Yu Y, Mi L, Qiu M, Song Y, Hou Y. Single-cell mosaic integration and cell state transfer with auto-scaling self-attention mechanism. Brief Bioinform 2024; 25:bbae540. [PMID: 39438079 PMCID: PMC11495875 DOI: 10.1093/bib/bbae540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 09/02/2024] [Accepted: 10/10/2024] [Indexed: 10/25/2024] Open
Abstract
The integration of data from multiple modalities generated by single-cell omics technologies is crucial for accurately identifying cell states. One challenge in comprehending multi-omics data resides in mosaic integration, in which different data modalities are profiled in different subsets of cells, as it requires simultaneous batch effect removal and modality alignment. Here, we develop Multi-omics Mosaic Auto-scaling Attention Variational Inference (mmAAVI), a scalable deep generative model for single-cell mosaic integration. Leveraging auto-scaling self-attention mechanisms, mmAAVI can map arbitrary combinations of omics to the common embedding space. If existing well-annotated cell states, the model can perform semisupervised learning to utilize existing these annotations. We validated the performance of mmAAVI and five other commonly used methods on four benchmark datasets, which vary in cell numbers, omics types, and missing patterns. mmAAVI consistently demonstrated its superiority. We also validated mmAAVI's ability for cell state knowledge transfer, achieving balanced accuracies of 0.82 and 0.97 with less 1% labeled cells between batches with completely different omics. The full package is available at https://github.com/luyiyun/mmAAVI.
Collapse
Affiliation(s)
- Zhiwei Rong
- Department of Biostatistics, School of Public Health, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China
| | - Jiali Song
- Department of Biostatistics, School of Public Health, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China
| | - Yipei Yu
- Department of Biostatistics, School of Public Health, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China
| | - Lan Mi
- Peking University Cancer Hospital, 52 Fucheng Rd., Haidian District, Beijing 100142, China
| | - ManTang Qiu
- Department of Thoracic Surgery, Peking University People’s Hospital, No. 11 Xizhimen South Street, Xicheng District, Beijing 100044, China
| | - Yuqin Song
- Peking University Cancer Hospital, 52 Fucheng Rd., Haidian District, Beijing 100142, China
| | - Yan Hou
- Department of Biostatistics, School of Public Health, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China
- Peking University Cancer Hospital, 52 Fucheng Rd., Haidian District, Beijing 100142, China
- Peking University Clinical Research Center, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China
| |
Collapse
|
5
|
Fisch L, Heming M, Schulte-Mecklenbeck A, Gross CC, Zumdick S, Barkhau C, Emden D, Ernsting J, Leenings R, Sarink K, Winter NR, Dannlowski U, Wiendl H, Hörste GMZ, Hahn T. GateNet: A novel neural network architecture for automated flow cytometry gating. Comput Biol Med 2024; 179:108820. [PMID: 39002319 DOI: 10.1016/j.compbiomed.2024.108820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 06/12/2024] [Accepted: 06/25/2024] [Indexed: 07/15/2024]
Abstract
BACKGROUND AND OBJECTIVE Flow cytometry is a widely used technique for identifying cell populations in patient-derived fluids, such as peripheral blood (PB) or cerebrospinal fluid (CSF). Despite its ubiquity in research and clinical practice, the process of gating, i.e., manually identifying cell types, is labor-intensive and error-prone. The objective of this study is to address this challenge by introducing GateNet, a neural network architecture designed for fully end-to-end automated gating without the need for correcting batch effects. METHODS For this study a unique dataset is used which comprises over 8,000,000 events from N = 127 PB and CSF samples which were manually labeled independently by four experts. Applying cross-validation, the classification performance of GateNet is compared to the human experts performance. Additionally, GateNet is applied to a publicly available dataset to evaluate generalization. The classification performance is measured using the F1 score. RESULTS GateNet achieves F1 scores ranging from 0.910 to 0.997 demonstrating human-level performance on samples unseen during training. In the publicly available dataset, GateNet confirms its generalization capabilities with an F1 score of 0.936. Importantly, we also show that GateNet only requires ≈10 samples to reach human-level performance. Finally, gating with GateNet only takes 15 microseconds per event utilizing graphics processing units (GPU). CONCLUSIONS GateNet enables fully end-to-end automated gating in flow cytometry, overcoming the labor-intensive and error-prone nature of manual adjustments. The neural network achieves human-level performance on unseen samples and generalizes well to diverse datasets. Notably, its data efficiency, requiring only ∼10 samples to reach human-level performance, positions GateNet as a widely applicable tool across various domains of flow cytometry.
Collapse
Affiliation(s)
- Lukas Fisch
- University of Münster, Institute for Translational Psychiatry, Münster, Germany.
| | - Michael Heming
- Department of Neurology with Institute of Translational Neurology, University and University Hospital Münster, Münster, Germany
| | - Andreas Schulte-Mecklenbeck
- Department of Neurology with Institute of Translational Neurology, University and University Hospital Münster, Münster, Germany
| | - Catharina C Gross
- Department of Neurology with Institute of Translational Neurology, University and University Hospital Münster, Münster, Germany
| | - Stefan Zumdick
- University of Münster, Institute for Translational Psychiatry, Münster, Germany
| | - Carlotta Barkhau
- University of Münster, Institute for Translational Psychiatry, Münster, Germany
| | - Daniel Emden
- University of Münster, Institute for Translational Psychiatry, Münster, Germany
| | - Jan Ernsting
- University of Münster, Institute for Translational Psychiatry, Münster, Germany; Institute for Geoinformatics, University of Münster, Germany; Faculty of Mathematics and Computer Science, University of Münster, Germany
| | - Ramona Leenings
- University of Münster, Institute for Translational Psychiatry, Münster, Germany
| | - Kelvin Sarink
- University of Münster, Institute for Translational Psychiatry, Münster, Germany
| | - Nils R Winter
- University of Münster, Institute for Translational Psychiatry, Münster, Germany
| | - Udo Dannlowski
- University of Münster, Institute for Translational Psychiatry, Münster, Germany
| | - Heinz Wiendl
- Department of Neurology with Institute of Translational Neurology, University and University Hospital Münster, Münster, Germany
| | - Gerd Meyer Zu Hörste
- Department of Neurology with Institute of Translational Neurology, University and University Hospital Münster, Münster, Germany
| | - Tim Hahn
- University of Münster, Institute for Translational Psychiatry, Münster, Germany
| |
Collapse
|
6
|
Li Y, Lin Y, Hu P, Peng D, Luo H, Peng X. Single-Cell RNA-Seq Debiased Clustering via Batch Effect Disentanglement. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11371-11381. [PMID: 37030864 DOI: 10.1109/tnnls.2023.3260003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
A variety of single-cell RNA-seq (scRNA-seq) clustering methods has achieved great success in discovering cellular phenotypes. However, it remains challenging when the data confounds with batch effects brought by different experimental conditions or technologies. Namely, the data partitions would be biased toward these nonbiological factors. Meanwhile, the batch differences are not always much smaller than true biological variations, hindering the cooperation of batch integration and clustering methods. To overcome this challenge, we propose single-cell RNA-seq debiased clustering (SCDC), an end-to-end clustering method that is debiased toward batch effects by disentangling the biological and nonbiological information from scRNA-seq data during data partitioning. In six analyses, SCDC qualitatively and quantitatively outperforms both the state-of-the-art clustering and batch integration methods in handling scRNA-seq data with batch effects. Furthermore, SCDC clusters data with a linearly increasing running time with respect to cell numbers and a fixed graphics processing unit (GPU) memory consumption, making it scalable to large datasets. The code will be released on Github.
Collapse
|
7
|
Ma Y, Pei Y. NDMNN: A novel deep residual network based MNN method to remove batch effects from scRNA-seq data. J Bioinform Comput Biol 2024; 22:2450015. [PMID: 39036845 DOI: 10.1142/s021972002450015x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
The rapid development of single-cell RNA sequencing (scRNA-seq) technology has generated vast amounts of data. However, these data often exhibit batch effects due to various factors such as different time points, experimental personnel, and instruments used, which can obscure the biological differences in the data itself. Based on the characteristics of scRNA-seq data, we designed a dense deep residual network model, referred to as NDnetwork. Subsequently, we combined the NDnetwork model with the MNN method to correct batch effects in scRNA-seq data, and named it the NDMNN method. Comprehensive experimental results demonstrate that the NDMNN method outperforms existing commonly used methods for correcting batch effects in scRNA-seq data. As the scale of single-cell sequencing continues to expand, we believe that NDMNN will be a valuable tool for researchers in the biological community for correcting batch effects in their studies. The source code and experimental results of the NDMNN method can be found at https://github.com/mustang-hub/NDMNN.
Collapse
Affiliation(s)
- Yupeng Ma
- Software Engineering, Tiangong University, Tianjin, P. R. China
| | - Yongzhen Pei
- School of Mathematical Sciences, Tiangong University, Tianjin, P. R. China
| |
Collapse
|
8
|
Zinati Y, Takiddeen A, Emad A. GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks. Nat Commun 2024; 15:4055. [PMID: 38744843 PMCID: PMC11525796 DOI: 10.1038/s41467-024-48516-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 05/01/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.
Collapse
Affiliation(s)
- Yazdan Zinati
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Abdulrahman Takiddeen
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
- Mila, Quebec AI Institute, Montreal, QC, Canada.
- The Rosalind and Morris Goodman Cancer Institute, Montreal, QC, Canada.
| |
Collapse
|
9
|
Zhang X. Highly Effective Batch Effect Correction Method for RNA-seq Count Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.02.592266. [PMID: 38746101 PMCID: PMC11092589 DOI: 10.1101/2024.05.02.592266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
RNA sequencing (RNA-seq) has become a cornerstone in transcriptomics, offering detailed insights into gene expression across diverse biological conditions and sample types. However, RNA-seq data often suffer from batch effects, which are systematic non-biological differences that compromise data reliability and obscure true biological variation. To address these challenges, we introduce ComBat-ref, a refined method of batch effect correction that enhances the statistical power and reliability of differential expression analysis in RNA-seq data. Building on the foundations of ComBat-seq, ComBat-ref employs a negative binomial model to adjust count data but innovates by using a pooled dispersion parameter for entire batches and preserving count data for the reference batch. Our method demonstrated superior performance in both simulated environments and real datasets, such as the growth factor receptor network (GFRN) data and NASA GeneLab transcriptomic datasets, significantly improving sensitivity and specificity over existing methods. By effectively mitigating batch effects while maintaining high detection power, ComBat-ref proves to be a robust tool for enhancing the accuracy and interpretability of RNA-seq data analyses.
Collapse
|
10
|
Liu R, Qian K, He X, Li H. Integration of scRNA-seq data by disentangled representation learning with condition domain adaptation. BMC Bioinformatics 2024; 25:116. [PMID: 38493095 PMCID: PMC10944609 DOI: 10.1186/s12859-024-05706-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
BACKGROUND The integration of single-cell RNA sequencing data from multiple experimental batches and diverse biological conditions holds significant importance in the study of cellular heterogeneity. RESULTS To expedite the exploration of systematic disparities under various biological contexts, we propose a scRNA-seq integration method called scDisco, which involves a domain-adaptive decoupling representation learning strategy for the integration of dissimilar single-cell RNA data. It constructs a condition-specific domain-adaptive network founded on variational autoencoders. scDisco not only effectively reduces batch effects but also successfully disentangles biological effects and condition-specific effects, and further augmenting condition-specific representations through the utilization of condition-specific Domain-Specific Batch Normalization layers. This enhancement enables the identification of genes specific to particular conditions. The effectiveness and robustness of scDisco as an integration method were analyzed using both simulated and real datasets, and the results demonstrate that scDisco can yield high-quality visualizations and quantitative outcomes. Furthermore, scDisco has been validated using real datasets, affirming its proficiency in cell clustering quality, retaining batch-specific cell types and identifying condition-specific genes. CONCLUSION scDisco is an effective integration method based on variational autoencoders, which improves analytical tasks of reducing batch effects, cell clustering, retaining batch-specific cell types and identifying condition-specific genes.
Collapse
Affiliation(s)
- Renjing Liu
- School of Mathematics and Physics, China University of Geosciences (Wuhan), Wuhan, 430074, China
| | - Kun Qian
- School of Mathematics and Physics, China University of Geosciences (Wuhan), Wuhan, 430074, China
| | - Xinwei He
- School of Mathematics and Physics, China University of Geosciences (Wuhan), Wuhan, 430074, China
| | - Hongwei Li
- School of Mathematics and Physics, China University of Geosciences (Wuhan), Wuhan, 430074, China.
| |
Collapse
|
11
|
Monnier L, Cournède PH. A novel batch-effect correction method for scRNA-seq data based on Adversarial Information Factorization. PLoS Comput Biol 2024; 20:e1011880. [PMID: 38386700 PMCID: PMC10914288 DOI: 10.1371/journal.pcbi.1011880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 03/05/2024] [Accepted: 01/30/2024] [Indexed: 02/24/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technology produces an unprecedented resolution at the level of a unique cell, raising great hopes in medicine. Nevertheless, scRNA-seq data suffer from high variations due to the experimental conditions, called batch effects, preventing any aggregated downstream analysis. Adversarial Information Factorization provides a robust batch-effect correction method that does not rely on prior knowledge of the cell types nor a specific normalization strategy while being adapted to any downstream analysis task. It compares to and even outperforms state-of-the-art methods in several scenarios: low signal-to-noise ratio, batch-specific cell types with few cells, and a multi-batches dataset with imbalanced batches and batch-specific cell types. Moreover, it best preserves the relative gene expression between cell types, yielding superior differential expression analysis results. Finally, in a more complex setting of a Leukemia cohort, our method preserved most of the underlying biological information for each patient while aligning the batches, improving the clustering metrics in the aggregated dataset.
Collapse
Affiliation(s)
- Lily Monnier
- Paris-Saclay University, CentraleSupélec, Laboratory of Mathematics and Computer Science (MICS), Gif-sur-Yvette, France
| | - Paul-Henry Cournède
- Paris-Saclay University, CentraleSupélec, Laboratory of Mathematics and Computer Science (MICS), Gif-sur-Yvette, France
| |
Collapse
|
12
|
Ye F, Wang J, Li J, Mei Y, Guo G. Mapping Cell Atlases at the Single-Cell Level. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305449. [PMID: 38145338 PMCID: PMC10885669 DOI: 10.1002/advs.202305449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 12/01/2023] [Indexed: 12/26/2023]
Abstract
Recent advancements in single-cell technologies have led to rapid developments in the construction of cell atlases. These atlases have the potential to provide detailed information about every cell type in different organisms, enabling the characterization of cellular diversity at the single-cell level. Global efforts in developing comprehensive cell atlases have profound implications for both basic research and clinical applications. This review provides a broad overview of the cellular diversity and dynamics across various biological systems. In addition, the incorporation of machine learning techniques into cell atlas analyses opens up exciting prospects for the field of integrative biology.
Collapse
Affiliation(s)
- Fang Ye
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
| | - Jingjing Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
| | - Jiaqi Li
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
| | - Yuqing Mei
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
| | - Guoji Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
- Zhejiang Provincial Key Lab for Tissue Engineering and Regenerative MedicineDr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative MedicineHangzhouZhejiang310058China
- Institute of HematologyZhejiang UniversityHangzhouZhejiang310000China
| |
Collapse
|
13
|
Yang Y, Wang K, Lu Z, Wang T, Wang X. Cytomulate: accurate and efficient simulation of CyTOF data. Genome Biol 2023; 24:262. [PMID: 37974276 PMCID: PMC10652542 DOI: 10.1186/s13059-023-03099-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 10/24/2023] [Indexed: 11/19/2023] Open
Abstract
Recently, many analysis tools have been devised to offer insights into data generated via cytometry by time-of-flight (CyTOF). However, objective evaluations of these methods remain absent as most evaluations are conducted against real data where the ground truth is generally unknown. In this paper, we develop Cytomulate, a reproducible and accurate simulation algorithm of CyTOF data, which could serve as a foundation for future method development and evaluation. We demonstrate that Cytomulate can capture various characteristics of CyTOF data and is superior in learning overall data distributions than single-cell RNA-seq-oriented methods such as scDesign2, Splatter, and generative models like LAMBDA.
Collapse
Affiliation(s)
- Yuqiu Yang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Kaiwen Wang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA
| | - Zeyu Lu
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Tao Wang
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| | - Xinlei Wang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA.
- Department of Mathematics, University of Texas at Arlington, Arlington, 76019, USA.
- Center for Data Science Research and Education, College of Science, University of Texas at Arlington, Arlington, 76019, USA.
| |
Collapse
|
14
|
Erfanian N, Heydari AA, Feriz AM, Iañez P, Derakhshani A, Ghasemigol M, Farahpour M, Razavi SM, Nasseri S, Safarpour H, Sahebkar A. Deep learning applications in single-cell genomics and transcriptomics data analysis. Biomed Pharmacother 2023; 165:115077. [PMID: 37393865 DOI: 10.1016/j.biopha.2023.115077] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 06/22/2023] [Accepted: 06/23/2023] [Indexed: 07/04/2023] Open
Abstract
Traditional bulk sequencing methods are limited to measuring the average signal in a group of cells, potentially masking heterogeneity, and rare populations. The single-cell resolution, however, enhances our understanding of complex biological systems and diseases, such as cancer, the immune system, and chronic diseases. However, the single-cell technologies generate massive amounts of data that are often high-dimensional, sparse, and complex, thus making analysis with traditional computational approaches difficult and unfeasible. To tackle these challenges, many are turning to deep learning (DL) methods as potential alternatives to the conventional machine learning (ML) algorithms for single-cell studies. DL is a branch of ML capable of extracting high-level features from raw inputs in multiple stages. Compared to traditional ML, DL models have provided significant improvements across many domains and applications. In this work, we examine DL applications in genomics, transcriptomics, spatial transcriptomics, and multi-omics integration, and address whether DL techniques will prove to be advantageous or if the single-cell omics domain poses unique challenges. Through a systematic literature review, we have found that DL has not yet revolutionized the most pressing challenges of the single-cell omics field. However, using DL models for single-cell omics has shown promising results (in many cases outperforming the previous state-of-the-art models) in data preprocessing and downstream analysis. Although developments of DL algorithms for single-cell omics have generally been gradual, recent advances reveal that DL can offer valuable resources in fast-tracking and advancing research in single-cell.
Collapse
Affiliation(s)
- Nafiseh Erfanian
- Student Research Committee, Birjand University of Medical Sciences, Birjand, Iran
| | - A Ali Heydari
- Department of Applied Mathematics, University of California, Merced, CA, USA; Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Adib Miraki Feriz
- Student Research Committee, Birjand University of Medical Sciences, Birjand, Iran
| | - Pablo Iañez
- Cellular Systems Genomics Group, Josep Carreras Research Institute, Barcelona, Spain
| | - Afshin Derakhshani
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada
| | | | - Mohsen Farahpour
- Department of Electronics, Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Seyyed Mohammad Razavi
- Department of Electronics, Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Saeed Nasseri
- Cellular and Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | - Hossein Safarpour
- Cellular and Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran.
| | - Amirhossein Sahebkar
- Biotechnology Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Applied Biomedical Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Department of Biotechnology, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
15
|
Zhang J, Li J, Lin L. Statistical and machine learning methods for immunoprofiling based on single-cell data. Hum Vaccin Immunother 2023:2234792. [PMID: 37485833 PMCID: PMC10373621 DOI: 10.1080/21645515.2023.2234792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 06/30/2023] [Accepted: 07/04/2023] [Indexed: 07/25/2023] Open
Abstract
Immunoprofiling has become a crucial tool for understanding the complex interactions between the immune system and diseases or interventions, such as therapies and vaccinations. Immune response biomarkers are critical for understanding those relationships and potentially developing personalized intervention strategies. Single-cell data have emerged as a promising source for identifying immune response biomarkers. In this review, we discuss the current state-of-the-art methods for immunoprofiling, including those for reducing the dimensionality of high-dimensional single-cell data and methods for clustering, classification, and prediction. We also draw attention to recent developments in data integration.
Collapse
Affiliation(s)
- Jingxuan Zhang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Jia Li
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
| | - Lin Lin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| |
Collapse
|
16
|
Fallahzadeh R, Bidoki NH, Stelzer IA, Becker M, Marić I, Chang AL, Culos A, Phongpreecha T, Xenochristou M, Francesco DD, Espinosa C, Berson E, Verdonk F, Angst MS, Gaudilliere B, Aghaeepour N. In-silico generation of high-dimensional immune response data in patients using a deep neural network. Cytometry A 2023; 103:392-404. [PMID: 36507780 PMCID: PMC10182197 DOI: 10.1002/cyto.a.24709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 10/14/2022] [Accepted: 11/29/2022] [Indexed: 12/15/2022]
Abstract
Technologies for single-cell profiling of the immune system have enabled researchers to extract rich interconnected networks of cellular abundance, phenotypical and functional cellular parameters. These studies can power machine learning approaches to understand the role of the immune system in various diseases. However, the performance of these approaches and the generalizability of the findings have been hindered by limited cohort sizes in translational studies, partially due to logistical demands and costs associated with longitudinal data collection in sufficiently large patient cohorts. An evolving challenge is the requirement for ever-increasing cohort sizes as the dimensionality of datasets grows. We propose a deep learning model derived from a novel pipeline of optimal temporal cell matching and overcomplete autoencoders that uses data from a small subset of patients to learn to forecast an entire patient's immune response in a high dimensional space from one timepoint to another. In our analysis of 1.08 million cells from patients pre- and post-surgical intervention, we demonstrate that the generated patient-specific data are qualitatively and quantitatively similar to real patient data by demonstrating fidelity, diversity, and usefulness.
Collapse
Affiliation(s)
- Ramin Fallahzadeh
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Neda H. Bidoki
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Ina A. Stelzer
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
| | - Martin Becker
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Ivana Marić
- Department of Pediatrics, Stanford University, Stanford, California, USA
| | - Alan L. Chang
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Anthony Culos
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Thanaphong Phongpreecha
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Maria Xenochristou
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Davide De Francesco
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Camilo Espinosa
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Eloise Berson
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Franck Verdonk
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
| | - Martin S. Angst
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
| | - Brice Gaudilliere
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Pediatrics, Stanford University, Stanford, California, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Pain and Perioperative Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
- Department of Pediatrics, Stanford University, Stanford, California, USA
| |
Collapse
|
17
|
Paudyal R, Shah AD, Akin O, Do RKG, Konar AS, Hatzoglou V, Mahmood U, Lee N, Wong RJ, Banerjee S, Shin J, Veeraraghavan H, Shukla-Dave A. Artificial Intelligence in CT and MR Imaging for Oncological Applications. Cancers (Basel) 2023; 15:cancers15092573. [PMID: 37174039 PMCID: PMC10177423 DOI: 10.3390/cancers15092573] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 04/13/2023] [Accepted: 04/17/2023] [Indexed: 05/15/2023] Open
Abstract
Cancer care increasingly relies on imaging for patient management. The two most common cross-sectional imaging modalities in oncology are computed tomography (CT) and magnetic resonance imaging (MRI), which provide high-resolution anatomic and physiological imaging. Herewith is a summary of recent applications of rapidly advancing artificial intelligence (AI) in CT and MRI oncological imaging that addresses the benefits and challenges of the resultant opportunities with examples. Major challenges remain, such as how best to integrate AI developments into clinical radiology practice, the vigorous assessment of quantitative CT and MR imaging data accuracy, and reliability for clinical utility and research integrity in oncology. Such challenges necessitate an evaluation of the robustness of imaging biomarkers to be included in AI developments, a culture of data sharing, and the cooperation of knowledgeable academics with vendor scientists and companies operating in radiology and oncology fields. Herein, we will illustrate a few challenges and solutions of these efforts using novel methods for synthesizing different contrast modality images, auto-segmentation, and image reconstruction with examples from lung CT as well as abdome, pelvis, and head and neck MRI. The imaging community must embrace the need for quantitative CT and MRI metrics beyond lesion size measurement. AI methods for the extraction and longitudinal tracking of imaging metrics from registered lesions and understanding the tumor environment will be invaluable for interpreting disease status and treatment efficacy. This is an exciting time to work together to move the imaging field forward with narrow AI-specific tasks. New AI developments using CT and MRI datasets will be used to improve the personalized management of cancer patients.
Collapse
Affiliation(s)
- Ramesh Paudyal
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| | - Akash D Shah
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| | - Oguz Akin
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| | - Richard K G Do
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| | - Amaresha Shridhar Konar
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| | - Vaios Hatzoglou
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| | - Usman Mahmood
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| | - Nancy Lee
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| | - Richard J Wong
- Head and Neck Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| | | | | | - Harini Veeraraghavan
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| | - Amita Shukla-Dave
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York City, NY 10065, USA
| |
Collapse
|
18
|
Yan X, Zheng R, Wu F, Li M. CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity. Bioinformatics 2023; 39:7055295. [PMID: 36821425 PMCID: PMC9985174 DOI: 10.1093/bioinformatics/btad099] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 12/27/2022] [Accepted: 02/22/2023] [Indexed: 02/24/2023] Open
Abstract
MOTIVATION Integration of growing single-cell RNA sequencing datasets helps better understand cellular identity and function. The major challenge for integration is removing batch effects while preserving biological heterogeneities. Advances in contrastive learning have inspired several contrastive learning-based batch correction methods. However, existing contrastive-learning-based methods exhibit noticeable ad hoc trade-off between batch mixing and preservation of cellular heterogeneities (mix-heterogeneity trade-off). Therefore, a deliberate mix-heterogeneity trade-off is expected to yield considerable improvements in scRNA-seq dataset integration. RESULTS We develop a novel contrastive learning-based batch correction framework, CIAIRE, which achieves superior mix-heterogeneity trade-off. The key contributions of CLAIRE are proposal of two complementary strategies: construction strategy and refinement strategy, to improve the appropriateness of positive pairs. Construction strategy dynamically generates positive pairs by augmenting inter-batch mutual nearest neighbors (MNN) with intra-batch k-nearest neighbors (KNN), which improves the coverage of positive pairs for the whole distribution of shared cell types between batches. Refinement strategy aims to automatically reduce the potential false positive pairs from the construction strategy, which resorts to the memory effect of deep neural networks. We demonstrate that CLAIRE possesses superior mix-heterogeneity trade-off over existing contrastive learning-based methods. Benchmark results on six real datasets also show that CLAIRE achieves the best integration performance against eight state-of-the-art methods. Finally, comprehensive experiments are conducted to validate the effectiveness of CLAIRE. AVAILABILITY AND IMPLEMENTATION The source code and data used in this study can be found in https://github.com/CSUBioGroup/CLAIRE-release. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xuhua Yan
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Fangxiang Wu
- Division of Biomedical Engineering, Department of Computer Science, Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
19
|
Dong X, Chowdhury S, Victor U, Li X, Qian L. Semi-Supervised Deep Learning for Cell Type Identification From Single-Cell Transcriptomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1492-1505. [PMID: 35536811 DOI: 10.1109/tcbb.2022.3173587] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Cell type identification from single-cell transcriptomic data is a common goal of single-cell RNA sequencing (scRNAseq) data analysis. Deep neural networks have been employed to identify cell types from scRNAseq data with high performance. However, it requires a large mount of individual cells with accurate and unbiased annotated types to train the identification models. Unfortunately, labeling the scRNAseq data is cumbersome and time-consuming as it involves manual inspection of marker genes. To overcome this challenge, we propose a semi-supervised learning model "SemiRNet" to use unlabeled scRNAseq cells and a limited amount of labeled scRNAseq cells to implement cell identification. The proposed model is based on recurrent convolutional neural networks (RCNN), which includes a shared network, a supervised network and an unsupervised network. The proposed model is evaluated on two large scale single-cell transcriptomic datasets. It is observed that the proposed model is able to achieve encouraging performance by learning on the very limited amount of labeled scRNAseq cells together with a large number of unlabeled scRNAseq cells.
Collapse
|
20
|
Single-cell RNA sequencing in orthopedic research. Bone Res 2023; 11:10. [PMID: 36828839 PMCID: PMC9958119 DOI: 10.1038/s41413-023-00245-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 12/22/2022] [Accepted: 12/29/2022] [Indexed: 02/26/2023] Open
Abstract
Although previous RNA sequencing methods have been widely used in orthopedic research and have provided ideas for therapeutic strategies, the specific mechanisms of some orthopedic disorders, including osteoarthritis, lumbar disc herniation, rheumatoid arthritis, fractures, tendon injuries, spinal cord injury, heterotopic ossification, and osteosarcoma, require further elucidation. The emergence of the single-cell RNA sequencing (scRNA-seq) technique has introduced a new era of research on these topics, as this method provides information regarding cellular heterogeneity, new cell subtypes, functions of novel subclusters, potential molecular mechanisms, cell-fate transitions, and cell‒cell interactions that are involved in the development of orthopedic diseases. Here, we summarize the cell subpopulations, genes, and underlying mechanisms involved in the development of orthopedic diseases identified by scRNA-seq, improving our understanding of the pathology of these diseases and providing new insights into therapeutic approaches.
Collapse
|
21
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
22
|
Lu J, Sheng Y, Qian W, Pan M, Zhao X, Ge Q. scRNA-seq data analysis method to improve analysis performance. IET Nanobiotechnol 2023; 17:246-256. [PMID: 36727937 DOI: 10.1049/nbt2.12115] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/28/2022] [Accepted: 12/30/2022] [Indexed: 02/03/2023] Open
Abstract
With the development of single-cell RNA sequencing technology (scRNA-seq), we have the ability to study biological questions at the level of the individual cell transcriptome. Nowadays, many analysis tools, specifically suitable for single-cell RNA sequencing data, have been developed. In this review, the currently commonly used scRNA-seq protocols are discussed. The upstream processing flow pipeline of scRNA-seq data, including goals and popular tools for reads mapping and expression quantification, quality control, normalization, imputation, and batch effect removal is also introduced. Finally, methods to evaluate these tools in both cellular and genetic dimensions, clustering and differential expression analysis are presented.
Collapse
Affiliation(s)
- Junru Lu
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Yuqi Sheng
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Weiheng Qian
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Min Pan
- School of Medicine, Southeast University, Nanjing, China
| | - Xiangwei Zhao
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
23
|
Gan D, Li J. SCIBER: a simple method for removing batch effects from single-cell RNA-sequencing data. Bioinformatics 2023; 39:6957084. [PMID: 36548380 PMCID: PMC9848058 DOI: 10.1093/bioinformatics/btac819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 11/27/2022] [Accepted: 12/21/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Integrative analysis of multiple single-cell RNA-sequencing datasets allows for more comprehensive characterizations of cell types, but systematic technical differences between datasets, known as 'batch effects', need to be removed before integration to avoid misleading interpretation of the data. Although many batch-effect-removal methods have been developed, there is still a large room for improvement: most existing methods only give dimension-reduced data instead of expression data of individual genes, are based on computationally demanding models and are black-box models and thus difficult to interpret or tune. RESULTS Here, we present a new batch-effect-removal method called SCIBER (Single-Cell Integrator and Batch Effect Remover) and study its performance on real datasets. SCIBER matches cell clusters across batches according to the overlap of their differentially expressed genes. As a simple algorithm that has better scalability to data with a large number of cells and is easy to tune, SCIBER shows comparable and sometimes better accuracy in removing batch effects on real datasets compared to the state-of-the-art methods, which are much more complicated. Moreover, SCIBER outputs expression data in the original space, that is, the expression of individual genes, which can be used directly for downstream analyses. Additionally, SCIBER is a reference-based method, which assigns one of the batches as the reference batch and keeps it untouched during the process, making it especially suitable for integrating user-generated datasets with standard reference data such as the Human Cell Atlas. AVAILABILITY AND IMPLEMENTATION SCIBER is publicly available as an R package on CRAN: https://cran.r-project.org/web/packages/SCIBER/. A vignette is included in the CRAN R package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dailin Gan
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Jun Li
- To whom correspondence should be addressed.
| |
Collapse
|
24
|
He X, Liu X, Zuo F, Shi H, Jing J. Artificial intelligence-based multi-omics analysis fuels cancer precision medicine. Semin Cancer Biol 2023; 88:187-200. [PMID: 36596352 DOI: 10.1016/j.semcancer.2022.12.009] [Citation(s) in RCA: 66] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 12/16/2022] [Accepted: 12/29/2022] [Indexed: 01/02/2023]
Abstract
With biotechnological advancements, innovative omics technologies are constantly emerging that have enabled researchers to access multi-layer information from the genome, epigenome, transcriptome, proteome, metabolome, and more. A wealth of omics technologies, including bulk and single-cell omics approaches, have empowered to characterize different molecular layers at unprecedented scale and resolution, providing a holistic view of tumor behavior. Multi-omics analysis allows systematic interrogation of various molecular information at each biological layer while posing tricky challenges regarding how to extract valuable insights from the exponentially increasing amount of multi-omics data. Therefore, efficient algorithms are needed to reduce the dimensionality of the data while simultaneously dissecting the mysteries behind the complex biological processes of cancer. Artificial intelligence has demonstrated the ability to analyze complementary multi-modal data streams within the oncology realm. The coincident development of multi-omics technologies and artificial intelligence algorithms has fuelled the development of cancer precision medicine. Here, we present state-of-the-art omics technologies and outline a roadmap of multi-omics integration analysis using an artificial intelligence strategy. The advances made using artificial intelligence-based multi-omics approaches are described, especially concerning early cancer screening, diagnosis, response assessment, and prognosis prediction. Finally, we discuss the challenges faced in multi-omics analysis, along with tentative future trends in this field. With the increasing application of artificial intelligence in multi-omics analysis, we anticipate a shifting paradigm in precision medicine becoming driven by artificial intelligence-based multi-omics technologies.
Collapse
Affiliation(s)
- Xiujing He
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Xiaowei Liu
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Fengli Zuo
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Hubing Shi
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Jing Jing
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China.
| |
Collapse
|
25
|
Niu J, Yang J, Guo Y, Qian K, Wang Q. Joint deep learning for batch effect removal and classification toward MALDI MS based metabolomics. BMC Bioinformatics 2022; 23:270. [PMID: 35818047 PMCID: PMC9275160 DOI: 10.1186/s12859-022-04758-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 05/30/2022] [Indexed: 12/02/2022] Open
Abstract
Background Metabolomics is a primary omics topic, which occupies an important position in both clinical applications and basic researches for metabolic signatures and biomarkers. Unfortunately, the relevant studies are challenged by the batch effect caused by many external factors. In last decade, the technique of deep learning has become a dominant tool in data science, such that one may train a diagnosis network from a known batch and then generalize it to a new batch. However, the batch effect inevitably hinders such efforts, as the two batches under consideration can be highly mismatched. Results We propose an end-to-end deep learning framework, for joint batch effect removal and then classification upon metabolomics data. We firstly validate the proposed deep learning framework on a public CyTOF dataset as a simulated experiment. We also visually compare the t-SNE distribution and demonstrate that our method effectively removes the batch effects in latent space. Then, for a private MALDI MS dataset, we have achieved the highest diagnostic accuracy, with about 5.1 ~ 7.9% increase on average over state-of-the-art methods. Conclusions Both experiments conclude that our method performs significantly better in classification than conventional methods benefitting from the effective removal of batch effect. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04758-z.
Collapse
Affiliation(s)
- Jingyang Niu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Jing Yang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Yuyu Guo
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Kun Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Qian Wang
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
26
|
Wang Y, Liu T, Zhao H. ResPAN: a powerful batch correction model for scRNA-seq data through residual adversarial networks. Bioinformatics 2022; 38:3942-3949. [PMID: 35771600 PMCID: PMC9364370 DOI: 10.1093/bioinformatics/btac427] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 06/15/2022] [Accepted: 06/28/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION With the advancement of technology, we can generate and access large-scale, high dimensional and diverse genomics data, especially through single-cell RNA sequencing (scRNA-seq). However, integrative downstream analysis from multiple scRNA-seq datasets remains challenging due to batch effects. RESULTS In this article, we propose a light-structured deep learning framework called ResPAN for scRNA-seq data integration. ResPAN is based on Wasserstein Generative Adversarial Network (WGAN) combined with random walk mutual nearest neighbor pairing and fully skip-connected autoencoders to reduce the differences among batches. We also discuss the limitations of existing methods and demonstrate the advantages of our model over seven other methods through extensive benchmarking studies on both simulated data under various scenarios and real datasets across different scales. Our model achieves leading performance on both batch correction and biological information conservation and maintains scalable to datasets with over half a million cells. AVAILABILITY AND IMPLEMENTATION An open-source implementation of ResPAN and scripts to reproduce the results can be downloaded from: https://github.com/AprilYuge/ResPAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
27
|
Niu J, Xu W, Wei D, Qian K, Wang Q. Deep Learning Framework for Integrating Multibatch Calibration, Classification, and Pathway Activities. Anal Chem 2022; 94:8937-8946. [PMID: 35709357 DOI: 10.1021/acs.analchem.2c00601] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The amount of available biological data has exploded since the emergence of high-throughput technologies, which is not only revolting the way we recognize molecules and diseases but also bringing novel analytical challenges to bioinformatics analysis. In recent years, deep learning has become a dominant technique in data science. However, classification accuracy is plagued with domain discrepancy. Notably, in the presence of multiple batches, domain discrepancy typically happens between individual batches. Most pairwise adaptation approaches may be suboptimal as they fail to eliminate external factors across multiple batches and take the classification task into account simultaneously. We propose a joint deep learning framework for integrating batch effect removal, classification, and downstream pathway activities upon biological data. To this end, we validate it on two MALDI MS-based metabolomics datasets. We have achieved the highest diagnostic accuracy (ACC), with a notable ∼10% improvement over other methods. Overall, these results indicate that our approach removes batch effect more effectively than state-of-the-art methods and yields more accurate classification as well as biomarkers for smart diagnosis.
Collapse
Affiliation(s)
- JingYang Niu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Wei Xu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - DongMing Wei
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Kun Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Qian Wang
- School of Biomedical Engineering, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
28
|
Nan Y, Ser JD, Walsh S, Schönlieb C, Roberts M, Selby I, Howard K, Owen J, Neville J, Guiot J, Ernst B, Pastor A, Alberich-Bayarri A, Menzel MI, Walsh S, Vos W, Flerin N, Charbonnier JP, van Rikxoort E, Chatterjee A, Woodruff H, Lambin P, Cerdá-Alberich L, Martí-Bonmatí L, Herrera F, Yang G. Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2022; 82:99-122. [PMID: 35664012 PMCID: PMC8878813 DOI: 10.1016/j.inffus.2022.01.001] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 12/22/2021] [Accepted: 01/07/2022] [Indexed: 05/13/2023]
Abstract
Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However, these surveys rarely focused on evaluation metrics and lacked a checklist for computational data harmonisation studies. In this systematic review, we summarise the computational data harmonisation approaches for multi-modality data in the digital healthcare field, including harmonisation strategies and evaluation metrics based on different theories. In addition, a comprehensive checklist that summarises common practices for data harmonisation studies is proposed to guide researchers to report their research findings more effectively. Last but not least, flowcharts presenting possible ways for methodology and metric selection are proposed and the limitations of different methods have been surveyed for future research.
Collapse
Affiliation(s)
- Yang Nan
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
| | - Javier Del Ser
- Department of Communications Engineering, University of the Basque Country UPV/EHU, Bilbao 48013, Spain
- TECNALIA, Basque Research and Technology Alliance (BRTA), Derio 48160, Spain
| | - Simon Walsh
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
| | - Carola Schönlieb
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
| | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
- Oncology R&D, AstraZeneca, Cambridge, Northern Ireland UK
| | - Ian Selby
- Department of Radiology, University of Cambridge, Cambridge, Northern Ireland UK
| | - Kit Howard
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - John Owen
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - Jon Neville
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - Julien Guiot
- University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
- University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
| | - Benoit Ernst
- University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
- University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
| | | | | | - Marion I. Menzel
- Technische Hochschule Ingolstadt, Ingolstadt, Germany
- GE Healthcare GmbH, Munich, Germany
| | - Sean Walsh
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | - Wim Vos
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | - Nina Flerin
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | | | | | - Avishek Chatterjee
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Henry Woodruff
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Philippe Lambin
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Leonor Cerdá-Alberich
- Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Luis Martí-Bonmatí
- Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Francisco Herrera
- Department of Computer Sciences and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI) University of Granada, Granada, Spain
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London, Northern Ireland UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, Northern Ireland UK
| |
Collapse
|
29
|
Han Y, Wang D, Peng L, Huang T, He X, Wang J, Ou C. Single-cell sequencing: a promising approach for uncovering the mechanisms of tumor metastasis. J Hematol Oncol 2022; 15:59. [PMID: 35549970 PMCID: PMC9096771 DOI: 10.1186/s13045-022-01280-w] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 04/28/2022] [Indexed: 02/08/2023] Open
Abstract
Single-cell sequencing (SCS) is an emerging high-throughput technology that can be used to study the genomics, transcriptomics, and epigenetics at a single cell level. SCS is widely used in the diagnosis and treatment of various diseases, including cancer. Over the years, SCS has gradually become an effective clinical tool for the exploration of tumor metastasis mechanisms and the development of treatment strategies. Currently, SCS can be used not only to analyze metastasis-related malignant biological characteristics, such as tumor heterogeneity, drug resistance, and microenvironment, but also to construct metastasis-related cell maps for predicting and monitoring the dynamics of metastasis. SCS is also used to identify therapeutic targets related to metastasis as it provides insights into the distribution of tumor cell subsets and gene expression differences between primary and metastatic tumors. Additionally, SCS techniques in combination with artificial intelligence (AI) are used in liquid biopsy to identify circulating tumor cells (CTCs), thereby providing a novel strategy for treating tumor metastasis. In this review, we summarize the potential applications of SCS in the field of tumor metastasis and discuss the prospects and limitations of SCS to provide a theoretical basis for finding therapeutic targets and mechanisms of metastasis.
Collapse
Affiliation(s)
- Yingying Han
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, China
| | - Dan Wang
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, China
| | - Lushan Peng
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, China
| | - Tao Huang
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, China
| | - Xiaoyun He
- Departments of Ultrasound Imaging, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, China
| | - Junpu Wang
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, China. .,Department of Pathology, School of Basic Medicine, Central South University, Changsha, 410031, Hunan, China. .,Key Laboratory of Hunan Province in Neurodegenerative Disorders, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, China.
| | - Chunlin Ou
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China.
| |
Collapse
|
30
|
Han W, Li L. Evaluating and minimizing batch effects in metabolomics. MASS SPECTROMETRY REVIEWS 2022; 41:421-442. [PMID: 33238061 DOI: 10.1002/mas.21672] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 10/27/2020] [Accepted: 10/29/2020] [Indexed: 06/11/2023]
Abstract
Determining metabolomic differences among samples of different phenotypes is a critical component of metabolomics research. With the rapid advances in analytical tools such as ultrahigh-resolution chromatography and mass spectrometry, an increasing number of metabolites can now be profiled with high quantification accuracy. The increased detectability and accuracy raise the level of stringiness required to reduce or control any experimental artifacts that can interfere with the measurement of phenotype-related metabolome changes. One of the artifacts is the batch effect that can be caused by multiple sources. In this review, we discuss the origins of batch effects, approaches to detect interbatch variations, and methods to correct unwanted data variability due to batch effects. We recognize that minimizing batch effects is currently an active research area, yet a very challenging task from both experimental and data processing perspectives. Thus, we try to be critical in describing the performance of a reported method with the hope of stimulating further studies for improving existing methods or developing new methods.
Collapse
Affiliation(s)
- Wei Han
- Department of Chemistry, University of Alberta, Edmonton, Alberta, Canada
| | - Liang Li
- Department of Chemistry, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
31
|
Pedersen CB, Dam SH, Barnkob MB, Leipold MD, Purroy N, Rassenti LZ, Kipps TJ, Nguyen J, Lederer JA, Gohil SH, Wu CJ, Olsen LR. cyCombine allows for robust integration of single-cell cytometry datasets within and across technologies. Nat Commun 2022; 13:1698. [PMID: 35361793 PMCID: PMC8971492 DOI: 10.1038/s41467-022-29383-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 03/14/2022] [Indexed: 12/21/2022] Open
Abstract
Combining single-cell cytometry datasets increases the analytical flexibility and the statistical power of data analyses. However, in many cases the full potential of co-analyses is not reached due to technical variance between data from different experimental batches. Here, we present cyCombine, a method to robustly integrate cytometry data from different batches, experiments, or even different experimental techniques, such as CITE-seq, flow cytometry, and mass cytometry. We demonstrate that cyCombine maintains the biological variance and the structure of the data, while minimizing the technical variance between datasets. cyCombine does not require technical replicates across datasets, and computation time scales linearly with the number of cells, allowing for integration of massive datasets. Robust, accurate, and scalable integration of cytometry data enables integration of multiple datasets for primary data analyses and the validation of results using public datasets.
Collapse
Affiliation(s)
- Christina Bligaard Pedersen
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
- Center for Genomic Medicine, Rigshospitalet-Copenhagen University Hospital, Copenhagen, Denmark
| | - Søren Helweg Dam
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Mike Bogetofte Barnkob
- Centre for Cellular Immunotherapy of Haematological Cancer Odense (CITCO), Department of Clinical Immunology, Odense University Hospital, University of Southern Denmark, Odense, Denmark
| | - Michael D Leipold
- Human Immune Monitoring Center, Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA, USA
| | - Noelia Purroy
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- AstraZeneca, Waltham, MA, USA
| | - Laura Z Rassenti
- Division of Hematology-Oncology, Department of Medicine, Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Thomas J Kipps
- Division of Hematology-Oncology, Department of Medicine, Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Jennifer Nguyen
- Department of Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - James Arthur Lederer
- Department of Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Satyen Harish Gohil
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Academic Haematology, University College London, London, UK
- Department of Haematology, University College London Hospitals NHS Trust, London, UK
| | - Catherine J Wu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lars Rønn Olsen
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark.
| |
Collapse
|
32
|
Zhang R, Luo Y, Ma J, Zhang M, Wang S. scPretrain: multi-task self-supervised learning for cell-type classification. Bioinformatics 2022; 38:1607-1614. [PMID: 34999749 DOI: 10.1093/bioinformatics/btac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 12/25/2021] [Accepted: 01/04/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Rapidly generated scRNA-seq datasets enable us to understand cellular differences and the function of each individual cell at single-cell resolution. Cell-type classification, which aims at characterizing and labeling groups of cells according to their gene expression, is one of the most important steps for single-cell analysis. To facilitate the manual curation process, supervised learning methods have been used to automatically classify cells. Most of the existing supervised learning approaches only utilize annotated cells in the training step while ignoring the more abundant unannotated cells. In this article, we proposed scPretrain, a multi-task self-supervised learning approach that jointly considers annotated and unannotated cells for cell-type classification. scPretrain consists of a pre-training step and a fine-tuning step. In the pre-training step, scPretrain uses a multi-task learning framework to train a feature extraction encoder based on each dataset's pseudo-labels, where only unannotated cells are used. In the fine-tuning step, scPretrain fine-tunes this feature extraction encoder using the limited annotated cells in a new dataset. RESULTS We evaluated scPretrain on 60 diverse datasets from different technologies, species and organs, and obtained a significant improvement on both cell-type classification and cell clustering. Moreover, the representations obtained by scPretrain in the pre-training step also enhanced the performance of conventional classifiers, such as random forest, logistic regression and support-vector machines. scPretrain is able to effectively utilize the massive amount of unlabeled data and be applied to annotating increasingly generated scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION The data and code underlying this article are available in scPretrain: Multi-task self-supervised learning for cell type classification, at https://github.com/ruiyi-zhang/scPretrain and https://zenodo.org/record/5802306. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruiyi Zhang
- School of EECS, Peking University, Beijing, China
| | - Yunan Luo
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jianzhu Ma
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.,Department of Biochemistry, Purdue University, West Lafayette, IN, USA
| | - Ming Zhang
- School of EECS, Peking University, Beijing, China
| | - Sheng Wang
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
33
|
Jovic D, Liang X, Zeng H, Lin L, Xu F, Luo Y. Single-cell RNA sequencing technologies and applications: A brief overview. Clin Transl Med 2022; 12:e694. [PMID: 35352511 PMCID: PMC8964935 DOI: 10.1002/ctm2.694] [Citation(s) in RCA: 384] [Impact Index Per Article: 128.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 12/09/2021] [Accepted: 12/20/2021] [Indexed: 12/19/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technology has become the state-of-the-art approach for unravelling the heterogeneity and complexity of RNA transcripts within individual cells, as well as revealing the composition of different cell types and functions within highly organized tissues/organs/organisms. Since its first discovery in 2009, studies based on scRNA-seq provide massive information across different fields making exciting new discoveries in better understanding the composition and interaction of cells within humans, model animals and plants. In this review, we provide a concise overview about the scRNA-seq technology, experimental and computational procedures for transforming the biological and molecular processes into computational and statistical data. We also provide an explanation of the key technological steps in implementing the technology. We highlight a few examples on how scRNA-seq can provide unique information for better understanding health and diseases. One important application of the scRNA-seq technology is to build a better and high-resolution catalogue of cells in all living organism, commonly known as atlas, which is key resource to better understand and provide a solution in treating diseases. While great promises have been demonstrated with the technology in all areas, we further highlight a few remaining challenges to be overcome and its great potentials in transforming current protocols in disease diagnosis and treatment.
Collapse
Affiliation(s)
- Dragomirka Jovic
- Lars Bolund Institute of Regenerative MedicineQingdao‐Europe Advanced Institute for Life SciencesQingdaoChina
- BGI‐ShenzhenShenzhenChina
| | - Xue Liang
- Lars Bolund Institute of Regenerative MedicineQingdao‐Europe Advanced Institute for Life SciencesQingdaoChina
- BGI‐ShenzhenShenzhenChina
- Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| | - Hua Zeng
- Nanjing University of Chinese MedicineNanjingChina
| | - Lin Lin
- Department of BiomedicineAarhus UniversityAarhusDenmark
- Steno Diabetes Center AarhusAarhus University HospitalAarhusDenmark
| | - Fengping Xu
- Lars Bolund Institute of Regenerative MedicineQingdao‐Europe Advanced Institute for Life SciencesQingdaoChina
- BGI‐ShenzhenShenzhenChina
| | - Yonglun Luo
- Lars Bolund Institute of Regenerative MedicineQingdao‐Europe Advanced Institute for Life SciencesQingdaoChina
- BGI‐ShenzhenShenzhenChina
- Department of BiomedicineAarhus UniversityAarhusDenmark
- Steno Diabetes Center AarhusAarhus University HospitalAarhusDenmark
| |
Collapse
|
34
|
Lo YC, Keyes TJ, Jager A, Sarno J, Domizi P, Majeti R, Sakamoto KM, Lacayo N, Mullighan CG, Waters J, Sahaf B, Bendall SC, Davis KL. CytofIn enables integrated analysis of public mass cytometry datasets using generalized anchors. Nat Commun 2022; 13:934. [PMID: 35177627 PMCID: PMC8854441 DOI: 10.1038/s41467-022-28484-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 01/27/2022] [Indexed: 11/09/2022] Open
Abstract
The increasing use of mass cytometry for analyzing clinical samples offers the possibility to perform comparative analyses across public datasets. However, challenges in batch normalization and data integration limit the comparison of datasets not intended to be analyzed together. Here, we present a data integration strategy, CytofIn, using generalized anchors to integrate mass cytometry datasets from the public domain. We show that low-variance controls, such as healthy samples and stable channels, are inherently homogeneous, robust against stimulation, and can serve as generalized anchors for batch correction. Single-cell quantification comparing mass cytometry data from 989 leukemia files pre- and post normalization with CytofIn demonstrates effective batch correction while recapitulating the gold-standard bead normalization. CytofIn integration of public cancer datasets enabled the comparison of immune features across histologies and treatments. We demonstrate the ability to integrate public datasets without necessitating identical control samples or bead standards for fast and robust analysis using CytofIn.
Collapse
Affiliation(s)
- Yu-Chen Lo
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Timothy J Keyes
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
- Medical Scientist Training Program, Stanford University School of Medicine, Stanford, CA, USA
| | - Astraea Jager
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Jolanda Sarno
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Pablo Domizi
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ravindra Majeti
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Kathleen M Sakamoto
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Norman Lacayo
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Charles G Mullighan
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Jeffrey Waters
- Center for Cancer Cellular Therapy, Cancer Correlative Sciences Unit, Stanford University School of Medicine, Stanford, CA, USA
| | - Bita Sahaf
- Center for Cancer Cellular Therapy, Cancer Correlative Sciences Unit, Stanford University School of Medicine, Stanford, CA, USA
| | - Sean C Bendall
- Center for Cancer Cellular Therapy, Cancer Correlative Sciences Unit, Stanford University School of Medicine, Stanford, CA, USA
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Kara L Davis
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA.
- Center for Cancer Cellular Therapy, Cancer Correlative Sciences Unit, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
35
|
Wang X, Wang J, Zhang H, Huang S, Yin Y. HDMC: a novel deep learning-based framework for removing batch effects in single-cell RNA-seq data. Bioinformatics 2022; 38:1295-1303. [PMID: 34864918 DOI: 10.1093/bioinformatics/btab821] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/25/2021] [Accepted: 11/30/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION With the development of single-cell RNA sequencing (scRNA-seq) techniques, increasingly more large-scale gene expression datasets become available. However, to analyze datasets produced by different experiments, batch effects among different datasets must be considered. Although several methods have been recently published to remove batch effects in scRNA-seq data, two problems remain to be challenging and not completely solved: (i) how to reduce the distribution differences of different batches more accurately; and (ii) how to align samples from different batches to recover the cell type clusters. RESULTS We proposed a novel deep-learning approach, which is a hierarchical distribution-matching framework assisted with contrastive learning to address these two problems. Firstly, we design a hierarchical framework for distribution matching based on a deep autoencoder. This framework employs an adversarial training strategy to match the global distribution of different batches. This provides an improved foundation to further match the local distributions with a maximum mean discrepancy-based loss. For local matching, we divide cells in each batch into clusters and develop a contrastive learning mechanism to simultaneously align similar cluster pairs and keep noisy pairs apart from each other. This allows to obtain clusters with all cells of the same type (true positives), and avoid clusters with cells of different type (false positives). We demonstrate the effectiveness of our method on both simulated and real datasets. Results show that our new method significantly outperforms the state-of-the-art methods and has the ability to prevent overcorrection. AVAILABILITY AND IMPLEMENTATION The python code to generate results and figures in this article is available at https://github.com/zhanglabNKU/HDMC, the data underlying this article is also available at this github repository. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiao Wang
- College of Computer Science, Nankai University, 300350 Tianjin, China.,Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, 300350 Tianjin, China
| | - Jia Wang
- College of Computer Science, Nankai University, 300350 Tianjin, China.,Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, 300350 Tianjin, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, 300350 Tianjin, China
| | - Shenwei Huang
- College of Computer Science, Nankai University, 300350 Tianjin, China.,Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, 300350 Tianjin, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
36
|
Automated identification of cell populations in flow cytometry data with transformers. Comput Biol Med 2022; 144:105314. [DOI: 10.1016/j.compbiomed.2022.105314] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 02/03/2022] [Accepted: 02/08/2022] [Indexed: 12/13/2022]
|
37
|
Kuret T, Sodin-Šemrl S, Leskošek B, Ferk P. Single Cell RNA Sequencing in Autoimmune Inflammatory Rheumatic Diseases: Current Applications, Challenges and a Step Toward Precision Medicine. Front Med (Lausanne) 2022; 8:822804. [PMID: 35118101 PMCID: PMC8804286 DOI: 10.3389/fmed.2021.822804] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 12/27/2021] [Indexed: 12/11/2022] Open
Abstract
Single cell RNA sequencing (scRNA-seq) represents a new large scale and high throughput technique allowing analysis of the whole transcriptome at the resolution of an individual cell. It has emerged as an imperative method in life science research, uncovering complex cellular networks and providing indices that will eventually lead to the development of more targeted and personalized therapies. The importance of scRNA-seq has been particularly highlighted through the analysis of complex biological systems, in which cellular heterogeneity is a key aspect, such as the immune system. Autoimmune inflammatory rheumatic diseases represent a group of disorders, associated with a dysregulated immune system and high patient heterogeneity in both pathophysiological and clinical aspects. This complicates the complete understanding of underlying pathological mechanisms, associated with limited therapeutic options available and their long-term inefficiency and even toxicity. There is an unmet need to investigate, in depth, the cellular and molecular mechanisms driving the pathogenesis of rheumatic diseases and drug resistance, identify novel therapeutic targets, as well as make a step forward in using stratified and informed therapeutic decisions, which could now be achieved with the use of single cell approaches. This review summarizes the current use of scRNA-seq in studying different rheumatic diseases, based on recent findings from published in vitro, in vivo, and clinical studies, as well as discusses the potential implementation of scRNA-seq in the development of precision medicine in rheumatology.
Collapse
Affiliation(s)
- Tadeja Kuret
- Faculty of Medicine, Institute of Cell Biology, University of Ljubljana, Ljubljana, Slovenia
| | - Snežna Sodin-Šemrl
- Department of Rheumatology, University Medical Centre Ljubljana, Ljubljana, Slovenia
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia
| | - Brane Leskošek
- Faculty of Medicine, Institute for Biostatistics and Medical Informatics/ELIXIR-SI Center, University of Ljubljana, Ljubljana, Slovenia
| | - Polonca Ferk
- Faculty of Medicine, Institute for Biostatistics and Medical Informatics/ELIXIR-SI Center, University of Ljubljana, Ljubljana, Slovenia
- *Correspondence: Polonca Ferk
| |
Collapse
|
38
|
Erfanian N, Derakhshani A, Nasseri S, Fereidouni M, Baradaran B, Jalili Tabrizi N, Brunetti O, Bernardini R, Silvestris N, Safarpour H. Immunotherapy of cancer in single-cell RNA sequencing era: A precision medicine perspective. Biomed Pharmacother 2021; 146:112558. [PMID: 34953396 DOI: 10.1016/j.biopha.2021.112558] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 12/16/2021] [Accepted: 12/19/2021] [Indexed: 12/31/2022] Open
Abstract
Immunotherapy has revolutionized cancer treatment and brought new aspects into tumor immunology. Effective immunotherapy will require using the suitable target antigens, optimizing the interaction between the antigenic peptide, the APC, and the T cell, and the simultaneous inhibitor of the negative regulatory process that inhibits immunotherapeutic effects and develop resistance. Tumor heterogeneity and its microenvironment is the leading cause of resistance in patients. Recently by emerging the single-cell RNA sequencing technology and its combination with immunotherapy, now we can specifically evaluate the mechanism of tumors in the face of immunotherapy agents at the single-cell resolution by detecting the transcriptional activity of immune checkpoints, screening neoantigens with high transcription levels, identifying rare cells, and other important processes. This review focuses on scRNA-seq, particularly on its application in cancer immunotherapy.
Collapse
Affiliation(s)
- Nafiseh Erfanian
- Student Research Committee, Birjand University of Medical Sciences, Birjand, Iran
| | - Afshin Derakhshani
- Experimental Pharmacology, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy
| | - Saeed Nasseri
- Cellular & Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | - Mohammad Fereidouni
- Cellular & Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | - Behzad Baradaran
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran; Department of Immunology, School of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Neda Jalili Tabrizi
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Oronzo Brunetti
- Medical Oncology Unit, IRCCS Istituto Tumori "Giovanni Paolo II" of Bari, Bari, Italy
| | - Renato Bernardini
- Department of Biomedical and Biotechnological Sciences, University of Catania, Via S. Sofia 97, Catania, Italy
| | - Nicola Silvestris
- Medical Oncology Unit, IRCCS Istituto Tumori "Giovanni Paolo II" of Bari, Bari, Italy; Department of Biomedical Sciences and Human Oncology (DIMO), University of Bari, Bari, Italy.
| | - Hossein Safarpour
- Cellular & Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran.
| |
Collapse
|
39
|
Maeda Y, Wada H, Sugiyama D, Saito T, Irie T, Itahashi K, Minoura K, Suzuki S, Kojima T, Kakimi K, Nakajima J, Funakoshi T, Iida S, Oka M, Shimamura T, Doi T, Doki Y, Nakayama E, Ueda R, Nishikawa H. Depletion of central memory CD8 + T cells might impede the antitumor therapeutic effect of Mogamulizumab. Nat Commun 2021; 12:7280. [PMID: 34907192 PMCID: PMC8671535 DOI: 10.1038/s41467-021-27574-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 11/29/2021] [Indexed: 11/09/2022] Open
Abstract
Regulatory T (Treg) cells are important negative regulators of immune homeostasis, but in cancers they tone down the anti-tumor immune response. They are distinguished by high expression levels of the chemokine receptor CCR4, hence their targeting by the anti-CCR4 monoclonal antibody mogamulizumab holds therapeutic promise. Here we show that despite a significant reduction in peripheral effector Treg cells, clinical responses are minimal in a cohort of patients with advanced CCR4-negative solid cancer in a phase Ib study (NCT01929486). Comprehensive immune-monitoring reveals that the abundance of CCR4-expressing central memory CD8+ T cells that are known to play roles in the antitumor immune response is reduced. In long survivors, characterised by lower CCR4 expression in their central memory CD8+ T cells possessed and/or NK cells with an exhausted phenotype, cell numbers are eventually maintained. Our study thus shows that mogamulizumab doses that are currently administered to patients in clinical studies may not differentiate between targeting effector Treg cells and central memory CD8+ T cells, and dosage refinement might be necessary to avoid depletion of effector components during immune therapy.
Collapse
MESH Headings
- Aged
- Aged, 80 and over
- Antibodies, Monoclonal, Humanized/therapeutic use
- Antineoplastic Agents/therapeutic use
- CD8-Positive T-Lymphocytes/drug effects
- CD8-Positive T-Lymphocytes/metabolism
- Dose-Response Relationship, Drug
- Female
- Humans
- Immunotherapy
- Killer Cells, Natural/drug effects
- Killer Cells, Natural/metabolism
- Male
- Memory T Cells/drug effects
- Middle Aged
- Neoplasms/drug therapy
- Neoplasms/immunology
- Receptors, CCR4/antagonists & inhibitors
- Receptors, CCR4/metabolism
- T-Lymphocytes, Regulatory/drug effects
- T-Lymphocytes, Regulatory/metabolism
- Treatment Outcome
Collapse
Grants
- Research Activity Start-up grant no. 15H06878, for Young Scientists (B) grant no. 17K15738 from the Ministry of Education, Culture, Sports, Science and Technology of Japan.the Projects for Cancer Research by Therapeutic Evolution [P-CREATE, no. 17cm0106322h0002]
- Scientific Research (B) grant no. 19H03729 from the Ministry of Education, Culture, Sports, Science and Technology of Japan.
- the Development of Technology for Patient Stratification Biomarker Discovery grant [no.19ae0101074s0401] from the Japan Agency for Medical Research and Development (AMED)
- Grants-in-Aid for Scientific Research (S) grant no. 17H06162, for Challenging Exploratory Research grant no. 16K15551, from the Ministry of Education, Culture, Sports, Science and Technology of Japan; the Projects for Cancer Research by Therapeutic Evolution [P-CREATE, no. 16cm0106301h0001, the Development of Technology for Patient Stratification Biomarker Discovery grant [no.19ae0101074s0401] from the Japan Agency for Medical Research and Development (AMED), the National Cancer Center Research and Development Fund [no. 28-A-7 and 31-A-7]
Collapse
Affiliation(s)
- Yuka Maeda
- Division of Cancer Immunology, Research Institute/Exploratory Oncology Research & Clinical Trial Center (EPOC), National Cancer Center, Tokyo, 104-0045/Chiba, 277-8577, Japan
| | - Hisashi Wada
- Department of Clinical Research in Tumor Immunology, Osaka University Graduate School of Medicine, Osaka, 565-0871, Japan.
| | - Daisuke Sugiyama
- Department of Immunology, Nagoya University Graduate School of Medicine, Nagoya, 466-8550, Japan
| | - Takuro Saito
- Department of Gastroenterological Surgery, Osaka University Graduate School of Medicine, Osaka, 565-0871, Japan
| | - Takuma Irie
- Division of Cancer Immunology, Research Institute/Exploratory Oncology Research & Clinical Trial Center (EPOC), National Cancer Center, Tokyo, 104-0045/Chiba, 277-8577, Japan
| | - Kota Itahashi
- Division of Cancer Immunology, Research Institute/Exploratory Oncology Research & Clinical Trial Center (EPOC), National Cancer Center, Tokyo, 104-0045/Chiba, 277-8577, Japan
| | - Kodai Minoura
- Department of Systems Biology, Nagoya University Graduate School of Medicine, Nagoya, 466-8550, Japan
| | - Susumu Suzuki
- Department of Tumor Immunology, Aichi Medical University, Aichi, 480-1195, Japan
| | - Takashi Kojima
- Department of Gastrointestinal Oncology, National Cancer Center Hospital East, Chiba, 277-8577, Japan
| | - Kazuhiro Kakimi
- Department of Immunotherapeutics, The University of Tokyo Hospital, Tokyo, 113-8655, Japan
| | - Jun Nakajima
- Department of Thoracic Surgery, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-8655, Japan
| | - Takeru Funakoshi
- Department of Dermatology, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Shinsuke Iida
- Department of Hematology and Oncology, Nagoya City University Institute of Medical and Pharmaceutical Sciences, Nagoya, 467-8601, Japan
| | - Mikio Oka
- Department of Respiratory Medicine, Kawasaki Medical School, Okayama 701-0192, Japan
| | - Teppei Shimamura
- Department of Systems Biology, Nagoya University Graduate School of Medicine, Nagoya, 466-8550, Japan
| | - Toshihiko Doi
- Department of Gastrointestinal Oncology, National Cancer Center Hospital East, Chiba, 277-8577, Japan
| | - Yuichiro Doki
- Department of Gastroenterological Surgery, Osaka University Graduate School of Medicine, Osaka, 565-0871, Japan
| | - Eiichi Nakayama
- Faculty of Health and Welfare, Kawasaki University of Medical Welfare, Okayama, 701-0192, Japan
| | - Ryuzo Ueda
- Department of Tumor Immunology, Aichi Medical University, Aichi, 480-1195, Japan.
| | - Hiroyoshi Nishikawa
- Division of Cancer Immunology, Research Institute/Exploratory Oncology Research & Clinical Trial Center (EPOC), National Cancer Center, Tokyo, 104-0045/Chiba, 277-8577, Japan.
- Department of Immunology, Nagoya University Graduate School of Medicine, Nagoya, 466-8550, Japan.
| |
Collapse
|
40
|
Bao S, Li K, Yan C, Zhang Z, Qu J, Zhou M. Deep learning-based advances and applications for single-cell RNA-sequencing data analysis. Brief Bioinform 2021; 23:6444320. [PMID: 34849562 DOI: 10.1093/bib/bbab473] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/24/2021] [Accepted: 10/15/2021] [Indexed: 11/14/2022] Open
Abstract
The rapid development of single-cell RNA-sequencing (scRNA-seq) technology has raised significant computational and analytical challenges. The application of deep learning to scRNA-seq data analysis is rapidly evolving and can overcome the unique challenges in upstream (quality control and normalization) and downstream (cell-, gene- and pathway-level) analysis of scRNA-seq data. In the present study, recent advances and applications of deep learning-based methods, together with specific tools for scRNA-seq data analysis, were summarized. Moreover, the future perspectives and challenges of deep-learning techniques regarding the appropriate analysis and interpretation of scRNA-seq data were investigated. The present study aimed to provide evidence supporting the biomedical application of deep learning-based tools and may aid biologists and bioinformaticians in navigating this exciting and fast-moving area.
Collapse
Affiliation(s)
- Siqi Bao
- School of Information and Communication Engineering, Hainan University, Haikou 570228, P. R. China.,School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.,Hainan Institute of Real World Data, Haikou 570228, P. R. China
| | - Ke Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Congcong Yan
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Zicheng Zhang
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jia Qu
- School of Information and Communication Engineering, Hainan University, Haikou 570228, P. R. China.,School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.,Hainan Institute of Real World Data, Haikou 570228, P. R. China
| | - Meng Zhou
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| |
Collapse
|
41
|
Tinnevelt GH, Wouters K, Postma GJ, Folcarelli R, Jansen JJ. High-throughput single cell data analysis - A tutorial. Anal Chim Acta 2021; 1185:338872. [PMID: 34711307 DOI: 10.1016/j.aca.2021.338872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 06/28/2021] [Accepted: 07/21/2021] [Indexed: 11/30/2022]
Abstract
White blood cells protect the body against disease but may also cause chronic inflammation, auto-immune diseases or leukemia. There are many different white blood cell types whose identity and function can be studied by measuring their protein expression. Therefore, high-throughput analytical instruments were developed to measure multiple proteins on millions of single cells. The information-rich biochemistry information may only be fully extracted using multivariate statistics. Here we show an overview of the most essential steps for multivariate data analysis of single cell data. We used white blood cells (immunology) as a case study, but a similar approach may be used in environment or biotech research. The first step is analyzing the study design and subsequently formulating a research question. The three main designs are immunophenotyping (finding different cell types), cell activation and rare cell discovery. When preparing the data it is essential to consider the design and focus on the cell type of interest by removing all unwanted events. After pre-processing, the ten-thousands to millions of single cells per sample need to be converted into a cellular distribution. For immunophenotyping a clustering method such as Self-Organizing Maps is useful and for cell activation a model that describes the covariance such as Principal Component Analysis is useful. In rare cell discovery it is useful to first model all common cells and remove them to find the rare cells. Finally discriminant analysis based on the cellular distribution may highlight which cell (sub)types are different between groups.
Collapse
Affiliation(s)
- Gerjen H Tinnevelt
- Radboud University, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500, GL, Nijmegen, the Netherlands.
| | - Kristiaan Wouters
- Department of Internal Medicine, Laboratory of Metabolism and Vascular Medicine, P.O. Box 616 (UNS50/14), 6200, MD, Maastricht, the Netherlands
| | - Geert J Postma
- Radboud University, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500, GL, Nijmegen, the Netherlands
| | - Rita Folcarelli
- Corbion, Arkelsedijk 46, 4206, AC, Gorinchem, the Netherlands
| | - Jeroen J Jansen
- Radboud University, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500, GL, Nijmegen, the Netherlands
| |
Collapse
|
42
|
Wang J, Zou Q, Lin C. A comparison of deep learning-based pre-processing and clustering approaches for single-cell RNA sequencing data. Brief Bioinform 2021; 23:6361043. [PMID: 34472590 DOI: 10.1093/bib/bbab345] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 07/22/2021] [Accepted: 08/04/2021] [Indexed: 11/13/2022] Open
Abstract
The emergence of single cell RNA sequencing has facilitated the studied of genomes, transcriptomes and proteomes. As available single-cell RNA-seq datasets are released continuously, one of the major challenges facing traditional RNA analysis tools is the high-dimensional, high-sparsity, high-noise and large-scale characteristics of single-cell RNA-seq data. Deep learning technologies match the characteristics of single-cell RNA-seq data perfectly and offer unprecedented promise. Here, we give a systematic review for most popular single-cell RNA-seq analysis methods and tools based on deep learning models, involving the procedures of data preprocessing (quality control, normalization, data correction, dimensionality reduction and data visualization) and clustering task for downstream analysis. We further evaluate the deep model-based analysis methods of data correction and clustering quantitatively on 11 gold standard datasets. Moreover, we discuss the data preferences of these methods and their limitations, and give some suggestions and guidance for users to select appropriate methods and tools.
Collapse
Affiliation(s)
- Jiacheng Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- School of Informatics, Xiamen University, Xiamen, China
| | - Chen Lin
- School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|
43
|
Zou B, Zhang T, Zhou R, Jiang X, Yang H, Jin X, Bai Y. deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors. Front Genet 2021; 12:708981. [PMID: 34447413 PMCID: PMC8383340 DOI: 10.3389/fgene.2021.708981] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 07/16/2021] [Indexed: 01/04/2023] Open
Abstract
It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis.
Collapse
Affiliation(s)
- Bin Zou
- BGI-Shenzhen, Shenzhen, China
| | | | - Ruilong Zhou
- BGI-Shenzhen, Shenzhen, China.,College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaosen Jiang
- BGI-Shenzhen, Shenzhen, China.,College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen, China.,James D. Watson Institute of Genome Sciences, Hangzhou, China
| | - Xin Jin
- BGI-Shenzhen, Shenzhen, China.,School of Medicine, South China University of Technology, Guangzhou, China.,Guangdong Provincial Key Laboratory of Human Disease Genomics, Shenzhen Key Laboratory of Genomics, BGI-Shenzhen, Shenzhen, China
| | | |
Collapse
|
44
|
Robust integration of multiple single-cell RNA sequencing datasets using a single reference space. Nat Biotechnol 2021; 39:877-884. [PMID: 33767393 PMCID: PMC8456427 DOI: 10.1038/s41587-021-00859-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 02/16/2021] [Indexed: 01/31/2023]
Abstract
In many biological applications of single-cell RNA sequencing (scRNA-seq), an integrated analysis of data from multiple batches or studies is necessary. Current methods typically achieve integration using shared cell types or covariance correlation between datasets, which can distort biological signals. Here we introduce an algorithm that uses the gene eigenvectors from a reference dataset to establish a global frame for integration. Using simulated and real datasets, we demonstrate that this approach, called Reference Principal Component Integration (RPCI), consistently outperforms other methods by multiple metrics, with clear advantages in preserving genuine cross-sample gene expression differences in matching cell types, such as those present in cells at distinct developmental stages or in perturbated versus control studies. Moreover, RPCI maintains this robust performance when multiple datasets are integrated. Finally, we applied RPCI to scRNA-seq data for mouse gut endoderm development and revealed temporal emergence of genetic programs helping establish the anterior-posterior axis in visceral endoderm.
Collapse
|
45
|
Chen Y, Song J, Ruan Q, Zeng X, Wu L, Cai L, Wang X, Yang C. Single-Cell Sequencing Methodologies: From Transcriptome to Multi-Dimensional Measurement. SMALL METHODS 2021; 5:e2100111. [PMID: 34927917 DOI: 10.1002/smtd.202100111] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/26/2021] [Indexed: 06/14/2023]
Abstract
Cells are the basic building blocks of biological systems, with inherent unique molecular features and development trajectories. The study of single cells facilitates in-depth understanding of cellular diversity, disease processes, and organization of multicellular organisms. Single-cell RNA sequencing (scRNA-seq) technologies have become essential tools for the interrogation of gene expression patterns and the dynamics of single cells, allowing cellular heterogeneity to be dissected at unprecedented resolution. Nevertheless, measuring at only transcriptome level or 1D is incomplete; the cellular heterogeneity reflects in multiple dimensions, including the genome, epigenome, transcriptome, spatial, and even temporal dimensions. Hence, integrative single cell analysis is highly desired. In addition, the way to interpret sequencing data by virtue of bioinformatic tools also exerts critical roles in revealing differential gene expression. Here, a comprehensive review that summarizes the cutting-edge single-cell transcriptome sequencing methodologies, including scRNA-seq, spatial and temporal transcriptome profiling, multi-omics sequencing and computational methods developed for scRNA-seq data analysis is provided. Finally, the challenges and perspectives of this field are discussed.
Collapse
Affiliation(s)
- Yingwen Chen
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Jia Song
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| | - Qingyu Ruan
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Xi Zeng
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Lingling Wu
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| | - Linfeng Cai
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Xuanqun Wang
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Chaoyong Yang
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| |
Collapse
|
46
|
Detection of differentially abundant cell subpopulations in scRNA-seq data. Proc Natl Acad Sci U S A 2021; 118:2100293118. [PMID: 34001664 DOI: 10.1073/pnas.2100293118] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Comprehensive and accurate comparisons of transcriptomic distributions of cells from samples taken from two different biological states, such as healthy versus diseased individuals, are an emerging challenge in single-cell RNA sequencing (scRNA-seq) analysis. Current methods for detecting differentially abundant (DA) subpopulations between samples rely heavily on initial clustering of all cells in both samples. Often, this clustering step is inadequate since the DA subpopulations may not align with a clear cluster structure, and important differences between the two biological states can be missed. Here, we introduce DA-seq, a targeted approach for identifying DA subpopulations not restricted to clusters. DA-seq is a multiscale method that quantifies a local DA measure for each cell, which is computed from its k nearest neighboring cells across a range of k values. Based on this measure, DA-seq delineates contiguous significant DA subpopulations in the transcriptomic space. We apply DA-seq to several scRNA-seq datasets and highlight its improved ability to detect differences between distinct phenotypes in severe versus mildly ill COVID-19 patients, melanomas subjected to immune checkpoint therapy comparing responders to nonresponders, embryonic development at two time points, and young versus aging brain tissue. DA-seq enabled us to detect differences between these phenotypes. Importantly, we find that DA-seq not only recovers the DA cell types as discovered in the original studies but also reveals additional DA subpopulations that were not described before. Analysis of these subpopulations yields biological insights that would otherwise be undetected using conventional computational approaches.
Collapse
|
47
|
Li L, Xiong F, Wang Y, Zhang S, Gong Z, Li X, He Y, Shi L, Wang F, Liao Q, Xiang B, Zhou M, Li X, Li Y, Li G, Zeng Z, Xiong W, Guo C. What are the applications of single-cell RNA sequencing in cancer research: a systematic review. JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH : CR 2021; 40:163. [PMID: 33975628 PMCID: PMC8111731 DOI: 10.1186/s13046-021-01955-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 12/18/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a tool for studying gene expression at the single-cell level that has been widely used due to its unprecedented high resolution. In the present review, we outline the preparation process and sequencing platforms for the scRNA-seq analysis of solid tumor specimens and discuss the main steps and methods used during data analysis, including quality control, batch-effect correction, normalization, cell cycle phase assignment, clustering, cell trajectory and pseudo-time reconstruction, differential expression analysis and gene set enrichment analysis, as well as gene regulatory network inference. Traditional bulk RNA sequencing does not address the heterogeneity within and between tumors, and since the development of the first scRNA-seq technique, this approach has been widely used in cancer research to better understand cancer cell biology and pathogenetic mechanisms. ScRNA-seq has been of great significance for the development of targeted therapy and immunotherapy. In the second part of this review, we focus on the application of scRNA-seq in solid tumors, and summarize the findings and achievements in tumor research afforded by its use. ScRNA-seq holds promise for improving our understanding of the molecular characteristics of cancer, and potentially contributing to improved diagnosis, prognosis, and therapeutics.
Collapse
Affiliation(s)
- Lvyuan Li
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Fang Xiong
- Department of Stomatology, Xiangya Hospital, Central South University, Changsha, China
| | - Yumin Wang
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China.,Department of Stomatology, Xiangya Hospital, Central South University, Changsha, China
| | - Shanshan Zhang
- Department of Stomatology, Xiangya Hospital, Central South University, Changsha, China
| | - Zhaojian Gong
- Department of Oral and Maxillofacial Surgery, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Xiayu Li
- Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Disease Genome Research Center, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Yi He
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China
| | - Lei Shi
- Department of Oral and Maxillofacial Surgery, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Fuyan Wang
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Qianjin Liao
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China
| | - Bo Xiang
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Ming Zhou
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Xiaoling Li
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Yong Li
- Department of Medicine, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Guiyuan Li
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Zhaoyang Zeng
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China
| | - Wei Xiong
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China. .,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China.
| | - Can Guo
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China. .,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, China.
| |
Collapse
|
48
|
Del Giudice M, Peirone S, Perrone S, Priante F, Varese F, Tirtei E, Fagioli F, Cereda M. Artificial Intelligence in Bulk and Single-Cell RNA-Sequencing Data to Foster Precision Oncology. Int J Mol Sci 2021; 22:ijms22094563. [PMID: 33925407 PMCID: PMC8123853 DOI: 10.3390/ijms22094563] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 04/21/2021] [Accepted: 04/23/2021] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence, or the discipline of developing computational algorithms able to perform tasks that requires human intelligence, offers the opportunity to improve our idea and delivery of precision medicine. Here, we provide an overview of artificial intelligence approaches for the analysis of large-scale RNA-sequencing datasets in cancer. We present the major solutions to disentangle inter- and intra-tumor heterogeneity of transcriptome profiles for an effective improvement of patient management. We outline the contributions of learning algorithms to the needs of cancer genomics, from identifying rare cancer subtypes to personalizing therapeutic treatments.
Collapse
Affiliation(s)
- Marco Del Giudice
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Candiolo Cancer Institute, FPO—IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy
| | - Serena Peirone
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Physics and INFN, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Sarah Perrone
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Physics, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Francesca Priante
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Physics, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Fabiola Varese
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Life Science and System Biology, Università degli Studi di Torino, via Accademia Albertina 13, 10123 Turin, Italy
| | - Elisa Tirtei
- Paediatric Onco-Haematology Division, Regina Margherita Children’s Hospital, City of Health and Science of Turin, 10126 Turin, Italy; (E.T.); (F.F.)
| | - Franca Fagioli
- Paediatric Onco-Haematology Division, Regina Margherita Children’s Hospital, City of Health and Science of Turin, 10126 Turin, Italy; (E.T.); (F.F.)
- Department of Public Health and Paediatric Sciences, University of Torino, 10124 Turin, Italy
| | - Matteo Cereda
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Candiolo Cancer Institute, FPO—IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy
- Correspondence: ; Tel.: +39-011-993-3969
| |
Collapse
|
49
|
JSOM: Jointly-evolving self-organizing maps for alignment of biological datasets and identification of related clusters. PLoS Comput Biol 2021; 17:e1008804. [PMID: 33724985 PMCID: PMC7963045 DOI: 10.1371/journal.pcbi.1008804] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 02/15/2021] [Indexed: 11/19/2022] Open
Abstract
With the rapid advances of various single-cell technologies, an increasing number of single-cell datasets are being generated, and the computational tools for aligning the datasets which make subsequent integration or meta-analysis possible have become critical. Typically, single-cell datasets from different technologies cannot be directly combined or concatenated, due to the innate difference in the data, such as the number of measured parameters and the distributions. Even datasets generated by the same technology are often affected by the batch effect. A computational approach for aligning different datasets and hence identifying related clusters will be useful for data integration and interpretation in large scale single-cell experiments. Our proposed algorithm called JSOM, a variation of the Self-organizing map, aligns two related datasets that contain similar clusters, by constructing two maps—low-dimensional discretized representation of datasets–that jointly evolve according to both datasets. Here we applied the JSOM algorithm to flow cytometry, mass cytometry, and single-cell RNA sequencing datasets. The resulting JSOM maps not only align the related clusters in the two datasets but also preserve the topology of the datasets so that the maps could be used for further analysis, such as clustering. Biological datasets are now generated more than ever as many data acquisition technologies have been developed over the years, especially single-cell technologies. With increasing amounts of datasets available for larger scale studies, robust computational tools that could align datasets are needed for data integration and interpretation. We present a new algorithm that can align two biological datasets and demonstrated that the algorithm can work with data generated from different data acquisition technologies. Our proposed algorithm produces low dimensional representations of two datasets to align them in a way that preserves the topology of the respective datasets. Such aligned maps facilitate further analysis, such as clustering. The proposed algorithm showed promising results when applied to different combinations of datasets, i.e., flow cytometry to flow cytometry, flow cytometry to mass cytometry, and two different single-cell RNA sequencing technologies. Therefore, our newly developed algorithm could potentially lead to new discoveries that were once difficult to obtain.
Collapse
|
50
|
Zhong PC, Shu R, Wu HW, Liu ZW, Shen XL, Hu YJ. Altered gene expression in glycolysis-cholesterol synthesis axis correlates with outcome of triple-negative breast cancer. Exp Biol Med (Maywood) 2021; 246:560-571. [PMID: 33243007 PMCID: PMC7934150 DOI: 10.1177/1535370220975206] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Accepted: 10/30/2020] [Indexed: 12/31/2022] Open
Abstract
Identification of molecular subtypes of clinically resectable triple-negative breast cancer (TNBC) is of great importance to achieve better clinical outcomes. Inter- and intratumor metabolic heterogeneity improves cancer survival, and the interaction of various metabolic pathways may affect treatment outcome of TNBC. We speculated that TNBC can be categorized into prognostic metabolic subtype according to the expression changes of glycolysis and cholesterol synthesis. The genome, transcriptome, and clinical data were downloaded from the Cancer Genome Atlas and Molecular Taxonomy of Breast Cancer International Consortium and subsequently analyzed by integrated bioinformatics methods. Four subtypes, namely, glycolytic, cholesterogenic, quiescent, and mixed, were classified according to the normalized median expressions of the genes involved in glycolysis and cholesterol synthesis. In the four subtypes, the cholesterogenic type was correlated with the shortest median survival (log rank P = 0.044), while patients with high-expressed glycolytic genes tended to have a longer survival. Tumors with PIK3CA amplification and dynein axonemal heavy chain 2 deletion exhibited higher expressions of cholesterogenic genes than other mutant oncogenes. The expressions of mitochondrial pyruvate carrier MPC1 and MPC2 were the lowest in quiescent tumor, and MPC2 expression was higher in cholesterogenic tumor compared with glycolytic or quiescent tumor (t-test P < 0.001). Glycolytic and cholesterogenic gene expressions were related to the expressions of prognostic genes in some other types of cancers. Classification of glycolytic and cholesterogenic pathways according to metabolic characteristics provides a new understanding to previously identified subtypes of TNBC and could improve personalized treatments based on tumor metabolic profiles.
Collapse
Affiliation(s)
- Peng-Cheng Zhong
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, China
| | - Rong Shu
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, China
| | - Hui-Wen Wu
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, China
| | - Zhi-Wen Liu
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, China
| | - Xiao-Ling Shen
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, China
| | - Ying-Jie Hu
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510405, China
| |
Collapse
|