1
|
Yu H, Zheng Y, Yang X. scDM: A deep generative method for cell surface protein prediction with diffusion model. J Mol Biol 2024; 436:168610. [PMID: 38754773 DOI: 10.1016/j.jmb.2024.168610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/06/2024] [Accepted: 05/09/2024] [Indexed: 05/18/2024]
Abstract
The executors of organismal functions are proteins, and the transition from RNA to protein is subject to post-transcriptional regulation; therefore, considering both RNA and surface protein expression simultaneously can provide additional evidence of biological processes. Cellular indexing of transcriptomes and epitopes by sequencing (CITE-Seq) technology can measure both RNA and protein expression in single cells, but these experiments are expensive and time-consuming. Due to the lack of computational tools for predicting surface proteins, we used datasets obtained with CITE-seq technology to design a deep generative prediction method based on diffusion models and to find biological discoveries through the prediction results. In our method, the scDM, which predicts protein expression values from RNA expression values of individual cells, uses a novel way of encoding the data into a model and generates predicted samples by introducing Gaussian noise to gradually remove the noise to learn the data distribution during the modelling process. Comprehensive evaluation across different datasets demonstrated that our predictions yielded satisfactory results and further demonstrated the effectiveness of incorporating information from single-cell multiomics data into diffusion models for biological studies. We also found that new directions for discovering therapeutic drug targets could be provided by jointly analysing the predictive value of surface protein expression and cancer cell drug scores.
Collapse
Affiliation(s)
- Hanlei Yu
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| | - Yuanjie Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China.
| | - Xinbo Yang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| |
Collapse
|
2
|
Tao Q, Xu Y, He Y, Luo T, Li X, Han L. Benchmarking mapping algorithms for cell-type annotating in mouse brain by integrating single-nucleus RNA-seq and Stereo-seq data. Brief Bioinform 2024; 25:bbae250. [PMID: 38796691 PMCID: PMC11128029 DOI: 10.1093/bib/bbae250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/17/2024] [Accepted: 05/08/2024] [Indexed: 05/28/2024] Open
Abstract
Limited gene capture efficiency and spot size of spatial transcriptome (ST) data pose significant challenges in cell-type characterization. The heterogeneity and complexity of cell composition in the mammalian brain make it more challenging to accurately annotate ST data from brain. Many algorithms attempt to characterize subtypes of neuron by integrating ST data with single-nucleus RNA sequencing (snRNA-seq) or single-cell RNA sequencing. However, assessing the accuracy of these algorithms on Stereo-seq ST data remains unresolved. Here, we benchmarked 9 mapping algorithms using 10 ST datasets from four mouse brain regions in two different resolutions and 24 pseudo-ST datasets from snRNA-seq. Both actual ST data and pseudo-ST data were mapped using snRNA-seq datasets from the corresponding brain regions as reference data. After comparing the performance across different areas and resolutions of the mouse brain, we have reached the conclusion that both robust cell-type decomposition and SpatialDWLS demonstrated superior robustness and accuracy in cell-type annotation. Testing with publicly available snRNA-seq data from another sequencing platform in the cortex region further validated our conclusions. Altogether, we developed a workflow for assessing suitability of mapping algorithm that fits for ST datasets, which can improve the efficiency and accuracy of spatial data annotation.
Collapse
Affiliation(s)
- Quyuan Tao
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Hangzhou 310012, China
| | - Yiheng Xu
- Department of Neurobiology and Department of Neurology of Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, MOE Frontier Center of Brain Science and Brain-machine Integration, School of Brain Science and Brain Medicine, Zhejiang University, Hangzhou 310058, China
| | - Youzhe He
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Hangzhou 310012, China
| | - Ting Luo
- BGI Research, Hangzhou 310012, China
- BGI Research, Shenzhen 518103, China
| | - Xiaoming Li
- Department of Neurobiology and Department of Neurology of Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, MOE Frontier Center of Brain Science and Brain-machine Integration, School of Brain Science and Brain Medicine, Zhejiang University, Hangzhou 310058, China
- Research Units for Emotion and Emotion disorders, Chinese Academy of Medical Sciences, Beijing 100730, China
| | - Lei Han
- BGI Research, Hangzhou 310012, China
- BGI Research, Shenzhen 518103, China
| |
Collapse
|
3
|
Xu J, Huang D, Zhang X. scmFormer Integrates Large-Scale Single-Cell Proteomics and Transcriptomics Data by Multi-Task Transformer. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307835. [PMID: 38483032 PMCID: PMC11109621 DOI: 10.1002/advs.202307835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/24/2024] [Indexed: 05/23/2024]
Abstract
Transformer-based models have revolutionized single cell RNA-seq (scRNA-seq) data analysis. However, their applicability is challenged by the complexity and scale of single-cell multi-omics data. Here a novel single-cell multi-modal/multi-task transformer (scmFormer) is proposed to fill up the existing blank of integrating single-cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large-scale single-cell multimodal data and heterogeneous multi-batch paired multi-omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell-type labels from single-cell transcriptomics to proteomics data. Using COVID-19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well-suited for spatial multi-omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single-cell multi-omics data.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- University of Chinese Academy of SciencesBeijing100049China
| | - De‐Shuang Huang
- Eastern Institute for Advanced StudyEastern Institute of TechnologyNingbo315200China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- Center of Economic BotanyCore Botanical GardensChinese Academy of SciencesWuhan430074China
| |
Collapse
|
4
|
Shannon CP, Lee AH, Tebbutt SJ, Singh A. A Commentary on Multi-omics Data Integration in Systems Vaccinology. J Mol Biol 2024; 436:168522. [PMID: 38458605 DOI: 10.1016/j.jmb.2024.168522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 03/04/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Affiliation(s)
| | - Amy Hy Lee
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada
| | - Scott J Tebbutt
- PROOF Centre of Excellence, Vancouver, Canada; Department of Medicine, The University of British Columbia, Vancouver, Canada; Centre for Heart Lung Innovation, Vancouver, Canada
| | - Amrit Singh
- Centre for Heart Lung Innovation, Vancouver, Canada; Department of Anesthesiology, Pharmacology and Therapeutics, The University of British Columbia, Vancouver, Canada.
| |
Collapse
|
5
|
Haviv D, Remšík J, Gatie M, Snopkowski C, Takizawa M, Pereira N, Bashkin J, Jovanovich S, Nawy T, Chaligne R, Boire A, Hadjantonakis AK, Pe'er D. The covariance environment defines cellular niches for spatial inference. Nat Biotechnol 2024:10.1038/s41587-024-02193-4. [PMID: 38565973 DOI: 10.1038/s41587-024-02193-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 02/28/2024] [Indexed: 04/04/2024]
Abstract
A key challenge of analyzing data from high-resolution spatial profiling technologies is to suitably represent the features of cellular neighborhoods or niches. Here we introduce the covariance environment (COVET), a representation that leverages the gene-gene covariate structure across cells in the niche to capture the multivariate nature of cellular interactions within it. We define a principled optimal transport-based distance metric between COVET niches that scales to millions of cells. Using COVET to encode spatial context, we developed environmental variational inference (ENVI), a conditional variational autoencoder that jointly embeds spatial and single-cell RNA sequencing data into a latent space. ENVI includes two decoders: one to impute gene expression across the spatial modality and a second to project spatial information onto single-cell data. ENVI can confer spatial context to genomics data from single dissociated cells and outperforms alternatives for imputing gene expression on diverse spatial datasets.
Collapse
Affiliation(s)
- Doron Haviv
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Ján Remšík
- Human Oncology & Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Mohamed Gatie
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Catherine Snopkowski
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Meril Takizawa
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | | | | | - Tal Nawy
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ronan Chaligne
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Adrienne Boire
- Human Oncology & Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Department of Neurology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Brain Tumor Center, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Anna-Katerina Hadjantonakis
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dana Pe'er
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Howard Hughes Medical Institute, New York, NY, USA.
| |
Collapse
|
6
|
Li K, Li J, Tao Y, Wang F. stDiff: a diffusion model for imputing spatial transcriptomics through single-cell transcriptomics. Brief Bioinform 2024; 25:bbae171. [PMID: 38628114 PMCID: PMC11021815 DOI: 10.1093/bib/bbae171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/11/2024] [Accepted: 04/01/2024] [Indexed: 04/19/2024] Open
Abstract
Spatial transcriptomics (ST) has become a powerful tool for exploring the spatial organization of gene expression in tissues. Imaging-based methods, though offering superior spatial resolutions at the single-cell level, are limited in either the number of imaged genes or the sensitivity of gene detection. Existing approaches for enhancing ST rely on the similarity between ST cells and reference single-cell RNA sequencing (scRNA-seq) cells. In contrast, we introduce stDiff, which leverages relationships between gene expression abundance in scRNA-seq data to enhance ST. stDiff employs a conditional diffusion model, capturing gene expression abundance relationships in scRNA-seq data through two Markov processes: one introducing noise to transcriptomics data and the other denoising to recover them. The missing portion of ST is predicted by incorporating the original ST data into the denoising process. In our comprehensive performance evaluation across 16 datasets, utilizing multiple clustering and similarity metrics, stDiff stands out for its exceptional ability to preserve topological structures among cells, positioning itself as a robust solution for cell population identification. Moreover, stDiff's enhancement outcomes closely mirror the actual ST data within the batch space. Across diverse spatial expression patterns, our model accurately reconstructs them, delineating distinct spatial boundaries. This highlights stDiff's capability to unify the observed and predicted segments of ST data for subsequent analysis. We anticipate that stDiff, with its innovative approach, will contribute to advancing ST imputation methodologies.
Collapse
Affiliation(s)
- Kongming Li
- Shanghai Key Lab of Intelligent Information Processing, Handan Street, 200433 Shanghai, China
- School of Computer Science and Technology, Fudan UniversityHandan Street, 200433 Shanghai, China
| | - Jiahao Li
- Shanghai Key Lab of Intelligent Information Processing, Handan Street, 200433 Shanghai, China
- School of Computer Science and Technology, Fudan UniversityHandan Street, 200433 Shanghai, China
| | - Yuhao Tao
- Shanghai Key Lab of Intelligent Information Processing, Handan Street, 200433 Shanghai, China
- School of Computer Science and Technology, Fudan UniversityHandan Street, 200433 Shanghai, China
| | - Fei Wang
- Shanghai Key Lab of Intelligent Information Processing, Handan Street, 200433 Shanghai, China
- School of Computer Science and Technology, Fudan UniversityHandan Street, 200433 Shanghai, China
| |
Collapse
|
7
|
Zhang H, Wang Y, Lian B, Wang Y, Li X, Wang T, Shang X, Yang H, Aziz A, Hu J. Scbean: a python library for single-cell multi-omics data analysis. Bioinformatics 2024; 40:btae053. [PMID: 38290765 PMCID: PMC10868338 DOI: 10.1093/bioinformatics/btae053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/10/2024] [Accepted: 01/25/2024] [Indexed: 02/01/2024] Open
Abstract
SUMMARY Single-cell multi-omics technologies provide a unique platform for characterizing cell states and reconstructing developmental process by simultaneously quantifying and integrating molecular signatures across various modalities, including genome, transcriptome, epigenome, and other omics layers. However, there is still an urgent unmet need for novel computational tools in this nascent field, which are critical for both effective and efficient interrogation of functionality across different omics modalities. Scbean represents a user-friendly Python library, designed to seamlessly incorporate a diverse array of models for the examination of single-cell data, encompassing both paired and unpaired multi-omics data. The library offers uniform and straightforward interfaces for tasks, such as dimensionality reduction, batch effect elimination, cell label transfer from well-annotated scRNA-seq data to scATAC-seq data, and the identification of spatially variable genes. Moreover, Scbean's models are engineered to harness the computational power of GPU acceleration through Tensorflow, rendering them capable of effortlessly handling datasets comprising millions of cells. AVAILABILITY AND IMPLEMENTATION Scbean is released on the Python Package Index (PyPI) (https://pypi.org/project/scbean/) and GitHub (https://github.com/jhu99/scbean) under the MIT license. The documentation and example code can be found at https://scbean.readthedocs.io/en/latest/.
Collapse
Affiliation(s)
- Haohui Zhang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Yuwei Wang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Bin Lian
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Yiran Wang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Xingyi Li
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Hui Yang
- School of Life Science, Northwestern Polytechnical University, 710072 Xi'an, Shaanxi, China
| | - Ahmad Aziz
- Population Health Sciences, German Center for Neurodegenerative Diseases (DZNE), 53127 Bonn, Germany
- Department of Neurology, Faculty of Medicine, University of Bonn, 53105 Bonn, Germany
| | - Jialu Hu
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
- Population Health Sciences, German Center for Neurodegenerative Diseases (DZNE), 53127 Bonn, Germany
| |
Collapse
|
8
|
Wang L, Nie R, Miao X, Cai Y, Wang A, Zhang H, Zhang J, Cai J. InClust+: the deep generative framework with mask modules for multimodal data integration, imputation, and cross-modal generation. BMC Bioinformatics 2024; 25:41. [PMID: 38267858 PMCID: PMC10809631 DOI: 10.1186/s12859-024-05656-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 01/15/2024] [Indexed: 01/26/2024] Open
Abstract
BACKGROUND With the development of single-cell technology, many cell traits can be measured. Furthermore, the multi-omics profiling technology could jointly measure two or more traits in a single cell simultaneously. In order to process the various data accumulated rapidly, computational methods for multimodal data integration are needed. RESULTS Here, we present inClust+, a deep generative framework for the multi-omics. It's built on previous inClust that is specific for transcriptome data, and augmented with two mask modules designed for multimodal data processing: an input-mask module in front of the encoder and an output-mask module behind the decoder. InClust+ was first used to integrate scRNA-seq and MERFISH data from similar cell populations, and to impute MERFISH data based on scRNA-seq data. Then, inClust+ was shown to have the capability to integrate the multimodal data (e.g. tri-modal data with gene expression, chromatin accessibility and protein abundance) with batch effect. Finally, inClust+ was used to integrate an unlabeled monomodal scRNA-seq dataset and two labeled multimodal CITE-seq datasets, transfer labels from CITE-seq datasets to scRNA-seq dataset, and generate the missing modality of protein abundance in monomodal scRNA-seq data. In the above examples, the performance of inClust+ is better than or comparable to the most recent tools in the corresponding task. CONCLUSIONS The inClust+ is a suitable framework for handling multimodal data. Meanwhile, the successful implementation of mask in inClust+ means that it can be applied to other deep learning methods with similar encoder-decoder architecture to broaden the application scope of these models.
Collapse
Affiliation(s)
- Lifei Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China.
| | - Rui Nie
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuexia Miao
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yankai Cai
- School of Economic and Management, China University of Geoscience, Wuhan, China
| | - Anqi Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Hanwen Zhang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Jiang Zhang
- School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| | - Jun Cai
- China National Center for Bioinformation, Beijing, China.
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
9
|
He Z, Hu S, Chen Y, An S, Zhou J, Liu R, Shi J, Wang J, Dong G, Shi J, Zhao J, Ou-Yang L, Zhu Y, Bo X, Ying X. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol 2024:10.1038/s41587-023-02040-y. [PMID: 38263515 DOI: 10.1038/s41587-023-02040-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 10/23/2023] [Indexed: 01/25/2024]
Abstract
Integrating single-cell datasets produced by multiple omics technologies is essential for defining cellular heterogeneity. Mosaic integration, in which different datasets share only some of the measured modalities, poses major challenges, particularly regarding modality alignment and batch effect removal. Here, we present a deep probabilistic framework for the mosaic integration and knowledge transfer (MIDAS) of single-cell multimodal data. MIDAS simultaneously achieves dimensionality reduction, imputation and batch correction of mosaic data by using self-supervised modality alignment and information-theoretic latent disentanglement. We demonstrate its superiority to 19 other methods and reliability by evaluating its performance in trimodal and mosaic integration tasks. We also constructed a single-cell trimodal atlas of human peripheral blood mononuclear cells and tailored transfer learning and reciprocal reference mapping schemes to enable flexible and accurate knowledge transfer from the atlas to new data. Applications in mosaic integration, pseudotime analysis and cross-tissue knowledge transfer on bone marrow mosaic datasets demonstrate the versatility and superiority of MIDAS. MIDAS is available at https://github.com/labomics/midas .
Collapse
Affiliation(s)
- Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiahao Zhou
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Junfeng Shi
- School of Automation, China University of Geosciences, Wuhan, China
| | - Jing Wang
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jinhui Shi
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiaxin Zhao
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China.
| |
Collapse
|
10
|
Guo ZH, Wang YB, Wang S, Zhang Q, Huang DS. scCorrector: a robust method for integrating multi-study single-cell data. Brief Bioinform 2024; 25:bbad525. [PMID: 38271483 PMCID: PMC10810333 DOI: 10.1093/bib/bbad525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/12/2023] [Accepted: 12/19/2023] [Indexed: 01/27/2024] Open
Abstract
The advent of single-cell sequencing technologies has revolutionized cell biology studies. However, integrative analyses of diverse single-cell data face serious challenges, including technological noise, sample heterogeneity, and different modalities and species. To address these problems, we propose scCorrector, a variational autoencoder-based model that can integrate single-cell data from different studies and map them into a common space. Specifically, we designed a Study Specific Adaptive Normalization for each study in decoder to implement these features. scCorrector substantially achieves competitive and robust performance compared with state-of-the-art methods and brings novel insights under various circumstances (e.g. various batches, multi-omics, cross-species, and development stages). In addition, the integration of single-cell data and spatial data makes it possible to transfer information between different studies, which greatly expand the narrow range of genes covered by MERFISH technology. In summary, scCorrector can efficiently integrate multi-study single-cell datasets, thereby providing broad opportunities to tackle challenges emerging from noisy resources.
Collapse
Affiliation(s)
- Zhen-Hao Guo
- College of Electronics and Information Engineering, Tongji University, Shanghai 200000, China
| | - Yan-Bin Wang
- College of Computer Science and Technology, Zhejiang University 310027, China
| | - Siguo Wang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Tongxin Road No.568, Ningbo, Zhejiang 315201, China
| | - Qinhu Zhang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Tongxin Road No.568, Ningbo, Zhejiang 315201, China
| | - De-Shuang Huang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Tongxin Road No.568, Ningbo, Zhejiang 315201, China
| |
Collapse
|
11
|
Makrodimitris S, Pronk B, Abdelaal T, Reinders M. An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics. Brief Bioinform 2023; 25:bbad416. [PMID: 38018908 PMCID: PMC10685331 DOI: 10.1093/bib/bbad416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/26/2023] [Accepted: 10/30/2023] [Indexed: 11/30/2023] Open
Abstract
Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired information per sample, but recently there has been a rise in the popularity of neural architectures that embed paired -omics into the same non-linear manifold. This work describes a head-to-head comparison of linear and non-linear joint embedding methods using both bulk and single-cell multi-modal datasets. We found that non-linear methods have a clear advantage with respect to linear ones for missing modality imputation. Performance comparisons in the downstream tasks of survival analysis for bulk tumor data and cell type classification for single-cell data lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline and hard to beat if all modalities are available at test time. However, if we only have one modality available at test time, training a predictive model on the joint space of that modality can lead to performance improvements with respect to just using the unimodal principal components. Second, -omic profiles imputed by neural joint embedding methods are realistic enough to be used by a classifier trained on real data with limited performance drops. Taken together, our comparisons give hints to which joint embedding to use for which downstream task. Overall, product-of-experts performed well in most tasks and was reasonably fast, while early integration (concatenation) of modalities did quite poorly.
Collapse
Affiliation(s)
- Stavros Makrodimitris
- Delft Bioinformatics Lab, Delft University of Technology, Street, Postcode, State, Country
- Department of Medical Oncology, Erasmus University Medical Center, Street, Postcode, State, Country
- Department of Clinical Genetics, Erasmus University Medical Center, Street, Postcode, State, Country
| | - Bram Pronk
- Delft Bioinformatics Lab, Delft University of Technology, Street, Postcode, State, Country
| | - Tamim Abdelaal
- Delft Bioinformatics Lab, Delft University of Technology, Street, Postcode, State, Country
- Department of Radiology, Leiden University Medical Center, Street, Postcode, State, Country
- Leiden Computational Biology Center, Leiden University Medical Center, Street, Postcode, State, Country
| | - Marcel Reinders
- Delft Bioinformatics Lab, Delft University of Technology, Street, Postcode, State, Country
- Leiden Computational Biology Center, Leiden University Medical Center, Street, Postcode, State, Country
| |
Collapse
|
12
|
Tang L, Huang ZP, Mei H, Hu Y. Insights gained from single-cell analysis of chimeric antigen receptor T-cell immunotherapy in cancer. Mil Med Res 2023; 10:52. [PMID: 37941075 PMCID: PMC10631149 DOI: 10.1186/s40779-023-00486-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 10/10/2023] [Indexed: 11/10/2023] Open
Abstract
Advances in chimeric antigen receptor (CAR)-T cell therapy have significantly improved clinical outcomes of patients with relapsed or refractory hematologic malignancies. However, progress is still hindered as clinical benefit is only available for a fraction of patients. A lack of understanding of CAR-T cell behaviors in vivo at the single-cell level impedes their more extensive application in clinical practice. Mounting evidence suggests that single-cell sequencing techniques can help perfect the receptor design, guide gene-based T cell modification, and optimize the CAR-T manufacturing conditions, and all of them are essential for long-term immunosurveillance and more favorable clinical outcomes. The information generated by employing these methods also potentially informs our understanding of the numerous complex factors that dictate therapeutic efficacy and toxicities. In this review, we discuss the reasons why CAR-T immunotherapy fails in clinical practice and what this field has learned since the milestone of single-cell sequencing technologies. We further outline recent advances in the application of single-cell analyses in CAR-T immunotherapy. Specifically, we provide an overview of single-cell studies focusing on target antigens, CAR-transgene integration, and preclinical research and clinical applications, and then discuss how it will affect the future of CAR-T cell therapy.
Collapse
Affiliation(s)
- Lu Tang
- Institute of Hematology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
- Hubei Clinical Medical Center of Cell Therapy for Neoplastic Disease, Wuhan, 430022, China
- Key Laboratory of Biological Targeted Therapy, The Ministry of Education, Wuhan, 430022, China
| | - Zhong-Pei Huang
- Institute of Hematology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
- Hubei Clinical Medical Center of Cell Therapy for Neoplastic Disease, Wuhan, 430022, China
- Key Laboratory of Biological Targeted Therapy, The Ministry of Education, Wuhan, 430022, China
| | - Heng Mei
- Institute of Hematology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
- Hubei Clinical Medical Center of Cell Therapy for Neoplastic Disease, Wuhan, 430022, China.
- Key Laboratory of Biological Targeted Therapy, The Ministry of Education, Wuhan, 430022, China.
- Hubei Key Laboratory of Biological Targeted Therapy, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
| | - Yu Hu
- Institute of Hematology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
- Hubei Clinical Medical Center of Cell Therapy for Neoplastic Disease, Wuhan, 430022, China.
- Key Laboratory of Biological Targeted Therapy, The Ministry of Education, Wuhan, 430022, China.
- Hubei Key Laboratory of Biological Targeted Therapy, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
| |
Collapse
|
13
|
Zhang C, Yang Y, Tang S, Aihara K, Zhang C, Chen L. Contrastively generative self-expression model for single-cell and spatial multimodal data. Brief Bioinform 2023; 24:bbad265. [PMID: 37507114 DOI: 10.1093/bib/bbad265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/27/2023] [Accepted: 07/03/2023] [Indexed: 07/30/2023] Open
Abstract
Advances in single-cell multi-omics technology provide an unprecedented opportunity to fully understand cellular heterogeneity. However, integrating omics data from multiple modalities is challenging due to the individual characteristics of each measurement. Here, to solve such a problem, we propose a contrastive and generative deep self-expression model, called single-cell multimodal self-expressive integration (scMSI), which integrates the heterogeneous multimodal data into a unified manifold space. Specifically, scMSI first learns each omics-specific latent representation and self-expression relationship to consider the characteristics of different omics data by deep self-expressive generative model. Then, scMSI combines these omics-specific self-expression relations through contrastive learning. In such a way, scMSI provides a paradigm to integrate multiple omics data even with weak relation, which effectively achieves the representation learning and data integration into a unified framework. We demonstrate that scMSI provides a cohesive solution for a variety of analysis tasks, such as integration analysis, data denoising, batch correction and spatial domain detection. We have applied scMSI on various single-cell and spatial multimodal datasets to validate its high effectiveness and robustness in diverse data types and application scenarios.
Collapse
Affiliation(s)
- Chengming Zhang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| | - Yiwen Yang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Shijie Tang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Kazuyuki Aihara
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| | - Chuanchao Zhang
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong 519031, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong 519031, China
| |
Collapse
|
14
|
Fouché A, Chadoutaud L, Delattre O, Zinovyev A. Transmorph: a unifying computational framework for modular single-cell RNA-seq data integration. NAR Genom Bioinform 2023; 5:lqad069. [PMID: 37448589 PMCID: PMC10336778 DOI: 10.1093/nargab/lqad069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 06/02/2023] [Accepted: 07/10/2023] [Indexed: 07/15/2023] Open
Abstract
Data integration of single-cell RNA-seq (scRNA-seq) data describes the task of embedding datasets gathered from different sources or experiments into a common representation so that cells with similar types or states are embedded close to one another independently from their dataset of origin. Data integration is a crucial step in most scRNA-seq data analysis pipelines involving multiple batches. It improves data visualization, batch effect reduction, clustering, label transfer, and cell type inference. Many data integration tools have been proposed during the last decade, but a surge in the number of these methods has made it difficult to pick one for a given use case. Furthermore, these tools are provided as rigid pieces of software, making it hard to adapt them to various specific scenarios. In order to address both of these issues at once, we introduce the transmorph framework. It allows the user to engineer powerful data integration pipelines and is supported by a rich software ecosystem. We demonstrate transmorph usefulness by solving a variety of practical challenges on scRNA-seq datasets including joint datasets embedding, gene space integration, and transfer of cycle phase annotations. transmorph is provided as an open source python package.
Collapse
Affiliation(s)
- Aziz Fouché
- To whom correspondence should be addressed. Tel: +33 156246989;
| | - Loïc Chadoutaud
- Institut Curie, PSL Research University, 75005 Paris, France
- INSERM, 75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, 75005 Paris, France
| | - Olivier Delattre
- INSERM U830, Equipe Labellisée LNCC, SIREDO Oncology Centre, Institut Curie, 75005 Paris, France
| | - Andrei Zinovyev
- Correspondence may also be addressed to Andrei Zinovyev. Tel: +33 156246989;
| |
Collapse
|
15
|
Johansen N, Hu H, Quon G. Projecting RNA measurements onto single cell atlases to extract cell type-specific expression profiles using scProjection. Nat Commun 2023; 14:5192. [PMID: 37626024 PMCID: PMC10457395 DOI: 10.1038/s41467-023-40744-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open
Abstract
Multi-modal single cell RNA assays capture RNA content as well as other data modalities, such as spatial cell position or the electrophysiological properties of cells. Compared to dedicated scRNA-seq assays however, they may unintentionally capture RNA from multiple adjacent cells, exhibit lower RNA sequencing depth compared to scRNA-seq, or lack genome-wide RNA measurements. We present scProjection, a method for mapping individual multi-modal RNA measurements to deeply sequenced scRNA-seq atlases to extract cell type-specific, single cell gene expression profiles. We demonstrate several use cases of scProjection, including identifying spatial motifs from spatial transcriptome assays, distinguishing RNA contributions from neighboring cells in both spatial and multi-modal single cell assays, and imputing expression measurements of un-measured genes from gene markers. scProjection therefore combines the advantages of both multi-modal and scRNA-seq assays to yield precise multi-modal measurements of single cells.
Collapse
Affiliation(s)
- Nelson Johansen
- Graduate Group in Computer Science, University of California, Davis, Davis, CA, USA.
| | - Hongru Hu
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, Davis, CA, USA
| | - Gerald Quon
- Graduate Group in Computer Science, University of California, Davis, Davis, CA, USA.
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, Davis, CA, USA.
- Department of Molecular and Cellular Biology, University of California, Davis, Davis, CA, USA.
| |
Collapse
|
16
|
Fouché A, Zinovyev A. Omics data integration in computational biology viewed through the prism of machine learning paradigms. FRONTIERS IN BIOINFORMATICS 2023; 3:1191961. [PMID: 37600970 PMCID: PMC10436311 DOI: 10.3389/fbinf.2023.1191961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 07/26/2023] [Indexed: 08/22/2023] Open
Abstract
Important quantities of biological data can today be acquired to characterize cell types and states, from various sources and using a wide diversity of methods, providing scientists with more and more information to answer challenging biological questions. Unfortunately, working with this amount of data comes at the price of ever-increasing data complexity. This is caused by the multiplication of data types and batch effects, which hinders the joint usage of all available data within common analyses. Data integration describes a set of tasks geared towards embedding several datasets of different origins or modalities into a joint representation that can then be used to carry out downstream analyses. In the last decade, dozens of methods have been proposed to tackle the different facets of the data integration problem, relying on various paradigms. This review introduces the most common data types encountered in computational biology and provides systematic definitions of the data integration problems. We then present how machine learning innovations were leveraged to build effective data integration algorithms, that are widely used today by computational biologists. We discuss the current state of data integration and important pitfalls to consider when working with data integration tools. We eventually detail a set of challenges the field will have to overcome in the coming years.
Collapse
Affiliation(s)
- Aziz Fouché
- Institut Curie, PSL Research University, Paris, France
- Institut National de la Santé et de la Recherche Médicale, Paris, France
- CBIO-Centre for Computational Biology, ParisTech, PSL Research University, Paris, France
- Ecole Normale Supérieure Paris-Saclay, Cachan, France
| | | |
Collapse
|