1
|
Zhou S, Lin N, Yu L, Su X, Liu Z, Yu X, Gao H, Lin S, Zeng Y. Single-cell multi-omics in the study of digestive system cancers. Comput Struct Biotechnol J 2024; 23:431-445. [PMID: 38223343 PMCID: PMC10787224 DOI: 10.1016/j.csbj.2023.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 12/07/2023] [Accepted: 12/07/2023] [Indexed: 01/16/2024] Open
Abstract
Digestive system cancers are prevalent diseases with a high mortality rate, posing a significant threat to public health and economic burden. The diagnosis and treatment of digestive system cancer confront conventional cancer problems, such as tumor heterogeneity and drug resistance. Single-cell sequencing (SCS) emerged at times required and has developed from single-cell RNA-seq (scRNA-seq) to the single-cell multi-omics era represented by single-cell spatial transcriptomics (ST). This article comprehensively reviews the advances of single-cell omics technology in the study of digestive system tumors. While analyzing and summarizing the research cases, vital details on the sequencing platform, sample information, sampling method, and key findings are provided. Meanwhile, we summarize the commonly used SCS platforms and their features, as well as the advantages of multi-omics technologies in combination. Finally, the development trends and prospects of the application of single-cell multi-omics technology in digestive system cancer research are prospected.
Collapse
Affiliation(s)
- Shuang Zhou
- The Second Clinical Medical School of Fujian Medical University, Quanzhou, Fujian Province, China
- The Clinical Center of Molecular Diagnosis and Therapy, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
| | - Nanfei Lin
- The Clinical Center of Molecular Diagnosis and Therapy, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
| | - Liying Yu
- The Clinical Center of Molecular Diagnosis and Therapy, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
| | - Xiaoshan Su
- Department of Pulmonary and Critical Care Medicine, The Second Affiliated Hospital of Fujian Medical University, Respirology Medicine Centre of Fujian Province, Quanzhou, China
| | - Zhenlong Liu
- Lady Davis Institute for Medical Research, Jewish General Hospital, & Division of Experimental Medicine, Department of Medicine, McGill University, Montreal, QC, Canada
| | - Xiaowan Yu
- Clinical Laboratory, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
| | - Hongzhi Gao
- The Clinical Center of Molecular Diagnosis and Therapy, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
| | - Shu Lin
- Centre of Neurological and Metabolic Research, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
- Diabetes and Metabolism Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, NSW 2010, Australia
| | - Yiming Zeng
- Department of Pulmonary and Critical Care Medicine, The Second Affiliated Hospital of Fujian Medical University, Respirology Medicine Centre of Fujian Province, Quanzhou, China
- Fujian Provincial Key Laboratory of Lung Stem Cells, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
- Jinan Microecological Biomedicine Shandong Laboratory, Jinan, Shandong Province, China
| |
Collapse
|
2
|
He Z, Hu S, Chen Y, An S, Zhou J, Liu R, Shi J, Wang J, Dong G, Shi J, Zhao J, Ou-Yang L, Zhu Y, Bo X, Ying X. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol 2024; 42:1594-1605. [PMID: 38263515 PMCID: PMC11471558 DOI: 10.1038/s41587-023-02040-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 10/23/2023] [Indexed: 01/25/2024]
Abstract
Integrating single-cell datasets produced by multiple omics technologies is essential for defining cellular heterogeneity. Mosaic integration, in which different datasets share only some of the measured modalities, poses major challenges, particularly regarding modality alignment and batch effect removal. Here, we present a deep probabilistic framework for the mosaic integration and knowledge transfer (MIDAS) of single-cell multimodal data. MIDAS simultaneously achieves dimensionality reduction, imputation and batch correction of mosaic data by using self-supervised modality alignment and information-theoretic latent disentanglement. We demonstrate its superiority to 19 other methods and reliability by evaluating its performance in trimodal and mosaic integration tasks. We also constructed a single-cell trimodal atlas of human peripheral blood mononuclear cells and tailored transfer learning and reciprocal reference mapping schemes to enable flexible and accurate knowledge transfer from the atlas to new data. Applications in mosaic integration, pseudotime analysis and cross-tissue knowledge transfer on bone marrow mosaic datasets demonstrate the versatility and superiority of MIDAS. MIDAS is available at https://github.com/labomics/midas .
Collapse
Affiliation(s)
- Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiahao Zhou
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Junfeng Shi
- School of Automation, China University of Geosciences, Wuhan, China
| | - Jing Wang
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jinhui Shi
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiaxin Zhao
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China.
| |
Collapse
|
3
|
Hu Y, Wan S, Luo Y, Li Y, Wu T, Deng W, Jiang C, Jiang S, Zhang Y, Liu N, Yang Z, Chen F, Li B, Qu K. Benchmarking algorithms for single-cell multi-omics prediction and integration. Nat Methods 2024:10.1038/s41592-024-02429-w. [PMID: 39322753 DOI: 10.1038/s41592-024-02429-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 08/19/2024] [Indexed: 09/27/2024]
Abstract
The development of single-cell multi-omics technology has greatly enhanced our understanding of biology, and in parallel, numerous algorithms have been proposed to predict the protein abundance and/or chromatin accessibility of cells from single-cell transcriptomic information and to integrate various types of single-cell multi-omics data. However, few studies have systematically compared and evaluated the performance of these algorithms. Here, we present a benchmark study of 14 protein abundance/chromatin accessibility prediction algorithms and 18 single-cell multi-omics integration algorithms using 47 single-cell multi-omics datasets. Our benchmark study showed overall totalVI and scArches outperformed the other algorithms for predicting protein abundance, and LS_Lab was the top-performing algorithm for the prediction of chromatin accessibility in most cases. Seurat, MOJITOO and scAI emerge as leading algorithms for vertical integration, whereas totalVI and UINMF excel beyond their counterparts in both horizontal and mosaic integration scenarios. Additionally, we provide a pipeline to assist researchers in selecting the optimal multi-omics prediction and integration algorithm.
Collapse
Affiliation(s)
- Yinlei Hu
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- School of Mathematical Science, University of Science and Technology of China, Hefei, China
| | - Siyuan Wan
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China
| | - Yuanhanyu Luo
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China
- National Institute of Biological Sciences, Beijing, China
| | - Yuanzhe Li
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China
| | - Tong Wu
- National Institute of Biological Sciences, Beijing, China
- College of Life Sciences, Beijing Normal University, Beijing, China
| | - Wentao Deng
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Chen Jiang
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Shan Jiang
- National Institute of Biological Sciences, Beijing, China
| | - Yueping Zhang
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China
| | - Nianping Liu
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China
| | - Zongcheng Yang
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Falai Chen
- School of Mathematical Science, University of Science and Technology of China, Hefei, China.
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China.
| | - Bin Li
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China.
- National Institute of Biological Sciences, Beijing, China.
| | - Kun Qu
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China.
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China.
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China.
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China.
| |
Collapse
|
4
|
Chen R, Zhou J, Chen B. Imputing abundance of over 2,500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles. Cell Syst 2024; 15:869-884.e6. [PMID: 39243755 PMCID: PMC11423933 DOI: 10.1016/j.cels.2024.08.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 05/23/2024] [Accepted: 08/15/2024] [Indexed: 09/09/2024]
Abstract
Cell surface proteins serve as primary drug targets and cell identity markers. Techniques such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have enabled the simultaneous quantification of surface protein abundance and transcript expression within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability of these computational approaches across diverse contexts (e.g., different tissues/disease states) impede their widespread adoption. Here, we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA sequencing), a context-agnostic zero-shot deep ensemble model, which enables large-scale protein abundance prediction and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2,500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER, including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Ruoqiao Chen
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI 48824, USA
| | - Jiayu Zhou
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Bin Chen
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI 48824, USA; Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA.
| |
Collapse
|
5
|
Hwang JY, Kim Y, Na K, Kim DK, Lee S, Kang SS, Baek S, Yang SM, Kim MH, Han H, Jeong SS, Lee CY, Han YJ, Sohn JO, Ye SK, Pyo KH. Exploring the Expression and Function of T Cell Surface Markers Identified through Cellular Indexing of Transcriptomes and Epitopes by Sequencing. Yonsei Med J 2024; 65:544-555. [PMID: 39193763 PMCID: PMC11359606 DOI: 10.3349/ymj.2023.0639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 03/06/2024] [Accepted: 03/08/2024] [Indexed: 08/29/2024] Open
Abstract
PURPOSE By utilizing both protein and mRNA expression patterns, we can identify more detailed and diverse immune cells, providing insights into understanding the complex immune landscape in cancer ecosystems. MATERIALS AND METHODS This study was performed by obtaining publicly available Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) data of peripheral blood mononuclear cells (PBMCs) from the Gene Expression Omnibus database. A total of 94674 total cells were analyzed, of which 32412 were T cells. There were 228 protein features and 16262 mRNA features in the data. The Seurat package was used for quality control and preprocessing, principal component analysis was performed, and Uniform Manifold Approximation and Projection was used to visualize the clusters. Protein and mRNA levels in the CITE-seq were analyzed. RESULTS We observed that a subset of T cells in the clusters generated at the protein level divided better. By identifying mRNA markers that were highly correlated with the CD4 and CD8 proteins and cross-validating CD26 and CD99 markers using flow cytometry, we found that CD4+ and CD8+ T cells were better discriminated in PBMCs. Weighted Nearest Neighbor clustering results identified a previously unobserved T cell subset. CONCLUSION In this study, we used CITE-seq data to confirm that protein expression patterns could be used to identify cells more precisely. These findings will improve our understanding of the heterogeneity of immune cells in the future and provide valuable insights into the complexity of the immune response in health and disease.
Collapse
Affiliation(s)
- Joon Yeon Hwang
- Department of Research Support, Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Youngtaek Kim
- Department of Research Support, Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Kwangmin Na
- Department of Research Support, Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Dong Kwon Kim
- Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Korea
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Korea
| | - Seul Lee
- Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Korea
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Korea
| | - Seong-San Kang
- JEUK Institute for Cancer Research, JEUK Co., Ltd., Gumi, Korea
| | - Sujeong Baek
- Department of Research Support, Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Seung Min Yang
- Department of Research Support, Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Mi Hyun Kim
- Department of Research Support, Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Heekyung Han
- Department of Research Support, Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Seong Su Jeong
- Department of Research Support, Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Chai Young Lee
- Department of Research Support, Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Yu Jin Han
- Department of Research Support, Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Jie-Ohn Sohn
- Wide River Institute of Immunology, Seoul National University, Hongcheon, Korea
| | - Sang-Kyu Ye
- Wide River Institute of Immunology, Seoul National University, Hongcheon, Korea
- Department of Pharmacology and Biomedical Sciences, Seoul National University College of Medicine, Seoul, Korea
| | - Kyoung-Ho Pyo
- Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Korea
- Yonsei New Il Han Institute for Integrative Lung Cancer Research, Yonsei University College of Medicine, Seoul, Korea
- Division of Medical Oncology, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, Korea.
| |
Collapse
|
6
|
Olawade DB, Teke J, Fapohunda O, Weerasinghe K, Usman SO, Ige AO, Clement David-Olawade A. Leveraging artificial intelligence in vaccine development: A narrative review. J Microbiol Methods 2024; 224:106998. [PMID: 39019262 DOI: 10.1016/j.mimet.2024.106998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/12/2024] [Accepted: 07/12/2024] [Indexed: 07/19/2024]
Abstract
Vaccine development stands as a cornerstone of public health efforts, pivotal in curbing infectious diseases and reducing global morbidity and mortality. However, traditional vaccine development methods are often time-consuming, costly, and inefficient. The advent of artificial intelligence (AI) has ushered in a new era in vaccine design, offering unprecedented opportunities to expedite the process. This narrative review explores the role of AI in vaccine development, focusing on antigen selection, epitope prediction, adjuvant identification, and optimization strategies. AI algorithms, including machine learning and deep learning, leverage genomic data, protein structures, and immune system interactions to predict antigenic epitopes, assess immunogenicity, and prioritize antigens for experimentation. Furthermore, AI-driven approaches facilitate the rational design of immunogens and the identification of novel adjuvant candidates with optimal safety and efficacy profiles. Challenges such as data heterogeneity, model interpretability, and regulatory considerations must be addressed to realize the full potential of AI in vaccine development. Integrating emerging technologies, such as single-cell omics and synthetic biology, promises to enhance vaccine design precision and scalability. This review underscores the transformative impact of AI on vaccine development and highlights the need for interdisciplinary collaborations and regulatory harmonization to accelerate the delivery of safe and effective vaccines against infectious diseases.
Collapse
Affiliation(s)
- David B Olawade
- Department of Allied and Public Health, School of Health, Sport and Bioscience, University of East London, London, United Kingdom; Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, United Kingdom.
| | - Jennifer Teke
- Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, United Kingdom; Faculty of Medicine, Health and Social Care, Canterbury Christ Church University, United Kingdom
| | | | - Kusal Weerasinghe
- Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, United Kingdom
| | - Sunday O Usman
- Department of Systems and Industrial Engineering, University of Arizona, USA
| | - Abimbola O Ige
- Department of Chemistry, Faculty of Science, University of Ibadan, Ibadan, Nigeria
| | | |
Collapse
|
7
|
Bashore AC, Xue C, Kim E, Yan H, Zhu LY, Pan H, Kissner M, Ross LS, Zhang H, Li M, Reilly MP. Monocyte Single-Cell Multimodal Profiling in Cardiovascular Disease Risk States. Circ Res 2024; 135:685-700. [PMID: 39105287 PMCID: PMC11430373 DOI: 10.1161/circresaha.124.324457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 07/11/2024] [Accepted: 07/28/2024] [Indexed: 08/07/2024]
Abstract
BACKGROUND Monocytes are a critical innate immune system cell type that serves homeostatic and immunoregulatory functions. They have been identified historically by the cell surface expression of CD14 and CD16. However, recent single-cell studies have revealed that they are much more heterogeneous than previously realized. METHODS We utilized cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) and single-cell RNA sequencing to describe the comprehensive transcriptional and phenotypic landscape of 437 126 monocytes. RESULTS This high-dimensional multimodal approach identified vast phenotypic diversity and functionally distinct subsets, including IFN-responsive, MHCIIhi (major histocompatibility complex class II), monocyte-platelet aggregates, as well as nonclassical, and several subpopulations of classical monocytes. Using flow cytometry, we validated the existence of MHCII+CD275+ MHCIIhi, CD42b+ monocyte-platelet aggregates, CD16+CD99- nonclassical monocytes, and CD99+ classical monocytes. Each subpopulation exhibited unique characteristics, developmental trajectories, transcriptional regulation, and tissue distribution. In addition, alterations associated with cardiovascular disease risk factors, including race, smoking, and hyperlipidemia were identified. Moreover, the effect of hyperlipidemia was recapitulated in mouse models of elevated cholesterol. CONCLUSIONS This integrative and cross-species comparative analysis provides a new perspective on the comparison of alterations in monocytes in pathological conditions and offers insights into monocyte-driven mechanisms in cardiovascular disease and the potential for monocyte subpopulation targeted therapies.
Collapse
Affiliation(s)
- Alexander C Bashore
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.)
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.), Columbia University Irving Medical Center, New York
| | - Chenyi Xue
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.)
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.), Columbia University Irving Medical Center, New York
| | - Eunyoung Kim
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.)
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.), Columbia University Irving Medical Center, New York
| | - Hanying Yan
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia (H.Y., M.L.)
| | - Lucie Y Zhu
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.)
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.), Columbia University Irving Medical Center, New York
| | - Huize Pan
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN (H.P.)
| | - Michael Kissner
- Columbia Stem Cell Initiative, Department of Genetics and Development (M.K.), Columbia University Irving Medical Center, New York
| | - Leila S Ross
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.)
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.), Columbia University Irving Medical Center, New York
| | - Hanrui Zhang
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.)
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.), Columbia University Irving Medical Center, New York
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia (H.Y., M.L.)
| | - Muredach P Reilly
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.)
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine (A.C.B., C.X., E.K., L.Y.Z., L.S.R., H.Z., M.P.R.), Columbia University Irving Medical Center, New York
- Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York (M.P.R.)
| |
Collapse
|
8
|
Jeong Y, Ronen J, Kopp W, Lutsik P, Akalin A. scMaui: a widely applicable deep learning framework for single-cell multiomics integration in the presence of batch effects and missing data. BMC Bioinformatics 2024; 25:257. [PMID: 39107690 PMCID: PMC11304929 DOI: 10.1186/s12859-024-05880-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 07/23/2024] [Indexed: 08/10/2024] Open
Abstract
The recent advances in high-throughput single-cell sequencing have created an urgent demand for computational models which can address the high complexity of single-cell multiomics data. Meticulous single-cell multiomics integration models are required to avoid biases towards a specific modality and overcome sparsity. Batch effects obfuscating biological signals must also be taken into account. Here, we introduce a new single-cell multiomics integration model, Single-cell Multiomics Autoencoder Integration (scMaui) based on variational product-of-experts autoencoders and adversarial learning. scMaui calculates a joint representation of multiple marginal distributions based on a product-of-experts approach which is especially effective for missing values in the modalities. Furthermore, it overcomes limitations seen in previous VAE-based integration methods with regard to batch effect correction and restricted applicable assays. It handles multiple batch effects independently accepting both discrete and continuous values, as well as provides varied reconstruction loss functions to cover all possible assays and preprocessing pipelines. We demonstrate that scMaui achieves superior performance in many tasks compared to other methods. Further downstream analyses also demonstrate its potential in identifying relations between assays and discovering hidden subpopulations.
Collapse
Affiliation(s)
- Yunhee Jeong
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, Germany
- Faculty of Mathematics and Informatics, Heidelberg University, Im Neuenheimer Feld 205, Heidelberg, Germany
| | - Jonathan Ronen
- Bioinformatics and Omics Data Science Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany
- Inceptive Nucleics, Inc., Palo Alto, CA, USA
| | - Wolfgang Kopp
- Bioinformatics and Omics Data Science Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany
- Roche Diagnostics GmbH, Penzberg, Germany
| | - Pavlo Lutsik
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, Germany.
- Department of Oncology, Catholic University (KU) Leuven, Leuven, Belgium.
| | - Altuna Akalin
- Bioinformatics and Omics Data Science Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany.
| |
Collapse
|
9
|
Chen R, Zhou J, Chen B. Imputing abundance of over 2500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.31.605432. [PMID: 39131290 PMCID: PMC11312525 DOI: 10.1101/2024.07.31.605432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Cell surface proteins serve as primary drug targets and cell identity markers. The emergence of techniques like CITE-seq has enabled simultaneous quantification of surface protein abundance and transcript expression for multimodal data analysis within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance based solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability for these computational approaches across diverse contexts, such as different tissues or disease states, impede their widespread adoption. Here we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA-seq), a context-agnostic zero-shot deep ensemble model, which enables the large-scale prediction of cell surface protein abundance and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer.
Collapse
Affiliation(s)
- Ruoqiao Chen
- Department of Pharmacology and Toxicology, Michigan State University, MI, USA
| | - Jiayu Zhou
- Department of Computer Science and Engineering, Michigan State University, MI, USA
| | - Bin Chen
- Department of Pharmacology and Toxicology, Michigan State University, MI, USA
- Department of Computer Science and Engineering, Michigan State University, MI, USA
- Department of Pediatrics and Human Development, Michigan State University, MI, USA
| |
Collapse
|
10
|
Zheng Y, Caron DP, Kim JY, Jun SH, Tian Y, Florian M, Stuart KD, Sims PA, Gottardo R. ADTnorm: Robust Integration of Single-cell Protein Measurement across CITE-seq Datasets. RESEARCH SQUARE 2024:rs.3.rs-4572811. [PMID: 39041028 PMCID: PMC11261982 DOI: 10.21203/rs.3.rs-4572811/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
CITE-seq enables paired measurement of surface protein and mRNA expression in single cells using antibodies conjugated to oligonucleotide tags. Due to the high copy number of surface protein molecules, sequencing antibody-derived tags (ADTs) allows for robust protein detection, improving cell-type identification. However, variability in antibody staining leads to batch effects in the ADT expression, obscuring biological variation, reducing interpretability, and obstructing cross-study analyses. Here, we present ADTnorm (https://github.com/yezhengSTAT/ADTnorm), a normalization and integration method designed explicitly for ADT abundance. Benchmarking against 14 existing scaling and normalization methods, we show that ADTnorm accurately aligns populations with negative- and positive-expression of surface protein markers across 13 public datasets, effectively removing technical variation across batches and improving cell-type separation. ADTnorm enables efficient integration of public CITE-seq datasets, each with unique experimental designs, paving the way for atlas-level analyses. Beyond normalization, ADTnorm includes built-in utilities to aid in automated threshold-gating as well as assessment of antibody staining quality for titration optimization and antibody panel selection. Applying ADTnorm to a published COVID-19 CITE-seq dataset allowed for identifying previously undetected disease-associated markers, illustrating a broad utility in biological applications.
Collapse
Affiliation(s)
- Ye Zheng
- Basic Science Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Daniel P. Caron
- Department of Microbiology and Immunology, Columbia University, New York, NY 10032, USA
| | - Ju Yeong Kim
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Seong-Hwan Jun
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Yuan Tian
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Mair Florian
- Department of Biology, ETH Zürich, Zürich 8093, Switzerland
| | - Kenneth D. Stuart
- Center for Global Infectious Disease Research, Seattle Children’s Research Institute, Seattle, WA, United States
| | - Peter A. Sims
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Raphael Gottardo
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
- Biomedical Data Science Center, Lausanne University Hospital and University of Lausanne, Lausanne 1005, Switzerland
| |
Collapse
|
11
|
Yu H, Zheng Y, Yang X. scDM: A deep generative method for cell surface protein prediction with diffusion model. J Mol Biol 2024; 436:168610. [PMID: 38754773 DOI: 10.1016/j.jmb.2024.168610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/06/2024] [Accepted: 05/09/2024] [Indexed: 05/18/2024]
Abstract
The executors of organismal functions are proteins, and the transition from RNA to protein is subject to post-transcriptional regulation; therefore, considering both RNA and surface protein expression simultaneously can provide additional evidence of biological processes. Cellular indexing of transcriptomes and epitopes by sequencing (CITE-Seq) technology can measure both RNA and protein expression in single cells, but these experiments are expensive and time-consuming. Due to the lack of computational tools for predicting surface proteins, we used datasets obtained with CITE-seq technology to design a deep generative prediction method based on diffusion models and to find biological discoveries through the prediction results. In our method, the scDM, which predicts protein expression values from RNA expression values of individual cells, uses a novel way of encoding the data into a model and generates predicted samples by introducing Gaussian noise to gradually remove the noise to learn the data distribution during the modelling process. Comprehensive evaluation across different datasets demonstrated that our predictions yielded satisfactory results and further demonstrated the effectiveness of incorporating information from single-cell multiomics data into diffusion models for biological studies. We also found that new directions for discovering therapeutic drug targets could be provided by jointly analysing the predictive value of surface protein expression and cancer cell drug scores.
Collapse
Affiliation(s)
- Hanlei Yu
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| | - Yuanjie Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China.
| | - Xinbo Yang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| |
Collapse
|
12
|
Gardner AL, Jost TA, Brock A. Computational identification of surface markers for isolating distinct subpopulations from heterogeneous cancer cell populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596337. [PMID: 38854060 PMCID: PMC11160629 DOI: 10.1101/2024.05.28.596337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Intratumor heterogeneity reduces treatment efficacy and complicates our understanding of tumor progression. There is a pressing need to understand the functions of heterogeneous tumor cell subpopulations within a tumor, yet biological systems to study these processes in vitro are limited. With the advent of single-cell RNA sequencing (scRNA-seq), it has become clear that some cancer cell line models include distinct subpopulations. Heterogeneous cell lines offer a unique opportunity to study the dynamics and evolution of genetically similar cancer cell subpopulations in controlled experimental settings. Here, we present clusterCleaver, a computational package that uses metrics of statistical distance to identify candidate surface markers maximally unique to transcriptomic subpopulations in scRNA-seq which may be used for FACS isolation. clusterCleaver was experimentally validated using the MDA-MB-231 and MDA-MB-436 breast cancer cell lines. ESAM and BST2/tetherin were experimentally confirmed as surface markers which identify and separate major transcriptomic subpopulations within MDA-MB-231 and MDA-MB-436 cells, respectively. clusterCleaver is a computationally efficient and experimentally validated workflow for identification and enrichment of distinct subpopulations within cell lines which paves the way for studies on the coexistence of cancer cell subpopulations in well-defined in vitro systems.
Collapse
Affiliation(s)
- Andrea L. Gardner
- Department of Biomedical Engineering, The University of Texas at Austin
| | - Tyler A. Jost
- Department of Biomedical Engineering, The University of Texas at Austin
| | - Amy Brock
- Department of Biomedical Engineering, The University of Texas at Austin
| |
Collapse
|
13
|
Xu J, Huang D, Zhang X. scmFormer Integrates Large-Scale Single-Cell Proteomics and Transcriptomics Data by Multi-Task Transformer. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307835. [PMID: 38483032 PMCID: PMC11109621 DOI: 10.1002/advs.202307835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/24/2024] [Indexed: 05/23/2024]
Abstract
Transformer-based models have revolutionized single cell RNA-seq (scRNA-seq) data analysis. However, their applicability is challenged by the complexity and scale of single-cell multi-omics data. Here a novel single-cell multi-modal/multi-task transformer (scmFormer) is proposed to fill up the existing blank of integrating single-cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large-scale single-cell multimodal data and heterogeneous multi-batch paired multi-omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell-type labels from single-cell transcriptomics to proteomics data. Using COVID-19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well-suited for spatial multi-omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single-cell multi-omics data.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- University of Chinese Academy of SciencesBeijing100049China
| | - De‐Shuang Huang
- Eastern Institute for Advanced StudyEastern Institute of TechnologyNingbo315200China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- Center of Economic BotanyCore Botanical GardensChinese Academy of SciencesWuhan430074China
| |
Collapse
|
14
|
Cao Y, Zhao X, Tang S, Jiang Q, Li S, Li S, Chen S. scButterfly: a versatile single-cell cross-modality translation method via dual-aligned variational autoencoders. Nat Commun 2024; 15:2973. [PMID: 38582890 PMCID: PMC10998864 DOI: 10.1038/s41467-024-47418-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 03/28/2024] [Indexed: 04/08/2024] Open
Abstract
Recent advancements for simultaneously profiling multi-omics modalities within individual cells have enabled the interrogation of cellular heterogeneity and molecular hierarchy. However, technical limitations lead to highly noisy multi-modal data and substantial costs. Although computational methods have been proposed to translate single-cell data across modalities, broad applications of the methods still remain impeded by formidable challenges. Here, we propose scButterfly, a versatile single-cell cross-modality translation method based on dual-aligned variational autoencoders and data augmentation schemes. With comprehensive experiments on multiple datasets, we provide compelling evidence of scButterfly's superiority over baseline methods in preserving cellular heterogeneity while translating datasets of various contexts and in revealing cell type-specific biological insights. Besides, we demonstrate the extensive applications of scButterfly for integrative multi-omics analysis of single-modality data, data enhancement of poor-quality single-cell multi-omics, and automatic cell type annotation of scATAC-seq data. Moreover, scButterfly can be generalized to unpaired data training, perturbation-response analysis, and consecutive translation.
Collapse
Affiliation(s)
- Yichuan Cao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xiamiao Zhao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Qun Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
15
|
Bashore AC, Yan H, Xue C, Zhu LY, Kim E, Mawson T, Coronel J, Chung A, Sachs N, Ho S, Ross LS, Kissner M, Passegué E, Bauer RC, Maegdefessel L, Li M, Reilly MP. High-Dimensional Single-Cell Multimodal Landscape of Human Carotid Atherosclerosis. Arterioscler Thromb Vasc Biol 2024; 44:930-945. [PMID: 38385291 PMCID: PMC10978277 DOI: 10.1161/atvbaha.123.320524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Accepted: 02/06/2024] [Indexed: 02/23/2024]
Abstract
BACKGROUND Atherosclerotic plaques are complex tissues composed of a heterogeneous mixture of cells. However, our understanding of the comprehensive transcriptional and phenotypic landscape of the cells within these lesions is limited. METHODS To characterize the landscape of human carotid atherosclerosis in greater detail, we combined cellular indexing of transcriptomes and epitopes by sequencing and single-cell RNA sequencing to classify all cell types within lesions (n=21; 13 symptomatic) to achieve a comprehensive multimodal understanding of the cellular identities of atherosclerosis and their association with clinical pathophysiology. RESULTS We identified 25 cell populations, each with a unique multiomic signature, including macrophages, T cells, NK (natural killer) cells, mast cells, B cells, plasma cells, neutrophils, dendritic cells, endothelial cells, fibroblasts, and smooth muscle cells (SMCs). Among the macrophages, we identified 2 proinflammatory subsets enriched in IL-1B (interleukin-1B) or C1Q expression, 2 TREM2-positive foam cells (1 expressing inflammatory genes), and subpopulations with a proliferative gene signature and SMC-specific gene signature with fibrotic pathways upregulated. Further characterization revealed various subsets of SMCs and fibroblasts, including SMC-derived foam cells. These foamy SMCs were localized in the deep intima of coronary atherosclerotic lesions. Utilizing cellular indexing of transcriptomes and epitopes by sequencing data, we developed a flow cytometry panel, using cell surface proteins CD29, CD142, and CD90, to isolate SMC-derived cells from lesions. Lastly, we observed reduced proportions of efferocytotic macrophages, classically activated endothelial cells, and contractile and modulated SMC-derived cells, while inflammatory SMCs were enriched in plaques of clinically symptomatic versus asymptomatic patients. CONCLUSIONS Our multimodal atlas of cell populations within atherosclerosis provides novel insights into the diversity, phenotype, location, isolation, and clinical relevance of the unique cellular composition of human carotid atherosclerosis. These findings facilitate both the mapping of cardiovascular disease susceptibility loci to specific cell types and the identification of novel molecular and cellular therapeutic targets for the treatment of the disease.
Collapse
Affiliation(s)
- Alexander C Bashore
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
| | - Hanying Yan
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia (H.Y., M.L.)
| | - Chenyi Xue
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
| | - Lucie Y Zhu
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
| | - Eunyoung Kim
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
| | - Thomas Mawson
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
| | - Johana Coronel
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
| | - Allen Chung
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
| | - Nadja Sachs
- Department of Vascular and Endovascular Surgery, Technical University Munich, Germany (N.S., L.M.)
| | - Sebastian Ho
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
| | - Leila S Ross
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
| | - Michael Kissner
- Columbia Stem Cell Initiative, Department of Genetics and Development (M.K., E.P.), Columbia University Irving Medical Center, New York, NY
| | - Emmanuelle Passegué
- Columbia Stem Cell Initiative, Department of Genetics and Development (M.K., E.P.), Columbia University Irving Medical Center, New York, NY
| | - Robert C Bauer
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
| | - Lars Maegdefessel
- Department of Vascular and Endovascular Surgery, Technical University Munich, Germany (N.S., L.M.)
- German Center for Cardiovascular Research, Partner Site Munich Heart Alliance (L.M.)
- Department of Medicine, Karolinksa Institute, Stockholm, Sweden (L.M.)
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia (H.Y., M.L.)
| | - Muredach P Reilly
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY (A.C.B., C.X., L.Y.Z., E.K., T.M., J.C., A.C., S.H., L.S.R., R.C.B., M.P.R.)
- Irving Institute for Clinical and Translational Research (M.P.R.), Columbia University Irving Medical Center, New York, NY
| |
Collapse
|
16
|
Hanhart D, Gossi F, Rapsomaniki MA, Kruithof-de Julio M, Chouvardas P. ScLinear predicts protein abundance at single-cell resolution. Commun Biol 2024; 7:267. [PMID: 38438709 PMCID: PMC10912329 DOI: 10.1038/s42003-024-05958-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/22/2024] [Indexed: 03/06/2024] Open
Abstract
Single-cell multi-omics have transformed biomedical research and present exciting machine learning opportunities. We present scLinear, a linear regression-based approach that predicts single-cell protein abundance based on RNA expression. ScLinear is vastly more efficient than state-of-the-art methodologies, without compromising its accuracy. ScLinear is interpretable and accurately generalizes in unseen single-cell and spatial transcriptomics data. Importantly, we offer a critical view in using complex algorithms ignoring simpler, faster, and more efficient approaches.
Collapse
Affiliation(s)
- Daniel Hanhart
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, 3008, Bern, Switzerland
| | - Federico Gossi
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, 3008, Bern, Switzerland
| | | | - Marianna Kruithof-de Julio
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, 3008, Bern, Switzerland
- Department of Urology, Inselspital, Bern University Hospital, University of Bern, 3010, Bern, Switzerland
| | - Panagiotis Chouvardas
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, 3008, Bern, Switzerland.
- Department of Urology, Inselspital, Bern University Hospital, University of Bern, 3010, Bern, Switzerland.
| |
Collapse
|
17
|
Emami N, Ferdousi R. HormoNet: a deep learning approach for hormone-drug interaction prediction. BMC Bioinformatics 2024; 25:87. [PMID: 38418979 PMCID: PMC10903040 DOI: 10.1186/s12859-024-05708-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 02/16/2024] [Indexed: 03/02/2024] Open
Abstract
Several experimental evidences have shown that the human endogenous hormones can interact with drugs in many ways and affect drug efficacy. The hormone drug interactions (HDI) are essential for drug treatment and precision medicine; therefore, it is essential to understand the hormone-drug associations. Here, we present HormoNet to predict the HDI pairs and their risk level by integrating features derived from hormone and drug target proteins. To the best of our knowledge, this is one of the first attempts to employ deep learning approach for prediction of HDI prediction. Amino acid composition and pseudo amino acid composition were applied to represent target information using 30 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied synthetic minority over-sampling technique technique. Additionally, we constructed novel datasets for HDI prediction and the risk level of their interaction. HormoNet achieved high performance on our constructed hormone-drug benchmark datasets. The results provide insights into the understanding of the relationship between hormone and a drug, and indicate the potential benefit of reducing risk levels of interactions in designing more effective therapies for patients in drug treatments. Our benchmark datasets and the source codes for HormoNet are available in: https://github.com/EmamiNeda/HormoNet .
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
18
|
Shen X, Li X. Deep-learning methods for unveiling large-scale single-cell transcriptomes. Cancer Biol Med 2024; 20:j.issn.2095-3941.2023.0436. [PMID: 38318925 PMCID: PMC10845931 DOI: 10.20892/j.issn.2095-3941.2023.0436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 12/20/2023] [Indexed: 02/07/2024] Open
Affiliation(s)
- Xilin Shen
- Tianjin Cancer Institute, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute & Hospital, Tianjin Medical University, Tianjin 300060, China
| | - Xiangchun Li
- Tianjin Cancer Institute, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute & Hospital, Tianjin Medical University, Tianjin 300060, China
| |
Collapse
|
19
|
Wang X, Wu X, Hong N, Jin W. Progress in single-cell multimodal sequencing and multi-omics data integration. Biophys Rev 2024; 16:13-28. [PMID: 38495443 PMCID: PMC10937857 DOI: 10.1007/s12551-023-01092-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 06/27/2023] [Indexed: 03/19/2024] Open
Abstract
With the rapid advance of single-cell sequencing technology, cell heterogeneity in various biological processes was dissected at different omics levels. However, single-cell mono-omics results in fragmentation of information and could not provide complete cell states. In the past several years, a variety of single-cell multimodal omics technologies have been developed to jointly profile multiple molecular modalities, including genome, transcriptome, epigenome, and proteome, from the same single cell. With the availability of single-cell multimodal omics data, we can simultaneously investigate the effects of genomic mutation or epigenetic modification on transcription and translation, and reveal the potential mechanisms underlying disease pathogenesis. Driven by the massive single-cell omics data, the integration method of single-cell multi-omics data has rapidly developed. Integration of the massive multi-omics single-cell data in public databases in the future will make it possible to construct a cell atlas of multi-omics, enabling us to comprehensively understand cell state and gene regulation at single-cell resolution. In this review, we summarized the experimental methods for single-cell multimodal omics data and computational methods for multi-omics data integration. We also discussed the future development of this field.
Collapse
Affiliation(s)
- Xuefei Wang
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Xinchao Wu
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Ni Hong
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Wenfei Jin
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
20
|
Wang L, Nie R, Miao X, Cai Y, Wang A, Zhang H, Zhang J, Cai J. InClust+: the deep generative framework with mask modules for multimodal data integration, imputation, and cross-modal generation. BMC Bioinformatics 2024; 25:41. [PMID: 38267858 PMCID: PMC10809631 DOI: 10.1186/s12859-024-05656-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 01/15/2024] [Indexed: 01/26/2024] Open
Abstract
BACKGROUND With the development of single-cell technology, many cell traits can be measured. Furthermore, the multi-omics profiling technology could jointly measure two or more traits in a single cell simultaneously. In order to process the various data accumulated rapidly, computational methods for multimodal data integration are needed. RESULTS Here, we present inClust+, a deep generative framework for the multi-omics. It's built on previous inClust that is specific for transcriptome data, and augmented with two mask modules designed for multimodal data processing: an input-mask module in front of the encoder and an output-mask module behind the decoder. InClust+ was first used to integrate scRNA-seq and MERFISH data from similar cell populations, and to impute MERFISH data based on scRNA-seq data. Then, inClust+ was shown to have the capability to integrate the multimodal data (e.g. tri-modal data with gene expression, chromatin accessibility and protein abundance) with batch effect. Finally, inClust+ was used to integrate an unlabeled monomodal scRNA-seq dataset and two labeled multimodal CITE-seq datasets, transfer labels from CITE-seq datasets to scRNA-seq dataset, and generate the missing modality of protein abundance in monomodal scRNA-seq data. In the above examples, the performance of inClust+ is better than or comparable to the most recent tools in the corresponding task. CONCLUSIONS The inClust+ is a suitable framework for handling multimodal data. Meanwhile, the successful implementation of mask in inClust+ means that it can be applied to other deep learning methods with similar encoder-decoder architecture to broaden the application scope of these models.
Collapse
Affiliation(s)
- Lifei Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China.
| | - Rui Nie
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuexia Miao
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yankai Cai
- School of Economic and Management, China University of Geoscience, Wuhan, China
| | - Anqi Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Hanwen Zhang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Jiang Zhang
- School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| | - Jun Cai
- China National Center for Bioinformation, Beijing, China.
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
21
|
Zhou S, Li Y, Wu W, Li L. scMMT: a multi-use deep learning approach for cell annotation, protein prediction and embedding in single-cell RNA-seq data. Brief Bioinform 2024; 25:bbad523. [PMID: 38300515 PMCID: PMC10833085 DOI: 10.1093/bib/bbad523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 12/19/2023] [Indexed: 02/02/2024] Open
Abstract
Accurate cell type annotation in single-cell RNA-sequencing data is essential for advancing biological and medical research, particularly in understanding disease progression and tumor microenvironments. However, existing methods are constrained by single feature extraction approaches, lack of adaptability to immune cell types with similar molecular profiles but distinct functions and a failure to account for the impact of cell label noise on model accuracy, all of which compromise the precision of annotation. To address these challenges, we developed a supervised approach called scMMT. We proposed a novel feature extraction technique to uncover more valuable information. Additionally, we constructed a multi-task learning framework based on the GradNorm method to enhance the recognition of challenging immune cells and reduce the impact of label noise by facilitating mutual reinforcement between cell type annotation and protein prediction tasks. Furthermore, we introduced logarithmic weighting and label smoothing mechanisms to enhance the recognition ability of rare cell types and prevent model overconfidence. Through comprehensive evaluations on multiple public datasets, scMMT has demonstrated state-of-the-art performance in various aspects including cell type annotation, rare cell identification, dropout and label noise resistance, protein expression prediction and low-dimensional embedding representation.
Collapse
Affiliation(s)
- Songqi Zhou
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| | - Yang Li
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
- Chongqing Research Institute of Big Data, Peking University, Chongqing, China
| | - Wenyuan Wu
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| | - Li Li
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| |
Collapse
|
22
|
Mullan KA, de Vrij N, Valkiers S, Meysman P. Current annotation strategies for T cell phenotyping of single-cell RNA-seq data. Front Immunol 2023; 14:1306169. [PMID: 38187377 PMCID: PMC10768068 DOI: 10.3389/fimmu.2023.1306169] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a popular technique for interrogating the diversity and dynamic nature of cellular gene expression and has numerous advantages in immunology. For example, scRNA-seq, in contrast to bulk RNA sequencing, can discern cellular subtypes within a population, which is important for heterogenous populations such as T cells. Moreover, recent advancements in the technology allow the parallel capturing of the highly diverse T-cell receptor (TCR) sequence with the gene expression. However, the field of single-cell RNA sequencing data analysis is still hampered by a lack of gold-standard cell phenotype annotation. This problem is particularly evident in the case of T cells due to the heterogeneity in both their gene expression and their TCR. While current cell phenotype annotation tools can differentiate major cell populations from each other, labelling T-cell subtypes remains problematic. In this review, we identify the common automated strategy for annotating T cells and their subpopulations, and also describe what crucial information is still missing from these tools.
Collapse
Affiliation(s)
- Kerry A. Mullan
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS) Consortium, University of Antwerp, Antwerp, Belgium
| | - Nicky de Vrij
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS) Consortium, University of Antwerp, Antwerp, Belgium
- Clinical Immunology Unit, Department of Clinical Sciences, Institute for Tropical Medicine, Antwerp, Belgium
| | - Sebastiaan Valkiers
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS) Consortium, University of Antwerp, Antwerp, Belgium
| | - Pieter Meysman
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS) Consortium, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
23
|
Yang Y, Lin YT, Li G, Zhong Y, Xu Q, Cai JJ. Interpretable modeling of time-resolved single-cell gene-protein expression with CrossmodalNet. Brief Bioinform 2023; 24:bbad342. [PMID: 37798250 DOI: 10.1093/bib/bbad342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 08/15/2023] [Accepted: 09/07/2023] [Indexed: 10/07/2023] Open
Abstract
Cell-surface proteins play a critical role in cell function and are primary targets for therapeutics. CITE-seq is a single-cell technique that enables simultaneous measurement of gene and surface protein expression. It is powerful but costly and technically challenging. Computational methods have been developed to predict surface protein expression using gene expression information such as from single-cell RNA sequencing (scRNA-seq) data. Existing methods however are computationally demanding and lack the interpretability to reveal underlying biological processes. We propose CrossmodalNet, an interpretable machine learning model, to predict surface protein expression from scRNA-seq data. Our model with a customized adaptive loss accurately predicts surface protein abundances. When samples from multiple time points are given, our model encodes temporal information into an easy-to-interpret time embedding to make prediction in a time-point-specific manner, and is able to uncover noise-free causal gene-protein relationships. Using three publicly available time-resolved CITE-seq data sets, we validate the performance of our model by comparing it with benchmarking methods and evaluate its interpretability. Together, we show that our method accurately and interpretably profiles surface protein expression using scRNA-seq data, thereby expanding the capacity of CITE-seq experiments for investigating molecular mechanisms involving surface proteins.
Collapse
Affiliation(s)
- Yongjian Yang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Yu-Te Lin
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Guanxun Li
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Yan Zhong
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Qian Xu
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| | - James J Cai
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| |
Collapse
|
24
|
Erfanian N, Heydari AA, Feriz AM, Iañez P, Derakhshani A, Ghasemigol M, Farahpour M, Razavi SM, Nasseri S, Safarpour H, Sahebkar A. Deep learning applications in single-cell genomics and transcriptomics data analysis. Biomed Pharmacother 2023; 165:115077. [PMID: 37393865 DOI: 10.1016/j.biopha.2023.115077] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 06/22/2023] [Accepted: 06/23/2023] [Indexed: 07/04/2023] Open
Abstract
Traditional bulk sequencing methods are limited to measuring the average signal in a group of cells, potentially masking heterogeneity, and rare populations. The single-cell resolution, however, enhances our understanding of complex biological systems and diseases, such as cancer, the immune system, and chronic diseases. However, the single-cell technologies generate massive amounts of data that are often high-dimensional, sparse, and complex, thus making analysis with traditional computational approaches difficult and unfeasible. To tackle these challenges, many are turning to deep learning (DL) methods as potential alternatives to the conventional machine learning (ML) algorithms for single-cell studies. DL is a branch of ML capable of extracting high-level features from raw inputs in multiple stages. Compared to traditional ML, DL models have provided significant improvements across many domains and applications. In this work, we examine DL applications in genomics, transcriptomics, spatial transcriptomics, and multi-omics integration, and address whether DL techniques will prove to be advantageous or if the single-cell omics domain poses unique challenges. Through a systematic literature review, we have found that DL has not yet revolutionized the most pressing challenges of the single-cell omics field. However, using DL models for single-cell omics has shown promising results (in many cases outperforming the previous state-of-the-art models) in data preprocessing and downstream analysis. Although developments of DL algorithms for single-cell omics have generally been gradual, recent advances reveal that DL can offer valuable resources in fast-tracking and advancing research in single-cell.
Collapse
Affiliation(s)
- Nafiseh Erfanian
- Student Research Committee, Birjand University of Medical Sciences, Birjand, Iran
| | - A Ali Heydari
- Department of Applied Mathematics, University of California, Merced, CA, USA; Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Adib Miraki Feriz
- Student Research Committee, Birjand University of Medical Sciences, Birjand, Iran
| | - Pablo Iañez
- Cellular Systems Genomics Group, Josep Carreras Research Institute, Barcelona, Spain
| | - Afshin Derakhshani
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada
| | | | - Mohsen Farahpour
- Department of Electronics, Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Seyyed Mohammad Razavi
- Department of Electronics, Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Saeed Nasseri
- Cellular and Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | - Hossein Safarpour
- Cellular and Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran.
| | - Amirhossein Sahebkar
- Biotechnology Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Applied Biomedical Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Department of Biotechnology, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
25
|
Wang L, Nie R, Zhang Z, Gu W, Wang S, Wang A, Zhang J, Cai J. A deep generative framework with embedded vector arithmetic and classifier for sample generation, label transfer, and clustering of single-cell data. CELL REPORTS METHODS 2023; 3:100558. [PMID: 37671019 PMCID: PMC10475846 DOI: 10.1016/j.crmeth.2023.100558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/31/2023] [Accepted: 07/20/2023] [Indexed: 09/07/2023]
Abstract
Multiple-source single-cell datasets have accumulated quickly and need computational methods to integrate and decompose into meaningful components. Here, we present inClust (integrated clustering), a flexible deep generative framework that enables embedding auxiliary information, latent space vector arithmetic, and clustering. All functional parts are relatively modular, independent in implementation but interrelated at runtime, resulting in an all-in general framework that could work in supervised, semi-supervised, or unsupervised mode. We show that inClust is superior to most data integration methods in benchmark datasets. Then, we demonstrate the capability of inClust in the tasks of conditional out-of-distribution generation in supervised mode, label transfer in semi-supervised mode, and spatial domain identification in unsupervised mode. In these examples, inClust could accurately express the effect of each covariate, distinguish the query-specific cell types, or segment spatial domains. The results support that inClust is an excellent general framework for multiple-task harmonization and data decomposition.
Collapse
Affiliation(s)
- Lifei Wang
- Shulan (Hangzhou) Hospital Affiliated with Shulan International Medical College, Zhejiang Shuren University, Hangzhou, Zhejiang 310015, China
| | - Rui Nie
- China National Center for Bioinformation, Beijing 100101, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of the Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- School of Systems Science, Beijing Normal University, Beijing 100875, China
| | - Weiwei Gu
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Shuo Wang
- School of Systems Science, Beijing Normal University, Beijing 100875, China
- Computer Engineering and Networks Lab, ETH Zurich, 8092 Zurich, Switzerland
| | - Anqi Wang
- Shulan (Hangzhou) Hospital Affiliated with Shulan International Medical College, Zhejiang Shuren University, Hangzhou, Zhejiang 310015, China
| | - Jiang Zhang
- School of Systems Science, Beijing Normal University, Beijing 100875, China
| | - Jun Cai
- China National Center for Bioinformation, Beijing 100101, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of the Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
26
|
Javaid A, Frost HR. STREAK: A supervised cell surface receptor abundance estimation strategy for single cell RNA-sequencing data using feature selection and thresholded gene set scoring. PLoS Comput Biol 2023; 19:e1011413. [PMID: 37603589 PMCID: PMC10470905 DOI: 10.1371/journal.pcbi.1011413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 08/31/2023] [Accepted: 08/07/2023] [Indexed: 08/23/2023] Open
Abstract
The accurate estimation of cell surface receptor abundance for single cell transcriptomics data is important for the tasks of cell type and phenotype categorization and cell-cell interaction quantification. We previously developed an unsupervised receptor abundance estimation technique named SPECK (Surface Protein abundance Estimation using CKmeans-based clustered thresholding) to address the challenges associated with accurate abundance estimation. In that paper, we concluded that SPECK results in improved concordance with Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) data relative to comparative unsupervised abundance estimation techniques using only single-cell RNA-sequencing (scRNA-seq) data. In this paper, we outline a new supervised receptor abundance estimation method called STREAK (gene Set Testing-based Receptor abundance Estimation using Adjusted distances and cKmeans thresholding) that leverages associations learned from joint scRNA-seq/CITE-seq training data and a thresholded gene set scoring mechanism to estimate receptor abundance for scRNA-seq target data. We evaluate STREAK relative to both unsupervised and supervised receptor abundance estimation techniques using two evaluation approaches on six joint scRNA-seq/CITE-seq datasets that represent four human and mouse tissue types. We conclude that STREAK outperforms other abundance estimation strategies and provides a more biologically interpretable and transparent statistical model.
Collapse
Affiliation(s)
- Azka Javaid
- Department of Biomedical Data Science, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Hildreth Robert Frost
- Department of Biomedical Data Science, Dartmouth College, Hanover, New Hampshire, United States of America
| |
Collapse
|
27
|
Ashuach T, Gabitto MI, Koodli RV, Saldi GA, Jordan MI, Yosef N. MultiVI: deep generative model for the integration of multimodal data. Nat Methods 2023; 20:1222-1231. [PMID: 37386189 PMCID: PMC10406609 DOI: 10.1038/s41592-023-01909-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 05/10/2023] [Indexed: 07/01/2023]
Abstract
Jointly profiling the transcriptome, chromatin accessibility and other molecular properties of single cells offers a powerful way to study cellular diversity. Here we present MultiVI, a probabilistic model to analyze such multiomic data and leverage it to enhance single-modality datasets. MultiVI creates a joint representation that allows an analysis of all modalities included in the multiomic input data, even for cells for which one or more modalities are missing. It is available at scvi-tools.org .
Collapse
Affiliation(s)
- Tal Ashuach
- Center for Computational Biology, University of California, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Mariano I Gabitto
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA.
- Allen Institute for Brain Science, Seattle, WA, USA.
| | - Rohan V Koodli
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | | | - Michael I Jordan
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
28
|
Bashore AC, Yan H, Xue C, Zhu LY, Kim E, Mawson T, Coronel J, Chung A, Ho S, Ross LS, Kissner M, Passegué E, Bauer RC, Maegdefessel L, Li M, Reilly MP. High-Dimensional Single-Cell Multimodal Landscape of Human Carotid Atherosclerosis. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.13.23292633. [PMID: 37502836 PMCID: PMC10370238 DOI: 10.1101/2023.07.13.23292633] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Background Atherosclerotic plaques are complex tissues composed of a heterogeneous mixture of cells. However, we have limited understanding of the comprehensive transcriptional and phenotypical landscape of the cells within these lesions. Methods To characterize the landscape of human carotid atherosclerosis in greater detail, we combined cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) and single-cell RNA sequencing (scRNA-seq) to classify all cell types within lesions (n=21; 13 symptomatic) to achieve a comprehensive multimodal understanding of the cellular identities of atherosclerosis and their association with clinical pathophysiology. Results We identified 25 distinct cell populations each having a unique multi-omic signature, including macrophages, T cells, NK cells, mast cells, B cells, plasma cells, neutrophils, dendritic cells, endothelial cells, fibroblasts, and smooth muscle cells (SMCs). Within the macrophage populations, we identified 2 proinflammatory subsets that were enriched in IL1B or C1Q expression, 2 distinct TREM2 positive foam cell subsets, one of which also expressed inflammatory genes, as well as subpopulations displaying a proliferative gene expression signature and one expressing SMC-specific genes and upregulation of fibrotic pathways. An in-depth characterization uncovered several subsets of SMCs and fibroblasts, including a SMC-derived foam cell. We localized this foamy SMC to the deep intima of coronary atherosclerotic lesions. Using CITE-seq data, we also developed the first flow cytometry panel, using cell surface proteins CD29, CD142, and CD90, to isolate SMC-derived cells from lesions. Last, we found that the proportion of efferocytotic macrophages, classically activated endothelial cells, contractile and modulated SMC-derived cell types were reduced, and inflammatory SMCs were enriched in plaques of clinically symptomatic vs. asymptomatic patients. Conclusions Our multimodal atlas of cell populations within atherosclerosis provides novel insights into the diversity, phenotype, location, isolation, and clinical relevance of the unique cellular composition of human carotid atherosclerosis. This facilitates both the mapping of cardiovascular disease susceptibility loci to specific cell types as well as the identification of novel molecular and cellular therapeutic targets for treatment of the disease.
Collapse
Affiliation(s)
- Alexander C Bashore
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
| | - Hanying Yan
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Chenyi Xue
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
| | - Lucie Y Zhu
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
| | - Eunyoung Kim
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
| | - Thomas Mawson
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
| | - Johana Coronel
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
| | - Allen Chung
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
| | - Sebastian Ho
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
| | - Leila S Ross
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
| | - Michael Kissner
- Columbia Stem Cell Initiative, Department of Genetics and Development, Columbia University Irving Medical Center, New York
| | - Emmanuelle Passegué
- Columbia Stem Cell Initiative, Department of Genetics and Development, Columbia University Irving Medical Center, New York
| | - Robert C Bauer
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
| | - Lars Maegdefessel
- Department of Vascular and Endovascular Surgery, Technical University Munich, Germany
- German Center for Cardiovascular Research (DZHK), partner site Munich Heart Alliance
- Karolinksa Institute, Department of Medicine
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Muredach P Reilly
- Division of Cardiology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York
- Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York, NY
| |
Collapse
|
29
|
Javaid A, Frost HR. SPECK: an unsupervised learning approach for cell surface receptor abundance estimation for single-cell RNA-sequencing data. BIOINFORMATICS ADVANCES 2023; 3:vbad073. [PMID: 37359727 PMCID: PMC10290233 DOI: 10.1093/bioadv/vbad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 05/23/2023] [Accepted: 06/12/2023] [Indexed: 06/28/2023]
Abstract
Summary The rapid development of single-cell transcriptomics has revolutionized the study of complex tissues. Single-cell RNA-sequencing (scRNA-seq) can profile tens-of-thousands of dissociated cells from a tissue sample, enabling researchers to identify cell types, phenotypes and interactions that control tissue structure and function. A key requirement of these applications is the accurate estimation of cell surface protein abundance. Although technologies to directly quantify surface proteins are available, these data are uncommon and limited to proteins with available antibodies. While supervised methods that are trained on Cellular Indexing of Transcriptomes and Epitopes by Sequencing data can provide the best performance, these training data are limited by available antibodies and may not exist for the tissue under investigation. In the absence of protein measurements, researchers must estimate receptor abundance from scRNA-seq data. Therefore, we developed a new unsupervised method for receptor abundance estimation using scRNA-seq data called SPECK (Surface Protein abundance Estimation using CKmeans-based clustered thresholding) and primarily evaluated its performance against unsupervised approaches for at least 25 human receptors and multiple tissue types. This analysis reveals that techniques based on a thresholded reduced rank reconstruction of scRNA-seq data are effective for receptor abundance estimation, with SPECK providing the best overall performance. Availability and implementation SPECK is freely available at https://CRAN.R-project.org/package=SPECK. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Azka Javaid
- Department of Biomedical Data Science, Dartmouth College, Hanover, NH 03755, USA
| | - H Robert Frost
- Department of Biomedical Data Science, Dartmouth College, Hanover, NH 03755, USA
| |
Collapse
|
30
|
Pregizer S, Vreven T, Mathur M, Robinson LN. Multi-omic single cell sequencing: Overview and opportunities for kidney disease therapeutic development. Front Mol Biosci 2023; 10:1176856. [PMID: 37091871 PMCID: PMC10113659 DOI: 10.3389/fmolb.2023.1176856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 03/21/2023] [Indexed: 04/09/2023] Open
Abstract
Single cell sequencing technologies have rapidly advanced in the last decade and are increasingly applied to gain unprecedented insights by deconstructing complex biology to its fundamental unit, the individual cell. First developed for measurement of gene expression, single cell sequencing approaches have evolved to allow simultaneous profiling of multiple additional features, including chromatin accessibility within the nucleus and protein expression at the cell surface. These multi-omic approaches can now further be applied to cells in situ, capturing the spatial context within which their biology occurs. To extract insights from these complex datasets, new computational tools have facilitated the integration of information across different data types and the use of machine learning approaches. Here, we summarize current experimental and computational methods for generation and integration of single cell multi-omic datasets. We focus on opportunities for multi-omic single cell sequencing to augment therapeutic development for kidney disease, including applications for biomarkers, disease stratification and target identification.
Collapse
|
31
|
Wan H, Chen L, Deng M. scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:939-958. [PMID: 36608843 PMCID: PMC10025768 DOI: 10.1016/j.gpb.2022.12.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 11/30/2022] [Accepted: 12/11/2022] [Indexed: 01/05/2023]
Abstract
Current cell-type annotation tools for single-cell RNA sequencing (scRNA-seq) data mainly utilize well-annotated source data to help identify cell types in target data. However, on account of privacy preservation, their requirements for raw source data may not always be satisfied. In this case, achieving feature alignment between source and target data explicitly is impossible. Additionally, these methods are barely able to discover the presence of novel cell types. A subjective threshold is often selected by users to detect novel cells. We propose a universal annotation framework for scRNA-seq data called scEMAIL, which automatically detects novel cell types without accessing source data during adaptation. For new cell-type identification, a novel cell-type perception module is designed with three steps. First, an expert ensemble system measures uncertainty of each cell from three complementary aspects. Second, based on this measurement, bimodality tests are applied to detect the presence of new cell types. Third, once assured of their presence, an adaptive threshold via manifold mixup partitions target cells into "known" and "unknown" groups. Model adaptation is then conducted to alleviate the batch effect. We gather multi-order neighborhood messages globally and impose local affinity regularizations on "known" cells. These constraints mitigate wrong classifications of the source model via reliable self-supervised information of neighbors. scEMAIL is accurate and robust under various scenarios in both simulation and real data. It is also flexible to be applied to challenging single-cell ATAC-seq data without loss of superiority. The source code of scEMAIL can be accessed at https://github.com/aster-ww/scEMAIL and https://ngdc.cncb.ac.cn/biocode/tools/BT007335/releases/v1.0.
Collapse
Affiliation(s)
- Hui Wan
- School of Mathematical Sciences, Peking University, Beijing 100871, China
| | - Liang Chen
- Huawei Technologies Co., Ltd., Beijing 100080, China.
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing 100871, China; Center for Statistical Science, Peking University, Beijing 100871, China; Center for Quantitative Biology, Peking University, Beijing 100871, China.
| |
Collapse
|