1
|
Subedi S, Sumida TS, Park YP. A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection. Life Sci Alliance 2024; 7:e202402713. [PMID: 39107066 PMCID: PMC11303850 DOI: 10.26508/lsa.202402713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 07/29/2024] [Accepted: 07/30/2024] [Indexed: 08/09/2024] Open
Abstract
Probabilistic topic modelling has become essential in many types of single-cell data analysis. Based on probabilistic topic assignments in each cell, we identify the latent representation of cellular states. A dictionary matrix, consisting of topic-specific gene frequency vectors, provides interpretable bases to be compared with known cell type-specific marker genes and other pathway annotations. However, fitting a topic model on a large number of cells would require heavy computational resources-specialized computing units, computing time and memory. Here, we present a scalable approximation method customized for single-cell RNA-seq data analysis, termed ASAP, short for Annotating a Single-cell data matrix by Approximate Pseudobulk estimation. Our approach is more accurate than existing methods but requires orders of magnitude less computing time, leaving much lower memory consumption. We also show that our approach is widely applicable for atlas-scale data analysis; our method seamlessly integrates single-cell and bulk data in joint analysis, not requiring additional preprocessing or feature selection steps.
Collapse
Affiliation(s)
- Sishir Subedi
- https://ror.org/03rmrcq20Bioinformatics Graduate Program, University of British Columbia, Vancouver, Canada
- BC Cancer Research, Vancouver, Canada
| | - Tomokazu S Sumida
- Neurology, Program for Neuroinflammation, Yale School of Medicine, New Haven, CT, USA
| | - Yongjin P Park
- BC Cancer Research, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Department of Statistics, University of British Columbia, Vancouver, Canada
| |
Collapse
|
2
|
He Z, Hu S, Chen Y, An S, Zhou J, Liu R, Shi J, Wang J, Dong G, Shi J, Zhao J, Ou-Yang L, Zhu Y, Bo X, Ying X. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol 2024; 42:1594-1605. [PMID: 38263515 PMCID: PMC11471558 DOI: 10.1038/s41587-023-02040-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 10/23/2023] [Indexed: 01/25/2024]
Abstract
Integrating single-cell datasets produced by multiple omics technologies is essential for defining cellular heterogeneity. Mosaic integration, in which different datasets share only some of the measured modalities, poses major challenges, particularly regarding modality alignment and batch effect removal. Here, we present a deep probabilistic framework for the mosaic integration and knowledge transfer (MIDAS) of single-cell multimodal data. MIDAS simultaneously achieves dimensionality reduction, imputation and batch correction of mosaic data by using self-supervised modality alignment and information-theoretic latent disentanglement. We demonstrate its superiority to 19 other methods and reliability by evaluating its performance in trimodal and mosaic integration tasks. We also constructed a single-cell trimodal atlas of human peripheral blood mononuclear cells and tailored transfer learning and reciprocal reference mapping schemes to enable flexible and accurate knowledge transfer from the atlas to new data. Applications in mosaic integration, pseudotime analysis and cross-tissue knowledge transfer on bone marrow mosaic datasets demonstrate the versatility and superiority of MIDAS. MIDAS is available at https://github.com/labomics/midas .
Collapse
Affiliation(s)
- Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiahao Zhou
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Junfeng Shi
- School of Automation, China University of Geosciences, Wuhan, China
| | - Jing Wang
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jinhui Shi
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiaxin Zhao
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China.
| |
Collapse
|
3
|
Hu Y, Wan S, Luo Y, Li Y, Wu T, Deng W, Jiang C, Jiang S, Zhang Y, Liu N, Yang Z, Chen F, Li B, Qu K. Benchmarking algorithms for single-cell multi-omics prediction and integration. Nat Methods 2024:10.1038/s41592-024-02429-w. [PMID: 39322753 DOI: 10.1038/s41592-024-02429-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 08/19/2024] [Indexed: 09/27/2024]
Abstract
The development of single-cell multi-omics technology has greatly enhanced our understanding of biology, and in parallel, numerous algorithms have been proposed to predict the protein abundance and/or chromatin accessibility of cells from single-cell transcriptomic information and to integrate various types of single-cell multi-omics data. However, few studies have systematically compared and evaluated the performance of these algorithms. Here, we present a benchmark study of 14 protein abundance/chromatin accessibility prediction algorithms and 18 single-cell multi-omics integration algorithms using 47 single-cell multi-omics datasets. Our benchmark study showed overall totalVI and scArches outperformed the other algorithms for predicting protein abundance, and LS_Lab was the top-performing algorithm for the prediction of chromatin accessibility in most cases. Seurat, MOJITOO and scAI emerge as leading algorithms for vertical integration, whereas totalVI and UINMF excel beyond their counterparts in both horizontal and mosaic integration scenarios. Additionally, we provide a pipeline to assist researchers in selecting the optimal multi-omics prediction and integration algorithm.
Collapse
Affiliation(s)
- Yinlei Hu
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- School of Mathematical Science, University of Science and Technology of China, Hefei, China
| | - Siyuan Wan
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China
| | - Yuanhanyu Luo
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China
- National Institute of Biological Sciences, Beijing, China
| | - Yuanzhe Li
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China
| | - Tong Wu
- National Institute of Biological Sciences, Beijing, China
- College of Life Sciences, Beijing Normal University, Beijing, China
| | - Wentao Deng
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Chen Jiang
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Shan Jiang
- National Institute of Biological Sciences, Beijing, China
| | - Yueping Zhang
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China
| | - Nianping Liu
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China
| | - Zongcheng Yang
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Falai Chen
- School of Mathematical Science, University of Science and Technology of China, Hefei, China.
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China.
| | - Bin Li
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China.
- National Institute of Biological Sciences, Beijing, China.
| | - Kun Qu
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China.
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China.
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China.
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China.
| |
Collapse
|
4
|
Kobel CM, Merkesvik J, Burgos IMT, Lai W, Øyås O, Pope PB, Hvidsten TR, Aho VTE. Integrating host and microbiome biology using holo-omics. Mol Omics 2024; 20:438-452. [PMID: 38963125 DOI: 10.1039/d4mo00017j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2024]
Abstract
Holo-omics is the use of omics data to study a host and its inherent microbiomes - a biological system known as a "holobiont". A microbiome that exists in such a space often encounters habitat stability and in return provides metabolic capacities that can benefit their host. Here we present an overview of beneficial host-microbiome systems and propose and discuss several methodological frameworks that can be used to investigate the intricacies of the many as yet undefined host-microbiome interactions that influence holobiont homeostasis. While this is an emerging field, we anticipate that ongoing methodological advancements will enhance the biological resolution that is necessary to improve our understanding of host-microbiome interplay to make meaningful interpretations and biotechnological applications.
Collapse
Affiliation(s)
- Carl M Kobel
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| | - Jenny Merkesvik
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | | | - Wanxin Lai
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Ove Øyås
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| | - Phillip B Pope
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Torgeir R Hvidsten
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Velma T E Aho
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| |
Collapse
|
5
|
Qian Y, Zou Q, Zhao M, Liu Y, Guo F, Ding Y. scRNMF: An imputation method for single-cell RNA-seq data by robust and non-negative matrix factorization. PLoS Comput Biol 2024; 20:e1012339. [PMID: 39116191 PMCID: PMC11338450 DOI: 10.1371/journal.pcbi.1012339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 08/21/2024] [Accepted: 07/19/2024] [Indexed: 08/10/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool in genomics research, enabling the analysis of gene expression at the individual cell level. However, scRNA-seq data often suffer from a high rate of dropouts, where certain genes fail to be detected in specific cells due to technical limitations. This missing data can introduce biases and hinder downstream analysis. To overcome this challenge, the development of effective imputation methods has become crucial in the field of scRNA-seq data analysis. Here, we propose an imputation method based on robust and non-negative matrix factorization (scRNMF). Instead of other matrix factorization algorithms, scRNMF integrates two loss functions: L2 loss and C-loss. The L2 loss function is highly sensitive to outliers, which can introduce substantial errors. We utilize the C-loss function when dealing with zero values in the raw data. The primary advantage of the C-loss function is that it imposes a smaller punishment for larger errors, which results in more robust factorization when handling outliers. Various datasets of different sizes and zero rates are used to evaluate the performance of scRNMF against other state-of-the-art methods. Our method demonstrates its power and stability as a tool for imputation of scRNA-seq data.
Collapse
Affiliation(s)
- Yuqing Qian
- Institute Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Institute Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Mengyuan Zhao
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yi Liu
- Institute Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
6
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
7
|
Verhey TB, Seo H, Gillmor A, Thoppey-Manoharan V, Schriemer D, Morrissy S. mosaicMPI: a framework for modular data integration across cohorts and -omics modalities. Nucleic Acids Res 2024; 52:e53. [PMID: 38813827 PMCID: PMC11229337 DOI: 10.1093/nar/gkae442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 04/26/2024] [Accepted: 05/10/2024] [Indexed: 05/31/2024] Open
Abstract
Advances in molecular profiling have facilitated generation of large multi-modal datasets that can potentially reveal critical axes of biological variation underlying complex diseases. Distilling biological meaning, however, requires computational strategies that can perform mosaic integration across diverse cohorts and datatypes. Here, we present mosaicMPI, a framework for discovery of low to high-resolution molecular programs representing both cell types and states, and integration within and across datasets into a network representing biological themes. Using existing datasets in glioblastoma, we demonstrate that this approach robustly integrates single cell and bulk programs across multiple platforms. Clinical and molecular annotations from cohorts are statistically propagated onto this network of programs, yielding a richly characterized landscape of biological themes. This enables deep understanding of individual tumor samples, systematic exploration of relationships between modalities, and generation of a reference map onto which new datasets can rapidly be mapped. mosaicMPI is available at https://github.com/MorrissyLab/mosaicMPI.
Collapse
Affiliation(s)
- Theodore B Verhey
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| | - Heewon Seo
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - Aaron Gillmor
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - Varsha Thoppey-Manoharan
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - David Schriemer
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - Sorana Morrissy
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
8
|
Rautenstrauch P, Ohler U. Liam tackles complex multimodal single-cell data integration challenges. Nucleic Acids Res 2024; 52:e52. [PMID: 38842910 PMCID: PMC11229356 DOI: 10.1093/nar/gkae409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 03/08/2024] [Accepted: 05/29/2024] [Indexed: 06/07/2024] Open
Abstract
Multi-omics characterization of single cells holds outstanding potential for profiling the dynamics and relations of gene regulatory states of thousands of cells. How to integrate multimodal data is an open problem, especially when aiming to combine data from multiple sources or conditions containing both biological and technical variation. We introduce liam, a flexible model for the simultaneous horizontal and vertical integration of paired single-cell multimodal data and mosaic integration of paired with unimodal data. Liam learns a joint low-dimensional representation of the measured modalities, which proves beneficial when the information content or quality of the modalities differ. Its integration accounts for complex batch effects using a tunable combination of conditional and adversarial training, which can be optimized using replicate information while retaining selected biological variation. We demonstrate liam's superior performance on multiple paired multimodal data types, including Multiome and CITE-seq data, and in mosaic integration scenarios. Our detailed benchmarking experiments illustrate the complexities and challenges remaining for integration and the meaningful assessment of its success.
Collapse
Affiliation(s)
- Pia Rautenstrauch
- Humboldt-Universität zu Berlin, Department of Computer Science, 10099 Berlin, Germany
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany
| | - Uwe Ohler
- Humboldt-Universität zu Berlin, Department of Computer Science, 10099 Berlin, Germany
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany
- Humboldt-Universität zu Berlin, Department of Biology, 10099 Berlin, Germany
| |
Collapse
|
9
|
Drost F, An Y, Bonafonte-Pardàs I, Dratva LM, Lindeboom RGH, Haniffa M, Teichmann SA, Theis F, Lotfollahi M, Schubert B. Multi-modal generative modeling for joint analysis of single-cell T cell receptor and gene expression data. Nat Commun 2024; 15:5577. [PMID: 38956082 PMCID: PMC11220149 DOI: 10.1038/s41467-024-49806-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 05/23/2024] [Indexed: 07/04/2024] Open
Abstract
Recent advances in single-cell immune profiling have enabled the simultaneous measurement of transcriptome and T cell receptor (TCR) sequences, offering great potential for studying immune responses at the cellular level. However, integrating these diverse modalities across datasets is challenging due to their unique data characteristics and technical variations. Here, to address this, we develop the multimodal generative model mvTCR to fuse modality-specific information across transcriptome and TCR into a shared representation. Our analysis demonstrates the added value of multimodal over unimodal approaches to capture antigen specificity. Notably, we use mvTCR to distinguish T cell subpopulations binding to SARS-CoV-2 antigens from bystander cells. Furthermore, when combined with reference mapping approaches, mvTCR can map newly generated datasets to extensive T cell references, facilitating knowledge transfer. In summary, we envision mvTCR to enable a scalable analysis of multimodal immune profiling data and advance our understanding of immune responses.
Collapse
Affiliation(s)
- Felix Drost
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Alte Akademie 8, 85354, Freising, Germany
| | - Yang An
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany
| | - Irene Bonafonte-Pardàs
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Lisa M Dratva
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Rik G H Lindeboom
- The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Muzlifah Haniffa
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Physics, Cavendish Laboratory, University of Cambridge, 19 JJ Thomson Avenue, Cambridge, UK
| | - Fabian Theis
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Alte Akademie 8, 85354, Freising, Germany
- School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany
| | - Mohammad Lotfollahi
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
| | - Benjamin Schubert
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
- School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany.
| |
Collapse
|
10
|
Chen S, Zhu B, Huang S, Hickey JW, Lin KZ, Snyder M, Greenleaf WJ, Nolan GP, Zhang NR, Ma Z. Integration of spatial and single-cell data across modalities with weakly linked features. Nat Biotechnol 2024; 42:1096-1106. [PMID: 37679544 DOI: 10.1038/s41587-023-01935-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 08/02/2023] [Indexed: 09/09/2023]
Abstract
Although single-cell and spatial sequencing methods enable simultaneous measurement of more than one biological modality, no technology can capture all modalities within the same cell. For current data integration methods, the feasibility of cross-modal integration relies on the existence of highly correlated, a priori 'linked' features. We describe matching X-modality via fuzzy smoothed embedding (MaxFuse), a cross-modal data integration method that, through iterative coembedding, data smoothing and cell matching, uses all information in each modality to obtain high-quality integration even when features are weakly linked. MaxFuse is modality-agnostic and demonstrates high robustness and accuracy in the weak linkage scenario, achieving 20~70% relative improvement over existing methods under key evaluation metrics on benchmarking datasets. A prototypical example of weak linkage is the integration of spatial proteomic data with single-cell sequencing data. On two example analyses of this type, MaxFuse enabled the spatial consolidation of proteomic, transcriptomic and epigenomic information at single-cell resolution on the same tissue section.
Collapse
Affiliation(s)
- Shuxiao Chen
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Bokai Zhu
- Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Sijia Huang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - John W Hickey
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Garry P Nolan
- Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA.
- Department of Pathology, Stanford University, Stanford, CA, USA.
| | - Nancy R Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA.
| | - Zongming Ma
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| |
Collapse
|
11
|
Curion F, Theis FJ. Machine learning integrative approaches to advance computational immunology. Genome Med 2024; 16:80. [PMID: 38862979 PMCID: PMC11165829 DOI: 10.1186/s13073-024-01350-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 05/23/2024] [Indexed: 06/13/2024] Open
Abstract
The study of immunology, traditionally reliant on proteomics to evaluate individual immune cells, has been revolutionized by single-cell RNA sequencing. Computational immunologists play a crucial role in analysing these datasets, moving beyond traditional protein marker identification to encompass a more detailed view of cellular phenotypes and their functional roles. Recent technological advancements allow the simultaneous measurements of multiple cellular components-transcriptome, proteome, chromatin, epigenetic modifications and metabolites-within single cells, including in spatial contexts within tissues. This has led to the generation of complex multiscale datasets that can include multimodal measurements from the same cells or a mix of paired and unpaired modalities. Modern machine learning (ML) techniques allow for the integration of multiple "omics" data without the need for extensive independent modelling of each modality. This review focuses on recent advancements in ML integrative approaches applied to immunological studies. We highlight the importance of these methods in creating a unified representation of multiscale data collections, particularly for single-cell and spatial profiling technologies. Finally, we discuss the challenges of these holistic approaches and how they will be instrumental in the development of a common coordinate framework for multiscale studies, thereby accelerating research and enabling discoveries in the computational immunology field.
Collapse
Affiliation(s)
- Fabiola Curion
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
12
|
Ghazanfar S, Guibentif C, Marioni JC. Stabilized mosaic single-cell data integration using unshared features. Nat Biotechnol 2024; 42:284-292. [PMID: 37231260 PMCID: PMC10869270 DOI: 10.1038/s41587-023-01766-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 03/28/2023] [Indexed: 05/27/2023]
Abstract
Currently available single-cell omics technologies capture many unique features with different biological information content. Data integration aims to place cells, captured with different technologies, onto a common embedding to facilitate downstream analytical tasks. Current horizontal data integration techniques use a set of common features, thereby ignoring non-overlapping features and losing information. Here we introduce StabMap, a mosaic data integration technique that stabilizes mapping of single-cell data by exploiting the non-overlapping features. StabMap first infers a mosaic data topology based on shared features, then projects all cells onto supervised or unsupervised reference coordinates by traversing shortest paths along the topology. We show that StabMap performs well in various simulation contexts, facilitates 'multi-hop' mosaic data integration where some datasets do not share any features and enables the use of spatial gene expression features for mapping dissociated single-cell data onto a spatial transcriptomic reference.
Collapse
Affiliation(s)
- Shila Ghazanfar
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- School of Mathematics and Statistics, The University of Sydney, Camperdown, New South Wales, Australia.
- Charles Perkins Centre, The University of Sydney, Camperdown, New South Wales, Australia.
| | - Carolina Guibentif
- Sahlgrenska Center for Cancer Research, Inst. Biomedicine, Dept. Microbiology and Immunology, University of Gothenburg, Gothenburg, Sweden
| | - John C Marioni
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
13
|
Han X, Wang B, Situ C, Qi Y, Zhu H, Li Y, Guo X. scapGNN: A graph neural network-based framework for active pathway and gene module inference from single-cell multi-omics data. PLoS Biol 2023; 21:e3002369. [PMID: 37956172 PMCID: PMC10681325 DOI: 10.1371/journal.pbio.3002369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 11/27/2023] [Accepted: 10/07/2023] [Indexed: 11/15/2023] Open
Abstract
Although advances in single-cell technologies have enabled the characterization of multiple omics profiles in individual cells, extracting functional and mechanistic insights from such information remains a major challenge. Here, we present scapGNN, a graph neural network (GNN)-based framework that creatively transforms sparse single-cell profile data into the stable gene-cell association network for inferring single-cell pathway activity scores and identifying cell phenotype-associated gene modules from single-cell multi-omics data. Systematic benchmarking demonstrated that scapGNN was more accurate, robust, and scalable than state-of-the-art methods in various downstream single-cell analyses such as cell denoising, batch effect removal, cell clustering, cell trajectory inference, and pathway or gene module identification. scapGNN was developed as a systematic R package that can be flexibly extended and enhanced for existing analysis processes. It provides a new analytical platform for studying single cells at the pathway and network levels.
Collapse
Affiliation(s)
- Xudong Han
- State Key Laboratory of Reproductive Medicine and Offspring Health, School of Medicine, Southeast University, Nanjing, China
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Bing Wang
- State Key Laboratory of Reproductive Medicine and Offspring Health, School of Medicine, Southeast University, Nanjing, China
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Chenghao Situ
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Yaling Qi
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Hui Zhu
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Yan Li
- Department of Clinical Laboratory, Sir Run Run Hospital, Nanjing Medical University, Nanjing, China
| | - Xuejiang Guo
- State Key Laboratory of Reproductive Medicine and Offspring Health, School of Medicine, Southeast University, Nanjing, China
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| |
Collapse
|
14
|
Song Y, Miao Z, Brazma A, Papatheodorou I. Benchmarking strategies for cross-species integration of single-cell RNA sequencing data. Nat Commun 2023; 14:6495. [PMID: 37838716 PMCID: PMC10576752 DOI: 10.1038/s41467-023-41855-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 09/21/2023] [Indexed: 10/16/2023] Open
Abstract
The growing number of available single-cell gene expression datasets from different species creates opportunities to explore evolutionary relationships between cell types across species. Cross-species integration of single-cell RNA-sequencing data has been particularly informative in this context. However, in order to do so robustly it is essential to have rigorous benchmarking and appropriate guidelines to ensure that integration results truly reflect biology. Here, we benchmark 28 combinations of gene homology mapping methods and data integration algorithms in a variety of biological settings. We examine the capability of each strategy to perform species-mixing of known homologous cell types and to preserve biological heterogeneity using 9 established metrics. We also develop a new biology conservation metric to address the maintenance of cell type distinguishability. Overall, scANVI, scVI and SeuratV4 methods achieve a balance between species-mixing and biology conservation. For evolutionarily distant species, including in-paralogs is beneficial. SAMap outperforms when integrating whole-body atlases between species with challenging gene homology annotation. We provide our freely available cross-species integration and assessment pipeline to help analyse new data and develop new algorithms.
Collapse
Affiliation(s)
- Yuyao Song
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom.
| | - Zhichao Miao
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom
- Guangzhou Laboratory, Guangzhou International Bio Island, Guangzhou, 510005, China
| | - Alvis Brazma
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom
| | - Irene Papatheodorou
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom.
| |
Collapse
|
15
|
Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of single cell multi-omics and spatial data guided by gene regulatory networks and cell-cell interactions. RESEARCH SQUARE 2023:rs.3.rs-3301625. [PMID: 37790516 PMCID: PMC10543280 DOI: 10.21203/rs.3.rs-3301625/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Simulated single-cell data is essential for designing and evaluating computational methods in the absence of experimental ground truth. Existing simulators typically focus on modeling one or two specific biological factors or mechanisms that affect the output data, which limits their capacity to simulate the complexity and multi-modality in real data. Here, we present scMultiSim, an in silico simulator that generates multi-modal single-cell data, including gene expression, chromatin accessibility, RNA velocity, and spatial cell locations while accounting for the relationships between modalities. scMultiSim jointly models various biological factors that affect the output data, including cell identity, within-cell gene regulatory networks (GRNs), cell-cell interactions (CCIs), and chromatin accessibility, hile also incorporating technical noises. Moreover, it allows users to adjust each factor's effect easily. We validated scMultiSim's simulated biological effects and demonstrated its applications by benchmarking a wide range of computational tasks, including multi-modal and multi-batch data integration, RNA velocity estimation, GRN inference and CCI inference using spatially resolved gene expression data, many of them were not benchmarked before due to the lack of proper tools. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
Collapse
Affiliation(s)
- Hechen Li
- Georgia Institute of Technology, Atlanta, USA
| | - Ziqi Zhang
- Georgia Institute of Technology, Atlanta, USA
| | | | - Xi Chen
- Southern University of Science and Technology, Shenzhen, China
| | | |
Collapse
|
16
|
Doha ZO, Wang X, Calistri NL, Eng J, Daniel CJ, Ternes L, Kim EN, Pelz C, Munks M, Betts C, Kwon S, Bucher E, Li X, Waugh T, Tatarova Z, Blumberg D, Ko A, Kirchberger N, Pietenpol JA, Sanders ME, Langer EM, Dai MS, Mills G, Chin K, Chang YH, Coussens LM, Gray JW, Heiser LM, Sears RC. MYC Deregulation and PTEN Loss Model Tumor and Stromal Heterogeneity of Aggressive Triple-Negative Breast Cancer. Nat Commun 2023; 14:5665. [PMID: 37704631 PMCID: PMC10499828 DOI: 10.1038/s41467-023-40841-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 08/14/2023] [Indexed: 09/15/2023] Open
Abstract
Triple-negative breast cancer (TNBC) patients have a poor prognosis and few treatment options. Mouse models of TNBC are important for development of new therapies, however, few mouse models represent the complexity of TNBC. Here, we develop a female TNBC murine model by mimicking two common TNBC mutations with high co-occurrence: amplification of the oncogene MYC and deletion of the tumor suppressor PTEN. This Myc;Ptenfl model develops heterogeneous triple-negative mammary tumors that display histological and molecular features commonly found in human TNBC. Our research involves deep molecular and spatial analyses on Myc;Ptenfl tumors including bulk and single-cell RNA-sequencing, and multiplex tissue-imaging. Through comparison with human TNBC, we demonstrate that this genetic mouse model develops mammary tumors with differential survival and therapeutic responses that closely resemble the inter- and intra-tumoral and microenvironmental heterogeneity of human TNBC, providing a pre-clinical tool for assessing the spectrum of patient TNBC biology and drug response.
Collapse
Affiliation(s)
- Zinab O Doha
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA
- Department of medical laboratory technology, Taibah University, Al-Madinah al-Munawwarah, Saudi Arabia
| | - Xiaoyan Wang
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Nicholas L Calistri
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Jennifer Eng
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- OHSU Center for Spatial Systems Biomedicine, Oregon Health & Science University, Portland, OR, USA
| | - Colin J Daniel
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Luke Ternes
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Eun Na Kim
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Carl Pelz
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA
- Brenden-Colson Center for Pancreatic Care, Oregon Health & Science University, Portland, OR, USA
| | - Michael Munks
- Brenden-Colson Center for Pancreatic Care, Oregon Health & Science University, Portland, OR, USA
- Department of Molecular Microbiology and Immunology, Oregon Health and Science University, Portland, OR, USA
| | - Courtney Betts
- Department of Cell, Developmental & Cancer Biology, Oregon Health and Science University, Portland, OR, USA
| | - Sunjong Kwon
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- OHSU Center for Spatial Systems Biomedicine, Oregon Health & Science University, Portland, OR, USA
| | - Elmar Bucher
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- OHSU Center for Spatial Systems Biomedicine, Oregon Health & Science University, Portland, OR, USA
| | - Xi Li
- Division of Oncologic Sciences, Oregon Health and Science University, Portland, OR, USA
| | - Trent Waugh
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Zuzana Tatarova
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- OHSU Center for Spatial Systems Biomedicine, Oregon Health & Science University, Portland, OR, USA
| | - Dylan Blumberg
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- OHSU Center for Spatial Systems Biomedicine, Oregon Health & Science University, Portland, OR, USA
| | - Aaron Ko
- Department of Molecular Microbiology and Immunology, Oregon Health and Science University, Portland, OR, USA
| | - Nell Kirchberger
- Department of Cell, Developmental & Cancer Biology, Oregon Health and Science University, Portland, OR, USA
| | - Jennifer A Pietenpol
- Department of Biochemistry, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Melinda E Sanders
- Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ellen M Langer
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Mu-Shui Dai
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Gordon Mills
- Brenden-Colson Center for Pancreatic Care, Oregon Health & Science University, Portland, OR, USA
- Division of Oncologic Sciences, Oregon Health and Science University, Portland, OR, USA
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Koei Chin
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- OHSU Center for Spatial Systems Biomedicine, Oregon Health & Science University, Portland, OR, USA
- Brenden-Colson Center for Pancreatic Care, Oregon Health & Science University, Portland, OR, USA
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Young Hwan Chang
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Lisa M Coussens
- Brenden-Colson Center for Pancreatic Care, Oregon Health & Science University, Portland, OR, USA
- Department of Cell, Developmental & Cancer Biology, Oregon Health and Science University, Portland, OR, USA
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Joe W Gray
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- OHSU Center for Spatial Systems Biomedicine, Oregon Health & Science University, Portland, OR, USA
- Brenden-Colson Center for Pancreatic Care, Oregon Health & Science University, Portland, OR, USA
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Laura M Heiser
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- OHSU Center for Spatial Systems Biomedicine, Oregon Health & Science University, Portland, OR, USA
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Rosalie C Sears
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA.
- Brenden-Colson Center for Pancreatic Care, Oregon Health & Science University, Portland, OR, USA.
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA.
| |
Collapse
|
17
|
Zhang Y, Khalilitousi M(S, Park YP. Unraveling dynamically encoded latent transcriptomic patterns in pancreatic cancer cells by topic modeling. CELL GENOMICS 2023; 3:100388. [PMID: 37719139 PMCID: PMC10504675 DOI: 10.1016/j.xgen.2023.100388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/27/2023] [Accepted: 07/31/2023] [Indexed: 09/19/2023]
Abstract
Building a comprehensive topic model has become an important research tool in single-cell genomics. With a topic model, we can decompose and ascertain distinctive cell topics shared across multiple cells, and the gene programs implicated by each topic can later serve as a predictive model in translational studies. Here, we present a Bayesian topic model that can uncover short-term RNA velocity patterns from a plethora of spliced and unspliced single-cell RNA-sequencing (RNA-seq) counts. We showed that modeling both types of RNA counts can improve robustness in statistical estimation and can reveal new aspects of dynamic changes that can be missed in static analysis. We showcase that our modeling framework can be used to identify statistically significant dynamic gene programs in pancreatic cancer data. Our results discovered that seven dynamic gene programs (topics) are highly correlated with cancer prognosis and generally enrich immune cell types and pathways.
Collapse
Affiliation(s)
- Yichen Zhang
- Department of Statistics, The University of British Columbia, Vancouver, BC, Canada
| | | | - Yongjin P. Park
- Department of Statistics, The University of British Columbia, Vancouver, BC, Canada
- Department of Pathology and Laboratory Medicine, The University of British Columbia, Vancouver, BC, Canada
- Department of Molecular Oncology, BC Cancer Research, Part of Provincial Health Care Authority, Vancouver, BC, Canada
| |
Collapse
|
18
|
Cheng M, Jiang Y, Xu J, Mentis AFA, Wang S, Zheng H, Sahu SK, Liu L, Xu X. Spatially resolved transcriptomics: a comprehensive review of their technological advances, applications, and challenges. J Genet Genomics 2023; 50:625-640. [PMID: 36990426 DOI: 10.1016/j.jgg.2023.03.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 03/11/2023] [Accepted: 03/16/2023] [Indexed: 03/29/2023]
Abstract
The ability to explore life kingdoms is largely driven by innovations and breakthroughs in technology, from the invention of the microscope 350 years ago to the recent emergence of single-cell sequencing, by which the scientific community has been able to visualize life at an unprecedented resolution. Most recently, the Spatially Resolved Transcriptomics (SRT) technologies have filled the gap in probing the spatial or even three-dimensional organization of the molecular foundation behind the molecular mysteries of life, including the origin of different cellular populations developed from totipotent cells and human diseases. In this review, we introduce recent progresses and challenges on SRT from the perspectives of technologies and bioinformatic tools, as well as the representative SRT applications. With the currently fast-moving progress of the SRT technologies and promising results from early adopted research projects, we can foresee the bright future of such new tools in understanding life at the most profound analytical level.
Collapse
Affiliation(s)
| | - Yujia Jiang
- BGI-Hangzhou, Hangzhou, Zhejiang 310012, China
| | | | | | - Shuai Wang
- BGI-Hangzhou, Hangzhou, Zhejiang 310012, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Sunil Kumar Sahu
- BGI-Shenzhen, Shenzhen, Guangdong 518103, China; State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, Guangdong 518083, China
| | - Longqi Liu
- BGI-Hangzhou, Hangzhou, Zhejiang 310012, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Xun Xu
- BGI-Hangzhou, Hangzhou, Zhejiang 310012, China; BGI-Shenzhen, Shenzhen, Guangdong 518103, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen, Guangdong 518120, China.
| |
Collapse
|
19
|
Tan Y, Huang J, Li D, Zou C, Liu D, Qin B. Single-cell RNA sequencing in dissecting microenvironment of age-related macular degeneration: Challenges and perspectives. Ageing Res Rev 2023; 90:102030. [PMID: 37549871 DOI: 10.1016/j.arr.2023.102030] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 04/29/2023] [Accepted: 08/04/2023] [Indexed: 08/09/2023]
Abstract
Age-related macular degeneration (AMD) is the leading cause of blindness in individuals over the age of 50 years, yet its etiology and pathogenesis largely remain uncovered. Single-cell RNA sequencing (scRNA-seq) technologies are recently developed and have a number of advantages over conventional bulk RNA sequencing techniques in uncovering the heterogeneity of complex microenvironments containing numerous cell types and cell communications during various biological processes. In this review, we summarize the latest discovered cellular components and regulatory mechanisms during AMD development revealed by scRNA-seq. In addition, we discuss the main challenges and future directions in exploring the pathophysiology of AMD equipped with single-cell technologies. Our review underscores the importance of multimodal single-cell platforms (such as single-cell spatiotemporal multi-omics and single-cell exosome omics) as new approaches for basic and clinical AMD research in identifying biomarker, characterizing cellular responses to drug treatment and environmental stimulation.
Collapse
Affiliation(s)
- Yao Tan
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
| | - Jianguo Huang
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
| | - Deshuang Li
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
| | - Chang Zou
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China; Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China; School of Life and Health Sciences, The Chinese University of Kong Hong, Shenzhen 518000, Guangdong, China.
| | - Dongcheng Liu
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China; Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China.
| | - Bo Qin
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China; Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China; Aier School of Ophthalmology, Central South University, Changsha, China.
| |
Collapse
|
20
|
Flynn E, Almonte-Loya A, Fragiadakis GK. Single-Cell Multiomics. Annu Rev Biomed Data Sci 2023; 6:313-337. [PMID: 37159875 PMCID: PMC11146013 DOI: 10.1146/annurev-biodatasci-020422-050645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Single-cell RNA sequencing methods have led to improved understanding of the heterogeneity and transcriptomic states present in complex biological systems. Recently, the development of novel single-cell technologies for assaying additional modalities, specifically genomic, epigenomic, proteomic, and spatial data, allows for unprecedented insight into cellular biology. While certain technologies collect multiple measurements from the same cells simultaneously, even when modalities are separately assayed in different cells, we can apply novel computational methods to integrate these data. The application of computational integration methods to multimodal paired and unpaired data results in rich information about the identities of the cells present and the interactions between different levels of biology, such as between genetic variation and transcription. In this review, we both discuss the single-cell technologies for measuring these modalities and describe and characterize a variety of computational integration methods for combining the resulting data to leverage multimodal information toward greater biological insight.
Collapse
Affiliation(s)
- Emily Flynn
- CoLabs, University of California, San Francisco, California, USA;
| | - Ana Almonte-Loya
- CoLabs, University of California, San Francisco, California, USA;
- Biomedical Informatics Program, University of California, San Francisco, California, USA
| | - Gabriela K Fragiadakis
- CoLabs, University of California, San Francisco, California, USA;
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
| |
Collapse
|
21
|
Fouché A, Zinovyev A. Omics data integration in computational biology viewed through the prism of machine learning paradigms. FRONTIERS IN BIOINFORMATICS 2023; 3:1191961. [PMID: 37600970 PMCID: PMC10436311 DOI: 10.3389/fbinf.2023.1191961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 07/26/2023] [Indexed: 08/22/2023] Open
Abstract
Important quantities of biological data can today be acquired to characterize cell types and states, from various sources and using a wide diversity of methods, providing scientists with more and more information to answer challenging biological questions. Unfortunately, working with this amount of data comes at the price of ever-increasing data complexity. This is caused by the multiplication of data types and batch effects, which hinders the joint usage of all available data within common analyses. Data integration describes a set of tasks geared towards embedding several datasets of different origins or modalities into a joint representation that can then be used to carry out downstream analyses. In the last decade, dozens of methods have been proposed to tackle the different facets of the data integration problem, relying on various paradigms. This review introduces the most common data types encountered in computational biology and provides systematic definitions of the data integration problems. We then present how machine learning innovations were leveraged to build effective data integration algorithms, that are widely used today by computational biologists. We discuss the current state of data integration and important pitfalls to consider when working with data integration tools. We eventually detail a set of challenges the field will have to overcome in the coming years.
Collapse
Affiliation(s)
- Aziz Fouché
- Institut Curie, PSL Research University, Paris, France
- Institut National de la Santé et de la Recherche Médicale, Paris, France
- CBIO-Centre for Computational Biology, ParisTech, PSL Research University, Paris, France
- Ecole Normale Supérieure Paris-Saclay, Cachan, France
| | | |
Collapse
|
22
|
Chen C, Wang J, Pan D, Wang X, Xu Y, Yan J, Wang L, Yang X, Yang M, Liu G. Applications of multi-omics analysis in human diseases. MedComm (Beijing) 2023; 4:e315. [PMID: 37533767 PMCID: PMC10390758 DOI: 10.1002/mco2.315] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/25/2023] [Accepted: 05/31/2023] [Indexed: 08/04/2023] Open
Abstract
Multi-omics usually refers to the crossover application of multiple high-throughput screening technologies represented by genomics, transcriptomics, single-cell transcriptomics, proteomics and metabolomics, spatial transcriptomics, and so on, which play a great role in promoting the study of human diseases. Most of the current reviews focus on describing the development of multi-omics technologies, data integration, and application to a particular disease; however, few of them provide a comprehensive and systematic introduction of multi-omics. This review outlines the existing technical categories of multi-omics, cautions for experimental design, focuses on the integrated analysis methods of multi-omics, especially the approach of machine learning and deep learning in multi-omics data integration and the corresponding tools, and the application of multi-omics in medical researches (e.g., cancer, neurodegenerative diseases, aging, and drug target discovery) as well as the corresponding open-source analysis tools and databases, and finally, discusses the challenges and future directions of multi-omics integration and application in precision medicine. With the development of high-throughput technologies and data integration algorithms, as important directions of multi-omics for future disease research, single-cell multi-omics and spatial multi-omics also provided a detailed introduction. This review will provide important guidance for researchers, especially who are just entering into multi-omics medical research.
Collapse
Affiliation(s)
- Chongyang Chen
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
| | - Jing Wang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Donghui Pan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xinyu Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Yuping Xu
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Junjie Yan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Lizhen Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xifei Yang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Min Yang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Gong‐Ping Liu
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
- Department of PathophysiologySchool of Basic MedicineKey Laboratory of Ministry of Education of China and Hubei Province for Neurological DisordersTongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
23
|
Ashuach T, Gabitto MI, Koodli RV, Saldi GA, Jordan MI, Yosef N. MultiVI: deep generative model for the integration of multimodal data. Nat Methods 2023; 20:1222-1231. [PMID: 37386189 PMCID: PMC10406609 DOI: 10.1038/s41592-023-01909-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 05/10/2023] [Indexed: 07/01/2023]
Abstract
Jointly profiling the transcriptome, chromatin accessibility and other molecular properties of single cells offers a powerful way to study cellular diversity. Here we present MultiVI, a probabilistic model to analyze such multiomic data and leverage it to enhance single-modality datasets. MultiVI creates a joint representation that allows an analysis of all modalities included in the multiomic input data, even for cells for which one or more modalities are missing. It is available at scvi-tools.org .
Collapse
Affiliation(s)
- Tal Ashuach
- Center for Computational Biology, University of California, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Mariano I Gabitto
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA.
- Allen Institute for Brain Science, Seattle, WA, USA.
| | - Rohan V Koodli
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | | | - Michael I Jordan
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
24
|
Maitra C, Seal DB, Das V, De RK. Unsupervised neural network for single cell Multi-omics INTegration (UMINT): an application to health and disease. Front Mol Biosci 2023; 10:1184748. [PMID: 37293552 PMCID: PMC10244650 DOI: 10.3389/fmolb.2023.1184748] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 05/04/2023] [Indexed: 06/10/2023] Open
Abstract
Multi-omics studies have enabled us to understand the mechanistic drivers behind complex disease states and progressions, thereby providing novel and actionable biological insights into health status. However, integrating data from multiple modalities is challenging due to high dimensionality and diverse nature of data, and noise associated with each platform. Sparsity in data, non-overlapping features and technical batch effects make the task of learning more complicated. Conventional machine learning (ML) tools are not quite effective against such data integration hazards due to their simplistic nature with less capacity. In addition, existing methods for single cell multi-omics integration are computationally expensive. Therefore, in this work, we have introduced a novel Unsupervised neural network for single cell Multi-omics INTegration (UMINT). UMINT serves as a promising model for integrating variable number of single cell omics layers with high dimensions. It has a light-weight architecture with substantially reduced number of parameters. The proposed model is capable of learning a latent low-dimensional embedding that can extract useful features from the data facilitating further downstream analyses. UMINT has been applied to integrate healthy and disease CITE-seq (paired RNA and surface proteins) datasets including a rare disease Mucosa-Associated Lymphoid Tissue (MALT) tumor. It has been benchmarked against existing state-of-the-art methods for single cell multi-omics integration. Furthermore, UMINT is capable of integrating paired single cell gene expression and ATAC-seq (Transposase-Accessible Chromatin) assays as well.
Collapse
Affiliation(s)
- Chayan Maitra
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| | | | | | - Rajat K. De
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| |
Collapse
|
25
|
Wang S, Zhang Y, Lin X, Su L, Xiao G, Zhu W, Shi Y. Learning matrix factorization with scalable distance metric and regularizer. Neural Netw 2023; 161:254-266. [PMID: 36774864 DOI: 10.1016/j.neunet.2023.01.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 01/17/2023] [Accepted: 01/24/2023] [Indexed: 02/05/2023]
Abstract
Matrix factorization has always been an encouraging field, which attempts to extract discriminative features from high-dimensional data. However, it suffers from negative generalization ability and high computational complexity when handling large-scale data. In this paper, we propose a learnable deep matrix factorization via the projected gradient descent method, which learns multi-layer low-rank factors from scalable metric distances and flexible regularizers. Accordingly, solving a constrained matrix factorization problem is equivalently transformed into training a neural network with an appropriate activation function induced from the projection onto a feasible set. Distinct from other neural networks, the proposed method activates the connected weights not just the hidden layers. As a result, it is proved that the proposed method can learn several existing well-known matrix factorizations, including singular value decomposition, convex, nonnegative and semi-nonnegative matrix factorizations. Finally, comprehensive experiments demonstrate the superiority of the proposed method against other state-of-the-arts.
Collapse
Affiliation(s)
- Shiping Wang
- College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Guangdong Provincial Key Laboratory of Big Data Computing, The Chinese University of Hong Kong, Shenzhen 518172, China.
| | - Yunhe Zhang
- College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China.
| | - Xincan Lin
- College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China.
| | - Lichao Su
- College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China.
| | - Guobao Xiao
- College of Computer and Control Engineering, Minjiang University, Fuzhou 350108, China.
| | - William Zhu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Yiqing Shi
- College of Photonic and Electronic Engineering, Fujian Normal University, Fuzhou 350117, China.
| |
Collapse
|
26
|
Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of multi-modality single cell data guided by cell-cell interactions and gene regulatory networks. RESEARCH SQUARE 2023:rs.3.rs-2675530. [PMID: 36993284 PMCID: PMC10055660 DOI: 10.21203/rs.3.rs-2675530/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Simulated single-cell data is essential for designing and evaluating computational methods in the absence of experimental ground truth. Existing simulators typically focus on modeling one or two specific biological factors or mechanisms that affect the output data, which limits their capacity to simulate the complexity and multi-modality in real data. Here, we present scMultiSim, an in silico simulator that generates multi-modal single-cell data, including gene expression, chromatin accessibility, RNA velocity, and spatial cell locations while accounting for the relationships between modalities. scMultiSim jointly models various biological factors that affect the output data, including cell identity, within-cell gene regulatory networks (GRNs), cell-cell interactions (CCIs), and chromatin accessibility, while also incorporating technical noises. Moreover, it allows users to adjust each factor's effect easily. We validated scMultiSim's simulated biological effects and demonstrated its applications by benchmarking a wide range of computational tasks, including cell clustering and trajectory inference, multi-modal and multi-batch data integration, RNA velocity estimation, GRN inference and CCI inference using spatially resolved gene expression data. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
Collapse
Affiliation(s)
- Hechen Li
- Georgia Institute of Technology, Atlanta, USA
| | - Ziqi Zhang
- Georgia Institute of Technology, Atlanta, USA
| | | | - Xi Chen
- Southern University of Science and Technology, China
| | | |
Collapse
|
27
|
Allesøe RL, Lundgaard AT, Hernández Medina R, Aguayo-Orozco A, Johansen J, Nissen JN, Brorsson C, Mazzoni G, Niu L, Biel JH, Brasas V, Webel H, Benros ME, Pedersen AG, Chmura PJ, Jacobsen UP, Mari A, Koivula R, Mahajan A, Vinuela A, Tajes JF, Sharma S, Haid M, Hong MG, Musholt PB, De Masi F, Vogt J, Pedersen HK, Gudmundsdottir V, Jones A, Kennedy G, Bell J, Thomas EL, Frost G, Thomsen H, Hansen E, Hansen TH, Vestergaard H, Muilwijk M, Blom MT, 't Hart LM, Pattou F, Raverdy V, Brage S, Kokkola T, Heggie A, McEvoy D, Mourby M, Kaye J, Hattersley A, McDonald T, Ridderstråle M, Walker M, Forgie I, Giordano GN, Pavo I, Ruetten H, Pedersen O, Hansen T, Dermitzakis E, Franks PW, Schwenk JM, Adamski J, McCarthy MI, Pearson E, Banasik K, Rasmussen S, Brunak S, Thomas CE, Haussler R, Beulens J, Rutters F, Nijpels G, van Oort S, Groeneveld L, Elders P, Giorgino T, Rodriquez M, Nice R, Perry M, Bianzano S, Graefe-Mody U, Hennige A, Grempler R, Baum P, Stærfeldt HH, Shah N, Teare H, Ehrhardt B, Tillner J, Dings C, Lehr T, Scherer N, Sihinevich I, Cabrelli L, Loftus H, Bizzotto R, Tura A, Dekkers K, van Leeuwen N, Groop L, Slieker R, Ramisch A, Jennison C, McVittie I, Frau F, Steckel-Hamann B, Adragni K, Thomas M, Pasdar NA, Fitipaldi H, Kurbasic A, Mutie P, Pomares-Millan H, Bonnefond A, Canouil M, Caiazzo R, Verkindt H, Holl R, Kuulasmaa T, Deshmukh H, Cederberg H, Laakso M, Vangipurapu J, Dale M, Thorand B, Nicolay C, Fritsche A, Hill A, Hudson M, Thorne C, Allin K, Arumugam M, Jonsson A, Engelbrechtsen L, Forman A, Dutta A, Sondertoft N, Fan Y, Gough S, Robertson N, McRobert N, Wesolowska-Andersen A, Brown A, Davtian D, Dawed A, Donnelly L, Palmer C, White M, Ferrer J, Whitcher B, Artati A, Prehn C, Adam J, Grallert H, Gupta R, Sackett PW, Nilsson B, Tsirigos K, Eriksen R, Jablonka B, Uhlen M, Gassenhuber J, Baltauss T, de Preville N, Klintenberg M, Abdalla M. Discovery of drug-omics associations in type 2 diabetes with generative deep-learning models. Nat Biotechnol 2023; 41:399-408. [PMID: 36593394 PMCID: PMC10017515 DOI: 10.1038/s41587-022-01520-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 09/20/2022] [Indexed: 01/03/2023]
Abstract
The application of multiple omics technologies in biomedical cohorts has the potential to reveal patient-level disease characteristics and individualized response to treatment. However, the scale and heterogeneous nature of multi-modal data makes integration and inference a non-trivial task. We developed a deep-learning-based framework, multi-omics variational autoencoders (MOVE), to integrate such data and applied it to a cohort of 789 people with newly diagnosed type 2 diabetes with deep multi-omics phenotyping from the DIRECT consortium. Using in silico perturbations, we identified drug-omics associations across the multi-modal datasets for the 20 most prevalent drugs given to people with type 2 diabetes with substantially higher sensitivity than univariate statistical tests. From these, we among others, identified novel associations between metformin and the gut microbiota as well as opposite molecular responses for the two statins, simvastatin and atorvastatin. We used the associations to quantify drug-drug similarities, assess the degree of polypharmacy and conclude that drug effects are distributed across the multi-omics modalities.
Collapse
Affiliation(s)
- Rosa Lundbye Allesøe
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark.,Copenhagen Research Centre for Mental Health, Mental Health Centre Copenhagen, Copenhagen University Hospital, Copenhagen, Denmark
| | - Agnete Troen Lundgaard
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Ricardo Hernández Medina
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Alejandro Aguayo-Orozco
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Joachim Johansen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Jakob Nybo Nissen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Caroline Brorsson
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Gianluca Mazzoni
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Lili Niu
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jorge Hernansanz Biel
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Valentas Brasas
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Henry Webel
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Michael Eriksen Benros
- Copenhagen Research Centre for Mental Health, Mental Health Centre Copenhagen, Copenhagen University Hospital, Copenhagen, Denmark.,Department of Immunology and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Anders Gorm Pedersen
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Piotr Jaroslaw Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Ulrik Plesner Jacobsen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Andrea Mari
- C.N.R. Institute of Neuroscience, Padova, Italy
| | - Robert Koivula
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Anubha Mahajan
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Ana Vinuela
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.,Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle, UK
| | | | - Sapna Sharma
- Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Bavaria, Germany.,Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Bavaria, Germany.,Chair of Food Chemistry and Molecular and Sensory Science, Technical University of Munich, Freising, Germany
| | - Mark Haid
- Metabolomics and Proteomics Core, Helmholtz Zentrum Muenchen, German Research Center for Environmental Health, Neuherberg, Germany
| | - Mun-Gwan Hong
- Affinity Proteomics, Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Solna, Sweden
| | - Petra B Musholt
- Research and Development Global Development, Translational Medicine and Clinical Pharmacology, Sanofi-Aventis Deutschland, Frankfurt, Germany
| | - Federico De Masi
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Josef Vogt
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Helle Krogh Pedersen
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark.,Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Valborg Gudmundsdottir
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Angus Jones
- University of Exeter Medical School, Exeter, UK
| | - Gwen Kennedy
- The Immunoassay Biomarker Core Laboratory, School of Medicine, University of Dundee, Dundee, UK
| | - Jimmy Bell
- Research Centre for Optimal Health, Department of Life Sciences, University of Westminster, London, UK
| | - E Louise Thomas
- Research Centre for Optimal Health, Department of Life Sciences, University of Westminster, London, UK
| | - Gary Frost
- Section for Nutrition Research, Faculty of Medicine, Imperial College London, London, UK
| | - Henrik Thomsen
- Department of Radiology, Copenhagen University Hospital Herlev-Gentofte, Herlev, Denmark
| | - Elizaveta Hansen
- Department of Radiology, Copenhagen University Hospital Herlev-Gentofte, Herlev, Denmark
| | - Tue Haldor Hansen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Henrik Vestergaard
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Mirthe Muilwijk
- Department of Epidemiology and Data Science, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Marieke T Blom
- Department of General Practice, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Leen M 't Hart
- Department of Epidemiology and Data Science, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.,Department of Biomedical Data Science, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, the Netherlands.,Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Francois Pattou
- Inserm, Univ Lille, CHU Lille, Lille Pasteur Institute, EGID, Lille, France
| | - Violeta Raverdy
- Inserm, Univ Lille, CHU Lille, Lille Pasteur Institute, EGID, Lille, France
| | - Soren Brage
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | - Tarja Kokkola
- Department of Medicine, University of Eastern Finland, Kuopio, Finland
| | - Alison Heggie
- Institute of Cellular Medicine, Newcastle University, Newcastle, UK
| | - Donna McEvoy
- Diabetes Research Network, Royal Victoria Infirmary, Newcastle, UK
| | - Miranda Mourby
- Centre for Health, Law and Emerging Technologies (HeLEX), Faculty of Law, University of Oxford, Oxford, UK
| | - Jane Kaye
- Centre for Health, Law and Emerging Technologies (HeLEX), Faculty of Law, University of Oxford, Oxford, UK
| | | | | | - Martin Ridderstråle
- Lund University Diabetes Centre, Department of Clinical Sciences, Lund University, Malmö, Sweden
| | - Mark Walker
- Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle, UK
| | - Ian Forgie
- Division of Population Health & Genomics, School of Medicine, University of Dundee, Dundee, UK
| | - Giuseppe N Giordano
- Genetic and Molecular Epidemiology Unit, Lund University Diabetes Centre, Department of Clinical Sciences, CRC, Lund University, SUS, Malmö, Sweden
| | - Imre Pavo
- Eli Lilly Regional Operations, Vienna, Austria
| | - Hartmut Ruetten
- Research and Development Global Development, Translational Medicine and Clinical Pharmacology, Sanofi-Aventis Deutschland, Frankfurt, Germany
| | - Oluf Pedersen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Torben Hansen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Emmanouil Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Paul W Franks
- Lund University Diabetes Centre, Department of Clinical Sciences, Lund University, Malmö, Sweden.,Harvard T.H. Chan School of Public Health, Boston, MA, USA.,OCDEM, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
| | - Jochen M Schwenk
- Affinity Proteomics, Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Solna, Sweden
| | - Jerzy Adamski
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.,Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Mark I McCarthy
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.,Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK.,Genentech, South San Francisco, CA, USA
| | - Ewan Pearson
- Division of Population Health & Genomics, School of Medicine, University of Dundee, Dundee, UK
| | - Karina Banasik
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark. .,Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Ryu Y, Han GH, Jung E, Hwang D. Integration of Single-Cell RNA-Seq Datasets: A Review of Computational Methods. Mol Cells 2023; 46:106-119. [PMID: 36859475 PMCID: PMC9982060 DOI: 10.14348/molcells.2023.0009] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 01/19/2023] [Accepted: 01/19/2023] [Indexed: 03/03/2023] Open
Abstract
With the increased number of single-cell RNA sequencing (scRNA-seq) datasets in public repositories, integrative analysis of multiple scRNA-seq datasets has become commonplace. Batch effects among different datasets are inevitable because of differences in cell isolation and handling protocols, library preparation technology, and sequencing platforms. To remove these batch effects for effective integration of multiple scRNA-seq datasets, a number of methodologies have been developed based on diverse concepts and approaches. These methods have proven useful for examining whether cellular features, such as cell subpopulations and marker genes, identified from a certain dataset, are consistently present, or whether their condition-dependent variations, such as increases in cell subpopulations in particular disease-related conditions, are consistently observed in different datasets generated under similar or distinct conditions. In this review, we summarize the concepts and approaches of the integration methods and their pros and cons as has been reported in previous literature.
Collapse
Affiliation(s)
- Yeonjae Ryu
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Geun Hee Han
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Eunsoo Jung
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Daehee Hwang
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
29
|
Zhang Z, Sun H, Mariappan R, Chen X, Chen X, Jain MS, Efremova M, Teichmann SA, Rajan V, Zhang X. scMoMaT jointly performs single cell mosaic integration and multi-modal bio-marker detection. Nat Commun 2023; 14:384. [PMID: 36693837 PMCID: PMC9873790 DOI: 10.1038/s41467-023-36066-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 01/13/2023] [Indexed: 01/26/2023] Open
Abstract
Single cell data integration methods aim to integrate cells across data batches and modalities, and data integration tasks can be categorized into horizontal, vertical, diagonal, and mosaic integration, where mosaic integration is the most general and challenging case with few methods developed. We propose scMoMaT, a method that is able to integrate single cell multi-omics data under the mosaic integration scenario using matrix tri-factorization. During integration, scMoMaT is also able to uncover the cluster specific bio-markers across modalities. These multi-modal bio-markers are used to interpret and annotate the clusters to cell types. Moreover, scMoMaT can integrate cell batches with unequal cell type compositions. Applying scMoMaT to multiple real and simulated datasets demonstrated these features of scMoMaT and showed that scMoMaT has superior performance compared to existing methods. Specifically, we show that integrated cell embedding combined with learned bio-markers lead to cell type annotations of higher quality or resolution compared to their original annotations.
Collapse
Affiliation(s)
- Ziqi Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Haoran Sun
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA, USA
| | - Ragunathan Mariappan
- Department of Information Systems and Analytics, National University of Singapore, Singapore, Singapore
| | - Xi Chen
- Department of Biology, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Xinyu Chen
- Bioengineering Program, Georgia Institute of Technology, Atlanta, GA, USA
| | | | | | | | - Vaibhav Rajan
- Department of Information Systems and Analytics, National University of Singapore, Singapore, Singapore
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
30
|
Du JH, Cai Z, Roeder K. Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT. Proc Natl Acad Sci U S A 2022; 119:e2214414119. [PMID: 36459654 PMCID: PMC9894175 DOI: 10.1073/pnas.2214414119] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 11/03/2022] [Indexed: 12/04/2022] Open
Abstract
Recent advances in single-cell technologies enable joint profiling of multiple omics. These profiles can reveal the complex interplay of different regulatory layers in single cells; still, new challenges arise when integrating datasets with some features shared across experiments and others exclusive to a single source; combining information across these sources is called mosaic integration. The difficulties lie in imputing missing molecular layers to build a self-consistent atlas, finding a common latent space, and transferring learning to new data sources robustly. Existing mosaic integration approaches based on matrix factorization cannot efficiently adapt to nonlinear embeddings for the latent cell space and are not designed for accurate imputation of missing molecular layers. By contrast, we propose a probabilistic variational autoencoder model, scVAEIT, to integrate and impute multimodal datasets with mosaic measurements. A key advance is the use of a missing mask for learning the conditional distribution of unobserved modalities and features, which makes scVAEIT flexible to combine different panels of measurements from multimodal datasets accurately and in an end-to-end manner. Imputing the masked features serves as a supervised learning procedure while preventing overfitting by regularization. Focusing on gene expression, protein abundance, and chromatin accessibility, we validate that scVAEIT robustly imputes the missing modalities and features of cells biologically different from the training data. scVAEIT also adjusts for batch effects while maintaining the biological variation, which provides better latent representations for the integrated datasets. We demonstrate that scVAEIT significantly improves integration and imputation across unseen cell types, different technologies, and different tissues.
Collapse
Affiliation(s)
- Jin-Hong Du
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA15213
| | - Zhanrui Cai
- Department of Statistics, Iowa State University, Ames, IA50011
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA15213
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA15213
| |
Collapse
|
31
|
Cao Y, Fu L, Wu J, Peng Q, Nie Q, Zhang J, Xie X. Integrated analysis of multimodal single-cell data with structural similarity. Nucleic Acids Res 2022; 50:e121. [PMID: 36130281 PMCID: PMC9757079 DOI: 10.1093/nar/gkac781] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 08/15/2022] [Accepted: 09/02/2022] [Indexed: 12/24/2022] Open
Abstract
Multimodal single-cell sequencing technologies provide unprecedented information on cellular heterogeneity from multiple layers of genomic readouts. However, joint analysis of two modalities without properly handling the noise often leads to overfitting of one modality by the other and worse clustering results than vanilla single-modality analysis. How to efficiently utilize the extra information from single cell multi-omics to delineate cell states and identify meaningful signal remains as a significant computational challenge. In this work, we propose a deep learning framework, named SAILERX, for efficient, robust, and flexible analysis of multi-modal single-cell data. SAILERX consists of a variational autoencoder with invariant representation learning to correct technical noises from sequencing process, and a multimodal data alignment mechanism to integrate information from different modalities. Instead of performing hard alignment by projecting both modalities to a shared latent space, SAILERX encourages the local structures of two modalities measured by pairwise similarities to be similar. This strategy is more robust against overfitting of noises, which facilitates various downstream analysis such as clustering, imputation, and marker gene detection. Furthermore, the invariant representation learning part enables SAILERX to perform integrative analysis on both multi- and single-modal datasets, making it an applicable and scalable tool for more general scenarios.
Collapse
Affiliation(s)
| | | | - Jie Wu
- Department of Biological Chemistry, University of California, Irvine, CA 92697, USA
| | - Qinke Peng
- Systems Engineering Institute, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, Shannxi 710049, China
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, CA 92697, USA,Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA,NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, CA 92697, USA
| | - Jing Zhang
- To whom correspondence should be addressed. Tel: +1 949 824 9979;
| | - Xiaohui Xie
- Correspondence may also be addressed to Xiaohui Xie. Tel: +1 949 824 9289;
| |
Collapse
|
32
|
Predicting Algorithm of Tissue Cell Ratio Based on Deep Learning Using Single-Cell RNA Sequencing. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12125790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Background: Understanding the proportion of cell types in heterogeneous tissue samples is important in bioinformatics. It is a challenge to infer the proportion of tissues using bulk RNA sequencing data in bioinformatics because most traditional algorithms for predicting tissue cell ratios heavily rely on standardized specific cell-type gene expression profiles, and do not consider tissue heterogeneity. The prediction accuracy of algorithms is limited, and robustness is lacking. This means that new approaches are needed urgently. Methods: In this study, we introduced an algorithm that automatically predicts tissue cell ratios named Autoptcr. The algorithm uses the data simulated by single-cell RNA sequencing (ScRNA-Seq) for model training, using convolutional neural networks (CNNs) to extract intrinsic relationships between genes and predict the cell proportions of tissues. Results: We trained the algorithm using simulated bulk samples and made predictions using real bulk PBMC data. Comparing Autoptcr with existing advanced algorithms, the Pearson correlation coefficient between the actual value of Autoptcr and the predicted value was the highest, reaching 0.903. Tested on a bulk sample, the correlation coefficient of Lin was 41% higher than that of CSx. The algorithm can infer tissue cell proportions directly from tissue gene expression data. Conclusions: The Autoptcr algorithm uses simulated ScRNA-Seq data for training to solve the problem of specific cell-type gene expression profiles. It also has high prediction accuracy and strong noise resistance for the tissue cell ratio. This work is expected to provide new research ideas for the prediction of tissue cell proportions.
Collapse
|
33
|
Abstract
Motivation The advent of multi-modal single-cell sequencing techniques have shed new light on molecular mechanisms by simultaneously inspecting transcriptomes, epigenomes and proteomes of the same cell. However, to date, the existing computational approaches for integration of multimodal single-cell data are either computationally expensive, require the delineation of parameters or can only be applied to particular modalities. Results Here we present a single-cell multi-modal integration method, named Multi-mOdal Joint IntegraTion of cOmpOnents (MOJITOO). MOJITOO uses canonical correlation analysis for a fast and parameter free detection of a shared representation of cells from multimodal single-cell data. Moreover, estimated canonical components can be used for interpretation, i.e. association of modality-specific molecular features with the latent space. We evaluate MOJITOO using bi- and tri-modal single-cell datasets and show that MOJITOO outperforms existing methods regarding computational requirements, preservation of original latent spaces and clustering. Availability and implementation The software, code and data for benchmarking are available at https://github.com/CostaLab/MOJITOO and https://doi.org/10.5281/zenodo.6348128. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mingbo Cheng
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074 Aachen, Germany
| | - Zhijian Li
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074 Aachen, Germany
| | - Ivan G Costa
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|