1
|
Rautenstrauch P, Ohler U. Liam tackles complex multimodal single-cell data integration challenges. Nucleic Acids Res 2024; 52:e52. [PMID: 38842910 PMCID: PMC11229356 DOI: 10.1093/nar/gkae409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 03/08/2024] [Accepted: 05/29/2024] [Indexed: 06/07/2024] Open
Abstract
Multi-omics characterization of single cells holds outstanding potential for profiling the dynamics and relations of gene regulatory states of thousands of cells. How to integrate multimodal data is an open problem, especially when aiming to combine data from multiple sources or conditions containing both biological and technical variation. We introduce liam, a flexible model for the simultaneous horizontal and vertical integration of paired single-cell multimodal data and mosaic integration of paired with unimodal data. Liam learns a joint low-dimensional representation of the measured modalities, which proves beneficial when the information content or quality of the modalities differ. Its integration accounts for complex batch effects using a tunable combination of conditional and adversarial training, which can be optimized using replicate information while retaining selected biological variation. We demonstrate liam's superior performance on multiple paired multimodal data types, including Multiome and CITE-seq data, and in mosaic integration scenarios. Our detailed benchmarking experiments illustrate the complexities and challenges remaining for integration and the meaningful assessment of its success.
Collapse
Affiliation(s)
- Pia Rautenstrauch
- Humboldt-Universität zu Berlin, Department of Computer Science, 10099 Berlin, Germany
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany
| | - Uwe Ohler
- Humboldt-Universität zu Berlin, Department of Computer Science, 10099 Berlin, Germany
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany
- Humboldt-Universität zu Berlin, Department of Biology, 10099 Berlin, Germany
| |
Collapse
|
2
|
Drost F, An Y, Bonafonte-Pardàs I, Dratva LM, Lindeboom RGH, Haniffa M, Teichmann SA, Theis F, Lotfollahi M, Schubert B. Multi-modal generative modeling for joint analysis of single-cell T cell receptor and gene expression data. Nat Commun 2024; 15:5577. [PMID: 38956082 PMCID: PMC11220149 DOI: 10.1038/s41467-024-49806-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 05/23/2024] [Indexed: 07/04/2024] Open
Abstract
Recent advances in single-cell immune profiling have enabled the simultaneous measurement of transcriptome and T cell receptor (TCR) sequences, offering great potential for studying immune responses at the cellular level. However, integrating these diverse modalities across datasets is challenging due to their unique data characteristics and technical variations. Here, to address this, we develop the multimodal generative model mvTCR to fuse modality-specific information across transcriptome and TCR into a shared representation. Our analysis demonstrates the added value of multimodal over unimodal approaches to capture antigen specificity. Notably, we use mvTCR to distinguish T cell subpopulations binding to SARS-CoV-2 antigens from bystander cells. Furthermore, when combined with reference mapping approaches, mvTCR can map newly generated datasets to extensive T cell references, facilitating knowledge transfer. In summary, we envision mvTCR to enable a scalable analysis of multimodal immune profiling data and advance our understanding of immune responses.
Collapse
Affiliation(s)
- Felix Drost
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Alte Akademie 8, 85354, Freising, Germany
| | - Yang An
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany
| | - Irene Bonafonte-Pardàs
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Lisa M Dratva
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Rik G H Lindeboom
- The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Muzlifah Haniffa
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Physics, Cavendish Laboratory, University of Cambridge, 19 JJ Thomson Avenue, Cambridge, UK
| | - Fabian Theis
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Alte Akademie 8, 85354, Freising, Germany
- School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany
| | - Mohammad Lotfollahi
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
| | - Benjamin Schubert
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
- School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany.
| |
Collapse
|
3
|
Long Y, Ang KS, Sethi R, Liao S, Heng Y, van Olst L, Ye S, Zhong C, Xu H, Zhang D, Kwok I, Husna N, Jian M, Ng LG, Chen A, Gascoigne NRJ, Gate D, Fan R, Xu X, Chen J. Deciphering spatial domains from spatial multi-omics with SpatialGlue. Nat Methods 2024:10.1038/s41592-024-02316-4. [PMID: 38907114 DOI: 10.1038/s41592-024-02316-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 05/20/2024] [Indexed: 06/23/2024]
Abstract
Advances in spatial omics technologies now allow multiple types of data to be acquired from the same tissue slice. To realize the full potential of such data, we need spatially informed methods for data integration. Here, we introduce SpatialGlue, a graph neural network model with a dual-attention mechanism that deciphers spatial domains by intra-omics integration of spatial location and omics measurement followed by cross-omics integration. We demonstrated SpatialGlue on data acquired from different tissue types using different technologies, including spatial epigenome-transcriptome and transcriptome-proteome modalities. Compared to other methods, SpatialGlue captured more anatomical details and more accurately resolved spatial domains such as the cortex layers of the brain. Our method also identified cell types like spleen macrophage subsets located at three different zones that were not available in the original data annotations. SpatialGlue scales well with data size and can be used to integrate three modalities. Our spatial multi-omics analysis tool combines the information from complementary omics modalities to obtain a holistic view of cellular and tissue properties.
Collapse
Affiliation(s)
- Yahui Long
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Kok Siong Ang
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Raman Sethi
- Binformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Sha Liao
- BGI-Shenzhen, Shenzhen, China
- BGI Research-Southwest, BGI, Chongqing, China
| | - Yang Heng
- BGI-Shenzhen, Shenzhen, China
- BGI Research-Southwest, BGI, Chongqing, China
| | - Lynn van Olst
- The Ken & Ruth Davee Department of Neurology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Shuchen Ye
- Binformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Chengwei Zhong
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Hang Xu
- Binformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Di Zhang
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Immanuel Kwok
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Nazihah Husna
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Immunology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Min Jian
- BGI-Shenzhen, Shenzhen, China
- BGI Research Asia-Pacific, BGI, Singapore, Singapore
| | - Lai Guan Ng
- Shanghai Immune Therapy Institute, Shanghai Jiao Tong University School of Medicine Affiliated Renji Hospital, Shanghai, China
| | - Ao Chen
- BGI-Shenzhen, Shenzhen, China
- BGI Research-Southwest, BGI, Chongqing, China
- JFL-BGI STOmics Center, Jinfeng Laboratory, Chongqing, China
| | - Nicholas R J Gascoigne
- Immunology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Cancer Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - David Gate
- The Ken & Ruth Davee Department of Neurology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Rong Fan
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, China
| | - Jinmiao Chen
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
- Binformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
- Immunology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Center for Computational Biology and Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore, Singapore.
| |
Collapse
|
4
|
Zhou M, Zhang H, Bai Z, Mann-Krzisnik D, Wang F, Li Y. Protocol to perform integrative analysis of high-dimensional single-cell multimodal data using an interpretable deep learning technique. STAR Protoc 2024; 5:103066. [PMID: 38748882 PMCID: PMC11109308 DOI: 10.1016/j.xpro.2024.103066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 11/21/2023] [Accepted: 04/24/2024] [Indexed: 05/25/2024] Open
Abstract
The advent of single-cell multi-omics sequencing technology makes it possible for researchers to leverage multiple modalities for individual cells. Here, we present a protocol to perform integrative analysis of high-dimensional single-cell multimodal data using an interpretable deep learning technique called moETM. We describe steps for data preprocessing, multi-omics integration, inclusion of prior pathway knowledge, and cross-omics imputation. As a demonstration, we used the single-cell multi-omics data collected from bone marrow mononuclear cells (GSE194122) as in our original study. For complete details on the use and execution of this protocol, please refer to Zhou et al.1.
Collapse
Affiliation(s)
- Manqi Zhou
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA; Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY 10021, USA
| | - Hao Zhang
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10021, USA
| | - Zilong Bai
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY 10021, USA; Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10021, USA
| | | | - Fei Wang
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY 10021, USA; Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10021, USA
| | - Yue Li
- Quantitative Life Science, McGill University, Montréal, QC H3A 0G4, Canada; School of Computer Science, McGill University, Montréal, QC H3A 0G4, Canada; Mila - Quebec AI Institute, Montréal, QC H2S 3H1, Canada.
| |
Collapse
|
5
|
Wagle MM, Long S, Chen C, Liu C, Yang P. Interpretable deep learning in single-cell omics. Bioinformatics 2024; 40:btae374. [PMID: 38889275 PMCID: PMC11211213 DOI: 10.1093/bioinformatics/btae374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/11/2024] [Accepted: 06/12/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Single-cell omics technologies have enabled the quantification of molecular profiles in individual cells at an unparalleled resolution. Deep learning, a rapidly evolving sub-field of machine learning, has instilled a significant interest in single-cell omics research due to its remarkable success in analysing heterogeneous high-dimensional single-cell omics data. Nevertheless, the inherent multi-layer nonlinear architecture of deep learning models often makes them 'black boxes' as the reasoning behind predictions is often unknown and not transparent to the user. This has stimulated an increasing body of research for addressing the lack of interpretability in deep learning models, especially in single-cell omics data analyses, where the identification and understanding of molecular regulators are crucial for interpreting model predictions and directing downstream experimental validations. RESULTS In this work, we introduce the basics of single-cell omics technologies and the concept of interpretable deep learning. This is followed by a review of the recent interpretable deep learning models applied to various single-cell omics research. Lastly, we highlight the current limitations and discuss potential future directions.
Collapse
Affiliation(s)
- Manoj M Wagle
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Siqu Long
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Carissa Chen
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Pengyi Yang
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| |
Collapse
|
6
|
Giansanti V, Giannese F, Botrugno OA, Gandolfi G, Balestrieri C, Antoniotti M, Tonon G, Cittaro D. Scalable integration of multiomic single-cell data using generative adversarial networks. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae300. [PMID: 38696763 DOI: 10.1093/bioinformatics/btae300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/22/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024]
Abstract
MOTIVATION Single-cell profiling has become a common practice to investigate the complexity of tissues, organs, and organisms. Recent technological advances are expanding our capabilities to profile various molecular layers beyond the transcriptome such as, but not limited to, the genome, the epigenome, and the proteome. Depending on the experimental procedure, these data can be obtained from separate assays or the very same cells. Yet, integration of more than two assays is currently not supported by the majority of the computational frameworks avaiable. RESULTS We here propose a Multi-Omic data integration framework based on Wasserstein Generative Adversarial Networks suitable for the analysis of paired or unpaired data with a high number of modalities (>2). At the core of our strategy is a single network trained on all modalities together, limiting the computational burden when many molecular layers are evaluated. AVAILABILITY AND IMPLEMENTATION Source code of our framework is available at https://github.com/vgiansanti/MOWGAN.
Collapse
Affiliation(s)
- Valentina Giansanti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, 20125, Italy
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Francesca Giannese
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Oronza A Botrugno
- Functional Genomics of Cancer Unit, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Università Vita-Salute San Raffaele, Milan, 20132, Italy
| | - Giorgia Gandolfi
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Chiara Balestrieri
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Experimental Hematology Unit, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, 20125, Italy
- Bicocca Bioinformatics Biostatistics and Bioimaging Centre-B4, Università degli Studi di Milano-Bicocca, Milan, 20125, Italy
- Istituto di Bioimmagini e Fisiologia Molecolare, Consiglio Nazionale delle Ricerche (CNR), Milan, 20090, Italy
| | - Giovanni Tonon
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Functional Genomics of Cancer Unit, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Università Vita-Salute San Raffaele, Milan, 20132, Italy
| | - Davide Cittaro
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| |
Collapse
|
7
|
Xu J, Huang D, Zhang X. scmFormer Integrates Large-Scale Single-Cell Proteomics and Transcriptomics Data by Multi-Task Transformer. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307835. [PMID: 38483032 PMCID: PMC11109621 DOI: 10.1002/advs.202307835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/24/2024] [Indexed: 05/23/2024]
Abstract
Transformer-based models have revolutionized single cell RNA-seq (scRNA-seq) data analysis. However, their applicability is challenged by the complexity and scale of single-cell multi-omics data. Here a novel single-cell multi-modal/multi-task transformer (scmFormer) is proposed to fill up the existing blank of integrating single-cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large-scale single-cell multimodal data and heterogeneous multi-batch paired multi-omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell-type labels from single-cell transcriptomics to proteomics data. Using COVID-19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well-suited for spatial multi-omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single-cell multi-omics data.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- University of Chinese Academy of SciencesBeijing100049China
| | - De‐Shuang Huang
- Eastern Institute for Advanced StudyEastern Institute of TechnologyNingbo315200China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- Center of Economic BotanyCore Botanical GardensChinese Academy of SciencesWuhan430074China
| |
Collapse
|
8
|
Ding J, Liu R, Wen H, Tang W, Li Z, Venegas J, Su R, Molho D, Jin W, Wang Y, Lu Q, Li L, Zuo W, Chang Y, Xie Y, Tang J. DANCE: a deep learning library and benchmark platform for single-cell analysis. Genome Biol 2024; 25:72. [PMID: 38504331 PMCID: PMC10949782 DOI: 10.1186/s13059-024-03211-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 03/05/2024] [Indexed: 03/21/2024] Open
Abstract
DANCE is the first standard, generic, and extensible benchmark platform for accessing and evaluating computational methods across the spectrum of benchmark datasets for numerous single-cell analysis tasks. Currently, DANCE supports 3 modules and 8 popular tasks with 32 state-of-art methods on 21 benchmark datasets. People can easily reproduce the results of supported algorithms across major benchmark datasets via minimal efforts, such as using only one command line. In addition, DANCE provides an ecosystem of deep learning architectures and tools for researchers to facilitate their own model development. DANCE is an open-source Python package that welcomes all kinds of contributions.
Collapse
Affiliation(s)
- Jiayuan Ding
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA.
| | - Renming Liu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Hongzhi Wen
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| | - Wenzhuo Tang
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Zhaoheng Li
- Department of Biostatistics, University of Washington, Seattle, USA
| | - Julian Venegas
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Runze Su
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Dylan Molho
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Wei Jin
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| | - Yixin Wang
- Department of Bioengineering, Stanford University, Palo Alto, USA
| | - Qiaolin Lu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Lingxiao Li
- Department of Computer Science, Boston University, Boston, USA
| | - Wangyang Zuo
- Department of Computer Science, Zhejiang University of Technology, Zhejiang, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuying Xie
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA.
- Department of Statistics and Probability, Michigan State University, East Lansing, USA.
| | - Jiliang Tang
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA.
| |
Collapse
|
9
|
Wang L, Nie R, Miao X, Cai Y, Wang A, Zhang H, Zhang J, Cai J. InClust+: the deep generative framework with mask modules for multimodal data integration, imputation, and cross-modal generation. BMC Bioinformatics 2024; 25:41. [PMID: 38267858 PMCID: PMC10809631 DOI: 10.1186/s12859-024-05656-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 01/15/2024] [Indexed: 01/26/2024] Open
Abstract
BACKGROUND With the development of single-cell technology, many cell traits can be measured. Furthermore, the multi-omics profiling technology could jointly measure two or more traits in a single cell simultaneously. In order to process the various data accumulated rapidly, computational methods for multimodal data integration are needed. RESULTS Here, we present inClust+, a deep generative framework for the multi-omics. It's built on previous inClust that is specific for transcriptome data, and augmented with two mask modules designed for multimodal data processing: an input-mask module in front of the encoder and an output-mask module behind the decoder. InClust+ was first used to integrate scRNA-seq and MERFISH data from similar cell populations, and to impute MERFISH data based on scRNA-seq data. Then, inClust+ was shown to have the capability to integrate the multimodal data (e.g. tri-modal data with gene expression, chromatin accessibility and protein abundance) with batch effect. Finally, inClust+ was used to integrate an unlabeled monomodal scRNA-seq dataset and two labeled multimodal CITE-seq datasets, transfer labels from CITE-seq datasets to scRNA-seq dataset, and generate the missing modality of protein abundance in monomodal scRNA-seq data. In the above examples, the performance of inClust+ is better than or comparable to the most recent tools in the corresponding task. CONCLUSIONS The inClust+ is a suitable framework for handling multimodal data. Meanwhile, the successful implementation of mask in inClust+ means that it can be applied to other deep learning methods with similar encoder-decoder architecture to broaden the application scope of these models.
Collapse
Affiliation(s)
- Lifei Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China.
| | - Rui Nie
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuexia Miao
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yankai Cai
- School of Economic and Management, China University of Geoscience, Wuhan, China
| | - Anqi Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Hanwen Zhang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Jiang Zhang
- School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| | - Jun Cai
- China National Center for Bioinformation, Beijing, China.
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
10
|
Hu T, Allam M, Cai S, Henderson W, Yueh B, Garipcan A, Ievlev AV, Afkarian M, Beyaz S, Coskun AF. Single-cell spatial metabolomics with cell-type specific protein profiling for tissue systems biology. Nat Commun 2023; 14:8260. [PMID: 38086839 PMCID: PMC10716522 DOI: 10.1038/s41467-023-43917-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
Metabolic reprogramming in cancer and immune cells occurs to support their increasing energy needs in biological tissues. Here we propose Single Cell Spatially resolved Metabolic (scSpaMet) framework for joint protein-metabolite profiling of single immune and cancer cells in male human tissues by incorporating untargeted spatial metabolomics and targeted multiplexed protein imaging in a single pipeline. We utilized the scSpaMet to profile cell types and spatial metabolomic maps of 19507, 31156, and 8215 single cells in human lung cancer, tonsil, and endometrium tissues, respectively. The scSpaMet analysis revealed cell type-dependent metabolite profiles and local metabolite competition of neighboring single cells in human tissues. Deep learning-based joint embedding revealed unique metabolite states within cell types. Trajectory inference showed metabolic patterns along cell differentiation paths. Here we show scSpaMet's ability to quantify and visualize the cell-type specific and spatially resolved metabolic-protein mapping as an emerging tool for systems-level understanding of tissue biology.
Collapse
Affiliation(s)
- Thomas Hu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Mayar Allam
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Shuangyi Cai
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Walter Henderson
- Institute for Electronics and Nanotechnology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Brian Yueh
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Anton V Ievlev
- Oak Ridge National Laboratory, Center for Nanophase Materials Sciences, Oak Ridge, TN, USA
| | - Maryam Afkarian
- Division of Nephrology, Department of Internal Medicine, University of California, Davis, CA, USA
| | - Semir Beyaz
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Ahmet F Coskun
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
- Interdisciplinary Bioengineering Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA.
- Winship Cancer Institute, Emory University, Atlanta, GA, USA.
- Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
11
|
Tang W, Wen H, Liu R, Ding J, Jin W, Xie Y, Liu H, Tang J. Single-Cell Multimodal Prediction via Transformers. ARXIV 2023:arXiv:2303.00233v3. [PMID: 37645040 PMCID: PMC10462176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
The recent development of multimodal single-cell technology has made the possibility of acquiring multiple omics data from individual cells, thereby enabling a deeper understanding of cellular states and dynamics. Nevertheless, the proliferation of multimodal single-cell data also introduces tremendous challenges in modeling the complex interactions among different modalities. The recently advanced methods focus on constructing static interaction graphs and applying graph neural networks (GNNs) to learn from multimodal data. However, such static graphs can be suboptimal as they do not take advantage of the downstream task information; meanwhile GNNs also have some inherent limitations when deeply stacking GNN layers. To tackle these issues, in this work, we investigate how to leverage transformers for multimodal single-cell data in an end-to-end manner while exploiting downstream task information. In particular, we propose a scMoFormer framework which can readily incorporate external domain knowledge and model the interactions within each modality and cross modalities. Extensive experiments demonstrate that scMoFormer achieves superior performance on various benchmark datasets. Remarkably, scMoFormer won a Kaggle silver medal with the rank of 24/1221 (Top 2%) without ensemble in a NeurIPS 2022 competition. Our implementation is publicly available at Github.
Collapse
|
12
|
Huang L, Song M, Shen H, Hong H, Gong P, Deng HW, Zhang C. Deep Learning Methods for Omics Data Imputation. BIOLOGY 2023; 12:1313. [PMID: 37887023 PMCID: PMC10604785 DOI: 10.3390/biology12101313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 09/28/2023] [Accepted: 10/02/2023] [Indexed: 10/28/2023]
Abstract
One common problem in omics data analysis is missing values, which can arise due to various reasons, such as poor tissue quality and insufficient sample volumes. Instead of discarding missing values and related data, imputation approaches offer an alternative means of handling missing data. However, the imputation of missing omics data is a non-trivial task. Difficulties mainly come from high dimensionality, non-linear or non-monotonic relationships within features, technical variations introduced by sampling methods, sample heterogeneity, and the non-random missingness mechanism. Several advanced imputation methods, including deep learning-based methods, have been proposed to address these challenges. Due to its capability of modeling complex patterns and relationships in large and high-dimensional datasets, many researchers have adopted deep learning models to impute missing omics data. This review provides a comprehensive overview of the currently available deep learning-based methods for omics imputation from the perspective of deep generative model architectures such as autoencoder, variational autoencoder, generative adversarial networks, and Transformer, with an emphasis on multi-omics data imputation. In addition, this review also discusses the opportunities that deep learning brings and the challenges that it might face in this field.
Collapse
Affiliation(s)
- Lei Huang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Hui Shen
- Center for Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS 39180, USA
| | - Hong-Wen Deng
- Center for Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| |
Collapse
|
13
|
Zhang C, Yang Y, Tang S, Aihara K, Zhang C, Chen L. Contrastively generative self-expression model for single-cell and spatial multimodal data. Brief Bioinform 2023; 24:bbad265. [PMID: 37507114 DOI: 10.1093/bib/bbad265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/27/2023] [Accepted: 07/03/2023] [Indexed: 07/30/2023] Open
Abstract
Advances in single-cell multi-omics technology provide an unprecedented opportunity to fully understand cellular heterogeneity. However, integrating omics data from multiple modalities is challenging due to the individual characteristics of each measurement. Here, to solve such a problem, we propose a contrastive and generative deep self-expression model, called single-cell multimodal self-expressive integration (scMSI), which integrates the heterogeneous multimodal data into a unified manifold space. Specifically, scMSI first learns each omics-specific latent representation and self-expression relationship to consider the characteristics of different omics data by deep self-expressive generative model. Then, scMSI combines these omics-specific self-expression relations through contrastive learning. In such a way, scMSI provides a paradigm to integrate multiple omics data even with weak relation, which effectively achieves the representation learning and data integration into a unified framework. We demonstrate that scMSI provides a cohesive solution for a variety of analysis tasks, such as integration analysis, data denoising, batch correction and spatial domain detection. We have applied scMSI on various single-cell and spatial multimodal datasets to validate its high effectiveness and robustness in diverse data types and application scenarios.
Collapse
Affiliation(s)
- Chengming Zhang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| | - Yiwen Yang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Shijie Tang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Kazuyuki Aihara
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| | - Chuanchao Zhang
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong 519031, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong 519031, China
| |
Collapse
|
14
|
Flynn E, Almonte-Loya A, Fragiadakis GK. Single-Cell Multiomics. Annu Rev Biomed Data Sci 2023; 6:313-337. [PMID: 37159875 PMCID: PMC11146013 DOI: 10.1146/annurev-biodatasci-020422-050645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Single-cell RNA sequencing methods have led to improved understanding of the heterogeneity and transcriptomic states present in complex biological systems. Recently, the development of novel single-cell technologies for assaying additional modalities, specifically genomic, epigenomic, proteomic, and spatial data, allows for unprecedented insight into cellular biology. While certain technologies collect multiple measurements from the same cells simultaneously, even when modalities are separately assayed in different cells, we can apply novel computational methods to integrate these data. The application of computational integration methods to multimodal paired and unpaired data results in rich information about the identities of the cells present and the interactions between different levels of biology, such as between genetic variation and transcription. In this review, we both discuss the single-cell technologies for measuring these modalities and describe and characterize a variety of computational integration methods for combining the resulting data to leverage multimodal information toward greater biological insight.
Collapse
Affiliation(s)
- Emily Flynn
- CoLabs, University of California, San Francisco, California, USA;
| | - Ana Almonte-Loya
- CoLabs, University of California, San Francisco, California, USA;
- Biomedical Informatics Program, University of California, San Francisco, California, USA
| | - Gabriela K Fragiadakis
- CoLabs, University of California, San Francisco, California, USA;
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
| |
Collapse
|
15
|
Fouché A, Zinovyev A. Omics data integration in computational biology viewed through the prism of machine learning paradigms. FRONTIERS IN BIOINFORMATICS 2023; 3:1191961. [PMID: 37600970 PMCID: PMC10436311 DOI: 10.3389/fbinf.2023.1191961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 07/26/2023] [Indexed: 08/22/2023] Open
Abstract
Important quantities of biological data can today be acquired to characterize cell types and states, from various sources and using a wide diversity of methods, providing scientists with more and more information to answer challenging biological questions. Unfortunately, working with this amount of data comes at the price of ever-increasing data complexity. This is caused by the multiplication of data types and batch effects, which hinders the joint usage of all available data within common analyses. Data integration describes a set of tasks geared towards embedding several datasets of different origins or modalities into a joint representation that can then be used to carry out downstream analyses. In the last decade, dozens of methods have been proposed to tackle the different facets of the data integration problem, relying on various paradigms. This review introduces the most common data types encountered in computational biology and provides systematic definitions of the data integration problems. We then present how machine learning innovations were leveraged to build effective data integration algorithms, that are widely used today by computational biologists. We discuss the current state of data integration and important pitfalls to consider when working with data integration tools. We eventually detail a set of challenges the field will have to overcome in the coming years.
Collapse
Affiliation(s)
- Aziz Fouché
- Institut Curie, PSL Research University, Paris, France
- Institut National de la Santé et de la Recherche Médicale, Paris, France
- CBIO-Centre for Computational Biology, ParisTech, PSL Research University, Paris, France
- Ecole Normale Supérieure Paris-Saclay, Cachan, France
| | | |
Collapse
|
16
|
Ashuach T, Gabitto MI, Koodli RV, Saldi GA, Jordan MI, Yosef N. MultiVI: deep generative model for the integration of multimodal data. Nat Methods 2023; 20:1222-1231. [PMID: 37386189 PMCID: PMC10406609 DOI: 10.1038/s41592-023-01909-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 05/10/2023] [Indexed: 07/01/2023]
Abstract
Jointly profiling the transcriptome, chromatin accessibility and other molecular properties of single cells offers a powerful way to study cellular diversity. Here we present MultiVI, a probabilistic model to analyze such multiomic data and leverage it to enhance single-modality datasets. MultiVI creates a joint representation that allows an analysis of all modalities included in the multiomic input data, even for cells for which one or more modalities are missing. It is available at scvi-tools.org .
Collapse
Affiliation(s)
- Tal Ashuach
- Center for Computational Biology, University of California, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Mariano I Gabitto
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA.
- Allen Institute for Brain Science, Seattle, WA, USA.
| | - Rohan V Koodli
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | | | - Michael I Jordan
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
17
|
Zhang J, Li J, Lin L. Statistical and machine learning methods for immunoprofiling based on single-cell data. Hum Vaccin Immunother 2023:2234792. [PMID: 37485833 PMCID: PMC10373621 DOI: 10.1080/21645515.2023.2234792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 06/30/2023] [Accepted: 07/04/2023] [Indexed: 07/25/2023] Open
Abstract
Immunoprofiling has become a crucial tool for understanding the complex interactions between the immune system and diseases or interventions, such as therapies and vaccinations. Immune response biomarkers are critical for understanding those relationships and potentially developing personalized intervention strategies. Single-cell data have emerged as a promising source for identifying immune response biomarkers. In this review, we discuss the current state-of-the-art methods for immunoprofiling, including those for reducing the dimensionality of high-dimensional single-cell data and methods for clustering, classification, and prediction. We also draw attention to recent developments in data integration.
Collapse
Affiliation(s)
- Jingxuan Zhang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Jia Li
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
| | - Lin Lin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| |
Collapse
|
18
|
Wani SA, Khan SA, Quadri SMK. scJVAE: A novel method for integrative analysis of multimodal single-cell data. Comput Biol Med 2023; 158:106865. [PMID: 37030268 DOI: 10.1016/j.compbiomed.2023.106865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 02/22/2023] [Accepted: 03/30/2023] [Indexed: 04/07/2023]
Abstract
The study of cellular decision-making can be approached comprehensively using multimodal single-cell omics technology. Recent advances in multimodal single-cell technology have enabled simultaneous profiling of more than one modality from the same cell, providing more significant insights into cell characteristics. However, learning the joint representation of multimodal single-cell data is challenging due to batch effects. Here we present a novel method, scJVAE (single-cell Joint Variational AutoEncoder), for batch effect removal and joint representation of multimodal single-cell data. The scJVAE integrates and learns joint embedding of paired scRNA-seq and scATAC-seq data modalities. We evaluate and demonstrate the ability of scJVAE to remove batch effects using various datasets with paired gene expression and open chromatin. We also consider scJVAE for downstream analysis, such as lower dimensional representation, cell-type clustering, and time and memory requirement. We find scJVAE a robust and scalable method outperforming existing state-of-the-art batch effect removal and integration methods.
Collapse
Affiliation(s)
- Shahid Ahmad Wani
- Department of Computer Science, Jamia Millia Islamia, New Delhi, 110025, India.
| | - Sumeer Ahmad Khan
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - S M K Quadri
- Department of Computer Science, Jamia Millia Islamia, New Delhi, 110025, India
| |
Collapse
|
19
|
Lan M, Zhang S, Gao L. Efficient Generation of Paired Single-Cell Multiomics Profiles by Deep Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023:e2301169. [PMID: 37114830 PMCID: PMC10375161 DOI: 10.1002/advs.202301169] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 04/08/2023] [Indexed: 06/19/2023]
Abstract
Recent advances in single-cell sequencing technology have made it possible to measure multiple paired omics simultaneously in a single cell such as cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) and single-nucleus chromatin accessibility and mRNA expression sequencing (SNARE-seq). However, the widespread application of these single-cell multiomics profiling technologies has been limited by their experimental complexity, noise in nature, and high cost. In addition, single-omics sequencing technologies have generated tremendous and high-quality single-cell datasets but have yet to be fully utilized. Here, single-cell multiomics generation (scMOG), a deep learning-based framework to generate single-cell assay for transposase-accessible chromatin (ATAC) data in silico is developed from experimentally available single-cell RNA-seq measurements and vice versa. The results demonstrate that scMOG can accurately perform cross-omics generation between RNA and ATAC, and generate paired multiomics data with biological meanings when one omics is experimentally unavailable and out of training datasets. The generated ATAC, either alone or in combination with measured RNA, exhibits equivalent or superior performance to that of the experimentally measured counterparts throughout multiple downstream analyses. scMOG is also applied to human lymphoma data, which proves to be more effective in identifying tumor samples than the experimentally measured ATAC data. Finally, the performance of scMOG is investigated in other omics such as proteomics and it still shows robust performance on surface protein generation.
Collapse
Affiliation(s)
- Meng Lan
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Shixiong Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| |
Collapse
|
20
|
Shi ZD, Pang K, Wu ZX, Dong Y, Hao L, Qin JX, Wang W, Chen ZS, Han CH. Tumor cell plasticity in targeted therapy-induced resistance: mechanisms and new strategies. Signal Transduct Target Ther 2023; 8:113. [PMID: 36906600 PMCID: PMC10008648 DOI: 10.1038/s41392-023-01383-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 12/07/2022] [Accepted: 02/20/2023] [Indexed: 03/13/2023] Open
Abstract
Despite the success of targeted therapies in cancer treatment, therapy-induced resistance remains a major obstacle to a complete cure. Tumor cells evade treatments and relapse via phenotypic switching driven by intrinsic or induced cell plasticity. Several reversible mechanisms have been proposed to circumvent tumor cell plasticity, including epigenetic modifications, regulation of transcription factors, activation or suppression of key signaling pathways, as well as modification of the tumor environment. Epithelial-to-mesenchymal transition, tumor cell and cancer stem cell formation also serve as roads towards tumor cell plasticity. Corresponding treatment strategies have recently been developed that either target plasticity-related mechanisms or employ combination treatments. In this review, we delineate the formation of tumor cell plasticity and its manipulation of tumor evasion from targeted therapy. We discuss the non-genetic mechanisms of targeted drug-induced tumor cell plasticity in various types of tumors and provide insights into the contribution of tumor cell plasticity to acquired drug resistance. New therapeutic strategies such as inhibition or reversal of tumor cell plasticity are also presented. We also discuss the multitude of clinical trials that are ongoing worldwide with the intention of improving clinical outcomes. These advances provide a direction for developing novel therapeutic strategies and combination therapy regimens that target tumor cell plasticity.
Collapse
Affiliation(s)
- Zhen-Duo Shi
- Department of Urology, Xuzhou Clinical School of Xuzhou Medical University, Jiangsu, China.,Department of Urology, Xuzhou Central Hospital, Xuzhou, Jiangsu, China.,School of Life Sciences, Jiangsu Normal University, Jiangsu, China.,Department of Urology, Heilongjiang Provincial Hospital, Heilongjiang, China
| | - Kun Pang
- Department of Urology, Xuzhou Clinical School of Xuzhou Medical University, Jiangsu, China.,Department of Urology, Xuzhou Central Hospital, Xuzhou, Jiangsu, China
| | - Zhuo-Xun Wu
- Department of Pharmaceutical Sciences, College of Pharmacy and Health Sciences, St. John's University, Queens, NY, 11439, USA
| | - Yang Dong
- Department of Urology, Xuzhou Clinical School of Xuzhou Medical University, Jiangsu, China.,Department of Urology, Xuzhou Central Hospital, Xuzhou, Jiangsu, China
| | - Lin Hao
- Department of Urology, Xuzhou Clinical School of Xuzhou Medical University, Jiangsu, China.,Department of Urology, Xuzhou Central Hospital, Xuzhou, Jiangsu, China
| | - Jia-Xin Qin
- Department of Urology, Xuzhou Clinical School of Xuzhou Medical University, Jiangsu, China.,Department of Urology, Xuzhou Central Hospital, Xuzhou, Jiangsu, China
| | - Wei Wang
- Department of Medical College, Southeast University, Nanjing, China
| | - Zhe-Sheng Chen
- Department of Pharmaceutical Sciences, College of Pharmacy and Health Sciences, St. John's University, Queens, NY, 11439, USA.
| | - Cong-Hui Han
- Department of Urology, Xuzhou Clinical School of Xuzhou Medical University, Jiangsu, China. .,Department of Urology, Xuzhou Central Hospital, Xuzhou, Jiangsu, China. .,School of Life Sciences, Jiangsu Normal University, Jiangsu, China. .,Department of Urology, Heilongjiang Provincial Hospital, Heilongjiang, China.
| |
Collapse
|
21
|
Carrion J, Nandakumar R, Shi X, Gu H, Kim Y, Raskind WH, Peter B, Dinu V. A data-fusion approach to identifying developmental dyslexia from multi-omics datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.27.530280. [PMID: 36909570 PMCID: PMC10002702 DOI: 10.1101/2023.02.27.530280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
This exploratory study tested and validated the use of data fusion and machine learning techniques to probe high-throughput omics and clinical data with a goal of exploring the etiology of developmental dyslexia. Developmental dyslexia is the leading learning disability in school aged children affecting roughly 5-10% of the US population. The complex biological and neurological phenotype of this life altering disability complicates its diagnosis. Phenome, exome, and metabolome data was collected allowing us to fully explore this system from a behavioral, cellular, and molecular point of view. This study provides a proof of concept showing that data fusion and ensemble learning techniques can outperform traditional machine learning techniques when provided small and complex multi-omics and clinical datasets. Heterogenous stacking classifiers consisting of single-omic experts/models achieved an accuracy of 86%, F1 score of 0.89, and AUC value of 0.83. Ensemble methods also provided a ranked list of important features that suggests exome single nucleotide polymorphisms found in the thalamus and cerebellum could be potential biomarkers for developmental dyslexia and heavily influenced the classification of DD within our machine learning models.
Collapse
Affiliation(s)
- Jackson Carrion
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
| | - Rohit Nandakumar
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
| | - Xiaojian Shi
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
- Cellular and Molecular Physiology Department, Yale School of Medicine, New Haven, CT 06510
| | - Haiwei Gu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
- Center for Translational Science, Florida International University, Port St. Lucie, FL 34987
| | - Yookyung Kim
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
| | - Wendy H Raskind
- Department of Medicine/Medical Genetics, University of Washington, Seattle, WA 98105
| | - Beate Peter
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
| | - Valentin Dinu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
| |
Collapse
|
22
|
Multimodal sensor fusion in the latent representation space. Sci Rep 2023; 13:2005. [PMID: 36737463 PMCID: PMC9898225 DOI: 10.1038/s41598-022-24754-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 11/21/2022] [Indexed: 02/05/2023] Open
Abstract
A new method for multimodal sensor fusion is introduced. The technique relies on a two-stage process. In the first stage, a multimodal generative model is constructed from unlabelled training data. In the second stage, the generative model serves as a reconstruction prior and the search manifold for the sensor fusion tasks. The method also handles cases where observations are accessed only via subsampling i.e. compressed sensing. We demonstrate the effectiveness and excellent performance on a range of multimodal fusion experiments such as multisensory classification, denoising, and recovery from subsampled observations.
Collapse
|
23
|
Subedi S, Park YP. Single-cell pair-wise relationships untangled by composite embedding model. iScience 2023; 26:106025. [PMID: 36824286 PMCID: PMC9941206 DOI: 10.1016/j.isci.2023.106025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/24/2022] [Accepted: 01/17/2023] [Indexed: 01/25/2023] Open
Abstract
In multicellular organisms, cell identity and functions are primed and refined through interactions with other surrounding cells. Here, we propose a scalable machine learning method, termed SPRUCE, which is designed to systematically ascertain common cell-cell communication patterns embedded in single-cell RNA-seq data. We applied our approach to investigate tumor microenvironments consolidating multiple breast cancer datasets and found seven frequently observed interaction signatures and underlying gene-gene interaction networks. Our results implicate that a part of tumor heterogeneity, especially within the same subtype, is better understood by differential interaction patterns rather than the static expression of known marker genes.
Collapse
Affiliation(s)
- Sishir Subedi
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada,BC Cancer Research, Part of Provincial Health Care Authority, Vancouver, BC, Canada
| | - Yongjin P. Park
- BC Cancer Research, Part of Provincial Health Care Authority, Vancouver, BC, Canada,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada,Department of Statistics, University of British Columbia, Vancouver, BC, Canada,Corresponding author
| |
Collapse
|
24
|
Wang Y, Lian B, Zhang H, Zhong Y, He J, Wu F, Reinert K, Shang X, Yang H, Hu J. A multi-view latent variable model reveals cellular heterogeneity in complex tissues for paired multimodal single-cell data. Bioinformatics 2023; 39:btad005. [PMID: 36622018 PMCID: PMC9857983 DOI: 10.1093/bioinformatics/btad005] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 12/27/2022] [Accepted: 01/06/2023] [Indexed: 01/10/2023] Open
Abstract
MOTIVATION Single-cell multimodal assays allow us to simultaneously measure two different molecular features of the same cell, enabling new insights into cellular heterogeneity, cell development and diseases. However, most existing methods suffer from inaccurate dimensionality reduction for the joint-modality data, hindering their discovery of novel or rare cell subpopulations. RESULTS Here, we present VIMCCA, a computational framework based on variational-assisted multi-view canonical correlation analysis to integrate paired multimodal single-cell data. Our statistical model uses a common latent variable to interpret the common source of variances in two different data modalities. Our approach jointly learns an inference model and two modality-specific non-linear models by leveraging variational inference and deep learning. We perform VIMCCA and compare it with 10 existing state-of-the-art algorithms on four paired multi-modal datasets sequenced by different protocols. Results demonstrate that VIMCCA facilitates integrating various types of joint-modality data, thus leading to more reliable and accurate downstream analysis. VIMCCA improves our ability to identify novel or rare cell subtypes compared to existing widely used methods. Besides, it can also facilitate inferring cell lineage based on joint-modality profiles. AVAILABILITY AND IMPLEMENTATION The VIMCCA algorithm has been implemented in our toolkit package scbean (≥0.5.0), and its code has been archived at https://github.com/jhu99/scbean under MIT license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuwei Wang
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Bin Lian
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Haohui Zhang
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Yuanke Zhong
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Jie He
- Department of Biostatistics, School of Public Health, Peking University Health Science Center, Beijing 100191, China
| | - Fashuai Wu
- Department of Orthopaedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Knut Reinert
- Institut für Informatik, Freie Universität Berlin, 14195 Berlin, Germany
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Hui Yang
- School of Life Science, Northwestern Polytechnical University, Shaanxi 710072, China
| | - Jialu Hu
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| |
Collapse
|
25
|
Lin X, Tian T, Wei Z, Hakonarson H. Clustering of single-cell multi-omics data with a multimodal deep learning method. Nat Commun 2022; 13:7705. [PMID: 36513636 PMCID: PMC9748135 DOI: 10.1038/s41467-022-35031-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 11/16/2022] [Indexed: 12/15/2022] Open
Abstract
Single-cell multimodal sequencing technologies are developed to simultaneously profile different modalities of data in the same cell. It provides a unique opportunity to jointly analyze multimodal data at the single-cell level for the identification of distinct cell types. A correct clustering result is essential for the downstream complex biological functional studies. However, combining different data sources for clustering analysis of single-cell multimodal data remains a statistical and computational challenge. Here, we develop a novel multimodal deep learning method, scMDC, for single-cell multi-omics data clustering analysis. scMDC is an end-to-end deep model that explicitly characterizes different data sources and jointly learns latent features of deep embedding for clustering analysis. Extensive simulation and real-data experiments reveal that scMDC outperforms existing single-cell single-modal and multimodal clustering methods on different single-cell multimodal datasets. The linear scalability of running time makes scMDC a promising method for analyzing large multimodal datasets.
Collapse
Affiliation(s)
- Xiang Lin
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Tian Tian
- Center of Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA.
| | - Hakon Hakonarson
- Center of Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Human Genetics, Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
26
|
Yuan M, Chen L, Deng M. Clustering single-cell multi-omics data with MoClust. Bioinformatics 2022; 39:6831092. [PMID: 36383167 PMCID: PMC9805570 DOI: 10.1093/bioinformatics/btac736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 11/09/2022] [Accepted: 11/14/2022] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Single-cell multi-omics sequencing techniques have rapidly developed in the past few years. Clustering analysis with single-cell multi-omics data may give us novel perspectives to dissect cellular heterogeneity. However, multi-omics data have the properties of inherited large dimension, high sparsity and existence of doublets. Moreover, representations of different omics from even the same cell follow diverse distributions. Without proper distribution alignment techniques, clustering methods will encounter less separable clusters easily affected by less informative omics data. RESULTS We developed MoClust, a novel joint clustering framework that can be applied to several types of single-cell multi-omics data. A selective automatic doublet detection module that can identify and filter out doublets is introduced in the pretraining stage to improve data quality. Omics-specific autoencoders are introduced to characterize the multi-omics data. A contrastive learning way of distribution alignment is adopted to adaptively fuse omics representations into an omics-invariant representation. This novel way of alignment boosts the compactness and separableness of clusters, while accurately weighting the contribution of each omics to the clustering object. Extensive experiments, over both simulated and real multi-omics datasets, demonstrated the powerful alignment, doublet detection and clustering ability features of MoClust. AVAILABILITY AND IMPLEMENTATION An implementation of MoClust is available from https://doi.org/10.5281/zenodo.7306504. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Musu Yuan
- Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Liang Chen
- To whom correspondence should be addressed. or
| | | |
Collapse
|
27
|
Zhang R, Meng-papaxanthos L, Vert JP, Noble WS. Multimodal Single-Cell Translation and Alignment with Semi-Supervised Learning. J Comput Biol 2022; 29:1198-1212. [PMID: 36251758 PMCID: PMC9700358 DOI: 10.1089/cmb.2022.0264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Single-cell multi-omics technologies enable comprehensive interrogation of cellular regulation, yet most single-cell assays measure only one type of activity-such as transcription, chromatin accessibility, DNA methylation, or 3D chromatin architecture-for each cell. To enable a multimodal view for individual cells, we propose Polarbear, a semi-supervised machine learning framework that facilitates missing modality profile prediction and single-cell cross-modality alignment. Polarbear learns to translate between modalities by using data from co-assay measurements coupled with the large quantity of single-assay data available in public databases. This semi-supervised scheme mitigates issues related to low cell quantities and high sparsity in co-assay data. Polarbear first pre-trains a beta-variational autoencoder for each modality using both co-assay and single-assay profiles to learn robust representations of individual cells, and it then uses the co-assay labels to train a translator between these cell representations. This semi-supervised framework enables us to predict missing modality profiles and match single cells across modalities with improved accuracy compared with fully supervised methods, thus facilitating multimodal data integration.
Collapse
Affiliation(s)
- Ran Zhang
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | | | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, USA
| |
Collapse
|
28
|
Brombacher E, Hackenberg M, Kreutz C, Binder H, Treppner M. The performance of deep generative models for learning joint embeddings of single-cell multi-omics data. Front Mol Biosci 2022; 9:962644. [PMID: 36387277 PMCID: PMC9643784 DOI: 10.3389/fmolb.2022.962644] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 10/12/2022] [Indexed: 11/07/2023] Open
Abstract
Recent extensions of single-cell studies to multiple data modalities raise new questions regarding experimental design. For example, the challenge of sparsity in single-omics data might be partly resolved by compensating for missing information across modalities. In particular, deep learning approaches, such as deep generative models (DGMs), can potentially uncover complex patterns via a joint embedding. Yet, this also raises the question of sample size requirements for identifying such patterns from single-cell multi-omics data. Here, we empirically examine the quality of DGM-based integrations for varying sample sizes. We first review the existing literature and give a short overview of deep learning methods for multi-omics integration. Next, we consider eight popular tools in more detail and examine their robustness to different cell numbers, covering two of the most common multi-omics types currently favored. Specifically, we use data featuring simultaneous gene expression measurements at the RNA level and protein abundance measurements for cell surface proteins (CITE-seq), as well as data where chromatin accessibility and RNA expression are measured in thousands of cells (10x Multiome). We examine the ability of the methods to learn joint embeddings based on biological and technical metrics. Finally, we provide recommendations for the design of multi-omics experiments and discuss potential future developments.
Collapse
Affiliation(s)
- Eva Brombacher
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis and Modeling University of Freiburg, Freiburg, Germany
- Spemann Graduate School of Biology and Medicine (SGBM) University of Freiburg, Freiburg, Germany
- Centre for Integrative Biological Signaling Studies (CIBSS) University of Freiburg, Freiburg, Germany
- Faculty of Biology University of Freiburg, Freiburg, Germany
| | - Maren Hackenberg
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis and Modeling University of Freiburg, Freiburg, Germany
| | - Clemens Kreutz
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis and Modeling University of Freiburg, Freiburg, Germany
- Centre for Integrative Biological Signaling Studies (CIBSS) University of Freiburg, Freiburg, Germany
| | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis and Modeling University of Freiburg, Freiburg, Germany
| | - Martin Treppner
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis and Modeling University of Freiburg, Freiburg, Germany
| |
Collapse
|
29
|
Hernández Medina R, Kutuzova S, Nielsen KN, Johansen J, Hansen LH, Nielsen M, Rasmussen S. Machine learning and deep learning applications in microbiome research. ISME COMMUNICATIONS 2022; 2:98. [PMID: 37938690 PMCID: PMC9723725 DOI: 10.1038/s43705-022-00182-9] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/12/2022] [Accepted: 09/16/2022] [Indexed: 05/27/2023]
Abstract
The many microbial communities around us form interactive and dynamic ecosystems called microbiomes. Though concealed from the naked eye, microbiomes govern and influence macroscopic systems including human health, plant resilience, and biogeochemical cycling. Such feats have attracted interest from the scientific community, which has recently turned to machine learning and deep learning methods to interrogate the microbiome and elucidate the relationships between its composition and function. Here, we provide an overview of how the latest microbiome studies harness the inductive prowess of artificial intelligence methods. We start by highlighting that microbiome data - being compositional, sparse, and high-dimensional - necessitates special treatment. We then introduce traditional and novel methods and discuss their strengths and applications. Finally, we discuss the outlook of machine and deep learning pipelines, focusing on bottlenecks and considerations to address them.
Collapse
Affiliation(s)
- Ricardo Hernández Medina
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
| | - Svetlana Kutuzova
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
- Department of Computer Science, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark
| | - Knud Nor Nielsen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871, Frederiksberg, Denmark
| | - Joachim Johansen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
| | - Lars Hestbjerg Hansen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871, Frederiksberg, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark.
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark.
| |
Collapse
|
30
|
Lynch AW, Theodoris CV, Long HW, Brown M, Liu XS, Meyer CA. MIRA: joint regulatory modeling of multimodal expression and chromatin accessibility in single cells. Nat Methods 2022; 19:1097-1108. [PMID: 36068320 PMCID: PMC9517733 DOI: 10.1038/s41592-022-01595-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 07/26/2022] [Indexed: 02/06/2023]
Abstract
Rigorously comparing gene expression and chromatin accessibility in the same single cells could illuminate the logic of how coupling or decoupling of these mechanisms regulates fate commitment. Here we present MIRA, probabilistic multimodal models for integrated regulatory analysis, a comprehensive methodology that systematically contrasts transcription and accessibility to infer the regulatory circuitry driving cells along cell state trajectories. MIRA leverages topic modeling of cell states and regulatory potential modeling of individual gene loci. MIRA thereby represents cell states in an efficient and interpretable latent space, infers high-fidelity cell state trees, determines key regulators of fate decisions at branch points and exposes the variable influence of local accessibility on transcription at distinct loci. Applied to epidermal differentiation and embryonic brain development from two different multimodal platforms, MIRA revealed that early developmental genes were tightly regulated by local chromatin landscape whereas terminal fate genes were titrated without requiring extensive chromatin remodeling.
Collapse
Affiliation(s)
- Allen W Lynch
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Christina V Theodoris
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.,Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA.,Harvard Medical School Genetics Training Program, Boston, MA, USA
| | - Henry W Long
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Medical Oncology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, and Harvard Medical School, Boston, MA, USA
| | - Myles Brown
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Medical Oncology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, and Harvard Medical School, Boston, MA, USA
| | - X Shirley Liu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA. .,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA. .,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Clifford A Meyer
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA. .,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA. .,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
31
|
Yuan M, Chen L, Deng M. Clustering CITE-seq data with a canonical correlation-based deep learning method. Front Genet 2022; 13:977968. [PMID: 36072672 PMCID: PMC9441595 DOI: 10.3389/fgene.2022.977968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 07/22/2022] [Indexed: 12/03/2022] Open
Abstract
Single-cell multiomics sequencing techniques have rapidly developed in the past few years. Among these techniques, single-cell cellular indexing of transcriptomes and epitopes (CITE-seq) allows simultaneous quantification of gene expression and surface proteins. Clustering CITE-seq data have the great potential of providing us with a more comprehensive and in-depth view of cell states and interactions. However, CITE-seq data inherit the properties of scRNA-seq data, being noisy, large-dimensional, and highly sparse. Moreover, representations of RNA and surface protein are sometimes with low correlation and contribute divergently to the clustering object. To overcome these obstacles and find a combined representation well suited for clustering, we proposed scCTClust for multiomics data, especially CITE-seq data, and clustering analysis. Two omics-specific neural networks are introduced to extract cluster information from omics data. A deep canonical correlation method is adopted to find the maximumly correlated representations of two omics. A novel decentralized clustering method is utilized over the linear combination of latent representations of two omics. The fusion weights which can account for contributions of omics to clustering are adaptively updated during training. Extensive experiments over both simulated and real CITE-seq data sets demonstrated the power of scCTClust. We also applied scCTClust on transcriptome–epigenome data to illustrate its potential for generalizing.
Collapse
Affiliation(s)
- Musu Yuan
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- *Correspondence: Musu Yuan,
| | - Liang Chen
- Department of Probability and Statistics, School of Mathematical Sciences, Peking University, Beijing, China
| | - Minghua Deng
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Department of Probability and Statistics, School of Mathematical Sciences, Peking University, Beijing, China
- Center for Statistical Science, Peking University, Beijing, China
| |
Collapse
|
32
|
Mora A, Rakar J, Cobeta IM, Salmani BY, Starkenberg A, Thor S, Bodén M. Variational autoencoding of gene landscapes during mouse CNS development uncovers layered roles of Polycomb Repressor Complex 2. Nucleic Acids Res 2022; 50:1280-1296. [PMID: 35048973 PMCID: PMC8860581 DOI: 10.1093/nar/gkac006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/22/2021] [Accepted: 01/05/2022] [Indexed: 12/13/2022] Open
Abstract
A prominent aspect of most, if not all, central nervous systems (CNSs) is that anterior regions (brain) are larger than posterior ones (spinal cord). Studies in Drosophila and mouse have revealed that Polycomb Repressor Complex 2 (PRC2), a protein complex responsible for applying key repressive histone modifications, acts by several mechanisms to promote anterior CNS expansion. However, it is unclear what the full spectrum of PRC2 action is during embryonic CNS development and how PRC2 intersects with the epigenetic landscape. We removed PRC2 function from the developing mouse CNS, by mutating the key gene Eed, and generated spatio-temporal transcriptomic data. To decode the role of PRC2, we developed a method that incorporates standard statistical analyses with probabilistic deep learning to integrate the transcriptomic response to PRC2 inactivation with epigenetic data. This multi-variate analysis corroborates the central involvement of PRC2 in anterior CNS expansion, and also identifies several unanticipated cohorts of genes, such as proliferation and immune response genes. Furthermore, the analysis reveals specific profiles of regulation via PRC2 upon these gene cohorts. These findings uncover a differential logic for the role of PRC2 upon functionally distinct gene cohorts that drive CNS anterior expansion. To support the analysis of emerging multi-modal datasets, we provide a novel bioinformatics package that integrates transcriptomic and epigenetic datasets to identify regulatory underpinnings of heterogeneous biological processes.
Collapse
Affiliation(s)
- Ariane Mora
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Jonathan Rakar
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
| | - Ignacio Monedero Cobeta
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden.,Department of Physiology, Universidad Autonoma de Madrid, Madrid, Spain
| | - Behzad Yaghmaeian Salmani
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden.,Department of Cell and Molecular Biology, Karolinska Institute, SE-171 65 Stockholm, Sweden
| | - Annika Starkenberg
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
| | - Stefan Thor
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden.,School of Biomedical Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
33
|
Gong B, Zhou Y, Purdom E. Cobolt: integrative analysis of multimodal single-cell sequencing data. Genome Biol 2021; 22:351. [PMID: 34963480 PMCID: PMC8715620 DOI: 10.1186/s13059-021-02556-z] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/22/2021] [Indexed: 11/29/2022] Open
Abstract
A growing number of single-cell sequencing platforms enable joint profiling of multiple omics from the same cells. We present Cobolt, a novel method that not only allows for analyzing the data from joint-modality platforms, but provides a coherent framework for the integration of multiple datasets measured on different modalities. We demonstrate its performance on multi-modality data of gene expression and chromatin accessibility and illustrate the integration abilities of Cobolt by jointly analyzing this multi-modality data with single-cell RNA-seq and ATAC-seq datasets.
Collapse
Affiliation(s)
- Boying Gong
- Division of Biostatistics, University of California, Berkeley, Berkeley, CA USA
| | - Yun Zhou
- Division of Biostatistics, University of California, Berkeley, Berkeley, CA USA
| | - Elizabeth Purdom
- Department of Statistics, University of California, Berkeley, Berkeley, CA USA
| |
Collapse
|