1
|
Jeong Y, Ronen J, Kopp W, Lutsik P, Akalin A. scMaui: a widely applicable deep learning framework for single-cell multiomics integration in the presence of batch effects and missing data. BMC Bioinformatics 2024; 25:257. [PMID: 39107690 PMCID: PMC11304929 DOI: 10.1186/s12859-024-05880-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 07/23/2024] [Indexed: 08/10/2024] Open
Abstract
The recent advances in high-throughput single-cell sequencing have created an urgent demand for computational models which can address the high complexity of single-cell multiomics data. Meticulous single-cell multiomics integration models are required to avoid biases towards a specific modality and overcome sparsity. Batch effects obfuscating biological signals must also be taken into account. Here, we introduce a new single-cell multiomics integration model, Single-cell Multiomics Autoencoder Integration (scMaui) based on variational product-of-experts autoencoders and adversarial learning. scMaui calculates a joint representation of multiple marginal distributions based on a product-of-experts approach which is especially effective for missing values in the modalities. Furthermore, it overcomes limitations seen in previous VAE-based integration methods with regard to batch effect correction and restricted applicable assays. It handles multiple batch effects independently accepting both discrete and continuous values, as well as provides varied reconstruction loss functions to cover all possible assays and preprocessing pipelines. We demonstrate that scMaui achieves superior performance in many tasks compared to other methods. Further downstream analyses also demonstrate its potential in identifying relations between assays and discovering hidden subpopulations.
Collapse
Affiliation(s)
- Yunhee Jeong
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, Germany
- Faculty of Mathematics and Informatics, Heidelberg University, Im Neuenheimer Feld 205, Heidelberg, Germany
| | - Jonathan Ronen
- Bioinformatics and Omics Data Science Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany
- Inceptive Nucleics, Inc., Palo Alto, CA, USA
| | - Wolfgang Kopp
- Bioinformatics and Omics Data Science Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany
- Roche Diagnostics GmbH, Penzberg, Germany
| | - Pavlo Lutsik
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, Germany.
- Department of Oncology, Catholic University (KU) Leuven, Leuven, Belgium.
| | - Altuna Akalin
- Bioinformatics and Omics Data Science Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany.
| |
Collapse
|
2
|
Curion F, Rich-Griffin C, Agarwal D, Ouologuem S, Rue-Albrecht K, May L, Garcia GEL, Heumos L, Thomas T, Lason W, Sims D, Theis FJ, Dendrou CA. Panpipes: a pipeline for multiomic single-cell and spatial transcriptomic data analysis. Genome Biol 2024; 25:181. [PMID: 38978088 PMCID: PMC11229213 DOI: 10.1186/s13059-024-03322-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 06/25/2024] [Indexed: 07/10/2024] Open
Abstract
Single-cell multiomic analysis of the epigenome, transcriptome, and proteome allows for comprehensive characterization of the molecular circuitry that underpins cell identity and state. However, the holistic interpretation of such datasets presents a challenge given a paucity of approaches for systematic, joint evaluation of different modalities. Here, we present Panpipes, a set of computational workflows designed to automate multimodal single-cell and spatial transcriptomic analyses by incorporating widely-used Python-based tools to perform quality control, preprocessing, integration, clustering, and reference mapping at scale. Panpipes allows reliable and customizable analysis and evaluation of individual and integrated modalities, thereby empowering decision-making before downstream investigations.
Collapse
Affiliation(s)
- Fabiola Curion
- Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Charlotte Rich-Griffin
- Nuffield Department of Medicine, Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Devika Agarwal
- Nuffield Department of Medicine, Centre for Human Genetics, University of Oxford, Oxford, UK
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK
| | - Sarah Ouologuem
- Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Germany
| | - Kevin Rue-Albrecht
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
| | - Lilly May
- Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Germany
| | - Giulia E L Garcia
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK
- Doctoral Training Centre, University of Oxford, Oxford, UK
| | - Lukas Heumos
- Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Germany
- Comprehensive Pneumology Center With the CPC-M bioArchive, Helmholtz Zentrum Munich, Member of the German Center for Lung Research (DZL), Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Tom Thomas
- Nuffield Department of Medicine, Centre for Human Genetics, University of Oxford, Oxford, UK
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK
- Nuffield Department of Medicine, Translational Gastroenterology Unit, University of Oxford, Oxford, UK
| | - Wojciech Lason
- Nuffield Department of Medicine, Respiratory Medicine Unit, Experimental Medicine Division, University of Oxford, John Radcliffe Hospital, Oxford, UK
| | - David Sims
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
| | - Fabian J Theis
- Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Germany.
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| | - Calliope A Dendrou
- Nuffield Department of Medicine, Centre for Human Genetics, University of Oxford, Oxford, UK.
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK.
- NIHR Oxford Biomedical Research Centre, Oxford, UK.
| |
Collapse
|
3
|
Rivero-Garcia I, Torres M, Sánchez-Cabo F. Deep generative models in single-cell omics. Comput Biol Med 2024; 176:108561. [PMID: 38749321 DOI: 10.1016/j.compbiomed.2024.108561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 04/30/2024] [Accepted: 05/05/2024] [Indexed: 05/31/2024]
Abstract
Deep Generative Models (DGMs) are becoming instrumental for inferring probability distributions inherent to complex processes, such as most questions in biomedical research. For many years, there was a lack of mathematical methods that would allow this inference in the scarce data scenario of biomedical research. The advent of single-cell omics has finally made square the so-called "skinny matrix", allowing to apply mathematical methods already extensively used in other areas. Moreover, it is now possible to integrate data at different molecular levels in thousands or even millions of samples, thanks to the number of single-cell atlases being collaboratively generated. Additionally, DGMs have proven useful in other frequent tasks in single-cell analysis pipelines, from dimensionality reduction, cell type annotation to RNA velocity inference. In spite of its promise, DGMs need to be used with caution in biomedical research, paying special attention to its use to answer the right questions and the definition of appropriate error metrics and validation check points that confirm not only its correct use but also its relevance. All in all, DGMs provide an exciting tool that opens a bright future for the integrative analysis of single-cell -omics to understand health and disease.
Collapse
Affiliation(s)
- Inés Rivero-Garcia
- Universidad Politécnica de Madrid, Madrid, 28040, Spain; Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain
| | - Miguel Torres
- Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain
| | - Fátima Sánchez-Cabo
- Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain.
| |
Collapse
|
4
|
Shannon CP, Lee AH, Tebbutt SJ, Singh A. A Commentary on Multi-omics Data Integration in Systems Vaccinology. J Mol Biol 2024; 436:168522. [PMID: 38458605 DOI: 10.1016/j.jmb.2024.168522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 03/04/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Affiliation(s)
| | - Amy Hy Lee
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada
| | - Scott J Tebbutt
- PROOF Centre of Excellence, Vancouver, Canada; Department of Medicine, The University of British Columbia, Vancouver, Canada; Centre for Heart Lung Innovation, Vancouver, Canada
| | - Amrit Singh
- Centre for Heart Lung Innovation, Vancouver, Canada; Department of Anesthesiology, Pharmacology and Therapeutics, The University of British Columbia, Vancouver, Canada.
| |
Collapse
|
5
|
Salimi A, Jang JH, Lee JY. Leveraging attention-enhanced variational autoencoders: Novel approach for investigating latent space of aptamer sequences. Int J Biol Macromol 2024; 255:127884. [PMID: 37926303 DOI: 10.1016/j.ijbiomac.2023.127884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/27/2023] [Accepted: 11/02/2023] [Indexed: 11/07/2023]
Abstract
Aptamers are increasingly recognized as potent alternatives to antibodies for diagnostic and therapeutic applications. The application of deep learning, particularly attention-based models, for aptamer (DNA/RNA) sequences is an innovative field. The ongoing advancements in aptamer sequencing technologies coupled with machine learning algorithms have resulted in novel developments. Further research is required to investigate the full potential of deep learning models and address the challenges associated with the generation of sequences, like the large search space of possible sequences. In this study, we propose a workflow that integrates an attention mechanism within a framework of a generative variational autoencoder, to generate novel sequences by expanding latent memory. They show 100 % novelty compared with the dataset, and approximately 88 % of them show negative values for the minimum free energy, which may indicate the likelihood of an RNA sequence folding into a functional structure. Because the field of aptamer discovery is affected by data scarcity, advanced strategies that facilitate the generation of diverse and superior sequences are necessitated. The utilization of our workflow can result in novel aptamers. Thus, investigations such as the present study can address the abovementioned challenge. Our research is anticipated to facilitate further discoveries and advancements in aptamer fields.
Collapse
Affiliation(s)
- Abbas Salimi
- Department of Chemistry, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Jee Hwan Jang
- School of Materials Science and Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea; Ucaretron Inc., No. 3508, 40, Simin-daero 365 beon-gil, Dongan-gu, Anyang-si, Gyeonggi-do, Republic of Korea.
| | - Jin Yong Lee
- Department of Chemistry, Sungkyunkwan University, Suwon 16419, Republic of Korea.
| |
Collapse
|
6
|
Makrodimitris S, Pronk B, Abdelaal T, Reinders M. An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics. Brief Bioinform 2023; 25:bbad416. [PMID: 38018908 PMCID: PMC10685331 DOI: 10.1093/bib/bbad416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/26/2023] [Accepted: 10/30/2023] [Indexed: 11/30/2023] Open
Abstract
Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired information per sample, but recently there has been a rise in the popularity of neural architectures that embed paired -omics into the same non-linear manifold. This work describes a head-to-head comparison of linear and non-linear joint embedding methods using both bulk and single-cell multi-modal datasets. We found that non-linear methods have a clear advantage with respect to linear ones for missing modality imputation. Performance comparisons in the downstream tasks of survival analysis for bulk tumor data and cell type classification for single-cell data lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline and hard to beat if all modalities are available at test time. However, if we only have one modality available at test time, training a predictive model on the joint space of that modality can lead to performance improvements with respect to just using the unimodal principal components. Second, -omic profiles imputed by neural joint embedding methods are realistic enough to be used by a classifier trained on real data with limited performance drops. Taken together, our comparisons give hints to which joint embedding to use for which downstream task. Overall, product-of-experts performed well in most tasks and was reasonably fast, while early integration (concatenation) of modalities did quite poorly.
Collapse
Affiliation(s)
- Stavros Makrodimitris
- Delft Bioinformatics Lab, Delft University of Technology, Street, Postcode, State, Country
- Department of Medical Oncology, Erasmus University Medical Center, Street, Postcode, State, Country
- Department of Clinical Genetics, Erasmus University Medical Center, Street, Postcode, State, Country
| | - Bram Pronk
- Delft Bioinformatics Lab, Delft University of Technology, Street, Postcode, State, Country
| | - Tamim Abdelaal
- Delft Bioinformatics Lab, Delft University of Technology, Street, Postcode, State, Country
- Department of Radiology, Leiden University Medical Center, Street, Postcode, State, Country
- Leiden Computational Biology Center, Leiden University Medical Center, Street, Postcode, State, Country
| | - Marcel Reinders
- Delft Bioinformatics Lab, Delft University of Technology, Street, Postcode, State, Country
- Leiden Computational Biology Center, Leiden University Medical Center, Street, Postcode, State, Country
| |
Collapse
|
7
|
Ritter U. In situ veritas: combining omics and multiplex imaging can facilitate the detection and characterization of cell-cell interactions in tissues. Front Med (Lausanne) 2023; 10:1155057. [PMID: 37332762 PMCID: PMC10270289 DOI: 10.3389/fmed.2023.1155057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 04/25/2023] [Indexed: 06/20/2023] Open
Affiliation(s)
- Uwe Ritter
- Chair for Immunology, University of Regensburg, Regensburg, Germany
- Department for Immunology, Leibniz Institute for Immunotherapy (LIT), Regensburg, Germany
| |
Collapse
|