1
|
Yao X, Jiang X, Luo H, Liang H, Ye X, Wei Y, Cong S. MOCAT: multi-omics integration with auxiliary classifiers enhanced autoencoder. BioData Min 2024; 17:9. [PMID: 38444019 PMCID: PMC10916109 DOI: 10.1186/s13040-024-00360-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 02/29/2024] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND Integrating multi-omics data is emerging as a critical approach in enhancing our understanding of complex diseases. Innovative computational methods capable of managing high-dimensional and heterogeneous datasets are required to unlock the full potential of such rich and diverse data. METHODS We propose a Multi-Omics integration framework with auxiliary Classifiers-enhanced AuToencoders (MOCAT) to utilize intra- and inter-omics information comprehensively. Additionally, attention mechanisms with confidence learning are incorporated for enhanced feature representation and trustworthy prediction. RESULTS Extensive experiments were conducted on four benchmark datasets to evaluate the effectiveness of our proposed model, including BRCA, ROSMAP, LGG, and KIPAN. Our model significantly improved most evaluation measurements and consistently surpassed the state-of-the-art methods. Ablation studies showed that the auxiliary classifiers significantly boosted classification accuracy in the ROSMAP and LGG datasets. Moreover, the attention mechanisms and confidence evaluation block contributed to improvements in the predictive accuracy and generalizability of our model. CONCLUSIONS The proposed framework exhibits superior performance in disease classification and biomarker discovery, establishing itself as a robust and versatile tool for analyzing multi-layer biological data. This study highlights the significance of elaborated designed deep learning methodologies in dissecting complex disease phenotypes and improving the accuracy of disease predictions.
Collapse
Affiliation(s)
- Xiaohui Yao
- Qingdao Innovation and Development Center, Harbin Engineering University, 1777 Sansha Rd, Qingdao, 266000, Shandong, China
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China
| | - Xiaohan Jiang
- Qingdao Innovation and Development Center, Harbin Engineering University, 1777 Sansha Rd, Qingdao, 266000, Shandong, China
| | - Haoran Luo
- Qingdao Innovation and Development Center, Harbin Engineering University, 1777 Sansha Rd, Qingdao, 266000, Shandong, China
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China
| | - Hong Liang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China
| | - Xiufen Ye
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China
| | - Yanhui Wei
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China
| | - Shan Cong
- Qingdao Innovation and Development Center, Harbin Engineering University, 1777 Sansha Rd, Qingdao, 266000, Shandong, China.
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China.
| |
Collapse
|
2
|
Kazwini NE, Sanguinetti G. SHARE-Topic: Bayesian interpretable modeling of single-cell multi-omic data. Genome Biol 2024; 25:55. [PMID: 38395871 PMCID: PMC10885556 DOI: 10.1186/s13059-024-03180-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 01/31/2024] [Indexed: 02/25/2024] Open
Abstract
Multi-omic single-cell technologies, which simultaneously measure the transcriptional and epigenomic state of the same cell, enable understanding epigenetic mechanisms of gene regulation. However, noisy and sparse data pose fundamental statistical challenges to extract biological knowledge from complex datasets. SHARE-Topic, a Bayesian generative model of multi-omic single cell data using topic models, aims to address these challenges. SHARE-Topic identifies common patterns of co-variation between different omic layers, providing interpretable explanations for the data complexity. Tested on data from different technological platforms, SHARE-Topic provides low dimensional representations recapitulating known biology and defines associations between genes and distal regulators in individual cells.
Collapse
Affiliation(s)
- Nour El Kazwini
- Theoretical and Scientific Data Science, Scuola Internazionale Superiore di Studi Avanzati, Trieste, Italy
| | - Guido Sanguinetti
- Theoretical and Scientific Data Science, Scuola Internazionale Superiore di Studi Avanzati, Trieste, Italy.
| |
Collapse
|
3
|
Sun H, Qu H, Duan K, Du W. scMGCN: A Multi-View Graph Convolutional Network for Cell Type Identification in scRNA-seq Data. Int J Mol Sci 2024; 25:2234. [PMID: 38396909 PMCID: PMC10889820 DOI: 10.3390/ijms25042234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/07/2024] [Accepted: 02/09/2024] [Indexed: 02/25/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) data reveal the complexity and diversity of cellular ecosystems and molecular interactions in various biomedical research. Hence, identifying cell types from large-scale scRNA-seq data using existing annotations is challenging and requires stable and interpretable methods. However, the current cell type identification methods have limited performance, mainly due to the intrinsic heterogeneity among cell populations and extrinsic differences between datasets. Here, we present a robust graph artificial intelligence model, a multi-view graph convolutional network model (scMGCN) that integrates multiple graph structures from raw scRNA-seq data and applies graph convolutional networks with attention mechanisms to learn cell embeddings and predict cell labels. We evaluate our model on single-dataset, cross-species, and cross-platform experiments and compare it with other state-of-the-art methods. Our results show that scMGCN outperforms the other methods regarding stability, accuracy, and robustness to batch effects. Our main contributions are as follows: Firstly, we introduce multi-view learning and multiple graph construction methods to capture comprehensive cellular information from scRNA-seq data. Secondly, we construct a scMGCN that combines graph convolutional networks with attention mechanisms to extract shared, high-order information from cells. Finally, we demonstrate the effectiveness and superiority of the scMGCN on various datasets.
Collapse
Affiliation(s)
| | | | | | - Wei Du
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (H.S.); (H.Q.); (K.D.)
| |
Collapse
|
4
|
Chong D, Jones NC, Schittenhelm RB, Anderson A, Casillas-Espinosa PM. Multi-omics Integration and Epilepsy: Towards a Better Understanding of Biological Mechanisms. Prog Neurobiol 2023:102480. [PMID: 37286031 DOI: 10.1016/j.pneurobio.2023.102480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/09/2023] [Accepted: 06/03/2023] [Indexed: 06/09/2023]
Abstract
The epilepsies are a group of complex neurological disorders characterised by recurrent seizures. Approximately 30% of patients fail to respond to anti-seizure medications, despite the recent introduction of many new drugs. The molecular processes underlying epilepsy development are not well understood and this knowledge gap impedes efforts to identify effective targets and develop novel therapies against epilepsy. Omics studies allow a comprehensive characterisation of a class of molecules. Omics-based biomarkers have led to clinically validated diagnostic and prognostic tests for personalised oncology, and more recently for non-cancer diseases. We believe that, in epilepsy, the full potential of multi-omics research is yet to be realised and we envisage that this review will serve as a guide to researchers planning to undertake omics-based mechanistic studies.
Collapse
Affiliation(s)
- Debbie Chong
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia
| | - Nigel C Jones
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| | - Ralf B Schittenhelm
- Monash Proteomics & Metabolomics Facility and Monash Biomedicine Discovery Institute, Monash University, Clayton, Victoria, 3800, Australia
| | - Alison Anderson
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| | - Pablo M Casillas-Espinosa
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| |
Collapse
|
5
|
Bärthel S, Falcomatà C, Rad R, Theis FJ, Saur D. Single-cell profiling to explore pancreatic cancer heterogeneity, plasticity and response to therapy. NATURE CANCER 2023; 4:454-467. [PMID: 36959420 DOI: 10.1038/s43018-023-00526-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/08/2023] [Indexed: 03/25/2023]
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer entity characterized by a heterogeneous genetic landscape and an immunosuppressive tumor microenvironment. Recent advances in high-resolution single-cell sequencing and spatial transcriptomics technologies have enabled an in-depth characterization of both malignant and host cell types and increased our understanding of the heterogeneity and plasticity of PDAC in the steady state and under therapeutic perturbation. In this Review we outline single-cell analyses in PDAC, discuss their implications on our understanding of the disease and present future perspectives of multimodal approaches to elucidate its biology and response to therapy at the single-cell level.
Collapse
Affiliation(s)
- Stefanie Bärthel
- Division of Translational Cancer Research, German Cancer Research Center and German Cancer Consortium, Heidelberg, Germany
- Institute of Experimental Cancer Therapy, Klinikum Rechts der Isar, School of Medicine, Technische Universität München, Munich, Germany
- Center for Translational Cancer Research (TranslaTUM), School of Medicine, Technische Universität München, Munich, Germany
| | - Chiara Falcomatà
- Division of Translational Cancer Research, German Cancer Research Center and German Cancer Consortium, Heidelberg, Germany
- Institute of Experimental Cancer Therapy, Klinikum Rechts der Isar, School of Medicine, Technische Universität München, Munich, Germany
- Center for Translational Cancer Research (TranslaTUM), School of Medicine, Technische Universität München, Munich, Germany
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Roland Rad
- Center for Translational Cancer Research (TranslaTUM), School of Medicine, Technische Universität München, Munich, Germany
- Institute of Molecular Oncology and Functional Genomics, School of Medicine, Technische Universität München, Munich, Germany
- German Cancer Consortium Partner Site Munich, Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany
- School of Computation, Information and Technology (CIT), Technische Universität München, Munich, Germany
| | - Dieter Saur
- Division of Translational Cancer Research, German Cancer Research Center and German Cancer Consortium, Heidelberg, Germany.
- Institute of Experimental Cancer Therapy, Klinikum Rechts der Isar, School of Medicine, Technische Universität München, Munich, Germany.
- Center for Translational Cancer Research (TranslaTUM), School of Medicine, Technische Universität München, Munich, Germany.
| |
Collapse
|
6
|
Zhang W, Lin Z. iPoLNG-An unsupervised model for the integrative analysis of single-cell multiomics data. Front Genet 2023; 14:998504. [PMID: 36865385 PMCID: PMC9972291 DOI: 10.3389/fgene.2023.998504] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 01/24/2023] [Indexed: 02/09/2023] Open
Abstract
Single-cell multiomics technologies, where the transcriptomic and epigenomic profiles are simultaneously measured in the same set of single cells, pose significant challenges for effective integrative analysis. Here, we propose an unsupervised generative model, iPoLNG, for the effective and scalable integration of single-cell multiomics data. iPoLNG reconstructs low-dimensional representations of the cells and features using computationally efficient stochastic variational inference by modelling the discrete counts in single-cell multiomics data with latent factors. The low-dimensional representation of cells enables the identification of distinct cell types, and the feature by factor loading matrices help characterize cell-type specific markers and provide rich biological insights on the functional pathway enrichment analysis. iPoLNG is also able to handle the setting of partial information where certain modality of the cells is missing. Taking advantage of GPU and probabilistic programming, iPoLNG is scalable to large datasets and it takes less than 15 min to implement on datasets with 20,000 cells.
Collapse
Affiliation(s)
- Wenyu Zhang
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| | | |
Collapse
|
7
|
Zeng P, Ma Y, Lin Z. scAWMV: an adaptively weighted multi-view learning framework for the integrative analysis of parallel scRNA-seq and scATAC-seq data. Bioinformatics 2022; 39:6831091. [PMID: 36383176 PMCID: PMC9805575 DOI: 10.1093/bioinformatics/btac739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 10/16/2022] [Accepted: 11/15/2022] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Technological advances have enabled us to profile single-cell multi-omics data from the same cells, providing us with an unprecedented opportunity to understand the cellular phenotype and links to its genotype. The available protocols and multi-omics datasets [including parallel single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) data profiled from the same cell] are growing increasingly. However, such data are highly sparse and tend to have high level of noise, making data analysis challenging. The methods that integrate the multi-omics data can potentially improve the capacity of revealing the cellular heterogeneity. RESULTS We propose an adaptively weighted multi-view learning (scAWMV) method for the integrative analysis of parallel scRNA-seq and scATAC-seq data profiled from the same cell. scAWMV considers both the difference in importance across different modalities in multi-omics data and the biological connection of the features in the scRNA-seq and scATAC-seq data. It generates biologically meaningful low-dimensional representations for the transcriptomic and epigenomic profiles via unsupervised learning. Application to four real datasets demonstrates that our framework scAWMV is an efficient method to dissect cellular heterogeneity for single-cell multi-omics data. AVAILABILITY AND IMPLEMENTATION The software and datasets are available at https://github.com/pengchengzeng/scAWMV. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pengcheng Zeng
- Institute of Mathematical Sciences, ShanghaiTech University, Shanghai 201210, China
| | - Yuanyuan Ma
- School of Computer and Information Engineering, Anyang Normal University, Henan 455000, China
| | | |
Collapse
|
8
|
Chapelle N, Fantou A, Marron T, Kenigsberg E, Merad M, Martin JC. Single-cell profiling to transform immunotherapy usage and target discovery in immune-mediated inflammatory diseases. Front Immunol 2022; 13:1006944. [DOI: 10.3389/fimmu.2022.1006944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 10/24/2022] [Indexed: 11/09/2022] Open
Abstract
Immunotherapy drugs are transforming the clinical care landscape of major human diseases from cancer, to inflammatory diseases, cardiovascular diseases, neurodegenerative diseases and even aging. In polygenic immune-mediated inflammatory diseases (IMIDs), the clinical benefits of immunotherapy have nevertheless remained limited to a subset of patients. Yet the identification of new actionable molecular candidates has remained challenging, and the use of standard of care imaging and/or histological diagnostic assays has failed to stratify potential responders from non-responders to biotherapies already available. We argue that these limitations partly stem from a poor understanding of disease pathophysiology and insufficient characterization of the roles assumed by candidate targets during disease initiation, progression and treatment. By transforming the resolution and scale of tissue cell mapping, high-resolution profiling strategies offer unprecedented opportunities to the understanding of immunopathogenic events in human IMID lesions. Here we discuss the potential for single-cell technologies to reveal relevant pathogenic cellular programs in IMIDs and to enhance patient stratification to guide biotherapy eligibility and clinical trial design.
Collapse
|
9
|
Brombacher E, Hackenberg M, Kreutz C, Binder H, Treppner M. The performance of deep generative models for learning joint embeddings of single-cell multi-omics data. Front Mol Biosci 2022; 9:962644. [PMID: 36387277 PMCID: PMC9643784 DOI: 10.3389/fmolb.2022.962644] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 10/12/2022] [Indexed: 11/07/2023] Open
Abstract
Recent extensions of single-cell studies to multiple data modalities raise new questions regarding experimental design. For example, the challenge of sparsity in single-omics data might be partly resolved by compensating for missing information across modalities. In particular, deep learning approaches, such as deep generative models (DGMs), can potentially uncover complex patterns via a joint embedding. Yet, this also raises the question of sample size requirements for identifying such patterns from single-cell multi-omics data. Here, we empirically examine the quality of DGM-based integrations for varying sample sizes. We first review the existing literature and give a short overview of deep learning methods for multi-omics integration. Next, we consider eight popular tools in more detail and examine their robustness to different cell numbers, covering two of the most common multi-omics types currently favored. Specifically, we use data featuring simultaneous gene expression measurements at the RNA level and protein abundance measurements for cell surface proteins (CITE-seq), as well as data where chromatin accessibility and RNA expression are measured in thousands of cells (10x Multiome). We examine the ability of the methods to learn joint embeddings based on biological and technical metrics. Finally, we provide recommendations for the design of multi-omics experiments and discuss potential future developments.
Collapse
Affiliation(s)
- Eva Brombacher
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis and Modeling University of Freiburg, Freiburg, Germany
- Spemann Graduate School of Biology and Medicine (SGBM) University of Freiburg, Freiburg, Germany
- Centre for Integrative Biological Signaling Studies (CIBSS) University of Freiburg, Freiburg, Germany
- Faculty of Biology University of Freiburg, Freiburg, Germany
| | - Maren Hackenberg
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis and Modeling University of Freiburg, Freiburg, Germany
| | - Clemens Kreutz
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis and Modeling University of Freiburg, Freiburg, Germany
- Centre for Integrative Biological Signaling Studies (CIBSS) University of Freiburg, Freiburg, Germany
| | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis and Modeling University of Freiburg, Freiburg, Germany
| | - Martin Treppner
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis and Modeling University of Freiburg, Freiburg, Germany
| |
Collapse
|
10
|
Stanojevic S, Li Y, Ristivojevic A, Garmire LX. Computational Methods for Single-cell Multi-omics Integration and Alignment. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:836-849. [PMID: 36581065 PMCID: PMC10025765 DOI: 10.1016/j.gpb.2022.11.013] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 08/09/2022] [Accepted: 11/04/2022] [Indexed: 12/27/2022]
Abstract
Recently developed technologies to generate single-cell genomic data have made a revolutionary impact in the field of biology. Multi-omics assays offer even greater opportunities to understand cellular states and biological processes. The problem of integrating different omics data with very different dimensionality and statistical properties remains, however, quite challenging. A growing body of computational tools is being developed for this task, leveraging ideas ranging from machine translation to the theory of networks, and represents another frontier on the interface of biology and data science. Our goal in this review is to provide a comprehensive, up-to-date survey of computational techniques for the integration of single-cell multi-omics data, while making the concepts behind each algorithm approachable to a non-expert audience.
Collapse
Affiliation(s)
- Stefan Stanojevic
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yijun Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | | | - Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
11
|
Wang C, Fan X. Single-cell multi-omics sequencing and its applications in studying the nervous system. BIOPHYSICS REPORTS 2022; 8:136-149. [PMID: 37288245 PMCID: PMC10189649 DOI: 10.52601/bpr.2021.210031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 09/04/2021] [Indexed: 11/05/2022] Open
Abstract
Single-cell sequencing has become one of the most powerful and popular techniques in dissecting molecular heterogeneity and modeling the cellular architecture of a biological system. During the past twenty years, the throughput of single-cell sequencing has increased from hundreds of cells to over tens of thousands of cells in parallel. Moreover, this technology has been developed from sequencing transcriptome to measure different omics such as DNA methylome, chromatin accessibility, and so on. Currently, multi-omics which can analyze different omics in the same cell is rapidly advancing. This work advances the study of many biosystems, including the nervous system. Here, we review current single-cell multi-omics sequencing techniques and describe how they improve our understanding of the nervous system. Finally, we discuss the open scientific questions in neural research that may be answered through further improvement of single-cell multi-omics sequencing technology.
Collapse
Affiliation(s)
- Chaoyang Wang
- Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China
| | - Xiaoying Fan
- Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China
- The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510700, China
| |
Collapse
|
12
|
Mircea M, Semrau S. How a cell decides its own fate: a single-cell view of molecular mechanisms and dynamics of cell-type specification. Biochem Soc Trans 2021; 49:2509-2525. [PMID: 34854897 PMCID: PMC8786291 DOI: 10.1042/bst20210135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 11/06/2021] [Accepted: 11/08/2021] [Indexed: 12/13/2022]
Abstract
On its path from a fertilized egg to one of the many cell types in a multicellular organism, a cell turns the blank canvas of its early embryonic state into a molecular profile fine-tuned to achieve a vital organismal function. This remarkable transformation emerges from the interplay between dynamically changing external signals, the cell's internal, variable state, and tremendously complex molecular machinery; we are only beginning to understand. Recently developed single-cell omics techniques have started to provide an unprecedented, comprehensive view of the molecular changes during cell-type specification and promise to reveal the underlying gene regulatory mechanism. The exponentially increasing amount of quantitative molecular data being created at the moment is slated to inform predictive, mathematical models. Such models can suggest novel ways to manipulate cell types experimentally, which has important biomedical applications. This review is meant to give the reader a starting point to participate in this exciting phase of molecular developmental biology. We first introduce some of the principal molecular players involved in cell-type specification and discuss the important organizing ability of biomolecular condensates, which has been discovered recently. We then review some of the most important single-cell omics methods and relevant findings they produced. We devote special attention to the dynamics of the molecular changes and discuss methods to measure them, most importantly lineage tracing. Finally, we introduce a conceptual framework that connects all molecular agents in a mathematical model and helps us make sense of the experimental data.
Collapse
Affiliation(s)
- Maria Mircea
- Leiden Institute of Physics, Leiden University, Leiden, The Netherlands
| | - Stefan Semrau
- Leiden Institute of Physics, Leiden University, Leiden, The Netherlands
| |
Collapse
|
13
|
Rautenstrauch P, Vlot AHC, Saran S, Ohler U. Intricacies of single-cell multi-omics data integration. Trends Genet 2021; 38:128-139. [PMID: 34561102 DOI: 10.1016/j.tig.2021.08.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 08/20/2021] [Accepted: 08/23/2021] [Indexed: 02/06/2023]
Abstract
A wealth of single-cell protocols makes it possible to characterize different molecular layers at unprecedented resolution. Integrating the resulting multimodal single-cell data to find cell-to-cell correspondences remains a challenge. We argue that data integration needs to happen at a meaningful biological level of abstraction and that it is necessary to consider the inherent discrepancies between modalities to strike a balance between biological discovery and noise removal. A survey of current methods reveals that a distinction between technical and biological origins of presumed unwanted variation between datasets is not yet commonly considered. The increasing availability of paired multimodal data will aid the development of improved methods by providing a ground truth on cell-to-cell matches.
Collapse
Affiliation(s)
- Pia Rautenstrauch
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 10115 Berlin, Germany; Department of Computer Science, Humboldt Universität zu Berlin, 10117 Berlin, Germany
| | - Anna Hendrika Cornelia Vlot
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 10115 Berlin, Germany; Department of Computer Science, Humboldt Universität zu Berlin, 10117 Berlin, Germany
| | - Sepideh Saran
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 10115 Berlin, Germany
| | - Uwe Ohler
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 10115 Berlin, Germany; Department of Computer Science, Humboldt Universität zu Berlin, 10117 Berlin, Germany; Department of Biology, Humboldt Universität zu Berlin, 10117 Berlin, Germany.
| |
Collapse
|
14
|
Chen Y, Zhang Y, Li JYH, Ouyang Z. LISA2: Learning Complex Single-Cell Trajectory and Expression Trends. Front Genet 2021; 12:681206. [PMID: 34512717 PMCID: PMC8428276 DOI: 10.3389/fgene.2021.681206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 06/01/2021] [Indexed: 12/20/2022] Open
Abstract
Single-cell transcriptional and epigenomics profiles have been applied in a variety of tissues and diseases for discovering new cell types, differentiation trajectories, and gene regulatory networks. Many methods such as Monocle 2/3, URD, and STREAM have been developed for tree-based trajectory building. Here, we propose a fast and flexible trajectory learning method, LISA2, for single-cell data analysis. This new method has two distinctive features: (1) LISA2 utilizes specified leaves and root to reduce the complexity for building the developmental trajectory, especially for some special cases such as rare cell populations and adjacent terminal cell states; and (2) LISA2 is applicable for both transcriptomics and epigenomics data. LISA2 visualizes complex trajectories using 3D Landmark ISOmetric feature MAPping (L-ISOMAP). We apply LISA2 to simulation and real datasets in cerebellum, diencephalon, and hematopoietic stem cells including both single-cell transcriptomics data and single-cell assay for transposase-accessible chromatin data. LISA2 is efficient in estimating single-cell trajectory and expression trends for different kinds of molecular state of cells.
Collapse
Affiliation(s)
- Yang Chen
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, MA, United States
| | - Yuping Zhang
- Department of Statistics, University of Connecticut, Storrs, CT, United States
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, United States
| | - James Y. H. Li
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, United States
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut, Farmington, CT, United States
| | - Zhengqing Ouyang
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, MA, United States
| |
Collapse
|
15
|
Serr I, Drost F, Schubert B, Daniel C. Antigen-Specific Treg Therapy in Type 1 Diabetes - Challenges and Opportunities. Front Immunol 2021; 12:712870. [PMID: 34367177 PMCID: PMC8341764 DOI: 10.3389/fimmu.2021.712870] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 07/06/2021] [Indexed: 01/16/2023] Open
Abstract
Regulatory T cells (Tregs) are key mediators of peripheral self-tolerance and alterations in their frequencies, stability, and function have been linked to autoimmunity. The antigen-specific induction of Tregs is a long-envisioned goal for the treatment of autoimmune diseases given reduced side effects compared to general immunosuppressive therapies. However, the translation of antigen-specific Treg inducing therapies for the treatment or prevention of autoimmune diseases into the clinic remains challenging. In this mini review, we will discuss promising results for antigen-specific Treg therapies in allergy and specific challenges for such therapies in autoimmune diseases, with a focus on type 1 diabetes (T1D). We will furthermore discuss opportunities for antigen-specific Treg therapies in T1D, including combinatorial strategies and tissue-specific Treg targeting. Specifically, we will highlight recent advances in miRNA-targeting as a means to foster Tregs in autoimmunity. Additionally, we will discuss advances and perspectives of computational strategies for the detailed analysis of tissue-specific Tregs on the single-cell level.
Collapse
Affiliation(s)
- Isabelle Serr
- Group Immune Tolerance in Type 1 Diabetes, Helmholtz Diabetes Center at Helmholtz Zentrum München, Institute of Diabetes Research, Munich, Germany
- Deutsches Zentrum für Diabetesforschung (DZD), Neuherberg, Germany
| | - Felix Drost
- School of Life Sciences Weihenstephan, Technische Universität München, Garching bei München, Germany
| | - Benjamin Schubert
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- Department of Mathematics, Technische Universität München, Garching bei München, Germany
| | - Carolin Daniel
- Group Immune Tolerance in Type 1 Diabetes, Helmholtz Diabetes Center at Helmholtz Zentrum München, Institute of Diabetes Research, Munich, Germany
- Deutsches Zentrum für Diabetesforschung (DZD), Neuherberg, Germany
- Division of Clinical Pharmacology, Department of Medicine IV, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
16
|
Song Q, Su J, Zhang W. scGCN is a graph convolutional networks algorithm for knowledge transfer in single cell omics. Nat Commun 2021; 12:3826. [PMID: 34158507 PMCID: PMC8219725 DOI: 10.1038/s41467-021-24172-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 06/07/2021] [Indexed: 12/20/2022] Open
Abstract
Single-cell omics is the fastest-growing type of genomics data in the literature and public genomics repositories. Leveraging the growing repository of labeled datasets and transferring labels from existing datasets to newly generated datasets will empower the exploration of single-cell omics data. However, the current label transfer methods have limited performance, largely due to the intrinsic heterogeneity among cell populations and extrinsic differences between datasets. Here, we present a robust graph artificial intelligence model, single-cell Graph Convolutional Network (scGCN), to achieve effective knowledge transfer across disparate datasets. Through benchmarking with other label transfer methods on a total of 30 single cell omics datasets, scGCN consistently demonstrates superior accuracy on leveraging cells from different tissues, platforms, and species, as well as cells profiled at different molecular layers. scGCN is implemented as an integrated workflow as a python software, which is available at https://github.com/QSong-github/scGCN .
Collapse
Affiliation(s)
- Qianqian Song
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Wake Forest Baptist Medical Center, Winston Salem, NC, USA
- Department of Cancer Biology, Wake Forest School of Medicine, Winston Salem, NC, USA
| | - Jing Su
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA.
- Section on Gerontology and Geriatric Medicine, Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA.
| | - Wei Zhang
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Wake Forest Baptist Medical Center, Winston Salem, NC, USA.
- Department of Cancer Biology, Wake Forest School of Medicine, Winston Salem, NC, USA.
| |
Collapse
|
17
|
Computational principles and challenges in single-cell data integration. Nat Biotechnol 2021; 39:1202-1215. [PMID: 33941931 DOI: 10.1038/s41587-021-00895-7] [Citation(s) in RCA: 158] [Impact Index Per Article: 52.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/16/2021] [Indexed: 02/07/2023]
Abstract
The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term 'data integration' has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods.
Collapse
|
18
|
Li Y, Ma L, Wu D, Chen G. Advances in bulk and single-cell multi-omics approaches for systems biology and precision medicine. Brief Bioinform 2021; 22:6189773. [PMID: 33778867 DOI: 10.1093/bib/bbab024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 12/31/2020] [Accepted: 01/20/2021] [Indexed: 12/13/2022] Open
Abstract
Multi-omics allows the systematic understanding of the information flow across different omics layers, while single omics can mainly reflect one aspect of the biological system. The advancement of bulk and single-cell sequencing technologies and related computational methods for multi-omics largely facilitated the development of system biology and precision medicine. Single-cell approaches have the advantage of dissecting cellular dynamics and heterogeneity, whereas traditional bulk technologies are limited to individual/population-level investigation. In this review, we first summarize the technologies for producing bulk and single-cell multi-omics data. Then, we survey the computational approaches for integrative analysis of bulk and single-cell multimodal data, respectively. Moreover, the databases and data storage for multi-omics, as well as the tools for visualizing multimodal data are summarized. We also outline the integration between bulk and single-cell data, and discuss the applications of multi-omics in precision medicine. Finally, we present the challenges and perspectives for multi-omics development.
Collapse
Affiliation(s)
| | - Lu Ma
- China Normal University, China
| | | | | |
Collapse
|
19
|
Planell N, Lagani V, Sebastian-Leon P, van der Kloet F, Ewing E, Karathanasis N, Urdangarin A, Arozarena I, Jagodic M, Tsamardinos I, Tarazona S, Conesa A, Tegner J, Gomez-Cabrero D. STATegra: Multi-Omics Data Integration - A Conceptual Scheme With a Bioinformatics Pipeline. Front Genet 2021; 12:620453. [PMID: 33747045 PMCID: PMC7970106 DOI: 10.3389/fgene.2021.620453] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 01/20/2021] [Indexed: 12/13/2022] Open
Abstract
Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package.
Collapse
Affiliation(s)
- Nuria Planell
- Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
| | - Vincenzo Lagani
- Institute of Chemical Biology, Ilia State University, Tbilisi, Georgia
- Gnosis Data Analysis P.C., Heraklion, Greece
| | - Patricia Sebastian-Leon
- Department of Genomic and Systems Reproductive Medicine, IVI-RMA (Instituto Valenciano de Infertilidad – Reproductive Medicine Associates) IVI Foundation, Valencia, Spain
| | - Frans van der Kloet
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| | - Ewoud Ewing
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Nestoras Karathanasis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Greece
- Computational Medicine Center, Thomas Jefferson University, Philadelphia, PA, United States
| | - Arantxa Urdangarin
- Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
| | - Imanol Arozarena
- Cancer Signalling Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), Health Research Institute of Navarre (IdiSNA), Pamplona, Spain
| | - Maja Jagodic
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Ioannis Tsamardinos
- Gnosis Data Analysis P.C., Heraklion, Greece
- Computer Science Department, University of Crete, Heraklion, Greece
| | - Sonia Tarazona
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, València, Spain
| | - Ana Conesa
- Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, United States
- Genetics Institute, University of Florida, Gainesville, FL, United States
| | - Jesper Tegner
- Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Unit of Computational Medicine, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
- Science for Life Laboratory, Solna, Sweden
| | - David Gomez-Cabrero
- Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
- Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Unit of Computational Medicine, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
- Mucosal & Salivary Biology DivisionKing’s College London Dental Institute, London, United Kingdom
| |
Collapse
|
20
|
Martins MCM, Mafra V, Monte-Bello CC, Caldana C. The Contribution of Metabolomics to Systems Biology: Current Applications Bridging Genotype and Phenotype in Plant Science. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1346:91-105. [DOI: 10.1007/978-3-030-80352-0_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
21
|
Measuring evolutionary cancer dynamics from genome sequencing, one patient at a time. Stat Appl Genet Mol Biol 2020. [DOI: 10.1515/sagmb-2020-0075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
AbstractCancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The adoption of widespread next-generation sequencing technologies opens up for the possibility of measuring molecular profiles of cancers at multiple resolutions, across one or multiple patients. In this review we discuss how cancer genome sequencing data from a single tumour can be used to understand these evolutionary forces, overviewing mathematical models and inferential methods adopted in field of Cancer Evolution.
Collapse
|
22
|
Fiorentino J, Torres-Padilla ME, Scialdone A. Measuring and Modeling Single-Cell Heterogeneity and Fate Decision in Mouse Embryos. Annu Rev Genet 2020; 54:167-187. [PMID: 32867543 DOI: 10.1146/annurev-genet-021920-110200] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Cellular heterogeneity is a property of any living system; however, its relationship with cellular fate decision remains an open question. Recent technological advances have enabled valuable insights, especially in complex systems such as the mouse embryo. In this review, we discuss recent studies that characterize cellular heterogeneity at different levels during mouse development, from the two-cell stage up to gastrulation. In addition to key experimental findings, we review mathematical modeling approaches that help researchers interpret these findings. Disentangling the role of heterogeneity in cell fate decision will likely rely on the refined integration of experiments, large-scale omics data, and mathematical modeling, complemented by the use of synthetic embryos and gastruloids as promising in vitro models.
Collapse
Affiliation(s)
- Jonathan Fiorentino
- Institute of Epigenetics and Stem Cells (IES), Helmholtz Zentrum München, D-81377 München, Germany; .,Institute of Functional Epigenetics (IFE) and Institute of Computational Biology (ICB), Helmholtz Zentrum München, D-85764 Neuherberg, Germany
| | - Maria-Elena Torres-Padilla
- Institute of Epigenetics and Stem Cells (IES), Helmholtz Zentrum München, D-81377 München, Germany; .,Faculty of Biology, Ludwig-Maximilians Universität, D-82152 Planegg-Martinsried, Germany
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells (IES), Helmholtz Zentrum München, D-81377 München, Germany; .,Institute of Functional Epigenetics (IFE) and Institute of Computational Biology (ICB), Helmholtz Zentrum München, D-85764 Neuherberg, Germany
| |
Collapse
|
23
|
Zuo C, Chen L. Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data. Brief Bioinform 2020; 22:5985290. [PMID: 33200787 PMCID: PMC8293818 DOI: 10.1093/bib/bbaa287] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 08/30/2020] [Accepted: 09/30/2020] [Indexed: 12/31/2022] Open
Abstract
Simultaneous profiling transcriptomic and chromatin accessibility information in the same individual cells offers an unprecedented resolution to understand cell states. However, computationally effective methods for the integration of these inherent sparse and heterogeneous data are lacking. Here, we present a single-cell multimodal variational autoencoder model, which combines three types of joint-learning strategies with a probabilistic Gaussian Mixture Model to learn the joint latent features that accurately represent these multilayer profiles. Studies on both simulated datasets and real datasets demonstrate that it has more preferable capability (i) dissecting cellular heterogeneity in the joint-learning space, (ii) denoising and imputing data and (iii) constructing the association between multilayer omics data, which can be used for understanding transcriptional regulatory mechanisms.
Collapse
Affiliation(s)
- Chunman Zuo
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China.,Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223 China
| |
Collapse
|
24
|
Luo G, Gao Q, Zhang S, Yan B. Probing infectious disease by single-cell RNA sequencing: Progresses and perspectives. Comput Struct Biotechnol J 2020; 18:2962-2971. [PMID: 33106757 PMCID: PMC7577221 DOI: 10.1016/j.csbj.2020.10.016] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 10/12/2020] [Accepted: 10/13/2020] [Indexed: 02/07/2023] Open
Abstract
The increasing application of single-cell RNA sequencing (scRNA-seq) technology in life science and biomedical research has significantly increased our understanding of the cellular heterogeneities in immunology, oncology and developmental biology. This review will summarize the development of various scRNA-seq technologies; primarily discussing the application of scRNA-seq on infectious diseases, and exploring the current development, challenges, and potential applications of scRNA-seq technology in the future.
Collapse
Key Words
- 3C, Chromosome Conformation Capture
- ACE2, Angiotensin-Converting Enzyme 2
- ARDS, acute respiratory distress syndrome
- ATAC-seq, Assay for Transposase-Accessible Chromatin using sequencing
- BCR, B cell receptor
- CEL-seq, Cell Expression by Linear amplification and Sequencing
- CLU, clusterin
- COVID-19, corona virus disease 2019
- CRISPR, Clustered Regularly Interspaced Short Palindromic Repeats
- CytoSeq, gene expression cytometry
- DENV, dengue virus
- FACS, fluorescence-activated cell sorting
- GNLY, granulysin
- GO analysis, Gene Ontology analysis
- HIV, Human Immunodeficiency Virus
- IAV, Influenza A virus
- IGHV/HD/HJ/HC, Immune globulin heavy V/D/J/C/ region
- IGLV/LJ/LC, Immune globulin light V/J/C/ region
- ILC, Innate Lymphoid Cell
- Infectious diseases
- LIGER, Linked Inference of Genomics Experimental Relationships
- MAGIC, Markov Affinity-based Graph Imputation of Cells
- MARS-seq, Massively parallel single-cell RNA sequencing
- MATCHER, Manifold Alignment To CHaracterize Experimental Relationships
- MCMV, mouse cytomegalovirus
- MERFISH, Multiplexed, Error Robust Fluorescent In Situ Hybridization
- MLV, Moloney Murine Leukemia Virus
- MOFA, Multi-Omics Factor Analysis
- MOI, multiplicity of infection
- PBMCs, peripheral blood mononuclear cells
- PLAC8, placenta-associated 8
- SARS-CoV-2, severe acute respiratory syndrome coronavirus 2
- SAVER, Single-cell Analysis Via Expression Recovery
- SPLit-seq, split pool ligation-based tranome sequencing
- STARTRAC, Single T-cell Analysis by RNA sequencing and TCR TRACking
- STRT-seq, Single-cell Tagged Reverse Transcription sequencing
- Single-cell RNA sequencing
- TCR, T cell receptor
- TSLP, thymic stromal lymphopoietin
- UMAP, Uniform Manifold Approximation and Projection
- UMI, Unique Molecular Identifier
- mcSCRB-seq, molecular crowding single-cell RNA barcoding and sequencing
- pDCs, plasmacytoid dendritic cells
- scRNA-seq, single cell RNA sequencing technology
- sci-RNA-seq, single-cell combinatorial indexing RNA sequencing
- seqFISH, sequential Fluorescent In Situ Hybridization
- smart-seq, switching mechanism at 5′ end of the RNA transcript sequencing
- t-SNE, t-Distributed stochastic neighbor embedding
Collapse
Affiliation(s)
- Geyang Luo
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
- Shanghai Public Health Clinical Center and Key Laboratory of Medical Molecular Virology (MOE/NHC/CAMS), Shanghai Medical College and School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Qian Gao
- Shanghai Public Health Clinical Center and Key Laboratory of Medical Molecular Virology (MOE/NHC/CAMS), Shanghai Medical College and School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Shuye Zhang
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
| | - Bo Yan
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
| |
Collapse
|
25
|
Song M, Greenbaum J, Luttrell J, Zhou W, Wu C, Shen H, Gong P, Zhang C, Deng HW. A Review of Integrative Imputation for Multi-Omics Datasets. Front Genet 2020; 11:570255. [PMID: 33193667 PMCID: PMC7594632 DOI: 10.3389/fgene.2020.570255] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 09/16/2020] [Indexed: 01/05/2023] Open
Abstract
Multi-omics studies, which explore the interactions between multiple types of biological factors, have significant advantages over single-omics analysis for their ability to provide a more holistic view of biological processes, uncover the causal and functional mechanisms for complex diseases, and facilitate new discoveries in precision medicine. However, omics datasets often contain missing values, and in multi-omics study designs it is common for individuals to be represented for some omics layers but not all. Since most statistical analyses cannot be applied directly to the incomplete datasets, imputation is typically performed to infer the missing values. Integrative imputation techniques which make use of the correlations and shared information among multi-omics datasets are expected to outperform approaches that rely on single-omics information alone, resulting in more accurate results for the subsequent downstream analyses. In this review, we provide an overview of the currently available imputation methods for handling missing values in bioinformatics data with an emphasis on multi-omics imputation. In addition, we also provide a perspective on how deep learning methods might be developed for the integrative imputation of multi-omics datasets.
Collapse
Affiliation(s)
- Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Jonathan Greenbaum
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Weihua Zhou
- College of Computing, Michigan Technological University, Houghton, MI, United States
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Hui Shen
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| |
Collapse
|
26
|
Ma A, McDermaid A, Xu J, Chang Y, Ma Q. Integrative Methods and Practical Challenges for Single-Cell Multi-omics. Trends Biotechnol 2020; 38:1007-1022. [PMID: 32818441 PMCID: PMC7442857 DOI: 10.1016/j.tibtech.2020.02.013] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 02/27/2020] [Accepted: 02/28/2020] [Indexed: 12/19/2022]
Abstract
Fast-developing single-cell multimodal omics (scMulti-omics) technologies enable the measurement of multiple modalities, such as DNA methylation, chromatin accessibility, RNA expression, protein abundance, gene perturbation, and spatial information, from the same cell. scMulti-omics can comprehensively explore and identify cell characteristics, while also presenting challenges to the development of computational methods and tools for integrative analyses. Here, we review these integrative methods and summarize the existing tools for studying a variety of scMulti-omics data. The various functionalities and practical challenges in using the available tools in the public domain are explored through several case studies. Finally, we identify remaining challenges and future trends in scMulti-omics modeling and analyses.
Collapse
Affiliation(s)
- Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA
| | - Adam McDermaid
- Imagenetics, Sanford Health, Sioux Falls, SD 57104, USA; Department of Internal Medicine, University of South Dakota, Virmillion, SD 57069, USA
| | - Jennifer Xu
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Yuzhou Chang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA.
| |
Collapse
|
27
|
Philpott M, Cribbs AP, Brown T, Brown T, Oppermann U. Advances and challenges in epigenomic single-cell sequencing applications. Curr Opin Chem Biol 2020; 57:17-26. [PMID: 32304986 DOI: 10.1016/j.cbpa.2020.01.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 01/22/2020] [Indexed: 12/15/2022]
Abstract
Understanding multicellular physiology and pathobiology requires analysis of the relationship between genotype, chromatin organisation and phenotype. In the multi-omics era, many methods exist to investigate biological processes across the genome, transcriptome, epigenome, proteome and metabolome. Until recently, this was only possible for populations of cells or complex tissues, creating an averaging effect that may obscure direct correlations between multiple layers of data. Single-cell sequencing methods have removed this averaging effect, but computational integration after profiling distinct modalities separately may still not completely reflect underlying biology. Multiplexed assays resolving multiple modalities in the same cell are required to overcome these shortcomings and have the potential to deliver unprecedented understanding of biology and disease.
Collapse
Affiliation(s)
- Martin Philpott
- Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, NIHR Oxford BRU, University of Oxford, OX3 7LD, UK
| | - Adam P Cribbs
- Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, NIHR Oxford BRU, University of Oxford, OX3 7LD, UK
| | - Tom Brown
- ATDBio, Oxford Science Park, Robert Robinson Ave, Oxford, OX4 4GA, UK
| | - Tom Brown
- Department of Chemistry, University of Oxford, Oxford, OX1 3TF, UK
| | - Udo Oppermann
- Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, NIHR Oxford BRU, University of Oxford, OX3 7LD, UK.
| |
Collapse
|
28
|
Abstract
The molecular mechanisms and functions in complex biological systems currently remain elusive. Recent high-throughput techniques, such as next-generation sequencing, have generated a wide variety of multiomics datasets that enable the identification of biological functions and mechanisms via multiple facets. However, integrating these large-scale multiomics data and discovering functional insights are, nevertheless, challenging tasks. To address these challenges, machine learning has been broadly applied to analyze multiomics. This review introduces multiview learning-an emerging machine learning field-and envisions its potentially powerful applications to multiomics. In particular, multiview learning is more effective than previous integrative methods for learning data's heterogeneity and revealing cross-talk patterns. Although it has been applied to various contexts, such as computer vision and speech recognition, multiview learning has not yet been widely applied to biological data-specifically, multiomics data. Therefore, this paper firstly reviews recent multiview learning methods and unifies them in a framework called multiview empirical risk minimization (MV-ERM). We further discuss the potential applications of each method to multiomics, including genomics, transcriptomics, and epigenomics, in an aim to discover the functional and mechanistic interpretations across omics. Secondly, we explore possible applications to different biological systems, including human diseases (e.g., brain disorders and cancers), plants, and single-cell analysis, and discuss both the benefits and caveats of using multiview learning to discover the molecular mechanisms and functions of these systems.
Collapse
Affiliation(s)
- Nam D. Nguyen
- Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America
| | - Daifeng Wang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
29
|
Jin S, Zhang L, Nie Q. scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles. Genome Biol 2020; 21:25. [PMID: 32014031 PMCID: PMC6996200 DOI: 10.1186/s13059-020-1932-8] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 01/10/2020] [Indexed: 02/06/2023] Open
Abstract
Simultaneous measurements of transcriptomic and epigenomic profiles in the same individual cells provide an unprecedented opportunity to understand cell fates. However, effective approaches for the integrative analysis of such data are lacking. Here, we present a single-cell aggregation and integration (scAI) method to deconvolute cellular heterogeneity from parallel transcriptomic and epigenomic profiles. Through iterative learning, scAI aggregates sparse epigenomic signals in similar cells learned in an unsupervised manner, allowing coherent fusion with transcriptomic measurements. Simulation studies and applications to three real datasets demonstrate its capability of dissecting cellular heterogeneity within both transcriptomic and epigenomic layers and understanding transcriptional regulatory mechanisms.
Collapse
Affiliation(s)
- Suoqin Jin
- Department of Mathematics, University of California, Irvine, CA 92697 USA
| | - Lihua Zhang
- Department of Mathematics, University of California, Irvine, CA 92697 USA
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, CA 92697 USA
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, CA 92697 USA
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, CA 92697 USA
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697 USA
| |
Collapse
|
30
|
|
31
|
Nguyen ND, Blaby IK, Wang D. ManiNetCluster: a novel manifold learning approach to reveal the functional links between gene networks. BMC Genomics 2019; 20:1003. [PMID: 31888454 PMCID: PMC6936142 DOI: 10.1186/s12864-019-6329-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND The coordination of genomic functions is a critical and complex process across biological systems such as phenotypes or states (e.g., time, disease, organism, environmental perturbation). Understanding how the complexity of genomic function relates to these states remains a challenge. To address this, we have developed a novel computational method, ManiNetCluster, which simultaneously aligns and clusters gene networks (e.g., co-expression) to systematically reveal the links of genomic function between different conditions. Specifically, ManiNetCluster employs manifold learning to uncover and match local and non-linear structures among networks, and identifies cross-network functional links. RESULTS We demonstrated that ManiNetCluster better aligns the orthologous genes from their developmental expression profiles across model organisms than state-of-the-art methods (p-value <2.2×10-16). This indicates the potential non-linear interactions of evolutionarily conserved genes across species in development. Furthermore, we applied ManiNetCluster to time series transcriptome data measured in the green alga Chlamydomonas reinhardtii to discover the genomic functions linking various metabolic processes between the light and dark periods of a diurnally cycling culture. We identified a number of genes putatively regulating processes across each lighting regime. CONCLUSIONS ManiNetCluster provides a novel computational tool to uncover the genes linking various functions from different networks, providing new insight on how gene functions coordinate across different conditions. ManiNetCluster is publicly available as an R package at https://github.com/daifengwanglab/ManiNetCluster.
Collapse
Affiliation(s)
- Nam D Nguyen
- Deparment of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA
| | - Ian K Blaby
- Biology Department, Brookhaven National Laboratory, Upton, NY 11973, USA. .,US Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, 4720, CA, USA.
| | - Daifeng Wang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, 53726, WI, USA. .,Waisman Center, University of Wisconsin-Madison, Madison, 53705, WI, USA.
| |
Collapse
|
32
|
Abstract
The recent maturation of single-cell RNA sequencing (scRNA-seq) technologies has coincided with transformative new methods to profile genetic, epigenetic, spatial, proteomic and lineage information in individual cells. This provides unique opportunities, alongside computational challenges, for integrative methods that can jointly learn across multiple types of data. Integrated analysis can discover relationships across cellular modalities, learn a holistic representation of the cell state, and enable the pooling of data sets produced across individuals and technologies. In this Review, we discuss the recent advances in the collection and integration of different data types at single-cell resolution with a focus on the integration of gene expression data with other types of single-cell measurement.
Collapse
|
33
|
Zampieri G, Vijayakumar S, Yaneske E, Angione C. Machine and deep learning meet genome-scale metabolic modeling. PLoS Comput Biol 2019; 15:e1007084. [PMID: 31295267 PMCID: PMC6622478 DOI: 10.1371/journal.pcbi.1007084] [Citation(s) in RCA: 150] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Omic data analysis is steadily growing as a driver of basic and applied molecular biology research. Core to the interpretation of complex and heterogeneous biological phenotypes are computational approaches in the fields of statistics and machine learning. In parallel, constraint-based metabolic modeling has established itself as the main tool to investigate large-scale relationships between genotype, phenotype, and environment. The development and application of these methodological frameworks have occurred independently for the most part, whereas the potential of their integration for biological, biomedical, and biotechnological research is less known. Here, we describe how machine learning and constraint-based modeling can be combined, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved. We overlap systematic classifications from both frameworks, making them accessible to nonexperts. Finally, we delineate potential future scenarios, propose new joint theoretical frameworks, and suggest concrete points of investigation for this joint subfield. A multiview approach merging experimental and knowledge-driven omic data through machine learning methods can incorporate key mechanistic information in an otherwise biologically-agnostic learning process.
Collapse
Affiliation(s)
- Guido Zampieri
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
| | - Supreeta Vijayakumar
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
| | - Elisabeth Yaneske
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
| | - Claudio Angione
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
- Healthcare Innovation Centre, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
34
|
Schiller HB, Montoro DT, Simon LM, Rawlins EL, Meyer KB, Strunz M, Vieira Braga FA, Timens W, Koppelman GH, Budinger GRS, Burgess JK, Waghray A, van den Berge M, Theis FJ, Regev A, Kaminski N, Rajagopal J, Teichmann SA, Misharin AV, Nawijn MC. The Human Lung Cell Atlas: A High-Resolution Reference Map of the Human Lung in Health and Disease. Am J Respir Cell Mol Biol 2019; 61:31-41. [PMID: 30995076 PMCID: PMC6604220 DOI: 10.1165/rcmb.2018-0416tr] [Citation(s) in RCA: 128] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 04/17/2019] [Indexed: 12/13/2022] Open
Abstract
Lung disease accounts for every sixth death globally. Profiling the molecular state of all lung cell types in health and disease is currently revolutionizing the identification of disease mechanisms and will aid the design of novel diagnostic and personalized therapeutic regimens. Recent progress in high-throughput techniques for single-cell genomic and transcriptomic analyses has opened up new possibilities to study individual cells within a tissue, classify these into cell types, and characterize variations in their molecular profiles as a function of genetics, environment, cell-cell interactions, developmental processes, aging, or disease. Integration of these cell state definitions with spatial information allows the in-depth molecular description of cellular neighborhoods and tissue microenvironments, including the tissue resident structural and immune cells, the tissue matrix, and the microbiome. The Human Cell Atlas consortium aims to characterize all cells in the healthy human body and has prioritized lung tissue as one of the flagship projects. Here, we present the rationale, the approach, and the expected impact of a Human Lung Cell Atlas.
Collapse
Affiliation(s)
- Herbert B. Schiller
- Helmholtz Zentrum München, Institute of Lung Biology and Disease, Group Systems Medicine of Chronic Lung Disease, Member of the German Center for Lung Research (DZL), Munich, Germany
| | - Daniel T. Montoro
- Harvard Stem Cell Institute, Cambridge, Massachusetts
- Center for Regenerative Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Lukas M. Simon
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Emma L. Rawlins
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, United Kingdom
| | | | - Maximilian Strunz
- Helmholtz Zentrum München, Institute of Lung Biology and Disease, Group Systems Medicine of Chronic Lung Disease, Member of the German Center for Lung Research (DZL), Munich, Germany
| | | | - Wim Timens
- Department of Pathology and Medical Biology
- Groningen Research Institute for Asthma and COPD at the University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Gerard H. Koppelman
- Department of Pediatric Pulmonology and Pediatric Allergology, Beatrix Children’s Hospital, and
- Groningen Research Institute for Asthma and COPD at the University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - G. R. Scott Budinger
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Chicago, Illinois
| | - Janette K. Burgess
- Department of Pathology and Medical Biology
- Groningen Research Institute for Asthma and COPD at the University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Avinash Waghray
- Harvard Stem Cell Institute, Cambridge, Massachusetts
- Center for Regenerative Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Maarten van den Berge
- Department of Pulmonology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Groningen Research Institute for Asthma and COPD at the University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Fabian J. Theis
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- Department of Mathematics, Technische Universität München, Munich, Germany
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts
- Department of Biology, Howard Hughes Medical Institute and Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts; and
| | - Naftali Kaminski
- Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, Connecticut
| | - Jayaraj Rajagopal
- Harvard Stem Cell Institute, Cambridge, Massachusetts
- Center for Regenerative Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | | | - Alexander V. Misharin
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Chicago, Illinois
| | - Martijn C. Nawijn
- Department of Pathology and Medical Biology
- Groningen Research Institute for Asthma and COPD at the University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| |
Collapse
|
35
|
Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol 2019; 15:e8746. [PMID: 31217225 PMCID: PMC6582955 DOI: 10.15252/msb.20188746] [Citation(s) in RCA: 953] [Impact Index Per Article: 190.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 03/15/2019] [Accepted: 04/03/2019] [Indexed: 12/21/2022] Open
Abstract
Single-cell RNA-seq has enabled gene expression to be studied at an unprecedented resolution. The promise of this technology is attracting a growing user base for single-cell analysis methods. As more analysis tools are becoming available, it is becoming increasingly difficult to navigate this landscape and produce an up-to-date workflow to analyse one's data. Here, we detail the steps of a typical single-cell RNA-seq analysis, including pre-processing (quality control, normalization, data correction, feature selection, and dimensionality reduction) and cell- and gene-level downstream analysis. We formulate current best-practice recommendations for these steps based on independent comparison studies. We have integrated these best-practice recommendations into a workflow, which we apply to a public dataset to further illustrate how these steps work in practice. Our documented case study can be found at https://www.github.com/theislab/single-cell-tutorial This review will serve as a workflow tutorial for new entrants into the field, and help established users update their analysis pipelines.
Collapse
Affiliation(s)
- Malte D Luecken
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- Department of Mathematics, Technische Universität München, Garching bei München, Germany
| |
Collapse
|
36
|
Tritschler S, Büttner M, Fischer DS, Lange M, Bergen V, Lickert H, Theis FJ. Concepts and limitations for learning developmental trajectories from single cell genomics. Development 2019; 146. [DOI: 10.1242/dev.170506] [Citation(s) in RCA: 118] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
Abstract
ABSTRACT
Single cell genomics has become a popular approach to uncover the cellular heterogeneity of progenitor and terminally differentiated cell types with great precision. This approach can also delineate lineage hierarchies and identify molecular programmes of cell-fate acquisition and segregation. Nowadays, tens of thousands of cells are routinely sequenced in single cell-based methods and even more are expected to be analysed in the future. However, interpretation of the resulting data is challenging and requires computational models at multiple levels of abstraction. In contrast to other applications of single cell sequencing, where clustering approaches dominate, developmental systems are generally modelled using continuous structures, trajectories and trees. These trajectory models carry the promise of elucidating mechanisms of development, disease and stimulation response at very high molecular resolution. However, their reliable analysis and biological interpretation requires an understanding of their underlying assumptions and limitations. Here, we review the basic concepts of such computational approaches and discuss the characteristics of developmental processes that can be learnt from trajectory models.
Collapse
Affiliation(s)
- Sophie Tritschler
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85353 Freising, Germany
| | - Maren Büttner
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| | - David S. Fischer
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85353 Freising, Germany
| | - Marius Lange
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Volker Bergen
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Heiko Lickert
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- German Center for Diabetes Research, 85764 Neuherberg, Germany
- Institute of Stem Cell Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| |
Collapse
|
37
|
Hawe JS, Theis FJ, Heinig M. Inferring Interaction Networks From Multi-Omics Data. Front Genet 2019; 10:535. [PMID: 31249591 PMCID: PMC6582773 DOI: 10.3389/fgene.2019.00535] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 05/16/2019] [Indexed: 01/24/2023] Open
Abstract
A major goal in systems biology is a comprehensive description of the entirety of all complex interactions between different types of biomolecules-also referred to as the interactome-and how these interactions give rise to higher, cellular and organism level functions or diseases. Numerous efforts have been undertaken to define such interactomes experimentally, for example yeast-two-hybrid based protein-protein interaction networks or ChIP-seq based protein-DNA interactions for individual proteins. To complement these direct measurements, genome-scale quantitative multi-omics data (transcriptomics, proteomics, metabolomics, etc.) enable researchers to predict novel functional interactions between molecular species. Moreover, these data allow to distinguish relevant functional from non-functional interactions in specific biological contexts. However, integration of multi-omics data is not straight forward due to their heterogeneity. Numerous methods for the inference of interaction networks from homogeneous functional data exist, but with the advent of large-scale paired multi-omics data a new class of methods for inferring comprehensive networks across different molecular species began to emerge. Here we review state-of-the-art techniques for inferring the topology of interaction networks from functional multi-omics data, encompassing graphical models with multiple node types and quantitative-trait-loci (QTL) based approaches. In addition, we will discuss Bayesian aspects of network inference, which allow for leveraging already established biological information such as known protein-protein or protein-DNA interactions, to guide the inference process.
Collapse
Affiliation(s)
- Johann S. Hawe
- Institute of Computational Biology, HelmholtzZentrum München, Munich, Germany
- Department of Informatics, Technische Universität München, Munich, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, HelmholtzZentrum München, Munich, Germany
- Department of Mathematics, Technische Universität München, Munich, Germany
| | - Matthias Heinig
- Institute of Computational Biology, HelmholtzZentrum München, Munich, Germany
- Department of Informatics, Technische Universität München, Munich, Germany
| |
Collapse
|
38
|
Multi-omics at single-cell resolution: comparison of experimental and data fusion approaches. Curr Opin Biotechnol 2018; 55:159-166. [PMID: 30368064 DOI: 10.1016/j.copbio.2018.09.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 09/21/2018] [Accepted: 09/27/2018] [Indexed: 12/22/2022]
Abstract
Biological samples are inherently heterogeneous and complex. Tackling this complexity requires innovative technological and analytical solutions. Recent advances in high-throughput single-cell isolation and nucleic acid barcoding methods are rapidly changing the technological landscape of biological sciences and now make it possible to measure the (epi)genomic, transcriptomic, or proteomic state of individual cells. In addition, few experimental approaches enable multi-omics measurements of the same cell. However, merging-omics data collected from different experiments remains a considerable challenge. Although several strategies for merging transcriptomics datasets have recently been introduced, cell-to-cell variability and heterogeneity remains one of the confounding factors limiting data fusion and integration. Here, we focus our discussion on the latest single-cell technological and analytical solutions to achieve high data dimensionality and resolution. Obtaining datasets with a wealth of multi-omics information will undoubtedly provide new avenues for researchers to unravel the complexity of biological samples encountered in modern biological research and molecular diagnostics.
Collapse
|
39
|
Finotello F, Eduati F. Multi-Omics Profiling of the Tumor Microenvironment: Paving the Way to Precision Immuno-Oncology. Front Oncol 2018; 8:430. [PMID: 30345255 PMCID: PMC6182075 DOI: 10.3389/fonc.2018.00430] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 09/13/2018] [Indexed: 12/20/2022] Open
Abstract
The tumor microenvironment (TME) is a multifaceted ecosystem characterized by profound cellular heterogeneity, dynamicity, and complex intercellular cross-talk. The striking responses obtained with immune checkpoint blockers, i.e., antibodies targeting immune-cell regulators to boost antitumor immunity, have demonstrated the enormous potential of anticancer treatments that target TME components other than tumor cells. However, as checkpoint blockade is currently beneficial only to a limited fraction of patients, there is an urgent need to understand the mechanisms orchestrating the immune response in the TME to guide the rational design of more effective anticancer therapies. In this Mini Review, we give an overview of the methodologies that allow studying the heterogeneity of the TME from multi-omics data generated from bulk samples, single cells, or images of tumor-tissue slides. These include approaches for the characterization of the different cell phenotypes and for the reconstruction of their spatial organization and inter-cellular cross-talk. We discuss how this broader vision of the cellular heterogeneity and plasticity of tumors, which is emerging thanks to these methodologies, offers the opportunity to rationally design precision immuno-oncology treatments. These developments are fundamental to overcome the current limitations of targeted agents and checkpoint blockers and to bring long-term clinical benefits to a larger fraction of cancer patients.
Collapse
Affiliation(s)
- Francesca Finotello
- Biocenter, Division for Bioinformatics, Medical University of Innsbruck, Innsbruck, Austria
| | - Federica Eduati
- Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
| |
Collapse
|
40
|
Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, Buettner F, Huber W, Stegle O. Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol 2018; 14:e8124. [PMID: 29925568 PMCID: PMC6010767 DOI: 10.15252/msb.20178124] [Citation(s) in RCA: 488] [Impact Index Per Article: 81.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Revised: 05/28/2018] [Accepted: 05/29/2018] [Indexed: 12/19/2022] Open
Abstract
Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
Collapse
Affiliation(s)
- Ricard Argelaguet
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Britta Velten
- European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Damien Arnol
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | | | - Thorsten Zenz
- Heidelberg University Hospital, Heidelberg, Germany
- German Cancer Research Center (dkfz) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
- Germany & Hematology, University Hospital Zurich and University of Zurich, Zurich, Switzerland
| | - John C Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Florian Buettner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- Helmholtz Zentrum München-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Wolfgang Huber
- European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| |
Collapse
|