51
|
Curry AR, Ooi L, Matosin N. How spatial omics approaches can be used to map the biological impacts of stress in psychiatric disorders: a perspective, overview and technical guide. Stress 2024; 27:2351394. [PMID: 38752853 DOI: 10.1080/10253890.2024.2351394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 04/29/2024] [Indexed: 05/21/2024] Open
Abstract
Exposure to significant levels of stress and trauma throughout life is a leading risk factor for the development of major psychiatric disorders. Despite this, we do not have a comprehensive understanding of the mechanisms that explain how stress raises psychiatric disorder risk. Stress in humans is complex and produces variable molecular outcomes depending on the stress type, timing, and duration. Deciphering how stress increases disorder risk has consequently been challenging to address with the traditional single-target experimental approaches primarily utilized to date. Importantly, the molecular processes that occur following stress are not fully understood but are needed to find novel treatment targets. Sequencing-based omics technologies, allowing for an unbiased investigation of physiological changes induced by stress, are rapidly accelerating our knowledge of the molecular sequelae of stress at a single-cell resolution. Spatial multi-omics technologies are now also emerging, allowing for simultaneous analysis of functional molecular layers, from epigenome to proteome, with anatomical context. The technology has immense potential to transform our understanding of how disorders develop, which we believe will significantly propel our understanding of how specific risk factors, such as stress, contribute to disease course. Here, we provide our perspective of how we believe these technologies will transform our understanding of the neurobiology of stress, and also provided a technical guide to assist molecular psychiatry and stress researchers who wish to implement spatial omics approaches in their own research. Finally, we identify potential future directions using multi-omics technology in stress research.
Collapse
Affiliation(s)
- Amber R Curry
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW, Australia
- Molecular Horizons, School of Chemistry and Molecular Bioscience, Faculty of Science Medicine and Health, University of Wollongong, Wollongong, NSW, Australia
| | - Lezanne Ooi
- Molecular Horizons, School of Chemistry and Molecular Bioscience, Faculty of Science Medicine and Health, University of Wollongong, Wollongong, NSW, Australia
| | - Natalie Matosin
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW, Australia
- Molecular Horizons, School of Chemistry and Molecular Bioscience, Faculty of Science Medicine and Health, University of Wollongong, Wollongong, NSW, Australia
| |
Collapse
|
52
|
Bergman DR, Norton KA, Jain HV, Jackson T. Connecting Agent-Based Models with High-Dimensional Parameter Spaces to Multidimensional Data Using SMoRe ParS: A Surrogate Modeling Approach. Bull Math Biol 2023; 86:11. [PMID: 38159216 PMCID: PMC10757706 DOI: 10.1007/s11538-023-01240-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 11/22/2023] [Indexed: 01/03/2024]
Abstract
Across a broad range of disciplines, agent-based models (ABMs) are increasingly utilized for replicating, predicting, and understanding complex systems and their emergent behavior. In the biological and biomedical sciences, researchers employ ABMs to elucidate complex cellular and molecular interactions across multiple scales under varying conditions. Data generated at these multiple scales, however, presents a computational challenge for robust analysis with ABMs. Indeed, calibrating ABMs remains an open topic of research due to their own high-dimensional parameter spaces. In response to these challenges, we extend and validate our novel methodology, Surrogate Modeling for Reconstructing Parameter Surfaces (SMoRe ParS), arriving at a computationally efficient framework for connecting high dimensional ABM parameter spaces with multidimensional data. Specifically, we modify SMoRe ParS to initially confine high dimensional ABM parameter spaces using unidimensional data, namely, single time-course information of in vitro cancer cell growth assays. Subsequently, we broaden the scope of our approach to encompass more complex ABMs and constrain parameter spaces using multidimensional data. We explore this extension with in vitro cancer cell inhibition assays involving the chemotherapeutic agent oxaliplatin. For each scenario, we validate and evaluate the effectiveness of our approach by comparing how well ABM simulations match the experimental data when using SMoRe ParS-inferred parameters versus parameters inferred by a commonly used direct method. In so doing, we show that our approach of using an explicitly formulated surrogate model as an interlocutor between the ABM and the experimental data effectively calibrates the ABM parameter space to multidimensional data. Our method thus provides a robust and scalable strategy for leveraging multidimensional data to inform multiscale ABMs and explore the uncertainty in their parameters.
Collapse
Affiliation(s)
- Daniel R Bergman
- Department of Mathematics, University of Michigan, 530 Church Street, Ann Arbor, MI, 48109, USA
| | - Kerri-Ann Norton
- Computational Biology Laboratory, Computer Science Program, Bard College, 30 Campus Road, Annandale-on-Hudson, NY, 12504, USA
| | - Harsh Vardhan Jain
- Department of Mathematics & Statistics, University of Minnesota Duluth, 1117 University Drive, Duluth, MN, 55812, USA
| | - Trachette Jackson
- Department of Mathematics, University of Michigan, 530 Church Street, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
53
|
Lee AS, Ayers LJ, Kosicki M, Chan WM, Fozo LN, Pratt BM, Collins TE, Zhao B, Rose MF, Sanchis-Juan A, Fu JM, Wong I, Zhao X, Tenney AP, Lee C, Laricchia KM, Barry BJ, Bradford VR, Lek M, MacArthur DG, Lee EA, Talkowski ME, Brand H, Pennacchio LA, Engle EC. A cell type-aware framework for nominating non-coding variants in Mendelian regulatory disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.22.23300468. [PMID: 38234731 PMCID: PMC10793524 DOI: 10.1101/2023.12.22.23300468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Unsolved Mendelian cases often lack obvious pathogenic coding variants, suggesting potential non-coding etiologies. Here, we present a single cell multi-omic framework integrating embryonic mouse chromatin accessibility, histone modification, and gene expression assays to discover cranial motor neuron (cMN) cis-regulatory elements and subsequently nominate candidate non-coding variants in the congenital cranial dysinnervation disorders (CCDDs), a set of Mendelian disorders altering cMN development. We generated single cell epigenomic profiles for ~86,000 cMNs and related cell types, identifying ~250,000 accessible regulatory elements with cognate gene predictions for ~145,000 putative enhancers. Seventy-five percent of elements (44 of 59) validated in an in vivo transgenic reporter assay, demonstrating that single cell accessibility is a strong predictor of enhancer activity. Applying our cMN atlas to 899 whole genome sequences from 270 genetically unsolved CCDD pedigrees, we achieved significant reduction in our variant search space and nominated candidate variants predicted to regulate known CCDD disease genes MAFB, PHOX2A, CHN1, and EBF3 - as well as new candidates in recurrently mutated enhancers through peak- and gene-centric allelic aggregation. This work provides novel non-coding variant discoveries of relevance to CCDDs and a generalizable framework for nominating non-coding variants of potentially high functional impact in other Mendelian disorders.
Collapse
Affiliation(s)
- Arthur S. Lee
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Manton Center for Orphan Disease Research, Boston Children’s Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Lauren J. Ayers
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
| | - Michael Kosicki
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA
| | - Wai-Man Chan
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
- Howard Hughes Medical Institute, Chevy Chase, MD
| | - Lydia N. Fozo
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
| | - Brandon M. Pratt
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
| | - Thomas E. Collins
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
| | - Boxun Zhao
- Manton Center for Orphan Disease Research, Boston Children’s Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
| | - Matthew F. Rose
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Pathology, Boston Children's Hospital, Boston, MA
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
- Medical Genetics Training Program, Harvard Medical School, Boston, MA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| | - Jack M. Fu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Isaac Wong
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Alan P. Tenney
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Cassia Lee
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
- Harvard College, Cambridge, MA
| | - Kristen M. Laricchia
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Brenda J. Barry
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
- Howard Hughes Medical Institute, Chevy Chase, MD
| | - Victoria R. Bradford
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
| | - Monkol Lek
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Daniel G. MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, NSW, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Eunjung Alice Lee
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
- Department of Genetics, Harvard Medical School, Boston, MA
| | - Michael E. Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Harrison Brand
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
- Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Boston, MA
| | - Len A. Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA
| | - Elizabeth C. Engle
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Manton Center for Orphan Disease Research, Boston Children’s Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Howard Hughes Medical Institute, Chevy Chase, MD
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
- Medical Genetics Training Program, Harvard Medical School, Boston, MA
- Department of Ophthalmology, Boston Children’s Hospital and Harvard Medical School, Boston, MA
| |
Collapse
|
54
|
Guo ZH, Wu Y, Wang S, Zhang Q, Shi JM, Wang YB, Chen ZH. scInterpreter: a knowledge-regularized generative model for interpretably integrating scRNA-seq data. BMC Bioinformatics 2023; 24:481. [PMID: 38104057 PMCID: PMC10724984 DOI: 10.1186/s12859-023-05579-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/23/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND The rapid emergence of single-cell RNA-seq (scRNA-seq) data presents remarkable opportunities for broad investigations through integration analyses. However, most integration models are black boxes that lack interpretability or are hard to train. RESULTS To address the above issues, we propose scInterpreter, a deep learning-based interpretable model. scInterpreter substantially outperforms other state-of-the-art (SOTA) models in multiple benchmark datasets. In addition, scInterpreter is extensible and can integrate and annotate atlas scRNA-seq data. We evaluated the robustness of scInterpreter in a variety of situations. Through comparison experiments, we found that with a knowledge prior, the training process can be significantly accelerated. Finally, we conducted interpretability analysis for each dimension (pathway) of cell representation in the embedding space. CONCLUSIONS The results showed that the cell representations obtained by scInterpreter are full of biological significance. Through weight sorting, we found several new genes related to pathways in PBMC dataset. In general, scInterpreter is an effective and interpretable integration tool. It is expected that scInterpreter will bring great convenience to the study of single-cell transcriptomics.
Collapse
Affiliation(s)
- Zhen-Hao Guo
- College of Electronics and Information Engineering, Tongji University, Shanghai, 200000, China
- Department of Clinical Anesthesiology, Faculty of Anesthesiology, Second Military Medical University / Naval Medical University, Shanghai, 200433, China
| | - Yan Wu
- College of Electronics and Information Engineering, Tongji University, Shanghai, 200000, China.
| | - Siguo Wang
- EIT Institute for Advanced Study, Ningbo, 315201, Zhejiang, China
| | - Qinhu Zhang
- EIT Institute for Advanced Study, Ningbo, 315201, Zhejiang, China
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning, 530007, China
| | - Jin-Ming Shi
- Department of Endocrinology, Aviation General Hospital, Beijing, 100000, China
| | - Yan-Bin Wang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, Zhejiang, China
| | - Zhan-Heng Chen
- Department of Clinical Anesthesiology, Faculty of Anesthesiology, Second Military Medical University / Naval Medical University, Shanghai, 200433, China.
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning, 530007, China.
| |
Collapse
|
55
|
Shree A, Pavan MK, Zafar H. scDREAMER for atlas-level integration of single-cell datasets using deep generative model paired with adversarial classifier. Nat Commun 2023; 14:7781. [PMID: 38012145 PMCID: PMC10682386 DOI: 10.1038/s41467-023-43590-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 11/14/2023] [Indexed: 11/29/2023] Open
Abstract
Integration of heterogeneous single-cell sequencing datasets generated across multiple tissue locations, time, and conditions is essential for a comprehensive understanding of the cellular states and expression programs underlying complex biological systems. Here, we present scDREAMER ( https://github.com/Zafar-Lab/scDREAMER ), a data-integration framework that employs deep generative models and adversarial training for both unsupervised and supervised (scDREAMER-Sup) integration of multiple batches. Using six real benchmarking datasets, we demonstrate that scDREAMER can overcome critical challenges including skewed cell type distribution among batches, nested batch-effects, large number of batches and conservation of development trajectory across batches. Our experiments also show that scDREAMER and scDREAMER-Sup outperform state-of-the-art unsupervised and supervised integration methods respectively in batch-correction and conservation of biological variation. Using a 1 million cells dataset, we demonstrate that scDREAMER is scalable and can perform atlas-level cross-species (e.g., human and mouse) integration while being faster than other deep-learning-based methods.
Collapse
Affiliation(s)
- Ajita Shree
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India
| | - Musale Krushna Pavan
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India
| | - Hamim Zafar
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India.
- Department of Biological Sciences and Bioengineering, Indian Institute of Technology Kanpur, Kanpur, India.
- Mehta Family Centre for Engineering in Medicine, Indian Institute of Technology Kanpur, Kanpur, India.
| |
Collapse
|
56
|
Nazaret A, Fan JL, Lavallée VP, Cornish AE, Kiseliovas V, Masilionis I, Chun J, Bowman RL, Eisman SE, Wang J, Shi L, Levine RL, Mazutis L, Blei D, Pe'er D, Azizi E. Deep generative model deciphers derailed trajectories in acute myeloid leukemia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.11.566719. [PMID: 38014231 PMCID: PMC10680623 DOI: 10.1101/2023.11.11.566719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Single-cell genomics has the potential to map cell states and their dynamics in an unbiased way in response to perturbations like disease. However, elucidating the cell-state transitions from healthy to disease requires analyzing data from perturbed samples jointly with unperturbed reference samples. Existing methods for integrating and jointly visualizing single-cell datasets from distinct contexts tend to remove key biological differences or do not correctly harmonize shared mechanisms. We present Decipher, a model that combines variational autoencoders with deep exponential families to reconstruct derailed trajectories ( https://github.com/azizilab/decipher ). Decipher jointly represents normal and perturbed single-cell RNA-seq datasets, revealing shared and disrupted dynamics. It further introduces a novel approach to visualize data, without the need for methods such as UMAP or TSNE. We demonstrate Decipher on data from acute myeloid leukemia patient bone marrow specimens, showing that it successfully characterizes the divergence from normal hematopoiesis and identifies transcriptional programs that become disrupted in each patient when they acquire NPM1 driver mutations.
Collapse
|
57
|
Alexandrov T, Saez‐Rodriguez J, Saka SK. Enablers and challenges of spatial omics, a melting pot of technologies. Mol Syst Biol 2023; 19:e10571. [PMID: 37842805 PMCID: PMC10632737 DOI: 10.15252/msb.202110571] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 07/31/2023] [Accepted: 08/03/2023] [Indexed: 10/17/2023] Open
Abstract
Spatial omics has emerged as a rapidly growing and fruitful field with hundreds of publications presenting novel methods for obtaining spatially resolved information for any omics data type on spatial scales ranging from subcellular to organismal. From a technology development perspective, spatial omics is a highly interdisciplinary field that integrates imaging and omics, spatial and molecular analyses, sequencing and mass spectrometry, and image analysis and bioinformatics. The emergence of this field has not only opened a window into spatial biology, but also created multiple novel opportunities, questions, and challenges for method developers. Here, we provide the perspective of technology developers on what makes the spatial omics field unique. After providing a brief overview of the state of the art, we discuss technological enablers and challenges and present our vision about the future applications and impact of this melting pot.
Collapse
Affiliation(s)
- Theodore Alexandrov
- Structural and Computational Biology UnitEuropean Molecular Biology LaboratoryHeidelbergGermany
- Molecular Medicine Partnership UnitEuropean Molecular Biology LaboratoryHeidelbergGermany
- BioInnovation InstituteCopenhagenDenmark
| | - Julio Saez‐Rodriguez
- Molecular Medicine Partnership UnitEuropean Molecular Biology LaboratoryHeidelbergGermany
- Faculty of Medicine and Heidelberg University Hospital, Institute for Computational BiomedicineHeidelberg UniversityHeidelbergGermany
| | - Sinem K Saka
- Genome Biology UnitEuropean Molecular Biology LaboratoryHeidelbergGermany
| |
Collapse
|
58
|
Yampolskaya M, Herriges MJ, Ikonomou L, Kotton DN, Mehta P. scTOP: physics-inspired order parameters for cellular identification and visualization. Development 2023; 150:dev201873. [PMID: 37756586 PMCID: PMC10629677 DOI: 10.1242/dev.201873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
Advances in single-cell RNA sequencing provide an unprecedented window into cellular identity. The abundance of data requires new theoretical and computational frameworks to analyze the dynamics of differentiation and integrate knowledge from cell atlases. We present 'single-cell Type Order Parameters' (scTOP): a statistical, physics-inspired approach for quantifying cell identity given a reference basis of cell types. scTOP can accurately classify cells, visualize developmental trajectories and assess the fidelity of engineered cells. Importantly, scTOP does this without feature selection, statistical fitting or dimensional reduction (e.g. uniform manifold approximation and projection, principle components analysis, etc.). We illustrate the power of scTOP using human and mouse datasets. By reanalyzing mouse lung data, we characterize a transient hybrid alveolar type 1/alveolar type 2 cell population. Visualizations of lineage tracing hematopoiesis data using scTOP confirm that a single clone can give rise to multiple mature cell types. We assess the transcriptional similarity between endogenous and donor-derived cells in the context of murine pulmonary cell transplantation. Our results suggest that physics-inspired order parameters can be an important tool for understanding differentiation and characterizing engineered cells. scTOP is available as an easy-to-use Python package.
Collapse
Affiliation(s)
| | - Michael J. Herriges
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Laertis Ikonomou
- Department of Oral Biology, University at Buffalo, The State University of New York, Buffalo, NY 14215, USA
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University at Buffalo, The State University of New York, Buffalo, NY 14215, USA
| | - Darrell N. Kotton
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Pankaj Mehta
- Department of Physics, Boston University, Boston, MA 02215, USA
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- Faculty of Computing and Data Science, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| |
Collapse
|
59
|
Qian W, Yang Z. Identification of cell-type-specific genes in multimodal single-cell data using deep neural network algorithm. Comput Biol Med 2023; 166:107498. [PMID: 37738895 DOI: 10.1016/j.compbiomed.2023.107498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 08/15/2023] [Accepted: 09/15/2023] [Indexed: 09/24/2023]
Abstract
The emergence of single-cell RNA sequencing (scRNA-seq) technology makes it possible to measure DNA, RNA, and protein in a single cell. Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq) is a powerful multimodal single-cell research innovation, allowing researchers to capture RNA and surface protein expression on the same cells. Currently, identification of cell-type-specific genes in CITE-seq data is still challenging. In this study, we obtained a set of CITE-seq datasets from Kaggle database, which included the sequencing dataset of seven cell types during bone marrow stem cell differentiation. We used Student's t-test to analyze these transcription RNAs and pick out 133 significantly differentially expressed genes (DEGs) among all cell types. Functional enrichment revealed that these DEGs were strongly associated with blood-related diseases, providing important insights into the cellular heterogeneity within bone marrow stem cells. The relation between RNA and protein levels was performed by deep neural network (DNN) model and achieved a high prediction score of 0.867. Based on their coefficients in the DNN model, three genes (LGALS1, CENPV, TRIM24) were identified as cell-type-specific genes in erythrocyte progenitor. Our works provide a novel perspective regarding the differentiation of stem cells in the bone marrow and provide valuable insights for further research in this field.
Collapse
Affiliation(s)
- Weiye Qian
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, PR China
| | - Zhiyuan Yang
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, PR China.
| |
Collapse
|
60
|
De Donno C, Hediyeh-Zadeh S, Moinfar AA, Wagenstetter M, Zappia L, Lotfollahi M, Theis FJ. Population-level integration of single-cell datasets enables multi-scale analysis across samples. Nat Methods 2023; 20:1683-1692. [PMID: 37813989 PMCID: PMC10630133 DOI: 10.1038/s41592-023-02035-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 09/05/2023] [Indexed: 10/11/2023]
Abstract
The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.
Collapse
Affiliation(s)
- Carlo De Donno
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | | | - Amir Ali Moinfar
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany
| | - Marco Wagenstetter
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Luke Zappia
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany
| | - Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
61
|
Patruno L, Milite S, Bergamin R, Calonaci N, D’Onofrio A, Anselmi F, Antoniotti M, Graudenzi A, Caravagna G. A Bayesian method to infer copy number clones from single-cell RNA and ATAC sequencing. PLoS Comput Biol 2023; 19:e1011557. [PMID: 37917660 PMCID: PMC10645363 DOI: 10.1371/journal.pcbi.1011557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/14/2023] [Accepted: 09/30/2023] [Indexed: 11/04/2023] Open
Abstract
Single-cell RNA and ATAC sequencing technologies enable the examination of gene expression and chromatin accessibility in individual cells, providing insights into cellular phenotypes. In cancer research, it is important to consistently analyze these states within an evolutionary context on genetic clones. Here we present CONGAS+, a Bayesian model to map single-cell RNA and ATAC profiles onto the latent space of copy number clones. CONGAS+ clusters cells into tumour subclones with similar ploidy, rendering straightforward to compare their expression and chromatin profiles. The framework, implemented on GPU and tested on real and simulated data, scales to analyse seamlessly thousands of cells, demonstrating better performance than single-molecule models, and supporting new multi-omics assays. In prostate cancer, lymphoma and basal cell carcinoma, CONGAS+ successfully identifies complex subclonal architectures while providing a coherent mapping between ATAC and RNA, facilitating the study of genotype-phenotype maps and their connection to genomic instability.
Collapse
Affiliation(s)
- Lucrezia Patruno
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Salvatore Milite
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
- Centre for Computational Biology, Human Technopole, Milan, Italy
| | - Riccardo Bergamin
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Nicola Calonaci
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Alberto D’Onofrio
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Fabio Anselmi
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- B4—Bicocca Bioinformatics Biostatistics and Bioimaging Centre, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- B4—Bicocca Bioinformatics Biostatistics and Bioimaging Centre, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Giulio Caravagna
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| |
Collapse
|
62
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 66] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
63
|
Lee MYY, Kaestner KH, Li M. Benchmarking algorithms for joint integration of unpaired and paired single-cell RNA-seq and ATAC-seq data. Genome Biol 2023; 24:244. [PMID: 37875977 PMCID: PMC10594700 DOI: 10.1186/s13059-023-03073-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 09/25/2023] [Indexed: 10/26/2023] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) measures gene expression in single cells, while single-nucleus ATAC-sequencing (snATAC-seq) quantifies chromatin accessibility in single nuclei. These two data types provide complementary information for deciphering cell types and states. However, when analyzed individually, they sometimes produce conflicting results regarding cell type/state assignment. The power is compromised since the two modalities reflect the same underlying biology. Recently, it has become possible to measure both gene expression and chromatin accessibility from the same nucleus. Such paired data enable the direct modeling of the relationships between the two modalities. Given the availability of the vast amount of single-modality data, it is desirable to integrate the paired and unpaired single-modality datasets to gain a comprehensive view of the cellular complexity. RESULTS We benchmark nine existing single-cell multi-omic data integration methods. Specifically, we evaluate to what extent the multiome data provide additional guidance for analyzing the existing single-modality data, and whether these methods uncover peak-gene associations from single-modality data. Our results indicate that multiome data are helpful for annotating single-modality data. However, we emphasize that the availability of an adequate number of nuclei in the multiome dataset is crucial for achieving accurate cell type annotation. Insufficient representation of nuclei may compromise the reliability of the annotations. Additionally, when generating a multiome dataset, the number of cells is more important than sequencing depth for cell type annotation. CONCLUSIONS Seurat v4 is the best currently available platform for integrating scRNA-seq, snATAC-seq, and multiome data even in the presence of complex batch effects.
Collapse
Affiliation(s)
- Michelle Y Y Lee
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Philadelphia, PA, 19104, USA
| | - Klaus H Kaestner
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
64
|
Oliva M, Lister R. Exploring the identity of individual plant cells in space and time. THE NEW PHYTOLOGIST 2023; 240:61-67. [PMID: 37483019 PMCID: PMC10952157 DOI: 10.1111/nph.19153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 06/17/2023] [Indexed: 07/25/2023]
Abstract
In recent years, single-cell genomics, coupled to imaging techniques, have become the state-of-the-art approach for characterising biological systems. In plant sciences, a variety of tissues and species have been profiled, providing an enormous quantity of data on cell identity at an unprecedented resolution, but what biological insights can be gained from such data sets? Using recently published studies in plant sciences, we will highlight how single-cell technologies have enabled a better comprehension of tissue organisation, cell fate dynamics in development or in response to various stimuli, as well as identifying key transcriptional regulators of cell identity. We discuss the limitations and technical hurdles to overcome, as well as future directions, and the promising use of single-cell omics to understand, predict, and manipulate plant development and physiology.
Collapse
Affiliation(s)
- Marina Oliva
- ARC Centre of Excellence in Plant Energy Biology, School of Molecular SciencesUniversity of Western AustraliaPerthWA6009Australia
| | - Ryan Lister
- ARC Centre of Excellence in Plant Energy Biology, School of Molecular SciencesUniversity of Western AustraliaPerthWA6009Australia
- The Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical ResearchThe University of Western AustraliaPerthWA6009Australia
| |
Collapse
|
65
|
Li W, Xiang B, Yang F, Rong Y, Yin Y, Yao J, Zhang H. scMHNN: a novel hypergraph neural network for integrative analysis of single-cell epigenomic, transcriptomic and proteomic data. Brief Bioinform 2023; 24:bbad391. [PMID: 37930028 DOI: 10.1093/bib/bbad391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 09/09/2023] [Accepted: 10/11/2023] [Indexed: 11/07/2023] Open
Abstract
Technological advances have now made it possible to simultaneously profile the changes of epigenomic, transcriptomic and proteomic at the single cell level, allowing a more unified view of cellular phenotypes and heterogeneities. However, current computational tools for single-cell multi-omics data integration are mainly tailored for bi-modality data, so new tools are urgently needed to integrate tri-modality data with complex associations. To this end, we develop scMHNN to integrate single-cell multi-omics data based on hypergraph neural network. After modeling the complex data associations among various modalities, scMHNN performs message passing process on the multi-omics hypergraph, which can capture the high-order data relationships and integrate the multiple heterogeneous features. Followingly, scMHNN learns discriminative cell representation via a dual-contrastive loss in self-supervised manner. Based on the pretrained hypergraph encoder, we further introduce the pre-training and fine-tuning paradigm, which allows more accurate cell-type annotation with only a small number of labeled cells as reference. Benchmarking results on real and simulated single-cell tri-modality datasets indicate that scMHNN outperforms other competing methods on both cell clustering and cell-type annotation tasks. In addition, we also demonstrate scMHNN facilitates various downstream tasks, such as cell marker detection and enrichment analysis.
Collapse
Affiliation(s)
- Wei Li
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350 Tianjin, China
- AI Lab, Tencent, Gaoxin 9th South Road, 518000 Shenzhen, China
| | - Bin Xiang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Yueyang Road, 200031 Shanghai, China
| | - Fan Yang
- AI Lab, Tencent, Gaoxin 9th South Road, 518000 Shenzhen, China
| | - Yu Rong
- AI Lab, Tencent, Gaoxin 9th South Road, 518000 Shenzhen, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska - Lincoln, 1400 R Street, 68588 Nebraska, USA
| | - Jianhua Yao
- AI Lab, Tencent, Gaoxin 9th South Road, 518000 Shenzhen, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350 Tianjin, China
| |
Collapse
|
66
|
Zhang C, Yang Y, Tang S, Aihara K, Zhang C, Chen L. Contrastively generative self-expression model for single-cell and spatial multimodal data. Brief Bioinform 2023; 24:bbad265. [PMID: 37507114 DOI: 10.1093/bib/bbad265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/27/2023] [Accepted: 07/03/2023] [Indexed: 07/30/2023] Open
Abstract
Advances in single-cell multi-omics technology provide an unprecedented opportunity to fully understand cellular heterogeneity. However, integrating omics data from multiple modalities is challenging due to the individual characteristics of each measurement. Here, to solve such a problem, we propose a contrastive and generative deep self-expression model, called single-cell multimodal self-expressive integration (scMSI), which integrates the heterogeneous multimodal data into a unified manifold space. Specifically, scMSI first learns each omics-specific latent representation and self-expression relationship to consider the characteristics of different omics data by deep self-expressive generative model. Then, scMSI combines these omics-specific self-expression relations through contrastive learning. In such a way, scMSI provides a paradigm to integrate multiple omics data even with weak relation, which effectively achieves the representation learning and data integration into a unified framework. We demonstrate that scMSI provides a cohesive solution for a variety of analysis tasks, such as integration analysis, data denoising, batch correction and spatial domain detection. We have applied scMSI on various single-cell and spatial multimodal datasets to validate its high effectiveness and robustness in diverse data types and application scenarios.
Collapse
Affiliation(s)
- Chengming Zhang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| | - Yiwen Yang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Shijie Tang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Kazuyuki Aihara
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| | - Chuanchao Zhang
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong 519031, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong 519031, China
| |
Collapse
|
67
|
Athaya T, Ripan RC, Li X, Hu H. Multimodal deep learning approaches for single-cell multi-omics data integration. Brief Bioinform 2023; 24:bbad313. [PMID: 37651607 PMCID: PMC10516349 DOI: 10.1093/bib/bbad313] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/23/2023] [Accepted: 07/18/2023] [Indexed: 09/02/2023] Open
Abstract
Integrating single-cell multi-omics data is a challenging task that has led to new insights into complex cellular systems. Various computational methods have been proposed to effectively integrate these rapidly accumulating datasets, including deep learning. However, despite the proven success of deep learning in integrating multi-omics data and its better performance over classical computational methods, there has been no systematic study of its application to single-cell multi-omics data integration. To fill this gap, we conducted a literature review to explore the use of multimodal deep learning techniques in single-cell multi-omics data integration, taking into account recent studies from multiple perspectives. Specifically, we first summarized different modalities found in single-cell multi-omics data. We then reviewed current deep learning techniques for processing multimodal data and categorized deep learning-based integration methods for single-cell multi-omics data according to data modality, deep learning architecture, fusion strategy, key tasks and downstream analysis. Finally, we provided insights into using these deep learning models to integrate multi-omics data and better understand single-cell biological mechanisms.
Collapse
Affiliation(s)
- Tasbiraha Athaya
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| | - Rony Chowdhury Ripan
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, Florida, United States of America
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| |
Collapse
|
68
|
Chari T, Gorin G, Pachter L. Biophysically Interpretable Inference of Cell Types from Multimodal Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.17.558131. [PMID: 37745403 PMCID: PMC10516047 DOI: 10.1101/2023.09.17.558131] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Multimodal, single-cell genomics technologies enable simultaneous capture of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell types, with applications ranging from inferring kinetic differences between cells, to the role of stochasticity in driving heterogeneity. However, current methods for determining cell types or 'clusters' present in multimodal data often rely on ad hoc or independent treatment of modalities, and assumptions ignoring inherent properties of the count data. To enable interpretable and consistent cell cluster determination from multimodal data, we present meK-Means (mechanistic K-Means) which integrates modalities and learns underlying, shared biophysical states through a unifying model of transcription. In particular, we demonstrate how meK-Means can be used to cluster cells from unspliced and spliced mRNA count modalities. By utilizing the causal, physical relationships underlying these modalities, we identify shared transcriptional kinetics across cells, which induce the observed gene expression profiles, and provide an alternative definition for 'clusters' through the governing parameters of cellular processes.
Collapse
Affiliation(s)
- Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Gennady Gorin
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
69
|
Xue L, Wu Y, Lin Y. Dissecting and improving gene regulatory network inference using single-cell transcriptome data. Genome Res 2023; 33:1609-1621. [PMID: 37580132 PMCID: PMC10620053 DOI: 10.1101/gr.277488.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 08/07/2023] [Indexed: 08/16/2023]
Abstract
Single-cell transcriptome data has been widely used to reconstruct gene regulatory networks (GRNs) controlling critical biological processes such as development and differentiation. Although a growing list of algorithms has been developed to infer GRNs using such data, achieving an inference accuracy consistently higher than random guessing has remained challenging. To address this, it is essential to delineate how the accuracy of regulatory inference is limited. Here, we systematically characterized factors limiting the accuracy of inferred GRNs and demonstrated that using pre-mRNA information can help improve regulatory inference compared to the typically used information (i.e., mature mRNA). Using kinetic modeling and simulated single-cell data sets, we showed that target genes' mature mRNA levels often fail to accurately report upstream regulatory activities because of gene-level and network-level factors, which can be improved by using pre-mRNA levels. We tested this finding on public single-cell RNA-seq data sets using intronic reads as proxies of pre-mRNA levels and can indeed achieve a higher inference accuracy compared to using exonic reads (corresponding to mature mRNAs). Using experimental data sets, we further validated findings from the simulated data sets and identified factors such as transcription factor activity dynamics influencing the accuracy of pre-mRNA-based inference. This work delineates the fundamental limitations of gene regulatory inference and helps improve GRN inference using single-cell RNA-seq data.
Collapse
Affiliation(s)
- Lingfeng Xue
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
| | - Yan Wu
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China, 100871
| | - Yihan Lin
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871;
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China, 100871
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
| |
Collapse
|
70
|
Tan Y, Huang J, Li D, Zou C, Liu D, Qin B. Single-cell RNA sequencing in dissecting microenvironment of age-related macular degeneration: Challenges and perspectives. Ageing Res Rev 2023; 90:102030. [PMID: 37549871 DOI: 10.1016/j.arr.2023.102030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 04/29/2023] [Accepted: 08/04/2023] [Indexed: 08/09/2023]
Abstract
Age-related macular degeneration (AMD) is the leading cause of blindness in individuals over the age of 50 years, yet its etiology and pathogenesis largely remain uncovered. Single-cell RNA sequencing (scRNA-seq) technologies are recently developed and have a number of advantages over conventional bulk RNA sequencing techniques in uncovering the heterogeneity of complex microenvironments containing numerous cell types and cell communications during various biological processes. In this review, we summarize the latest discovered cellular components and regulatory mechanisms during AMD development revealed by scRNA-seq. In addition, we discuss the main challenges and future directions in exploring the pathophysiology of AMD equipped with single-cell technologies. Our review underscores the importance of multimodal single-cell platforms (such as single-cell spatiotemporal multi-omics and single-cell exosome omics) as new approaches for basic and clinical AMD research in identifying biomarker, characterizing cellular responses to drug treatment and environmental stimulation.
Collapse
Affiliation(s)
- Yao Tan
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
| | - Jianguo Huang
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
| | - Deshuang Li
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
| | - Chang Zou
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China; Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China; School of Life and Health Sciences, The Chinese University of Kong Hong, Shenzhen 518000, Guangdong, China.
| | - Dongcheng Liu
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China; Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China.
| | - Bo Qin
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China; Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China; Aier School of Ophthalmology, Central South University, Changsha, China.
| |
Collapse
|
71
|
Fouché A, Chadoutaud L, Delattre O, Zinovyev A. Transmorph: a unifying computational framework for modular single-cell RNA-seq data integration. NAR Genom Bioinform 2023; 5:lqad069. [PMID: 37448589 PMCID: PMC10336778 DOI: 10.1093/nargab/lqad069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 06/02/2023] [Accepted: 07/10/2023] [Indexed: 07/15/2023] Open
Abstract
Data integration of single-cell RNA-seq (scRNA-seq) data describes the task of embedding datasets gathered from different sources or experiments into a common representation so that cells with similar types or states are embedded close to one another independently from their dataset of origin. Data integration is a crucial step in most scRNA-seq data analysis pipelines involving multiple batches. It improves data visualization, batch effect reduction, clustering, label transfer, and cell type inference. Many data integration tools have been proposed during the last decade, but a surge in the number of these methods has made it difficult to pick one for a given use case. Furthermore, these tools are provided as rigid pieces of software, making it hard to adapt them to various specific scenarios. In order to address both of these issues at once, we introduce the transmorph framework. It allows the user to engineer powerful data integration pipelines and is supported by a rich software ecosystem. We demonstrate transmorph usefulness by solving a variety of practical challenges on scRNA-seq datasets including joint datasets embedding, gene space integration, and transfer of cycle phase annotations. transmorph is provided as an open source python package.
Collapse
Affiliation(s)
- Aziz Fouché
- To whom correspondence should be addressed. Tel: +33 156246989;
| | - Loïc Chadoutaud
- Institut Curie, PSL Research University, 75005 Paris, France
- INSERM, 75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, 75005 Paris, France
| | - Olivier Delattre
- INSERM U830, Equipe Labellisée LNCC, SIREDO Oncology Centre, Institut Curie, 75005 Paris, France
| | - Andrei Zinovyev
- Correspondence may also be addressed to Andrei Zinovyev. Tel: +33 156246989;
| |
Collapse
|
72
|
Zhou M, Zhang H, Bai Z, Mann-Krzisnik D, Wang F, Li Y. Single-cell multi-omics topic embedding reveals cell-type-specific and COVID-19 severity-related immune signatures. CELL REPORTS METHODS 2023; 3:100563. [PMID: 37671028 PMCID: PMC10475851 DOI: 10.1016/j.crmeth.2023.100563] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 03/31/2023] [Accepted: 07/28/2023] [Indexed: 09/07/2023]
Abstract
The advent of single-cell multi-omics sequencing technology makes it possible for researchers to leverage multiple modalities for individual cells and explore cell heterogeneity. However, the high-dimensional, discrete, and sparse nature of the data make the downstream analysis particularly challenging. Here, we propose an interpretable deep learning method called moETM to perform integrative analysis of high-dimensional single-cell multimodal data. moETM integrates multiple omics data via a product-of-experts in the encoder and employs multiple linear decoders to learn the multi-omics signatures. moETM demonstrates superior performance compared with six state-of-the-art methods on seven publicly available datasets. By applying moETM to the scRNA + scATAC data, we identified sequence motifs corresponding to the transcription factors regulating immune gene signatures. Applying moETM to CITE-seq data from the COVID-19 patients revealed not only known immune cell-type-specific signatures but also composite multi-omics biomarkers of critical conditions due to COVID-19, thus providing insights from both biological and clinical perspectives.
Collapse
Affiliation(s)
- Manqi Zhou
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY 10021, USA
| | - Hao Zhang
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10021, USA
| | - Zilong Bai
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY 10021, USA
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10021, USA
| | | | - Fei Wang
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY 10021, USA
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10021, USA
| | - Yue Li
- Quantitative Life Science, McGill University, Montréal, QC H3A 0G4, Canada
- School of Computer Science, McGill University, Montréal, QC H3A 0G4, Canada
- Mila – Quebec AI Institute, Montréal, QC H2S 3H1, Canada
| |
Collapse
|
73
|
Gunawan I, Vafaee F, Meijering E, Lock JG. An introduction to representation learning for single-cell data analysis. CELL REPORTS METHODS 2023; 3:100547. [PMID: 37671013 PMCID: PMC10475795 DOI: 10.1016/j.crmeth.2023.100547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze these complex, high-dimensional data by projecting them into lower-dimensional embeddings. This facilitates the interpretation and interrogation of the structures, dynamics, and regulation of cell heterogeneity. Reflecting their central role in analyzing diverse single-cell data types, a myriad of representation learning methods exist, with new approaches continually emerging. Here, we contrast general features of representation learning methods spanning statistical, manifold learning, and neural network approaches. We consider key steps involved in representation learning with single-cell data, including data pre-processing, hyperparameter optimization, downstream analysis, and biological validation. Interdependencies and contingencies linking these steps are also highlighted. This overview is intended to guide researchers in the selection, application, and optimization of representation learning strategies for current and future single-cell research applications.
Collapse
Affiliation(s)
- Ihuan Gunawan
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
| | - Erik Meijering
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - John George Lock
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
- Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| |
Collapse
|
74
|
Wang RH, Wang J, Li SC. Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data. Nucleic Acids Res 2023; 51:e81. [PMID: 37403780 PMCID: PMC10450184 DOI: 10.1093/nar/gkad570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 06/01/2023] [Accepted: 06/23/2023] [Indexed: 07/06/2023] Open
Abstract
Single-cell sequencing technology enables the simultaneous capture of multiomic data from multiple cells. The captured data can be represented by tensors, i.e. the higher-rank matrices. However, the existing analysis tools often take the data as a collection of two-order matrices, renouncing the correspondences among the features. Consequently, we propose a probabilistic tensor decomposition framework, SCOIT, to extract embeddings from single-cell multiomic data. SCOIT incorporates various distributions, including Gaussian, Poisson, and negative binomial distributions, to deal with sparse, noisy, and heterogeneous single-cell data. Our framework can decompose a multiomic tensor into a cell embedding matrix, a gene embedding matrix, and an omic embedding matrix, allowing for various downstream analyses. We applied SCOIT to eight single-cell multiomic datasets from different sequencing protocols. With cell embeddings, SCOIT achieves superior performance for cell clustering compared to nine state-of-the-art tools under various metrics, demonstrating its ability to dissect cellular heterogeneity. With the gene embeddings, SCOIT enables cross-omics gene expression analysis and integrative gene regulatory network study. Furthermore, the embeddings allow cross-omics imputation simultaneously, outperforming current imputation methods with the Pearson correlation coefficient increased by 3.38-39.26%; moreover, SCOIT accommodates the scenario that subsets of the cells are with merely one omic profile available.
Collapse
Affiliation(s)
- Ruo Han Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong
| | - Jianping Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Hong Kong
| |
Collapse
|
75
|
Swapna LS, Huang M, Li Y. GTM-decon: guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes. Genome Biol 2023; 24:190. [PMID: 37596691 PMCID: PMC10436670 DOI: 10.1186/s13059-023-03034-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 08/09/2023] [Indexed: 08/20/2023] Open
Abstract
Cell-type composition is an important indicator of health. We present Guided Topic Model for deconvolution (GTM-decon) to automatically infer cell-type-specific gene topic distributions from single-cell RNA-seq data for deconvolving bulk transcriptomes. GTM-decon performs competitively on deconvolving simulated and real bulk data compared with the state-of-the-art methods. Moreover, as demonstrated in deconvolving disease transcriptomes, GTM-decon can infer multiple cell-type-specific gene topic distributions per cell type, which captures sub-cell-type variations. GTM-decon can also use phenotype labels from single-cell or bulk data to infer phenotype-specific gene distributions. In a nested-guided design, GTM-decon identified cell-type-specific differentially expressed genes from bulk breast cancer transcriptomes.
Collapse
Affiliation(s)
| | - Michael Huang
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - Yue Li
- School of Computer Science, McGill University, Montreal, QC, Canada.
| |
Collapse
|
76
|
Flynn E, Almonte-Loya A, Fragiadakis GK. Single-Cell Multiomics. Annu Rev Biomed Data Sci 2023; 6:313-337. [PMID: 37159875 PMCID: PMC11146013 DOI: 10.1146/annurev-biodatasci-020422-050645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Single-cell RNA sequencing methods have led to improved understanding of the heterogeneity and transcriptomic states present in complex biological systems. Recently, the development of novel single-cell technologies for assaying additional modalities, specifically genomic, epigenomic, proteomic, and spatial data, allows for unprecedented insight into cellular biology. While certain technologies collect multiple measurements from the same cells simultaneously, even when modalities are separately assayed in different cells, we can apply novel computational methods to integrate these data. The application of computational integration methods to multimodal paired and unpaired data results in rich information about the identities of the cells present and the interactions between different levels of biology, such as between genetic variation and transcription. In this review, we both discuss the single-cell technologies for measuring these modalities and describe and characterize a variety of computational integration methods for combining the resulting data to leverage multimodal information toward greater biological insight.
Collapse
Affiliation(s)
- Emily Flynn
- CoLabs, University of California, San Francisco, California, USA;
| | - Ana Almonte-Loya
- CoLabs, University of California, San Francisco, California, USA;
- Biomedical Informatics Program, University of California, San Francisco, California, USA
| | - Gabriela K Fragiadakis
- CoLabs, University of California, San Francisco, California, USA;
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
| |
Collapse
|
77
|
Fouché A, Zinovyev A. Omics data integration in computational biology viewed through the prism of machine learning paradigms. FRONTIERS IN BIOINFORMATICS 2023; 3:1191961. [PMID: 37600970 PMCID: PMC10436311 DOI: 10.3389/fbinf.2023.1191961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 07/26/2023] [Indexed: 08/22/2023] Open
Abstract
Important quantities of biological data can today be acquired to characterize cell types and states, from various sources and using a wide diversity of methods, providing scientists with more and more information to answer challenging biological questions. Unfortunately, working with this amount of data comes at the price of ever-increasing data complexity. This is caused by the multiplication of data types and batch effects, which hinders the joint usage of all available data within common analyses. Data integration describes a set of tasks geared towards embedding several datasets of different origins or modalities into a joint representation that can then be used to carry out downstream analyses. In the last decade, dozens of methods have been proposed to tackle the different facets of the data integration problem, relying on various paradigms. This review introduces the most common data types encountered in computational biology and provides systematic definitions of the data integration problems. We then present how machine learning innovations were leveraged to build effective data integration algorithms, that are widely used today by computational biologists. We discuss the current state of data integration and important pitfalls to consider when working with data integration tools. We eventually detail a set of challenges the field will have to overcome in the coming years.
Collapse
Affiliation(s)
- Aziz Fouché
- Institut Curie, PSL Research University, Paris, France
- Institut National de la Santé et de la Recherche Médicale, Paris, France
- CBIO-Centre for Computational Biology, ParisTech, PSL Research University, Paris, France
- Ecole Normale Supérieure Paris-Saclay, Cachan, France
| | | |
Collapse
|
78
|
Chen K, Xu M, Lu F, He Y. Development of Matrix Metalloproteinases-Mediated Extracellular Matrix Remodeling in Regenerative Medicine: A Mini Review. Tissue Eng Regen Med 2023; 20:661-670. [PMID: 37160567 PMCID: PMC10352474 DOI: 10.1007/s13770-023-00536-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 02/25/2023] [Accepted: 03/03/2023] [Indexed: 05/11/2023] Open
Abstract
Extracellular matrix (ECM) components confer biomechanical properties, maintain cell phenotype and mediate tissue homeostasis. ECM remodeling is complex and plays a key role in both physiological and pathological processes. Matrix metalloproteinases (MMPs) are a group of enzymes responsible for ECM degradation and have been accepted as a key regulator in ECM remodeling. In this mini-review, we summarize MMPs categories, functions and the targeted substrates. We then discuss current understanding of the role of MMPs-mediated events, including inflammation reaction, angiogenesis, cellular activities, etc., in ECM remodeling in the context of regenerative medicine.
Collapse
Affiliation(s)
- Kaiqi Chen
- Department of Plastic and Cosmetic Surgery, Nanfang Hospital, Southern Medical University, 1838 Guangzhou North Road, Guangzhou, 510515, Guangdong, People's Republic of China
| | - Mimi Xu
- Department of Plastic and Cosmetic Surgery, Nanfang Hospital, Southern Medical University, 1838 Guangzhou North Road, Guangzhou, 510515, Guangdong, People's Republic of China
| | - Feng Lu
- Department of Plastic and Cosmetic Surgery, Nanfang Hospital, Southern Medical University, 1838 Guangzhou North Road, Guangzhou, 510515, Guangdong, People's Republic of China.
| | - Yunfan He
- Department of Plastic and Cosmetic Surgery, Nanfang Hospital, Southern Medical University, 1838 Guangzhou North Road, Guangzhou, 510515, Guangdong, People's Republic of China.
| |
Collapse
|
79
|
Vandereyken K, Sifrim A, Thienpont B, Voet T. Methods and applications for single-cell and spatial multi-omics. Nat Rev Genet 2023; 24:494-515. [PMID: 36864178 PMCID: PMC9979144 DOI: 10.1038/s41576-023-00580-2] [Citation(s) in RCA: 228] [Impact Index Per Article: 228.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/20/2023] [Indexed: 03/04/2023]
Abstract
The joint analysis of the genome, epigenome, transcriptome, proteome and/or metabolome from single cells is transforming our understanding of cell biology in health and disease. In less than a decade, the field has seen tremendous technological revolutions that enable crucial new insights into the interplay between intracellular and intercellular molecular mechanisms that govern development, physiology and pathogenesis. In this Review, we highlight advances in the fast-developing field of single-cell and spatial multi-omics technologies (also known as multimodal omics approaches), and the computational strategies needed to integrate information across these molecular layers. We demonstrate their impact on fundamental cell biology and translational research, discuss current challenges and provide an outlook to the future.
Collapse
Affiliation(s)
- Katy Vandereyken
- KU Leuven Institute for Single Cell Omics (LISCO), University of Leuven, KU Leuven, Leuven, Belgium
- Department of Human Genetics, University of Leuven, KU Leuven, Leuven, Belgium
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Alejandro Sifrim
- KU Leuven Institute for Single Cell Omics (LISCO), University of Leuven, KU Leuven, Leuven, Belgium
- Department of Human Genetics, University of Leuven, KU Leuven, Leuven, Belgium
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Bernard Thienpont
- KU Leuven Institute for Single Cell Omics (LISCO), University of Leuven, KU Leuven, Leuven, Belgium
- Department of Human Genetics, University of Leuven, KU Leuven, Leuven, Belgium
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Thierry Voet
- KU Leuven Institute for Single Cell Omics (LISCO), University of Leuven, KU Leuven, Leuven, Belgium.
- Department of Human Genetics, University of Leuven, KU Leuven, Leuven, Belgium.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
| |
Collapse
|
80
|
Cuomo ASE, Nathan A, Raychaudhuri S, MacArthur DG, Powell JE. Single-cell genomics meets human genetics. Nat Rev Genet 2023; 24:535-549. [PMID: 37085594 PMCID: PMC10784789 DOI: 10.1038/s41576-023-00599-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/29/2023] [Indexed: 04/23/2023]
Abstract
Single-cell genomic technologies are revealing the cellular composition, identities and states in tissues at unprecedented resolution. They have now scaled to the point that it is possible to query samples at the population level, across thousands of individuals. Combining single-cell information with genotype data at this scale provides opportunities to link genetic variation to the cellular processes underpinning key aspects of human biology and disease. This strategy has potential implications for disease diagnosis, risk prediction and development of therapeutic solutions. But, effectively integrating large-scale single-cell genomic data, genetic variation and additional phenotypic data will require advances in data generation and analysis methods. As single-cell genetics begins to emerge as a field in its own right, we review its current state and the challenges and opportunities ahead.
Collapse
Affiliation(s)
- Anna S E Cuomo
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
| | - Aparna Nathan
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Joseph E Powell
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
81
|
Yan X, Zheng R, Chen J, Li M. scNCL: transferring labels from scRNA-seq to scATAC-seq data with neighborhood contrastive regularization. Bioinformatics 2023; 39:btad505. [PMID: 37584660 PMCID: PMC10457667 DOI: 10.1093/bioinformatics/btad505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/17/2023] [Accepted: 08/12/2023] [Indexed: 08/17/2023] Open
Abstract
MOTIVATION scATAC-seq has enabled chromatin accessibility landscape profiling at the single-cell level, providing opportunities for determining cell-type-specific regulation codes. However, high dimension, extreme sparsity, and large scale of scATAC-seq data have posed great challenges to cell-type identification. Thus, there has been a growing interest in leveraging the well-annotated scRNA-seq data to help annotate scATAC-seq data. However, substantial computational obstacles remain to transfer information from scRNA-seq to scATAC-seq, especially for their heterogeneous features. RESULTS We propose a new transfer learning method, scNCL, which utilizes prior knowledge and contrastive learning to tackle the problem of heterogeneous features. Briefly, scNCL transforms scATAC-seq features into gene activity matrix based on prior knowledge. Since feature transformation can cause information loss, scNCL introduces neighborhood contrastive learning to preserve the neighborhood structure of scATAC-seq cells in raw feature space. To learn transferable latent features, scNCL uses a feature projection loss and an alignment loss to harmonize embeddings between scRNA-seq and scATAC-seq. Experiments on various datasets demonstrated that scNCL not only realizes accurate and robust label transfer for common types, but also achieves reliable detection of novel types. scNCL is also computationally efficient and scalable to million-scale datasets. Moreover, we prove scNCL can help refine cell-type annotations in existing scATAC-seq atlases. AVAILABILITY AND IMPLEMENTATION The source code and data used in this paper can be found in https://github.com/CSUBioGroup/scNCL-release.
Collapse
Affiliation(s)
- Xuhua Yan
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jinmiao Chen
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Singapore 138648, Singapore
- Immunology Translational Research Program, Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore (NUS), Singapore 117545, Singapore
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
82
|
Ashuach T, Gabitto MI, Koodli RV, Saldi GA, Jordan MI, Yosef N. MultiVI: deep generative model for the integration of multimodal data. Nat Methods 2023; 20:1222-1231. [PMID: 37386189 PMCID: PMC10406609 DOI: 10.1038/s41592-023-01909-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 05/10/2023] [Indexed: 07/01/2023]
Abstract
Jointly profiling the transcriptome, chromatin accessibility and other molecular properties of single cells offers a powerful way to study cellular diversity. Here we present MultiVI, a probabilistic model to analyze such multiomic data and leverage it to enhance single-modality datasets. MultiVI creates a joint representation that allows an analysis of all modalities included in the multiomic input data, even for cells for which one or more modalities are missing. It is available at scvi-tools.org .
Collapse
Affiliation(s)
- Tal Ashuach
- Center for Computational Biology, University of California, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Mariano I Gabitto
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA.
- Allen Institute for Brain Science, Seattle, WA, USA.
| | - Rohan V Koodli
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | | | - Michael I Jordan
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
83
|
Heumos L, Schaar AC, Lance C, Litinetskaya A, Drost F, Zappia L, Lücken MD, Strobl DC, Henao J, Curion F, Schiller HB, Theis FJ. Best practices for single-cell analysis across modalities. Nat Rev Genet 2023; 24:550-572. [PMID: 37002403 PMCID: PMC10066026 DOI: 10.1038/s41576-023-00586-w] [Citation(s) in RCA: 168] [Impact Index Per Article: 168.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2023] [Indexed: 04/03/2023]
Abstract
Recent advances in single-cell technologies have enabled high-throughput molecular profiling of cells across modalities and locations. Single-cell transcriptomics data can now be complemented by chromatin accessibility, surface protein expression, adaptive immune receptor repertoire profiling and spatial information. The increasing availability of single-cell data across modalities has motivated the development of novel computational methods to help analysts derive biological insights. As the field grows, it becomes increasingly difficult to navigate the vast landscape of tools and analysis steps. Here, we summarize independent benchmarking studies of unimodal and multimodal single-cell analysis across modalities to suggest comprehensive best-practice workflows for the most common analysis steps. Where independent benchmarks are not available, we review and contrast popular methods. Our article serves as an entry point for novices in the field of single-cell (multi-)omic analysis and guides advanced users to the most recent best practices.
Collapse
Affiliation(s)
- Lukas Heumos
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center, Helmholtz Munich; Member of the German Center for Lung Research (DZL), Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Anna C Schaar
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Center for Machine Learning, Technical University of Munich, Garching, Germany
| | - Christopher Lance
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Department of Paediatrics, Dr von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Anastasia Litinetskaya
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Felix Drost
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Luke Zappia
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Malte D Lücken
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Institute of Lung Health and Immunity, Helmholtz Munich, Munich, Germany
| | - Daniel C Strobl
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
- Institute of Clinical Chemistry and Pathobiochemistry, School of Medicine, Technical University of Munich, Munich, Germany
- TranslaTUM, Center for Translational Cancer Research, Technical University of Munich, Munich, Germany
| | - Juan Henao
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
| | - Fabiola Curion
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Herbert B Schiller
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center, Helmholtz Munich; Member of the German Center for Lung Research (DZL), Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Munich Center for Machine Learning, Technical University of Munich, Garching, Germany.
| |
Collapse
|
84
|
Limeta A, Gatto F, Herrgård MJ, Ji B, Nielsen J. Leveraging high-resolution omics data for predicting responses and adverse events to immune checkpoint inhibitors. Comput Struct Biotechnol J 2023; 21:3912-3919. [PMID: 37602228 PMCID: PMC10432706 DOI: 10.1016/j.csbj.2023.07.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 07/17/2023] [Accepted: 07/22/2023] [Indexed: 08/22/2023] Open
Abstract
A long-standing goal of personalized and precision medicine is to enable accurate prediction of the outcomes of a given treatment regimen for patients harboring a disease. Currently, many clinical trials fail to meet their endpoints due to underlying factors in the patient population that contribute to either poor responses to the drug of interest or to treatment-related adverse events. Identifying these factors beforehand and correcting for them can lead to an increased success of clinical trials. Comprehensive and large-scale data gathering efforts in biomedicine by omics profiling of the healthy and diseased individuals has led to a treasure-trove of host, disease and environmental factors that contribute to the effectiveness of drugs aiming to treat disease. With increasing omics data, artificial intelligence allows an in-depth analysis of big data and offers a wide range of applications for real-world clinical use, including improved patient selection and identification of actionable targets for companion therapeutics for improved translatability across more patients. As a blueprint for complex drug-disease-host interactions, we here discuss the challenges of utilizing omics data for predicting responses and adverse events in cancer immunotherapy with immune checkpoint inhibitors (ICIs). The omics-based methodologies for improving patient outcomes as in the ICI case have also been applied across a wide-range of complex disease settings, exemplifying the use of omics for in-depth disease profiling and clinical use.
Collapse
Affiliation(s)
- Angelo Limeta
- Department of Biology and Biological Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Francesco Gatto
- Department of Biology and Biological Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
- Department of Oncology-Pathology, Karolinska Institute, 171 64 Stockholm, Sweden
| | | | - Boyang Ji
- BioInnovation Institute, 2200 Copenhagen N, Denmark
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
- BioInnovation Institute, 2200 Copenhagen N, Denmark
| |
Collapse
|
85
|
Amitay Y, Bussi Y, Feinstein B, Bagon S, Milo I, Keren L. CellSighter: a neural network to classify cells in highly multiplexed images. Nat Commun 2023; 14:4302. [PMID: 37463931 PMCID: PMC10354029 DOI: 10.1038/s41467-023-40066-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 07/07/2023] [Indexed: 07/20/2023] Open
Abstract
Multiplexed imaging enables measurement of multiple proteins in situ, offering an unprecedented opportunity to chart various cell types and states in tissues. However, cell classification, the task of identifying the type of individual cells, remains challenging, labor-intensive, and limiting to throughput. Here, we present CellSighter, a deep-learning based pipeline to accelerate cell classification in multiplexed images. Given a small training set of expert-labeled images, CellSighter outputs the label probabilities for all cells in new images. CellSighter achieves over 80% accuracy for major cell types across imaging platforms, which approaches inter-observer concordance. Ablation studies and simulations show that CellSighter is able to generalize its training data and learn features of protein expression levels, as well as spatial features such as subcellular expression patterns. CellSighter's design reduces overfitting, and it can be trained with only thousands or even hundreds of labeled examples. CellSighter also outputs a prediction confidence, allowing downstream experts control over the results. Altogether, CellSighter drastically reduces hands-on time for cell classification in multiplexed images, while improving accuracy and consistency across datasets.
Collapse
Affiliation(s)
- Yael Amitay
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Mathematics and Computer Science, Weizmann Institute of Science, Rehovot, Israel
| | - Yuval Bussi
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Mathematics and Computer Science, Weizmann Institute of Science, Rehovot, Israel
| | - Ben Feinstein
- Department of Mathematics and Computer Science, Weizmann Institute of Science, Rehovot, Israel
| | - Shai Bagon
- Department of Mathematics and Computer Science, Weizmann Institute of Science, Rehovot, Israel
| | - Idan Milo
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Leeat Keren
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
86
|
Tang W, Jørgensen ACS, Marguerat S, Thomas P, Shahrezaei V. Modelling capture efficiency of single-cell RNA-sequencing data improves inference of transcriptome-wide burst kinetics. Bioinformatics 2023; 39:btad395. [PMID: 37354494 PMCID: PMC10318389 DOI: 10.1093/bioinformatics/btad395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 05/18/2023] [Accepted: 06/22/2023] [Indexed: 06/26/2023] Open
Abstract
MOTIVATION Gene expression is characterized by stochastic bursts of transcription that occur at brief and random periods of promoter activity. The kinetics of gene expression burstiness differs across the genome and is dependent on the promoter sequence, among other factors. Single-cell RNA sequencing (scRNA-seq) has made it possible to quantify the cell-to-cell variability in transcription at a global genome-wide level. However, scRNA-seq data are prone to technical variability, including low and variable capture efficiency of transcripts from individual cells. RESULTS Here, we propose a novel mathematical theory for the observed variability in scRNA-seq data. Our method captures burst kinetics and variability in both the cell size and capture efficiency, which allows us to propose several likelihood-based and simulation-based methods for the inference of burst kinetics from scRNA-seq data. Using both synthetic and real data, we show that the simulation-based methods provide an accurate, robust and flexible tool for inferring burst kinetics from scRNA-seq data. In particular, in a supervised manner, a simulation-based inference method based on neural networks proves to be accurate and useful when applied to both allele and nonallele-specific scRNA-seq data. AVAILABILITY AND IMPLEMENTATION The code for Neural Network and Approximate Bayesian Computation inference is available at https://github.com/WT215/nnRNA and https://github.com/WT215/Julia_ABC, respectively.
Collapse
Affiliation(s)
- Wenhao Tang
- Department of Mathematics, Imperial College London, London SW7 2BX, United Kingdom
| | - Andreas Christ Sølvsten Jørgensen
- Department of Mathematics, Imperial College London, London SW7 2BX, United Kingdom
- I-X Centre for AI in Science, Imperial College London, White City Campus, London W12 0BZ, United Kingdom
| | - Samuel Marguerat
- MRC London Institute of Medical Sciences (LMS), London W12 0NN, United Kingdom
- Institute of Clinical Sciences (ICS), Faculty of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Philipp Thomas
- Department of Mathematics, Imperial College London, London SW7 2BX, United Kingdom
| | - Vahid Shahrezaei
- Department of Mathematics, Imperial College London, London SW7 2BX, United Kingdom
| |
Collapse
|
87
|
Zhou M, Zhang H, Baii Z, Mann-Krzisnik D, Wang F, Li Y. Single-cell multi-omic topic embedding reveals cell-type-specific and COVID-19 severity-related immune signatures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.31.526312. [PMID: 36778483 PMCID: PMC9915637 DOI: 10.1101/2023.01.31.526312] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The advent of single-cell multi-omics sequencing technology makes it possible for re-searchers to leverage multiple modalities for individual cells and explore cell heterogeneity. However, the high dimensional, discrete, and sparse nature of the data make the downstream analysis particularly challenging. Most of the existing computational methods for single-cell data analysis are either limited to single modality or lack flexibility and interpretability. In this study, we propose an interpretable deep learning method called multi-omic embedded topic model (moETM) to effectively perform integrative analysis of high-dimensional single-cell multimodal data. moETM integrates multiple omics data via a product-of-experts in the encoder for efficient variational inference and then employs multiple linear decoders to learn the multi-omic signatures of the gene regulatory programs. Through comprehensive experiments on public single-cell transcriptome and chromatin accessibility data (i.e., scRNA+scATAC), as well as scRNA and proteomic data (i.e., CITE-seq), moETM demonstrates superior performance compared with six state-of-the-art single-cell data analysis methods on seven publicly available datasets. By applying moETM to the scRNA+scATAC data in human bone marrow mononuclear cells (BMMCs), we identified sequence motifs corresponding to the transcription factors that regulate immune gene signatures. Applying moETM analysis to CITE-seq data from the COVID-19 patients revealed not only known immune cell-type-specific signatures but also composite multi-omic biomarkers of critical conditions due to COVID-19, thus providing insights from both biological and clinical perspectives.
Collapse
Affiliation(s)
- Manqi Zhou
- Department of Computational Biology, Cornell University
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine
| | - Hao Zhang
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine
| | - Zilong Baii
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine
| | | | - Fei Wang
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine
| | - Yue Li
- Quantitative Life Science, McGill University
- School of Computer Science, McGill University
- Mila - Quebec AI Institute
| |
Collapse
|
88
|
Zhang L, Parvin R, Chen M, Hu D, Fan Q, Ye F. High-throughput microfluidic droplets in biomolecular analytical system: A review. Biosens Bioelectron 2023; 228:115213. [PMID: 36906989 DOI: 10.1016/j.bios.2023.115213] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 02/13/2023] [Accepted: 03/06/2023] [Indexed: 03/11/2023]
Abstract
Droplet microfluidic technology has revolutionized biomolecular analytical research, as it has the capability to reserve the genotype-to-phenotype linkage and assist for revealing the heterogeneity. Massive and uniform picolitre droplets feature dividing solution to the level that single cell and single molecule in each droplet can be visualized, barcoded, and analyzed. Then, the droplet assays can unfold intensive genomic data, offer high sensitivity, and screen and sort from a large number of combinations or phenotypes. Based on these unique advantages, this review focuses on up-to-date research concerning diverse screening applications utilizing droplet microfluidic technology. The emerging progress of droplet microfluidic technology is first introduced, including efficient and scaling-up in droplets encapsulation, and prevalent batch operations. Then the new implementations of droplet-based digital detection assays and single-cell muti-omics sequencing are briefly examined, along with related applications such as drug susceptibility testing, multiplexing for cancer subtype identification, interactions of virus-to-host, and multimodal and spatiotemporal analysis. Meanwhile, we specialize in droplet-based large-scale combinational screening regarding desired phenotypes, with an emphasis on sorting for immune cells, antibodies, enzymatic properties, and proteins produced by directed evolution methods. Finally, some challenges, deployment and future perspective of droplet microfluidics technology in practice are also discussed.
Collapse
Affiliation(s)
- Lexiang Zhang
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang, 325000, China; Zhejiang Engineering Research Center for Tissue Repair Materials, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, 325000, China
| | - Rokshana Parvin
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang, 325000, China; Zhejiang Engineering Research Center for Tissue Repair Materials, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, 325000, China
| | - Mingshuo Chen
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang, 325000, China; Zhejiang Engineering Research Center for Tissue Repair Materials, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, 325000, China
| | - Dingmeng Hu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang, 325000, China; Zhejiang Engineering Research Center for Tissue Repair Materials, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, 325000, China
| | - Qihui Fan
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang, 325000, China; Zhejiang Engineering Research Center for Tissue Repair Materials, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, 325000, China; Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Fangfu Ye
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang, 325000, China; Zhejiang Engineering Research Center for Tissue Repair Materials, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, 325000, China; Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing, 100190, China.
| |
Collapse
|
89
|
Tang X, Zhang J, He Y, Zhang X, Lin Z, Partarrieu S, Hanna EB, Ren Z, Shen H, Yang Y, Wang X, Li N, Ding J, Liu J. Explainable multi-task learning for multi-modality biological data analysis. Nat Commun 2023; 14:2546. [PMID: 37137905 PMCID: PMC10156823 DOI: 10.1038/s41467-023-37477-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 03/17/2023] [Indexed: 05/05/2023] Open
Abstract
Current biotechnologies can simultaneously measure multiple high-dimensional modalities (e.g., RNA, DNA accessibility, and protein) from the same cells. A combination of different analytical tasks (e.g., multi-modal integration and cross-modal analysis) is required to comprehensively understand such data, inferring how gene regulation drives biological diversity and functions. However, current analytical methods are designed to perform a single task, only providing a partial picture of the multi-modal data. Here, we present UnitedNet, an explainable multi-task deep neural network capable of integrating different tasks to analyze single-cell multi-modality data. Applied to various multi-modality datasets (e.g., Patch-seq, multiome ATAC + gene expression, and spatial transcriptomics), UnitedNet demonstrates similar or better accuracy in multi-modal integration and cross-modal prediction compared with state-of-the-art methods. Moreover, by dissecting the trained UnitedNet with the explainable machine learning algorithm, we can directly quantify the relationship between gene expression and other modalities with cell-type specificity. UnitedNet is a comprehensive end-to-end framework that could be broadly applicable to single-cell multi-modality biology. This framework has the potential to facilitate the discovery of cell-type-specific regulation kinetics across transcriptomics and other modalities.
Collapse
Affiliation(s)
- Xin Tang
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, 02134, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jiawei Zhang
- School of Statistics, University of Minnesota Twin Cities, Minneapolis, MN, 55455, USA
| | - Yichun He
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, 02134, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Xinhe Zhang
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, 02134, USA
| | - Zuwan Lin
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Sebastian Partarrieu
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, 02134, USA
| | - Emma Bou Hanna
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, 02134, USA
| | - Zhaolin Ren
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, 02134, USA
| | - Hao Shen
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, 02134, USA
| | - Yuhong Yang
- School of Statistics, University of Minnesota Twin Cities, Minneapolis, MN, 55455, USA
| | - Xiao Wang
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Chemistry, MIT, Cambridge, MA, 02139, USA
| | - Na Li
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, 02134, USA
| | - Jie Ding
- School of Statistics, University of Minnesota Twin Cities, Minneapolis, MN, 55455, USA.
| | - Jia Liu
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, 02134, USA.
| |
Collapse
|
90
|
Gossi F, Pati P, Chouvardas P, Martinelli AL, Kruithof-de Julio M, Rapsomaniki MA. Matching single cells across modalities with contrastive learning and optimal transport. Brief Bioinform 2023; 24:7147026. [PMID: 37122067 DOI: 10.1093/bib/bbad130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/25/2023] [Accepted: 03/14/2023] [Indexed: 05/02/2023] Open
Abstract
Understanding the interactions between the biomolecules that govern cellular behaviors remains an emergent question in biology. Recent advances in single-cell technologies have enabled the simultaneous quantification of multiple biomolecules in the same cell, opening new avenues for understanding cellular complexity and heterogeneity. Still, the resulting multimodal single-cell datasets present unique challenges arising from the high dimensionality and multiple sources of acquisition noise. Computational methods able to match cells across different modalities offer an appealing alternative towards this goal. In this work, we propose MatchCLOT, a novel method for modality matching inspired by recent promising developments in contrastive learning and optimal transport. MatchCLOT uses contrastive learning to learn a common representation between two modalities and applies entropic optimal transport as an approximate maximum weight bipartite matching algorithm. Our model obtains state-of-the-art performance on two curated benchmarking datasets and an independent test dataset, improving the top scoring method by 26.1% while preserving the underlying biological structure of the multimodal data. Importantly, MatchCLOT offers high gains in computational time and memory that, in contrast to existing methods, allows it to scale well with the number of cells. As single-cell datasets become increasingly large, MatchCLOT offers an accurate and efficient solution to the problem of modality matching.
Collapse
Affiliation(s)
- Federico Gossi
- IBM Research Europe, Säumerstrasse 4, 8803 Rüschlikon, Switzerland
- Department of Computer Science, ETH Zurich, Universitätstrasse 6, 8092 Zürich, Switzerland
| | - Pushpak Pati
- IBM Research Europe, Säumerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Panagiotis Chouvardas
- Department for BioMedical Research, Urology Research Laboratory, University of Bern, Murtenstrasse 24, 3008 Bern, Switzerland
| | - Adriano Luca Martinelli
- IBM Research Europe, Säumerstrasse 4, 8803 Rüschlikon, Switzerland
- Institute of Molecular Systems Biology, ETH Zurich, Otto-Stern-Weg 3, 8093 Zürich, Switzerland
| | - Marianna Kruithof-de Julio
- Department for BioMedical Research, Urology Research Laboratory, University of Bern, Murtenstrasse 24, 3008 Bern, Switzerland
- Department of Urology, Inselspital, Bern University Hospital, Freiburgstrasse 15, 3010 Bern, Switzerland
| | | |
Collapse
|
91
|
Qian Z, Qin J, Lai Y, Zhang C, Zhang X. Large-Scale Integration of Single-Cell RNA-Seq Data Reveals Astrocyte Diversity and Transcriptomic Modules across Six Central Nervous System Disorders. Biomolecules 2023; 13:692. [PMID: 37189441 PMCID: PMC10135484 DOI: 10.3390/biom13040692] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 03/30/2023] [Accepted: 04/12/2023] [Indexed: 05/17/2023] Open
Abstract
The dysfunction of astrocytes in response to environmental factors contributes to many neurological diseases by impacting neuroinflammation responses, glutamate and ion homeostasis, and cholesterol and sphingolipid metabolism, which calls for comprehensive and high-resolution analysis. However, single-cell transcriptome analyses of astrocytes have been hampered by the sparseness of human brain specimens. Here, we demonstrate how large-scale integration of multi-omics data, including single-cell and spatial transcriptomic and proteomic data, overcomes these limitations. We created a single-cell transcriptomic dataset of human brains by integration, consensus annotation, and analyzing 302 publicly available single-cell RNA-sequencing (scRNA-seq) datasets, highlighting the power to resolve previously unidentifiable astrocyte subpopulations. The resulting dataset includes nearly one million cells that span a wide variety of diseases, including Alzheimer's disease (AD), Parkinson's disease (PD), Huntington's disease (HD), multiple sclerosis (MS), epilepsy (Epi), and chronic traumatic encephalopathy (CTE). We profiled the astrocytes at three levels, subtype compositions, regulatory modules, and cell-cell communications, and comprehensively depicted the heterogeneity of pathological astrocytes. We constructed seven transcriptomic modules that are involved in the onset and progress of disease development, such as the M2 ECM and M4 stress modules. We validated that the M2 ECM module could furnish potential markers for AD early diagnosis at both the transcriptome and protein levels. In order to accomplish a high-resolution, local identification of astrocyte subtypes, we also carried out a spatial transcriptome analysis of mouse brains using the integrated dataset as a reference. We found that astrocyte subtypes are regionally heterogeneous. We identified dynamic cell-cell interactions in different disorders and found that astrocytes participate in key signaling pathways, such as NRG3-ERBB4, in epilepsy. Our work supports the utility of large-scale integration of single-cell transcriptomic data, which offers new insights into underlying multiple CNS disease mechanisms where astrocytes are involved.
Collapse
Affiliation(s)
- Zhenwei Qian
- School of Basic Medical Sciences, Beijing Key Laboratory of Neural Regeneration and Repair, Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing 100069, China
| | - Jinglin Qin
- School of Basic Medical Sciences, Beijing Key Laboratory of Neural Regeneration and Repair, Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing 100069, China
| | - Yiwen Lai
- School of Basic Medical Sciences, Beijing Key Laboratory of Neural Regeneration and Repair, Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing 100069, China
| | - Chen Zhang
- School of Basic Medical Sciences, Beijing Key Laboratory of Neural Regeneration and Repair, Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing 100069, China
- Chinese Institute for Brain Research, Beijing 102206, China
- State Key Laboratory of Translational Medicine and Innovative Drug Development, Nanjing 210000, China
| | - Xiannian Zhang
- School of Basic Medical Sciences, Beijing Key Laboratory of Neural Regeneration and Repair, Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing 100069, China
| |
Collapse
|
92
|
Oder B, Chatzidimitriou A, Langerak AW, Rosenquist R, Österholm C. Recent revelations and future directions using single-cell technologies in chronic lymphocytic leukemia. Front Oncol 2023; 13:1143811. [PMID: 37091144 PMCID: PMC10117666 DOI: 10.3389/fonc.2023.1143811] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 03/22/2023] [Indexed: 04/08/2023] Open
Abstract
Chronic lymphocytic leukemia (CLL) is a clinically and biologically heterogeneous disease with varying outcomes. In the last decade, the application of next-generation sequencing technologies has allowed extensive mapping of disease-specific genomic, epigenomic, immunogenetic, and transcriptomic signatures linked to CLL pathogenesis. These technologies have improved our understanding of the impact of tumor heterogeneity and evolution on disease outcome, although they have mostly been performed on bulk preparations of nucleic acids. As a further development, new technologies have emerged in recent years that allow high-resolution mapping at the single-cell level. These include single-cell RNA sequencing for assessment of the transcriptome, both of leukemic and non-malignant cells in the tumor microenvironment; immunogenetic profiling of B and T cell receptor rearrangements; single-cell sequencing methods for investigation of methylation and chromatin accessibility across the genome; and targeted single-cell DNA sequencing for analysis of copy-number alterations and single nucleotide variants. In addition, concomitant profiling of cellular subpopulations, based on protein expression, can also be obtained by various antibody-based approaches. In this review, we discuss different single-cell sequencing technologies and how they have been applied so far to study CLL onset and progression, also in response to treatment. This latter aspect is particularly relevant considering that we are moving away from chemoimmunotherapy to targeted therapies, with a potentially distinct impact on clonal dynamics. We also discuss new possibilities, such as integrative multi-omics analysis, as well as inherent limitations of the different single-cell technologies, from sample preparation to data interpretation using available bioinformatic pipelines. Finally, we discuss future directions in this rapidly evolving field.
Collapse
Affiliation(s)
- Blaž Oder
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Anastasia Chatzidimitriou
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Anton W. Langerak
- Department of Immunology, Erasmus MC, University Medical Center Rotterdam, Rotterdam, Netherlands
| | - Richard Rosenquist
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Cecilia Österholm
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- *Correspondence: Cecilia Österholm,
| |
Collapse
|
93
|
Pregizer S, Vreven T, Mathur M, Robinson LN. Multi-omic single cell sequencing: Overview and opportunities for kidney disease therapeutic development. Front Mol Biosci 2023; 10:1176856. [PMID: 37091871 PMCID: PMC10113659 DOI: 10.3389/fmolb.2023.1176856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 03/21/2023] [Indexed: 04/09/2023] Open
Abstract
Single cell sequencing technologies have rapidly advanced in the last decade and are increasingly applied to gain unprecedented insights by deconstructing complex biology to its fundamental unit, the individual cell. First developed for measurement of gene expression, single cell sequencing approaches have evolved to allow simultaneous profiling of multiple additional features, including chromatin accessibility within the nucleus and protein expression at the cell surface. These multi-omic approaches can now further be applied to cells in situ, capturing the spatial context within which their biology occurs. To extract insights from these complex datasets, new computational tools have facilitated the integration of information across different data types and the use of machine learning approaches. Here, we summarize current experimental and computational methods for generation and integration of single cell multi-omic datasets. We focus on opportunities for multi-omic single cell sequencing to augment therapeutic development for kidney disease, including applications for biomarkers, disease stratification and target identification.
Collapse
|
94
|
Bärthel S, Falcomatà C, Rad R, Theis FJ, Saur D. Single-cell profiling to explore pancreatic cancer heterogeneity, plasticity and response to therapy. NATURE CANCER 2023; 4:454-467. [PMID: 36959420 DOI: 10.1038/s43018-023-00526-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/08/2023] [Indexed: 03/25/2023]
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer entity characterized by a heterogeneous genetic landscape and an immunosuppressive tumor microenvironment. Recent advances in high-resolution single-cell sequencing and spatial transcriptomics technologies have enabled an in-depth characterization of both malignant and host cell types and increased our understanding of the heterogeneity and plasticity of PDAC in the steady state and under therapeutic perturbation. In this Review we outline single-cell analyses in PDAC, discuss their implications on our understanding of the disease and present future perspectives of multimodal approaches to elucidate its biology and response to therapy at the single-cell level.
Collapse
Affiliation(s)
- Stefanie Bärthel
- Division of Translational Cancer Research, German Cancer Research Center and German Cancer Consortium, Heidelberg, Germany
- Institute of Experimental Cancer Therapy, Klinikum Rechts der Isar, School of Medicine, Technische Universität München, Munich, Germany
- Center for Translational Cancer Research (TranslaTUM), School of Medicine, Technische Universität München, Munich, Germany
| | - Chiara Falcomatà
- Division of Translational Cancer Research, German Cancer Research Center and German Cancer Consortium, Heidelberg, Germany
- Institute of Experimental Cancer Therapy, Klinikum Rechts der Isar, School of Medicine, Technische Universität München, Munich, Germany
- Center for Translational Cancer Research (TranslaTUM), School of Medicine, Technische Universität München, Munich, Germany
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Roland Rad
- Center for Translational Cancer Research (TranslaTUM), School of Medicine, Technische Universität München, Munich, Germany
- Institute of Molecular Oncology and Functional Genomics, School of Medicine, Technische Universität München, Munich, Germany
- German Cancer Consortium Partner Site Munich, Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany
- School of Computation, Information and Technology (CIT), Technische Universität München, Munich, Germany
| | - Dieter Saur
- Division of Translational Cancer Research, German Cancer Research Center and German Cancer Consortium, Heidelberg, Germany.
- Institute of Experimental Cancer Therapy, Klinikum Rechts der Isar, School of Medicine, Technische Universität München, Munich, Germany.
- Center for Translational Cancer Research (TranslaTUM), School of Medicine, Technische Universität München, Munich, Germany.
| |
Collapse
|
95
|
Nguyen HCT, Baik B, Yoon S, Park T, Nam D. Benchmarking integration of single-cell differential expression. Nat Commun 2023; 14:1570. [PMID: 36944632 PMCID: PMC10030080 DOI: 10.1038/s41467-023-37126-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 03/03/2023] [Indexed: 03/23/2023] Open
Abstract
Integration of single-cell RNA sequencing data between different samples has been a major challenge for analyzing cell populations. However, strategies to integrate differential expression analysis of single-cell data remain underinvestigated. Here, we benchmark 46 workflows for differential expression analysis of single-cell data with multiple batches. We show that batch effects, sequencing depth and data sparsity substantially impact their performances. Notably, we find that the use of batch-corrected data rarely improves the analysis for sparse data, whereas batch covariate modeling improves the analysis for substantial batch effects. We show that for low depth data, single-cell techniques based on zero-inflation model deteriorate the performance, whereas the analysis of uncorrected data using limmatrend, Wilcoxon test and fixed effects model performs well. We suggest several high-performance methods under different conditions based on various simulation and real data analyses. Additionally, we demonstrate that differential expression analysis for a specific cell type outperforms that of large-scale bulk sample data in prioritizing disease-related genes.
Collapse
Affiliation(s)
- Hai C T Nguyen
- Department of Biological Sciences, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
| | - Bukyung Baik
- Department of Biological Sciences, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
| | - Sora Yoon
- Department of Biological Sciences, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, 08826, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dougu Nam
- Department of Biological Sciences, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea.
- Department of Mathematical Sciences, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea.
| |
Collapse
|
96
|
Schäfer JA, Sutandy FXR, Münch C. Omics-based approaches for the systematic profiling of mitochondrial biology. Mol Cell 2023; 83:911-926. [PMID: 36931258 DOI: 10.1016/j.molcel.2023.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 02/06/2023] [Accepted: 02/14/2023] [Indexed: 03/18/2023]
Abstract
Mitochondria are essential for cellular functions such as metabolism and apoptosis. They dynamically adapt to the changing environmental demands by adjusting their protein, nucleic acid, metabolite, and lipid contents. In addition, the mitochondrial components are modulated on different levels in response to changes, including abundance, activity, and interaction. A wide range of omics-based approaches has been developed to be able to explore mitochondrial adaptation and how mitochondrial function is compromised in disease contexts. Here, we provide an overview of the omics methods that allow us to systematically investigate the different aspects of mitochondrial biology. In addition, we show examples of how these methods have provided new biological insights. The emerging use of these toolboxes provides a more comprehensive understanding of the processes underlying mitochondrial function.
Collapse
Affiliation(s)
- Jasmin Adriana Schäfer
- Institute of Biochemistry II, Goethe University Frankfurt, Theodor-Stern-Kai 7, Haus 75, 60590 Frankfurt am Main, Germany
| | - F X Reymond Sutandy
- Institute of Biochemistry II, Goethe University Frankfurt, Theodor-Stern-Kai 7, Haus 75, 60590 Frankfurt am Main, Germany
| | - Christian Münch
- Institute of Biochemistry II, Goethe University Frankfurt, Theodor-Stern-Kai 7, Haus 75, 60590 Frankfurt am Main, Germany.
| |
Collapse
|
97
|
Balusu S, Praschberger R, Lauwers E, De Strooper B, Verstreken P. Neurodegeneration cell per cell. Neuron 2023; 111:767-786. [PMID: 36787752 DOI: 10.1016/j.neuron.2023.01.016] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/12/2022] [Accepted: 01/18/2023] [Indexed: 02/16/2023]
Abstract
The clinical definition of neurodegenerative diseases is based on symptoms that reflect terminal damage of specific brain regions. This is misleading as it tells little about the initial disease processes. Circuitry failures that underlie the clinical symptomatology are themselves preceded by clinically mostly silent, slowly progressing multicellular processes that trigger or are triggered by the accumulation of abnormally folded proteins such as Aβ, Tau, TDP-43, and α-synuclein, among others. Methodological advances in single-cell omics, combined with complex genetics and novel ways to model complex cellular interactions using induced pluripotent stem (iPS) cells, make it possible to analyze the early cellular phase of neurodegenerative disorders. This will revolutionize the way we study those diseases and will translate into novel diagnostics and cell-specific therapeutic targets, stopping these disorders in their early track before they cause difficult-to-reverse damage to the brain.
Collapse
Affiliation(s)
- Sriram Balusu
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium; KU Leuven Department of Neurosciences, Leuven Brain Institute, Leuven, Belgium
| | - Roman Praschberger
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium; KU Leuven Department of Neurosciences, Leuven Brain Institute, Leuven, Belgium
| | - Elsa Lauwers
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium; KU Leuven Department of Neurosciences, Leuven Brain Institute, Leuven, Belgium
| | - Bart De Strooper
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium; KU Leuven Department of Neurosciences, Leuven Brain Institute, Leuven, Belgium; UK Dementia Research Institute, London, UK.
| | - Patrik Verstreken
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium; KU Leuven Department of Neurosciences, Leuven Brain Institute, Leuven, Belgium.
| |
Collapse
|
98
|
Single-cell proteomics enabled by next-generation sequencing or mass spectrometry. Nat Methods 2023; 20:363-374. [PMID: 36864196 DOI: 10.1038/s41592-023-01791-5] [Citation(s) in RCA: 66] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 01/24/2023] [Indexed: 03/04/2023]
Abstract
In the last decade, single-cell RNA sequencing routinely performed on large numbers of single cells has greatly advanced our understanding of the underlying heterogeneity of complex biological systems. Technological advances have also enabled protein measurements, further contributing to the elucidation of cell types and states present in complex tissues. Recently, there have been independent advances in mass spectrometric techniques bringing us one step closer to characterizing single-cell proteomes. Here we discuss the challenges of detecting proteins in single cells by both mass spectrometry and sequencing-based methods. We review the state of the art for these techniques and propose that there is a space for technological advancements and complementary approaches that maximize the advantages of both classes of technologies.
Collapse
|
99
|
Foltz SM, Greene CS, Taroni JN. Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously. Commun Biol 2023; 6:222. [PMID: 36841852 PMCID: PMC9968332 DOI: 10.1038/s42003-023-04588-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 02/13/2023] [Indexed: 02/27/2023] Open
Abstract
Large compendia of gene expression data have proven valuable for the discovery of novel biological relationships. Historically, most available RNA assays were run on microarray, while RNA-seq is now the platform of choice for many new experiments. The data structure and distributions between the platforms differ, making it challenging to combine them directly. Here we perform supervised and unsupervised machine learning evaluations to assess which existing normalization methods are best suited for combining microarray and RNA-seq data. We find that quantile and Training Distribution Matching normalization allow for supervised and unsupervised model training on microarray and RNA-seq data simultaneously. Nonparanormal normalization and z-scores are also appropriate for some applications, including pathway analysis with Pathway-Level Information Extractor (PLIER). We demonstrate that it is possible to perform effective cross-platform normalization using existing methods to combine microarray and RNA-seq data for machine learning applications.
Collapse
Affiliation(s)
- Steven M Foltz
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Wynnewood, PA, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA.
| | - Jaclyn N Taroni
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Wynnewood, PA, USA.
| |
Collapse
|
100
|
Ma A, Wang X, Li J, Wang C, Xiao T, Liu Y, Cheng H, Wang J, Li Y, Chang Y, Li J, Wang D, Jiang Y, Su L, Xin G, Gu S, Li Z, Liu B, Xu D, Ma Q. Single-cell biological network inference using a heterogeneous graph transformer. Nat Commun 2023; 14:964. [PMID: 36810839 PMCID: PMC9944243 DOI: 10.1038/s41467-023-36559-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Accepted: 02/06/2023] [Indexed: 02/23/2023] Open
Abstract
Single-cell multi-omics (scMulti-omics) allows the quantification of multiple modalities simultaneously to capture the intricacy of complex molecular mechanisms and cellular heterogeneity. Existing tools cannot effectively infer the active biological networks in diverse cell types and the response of these networks to external stimuli. Here we present DeepMAPS for biological network inference from scMulti-omics. It models scMulti-omics in a heterogeneous graph and learns relations among cells and genes within both local and global contexts in a robust manner using a multi-head graph transformer. Benchmarking results indicate DeepMAPS performs better than existing tools in cell clustering and biological network construction. It also showcases competitive capability in deriving cell-type-specific biological networks in lung tumor leukocyte CITE-seq data and matched diffuse small lymphocytic lymphoma scRNA-seq and scATAC-seq data. In addition, we deploy a DeepMAPS webserver equipped with multiple functionalities and visualizations to improve the usability and reproducibility of scMulti-omics data analysis.
Collapse
Affiliation(s)
- Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Xiaoying Wang
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Tong Xiao
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Yuntao Liu
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Hao Cheng
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Juexin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Yang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Yuzhou Chang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Jinpu Li
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Yuexu Jiang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Li Su
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO, USA
| | - Gang Xin
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Shaopeng Gu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, China.
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA.
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO, USA.
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA.
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|