1
|
Zhao F, Ma X, Yao B, Lu Q, Chen L. scaDA: A novel statistical method for differential analysis of single-cell chromatin accessibility sequencing data. PLoS Comput Biol 2024; 20:e1011854. [PMID: 39093856 DOI: 10.1371/journal.pcbi.1011854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 07/17/2024] [Indexed: 08/04/2024] Open
Abstract
Single-cell ATAC-seq sequencing data (scATAC-seq) has been widely used to investigate chromatin accessibility on the single-cell level. One important application of scATAC-seq data analysis is differential chromatin accessibility (DA) analysis. However, the data characteristics of scATAC-seq such as excessive zeros and large variability of chromatin accessibility across cells impose a unique challenge for DA analysis. Existing statistical methods focus on detecting the mean difference of the chromatin accessible regions while overlooking the distribution difference. Motivated by real data exploration that distribution difference exists among cell types, we introduce a novel composite statistical test named "scaDA", which is based on zero-inflated negative binomial model (ZINB), for performing differential distribution analysis of chromatin accessibility by jointly testing the abundance, prevalence and dispersion simultaneously. Benefiting from both dispersion shrinkage and iterative refinement of mean and prevalence parameter estimates, scaDA demonstrates its superiority to both ZINB-based likelihood ratio tests and published methods by achieving the highest power and best FDR control in a comprehensive simulation study. In addition to demonstrating the highest power in three real sc-multiome data analyses, scaDA successfully identifies differentially accessible regions in microglia from sc-multiome data for an Alzheimer's disease (AD) study that are most enriched in GO terms related to neurogenesis and the clinical phenotype of AD, and AD-associated GWAS SNPs.
Collapse
Affiliation(s)
- Fengdi Zhao
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| | - Xin Ma
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| | - Bing Yao
- Department of Human Genetics, Emory University, Atlanta, Georgia, United States of America
| | - Qing Lu
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
2
|
Volteras D, Shahrezaei V, Thomas P. Global transcription regulation revealed from dynamical correlations in time-resolved single-cell RNA sequencing. Cell Syst 2024:S2405-4712(24)00201-1. [PMID: 39121860 DOI: 10.1016/j.cels.2024.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 02/29/2024] [Accepted: 07/11/2024] [Indexed: 08/12/2024]
Abstract
Single-cell transcriptomics reveals significant variations in transcriptional activity across cells. Yet, it remains challenging to identify mechanisms of transcription dynamics from static snapshots. It is thus still unknown what drives global transcription dynamics in single cells. We present a stochastic model of gene expression with cell size- and cell cycle-dependent rates in growing and dividing cells that harnesses temporal dimensions of single-cell RNA sequencing through metabolic labeling protocols and cel lcycle reporters. We develop a parallel and highly scalable approximate Bayesian computation method that corrects for technical variation and accurately quantifies absolute burst frequency, burst size, and degradation rate along the cell cycle at a transcriptome-wide scale. Using Bayesian model selection, we reveal scaling between transcription rates and cell size and unveil waves of gene regulation across the cell cycle-dependent transcriptome. Our study shows that stochastic modeling of dynamical correlations identifies global mechanisms of transcription regulation. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Dimitris Volteras
- Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Vahid Shahrezaei
- Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Philipp Thomas
- Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
3
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
4
|
Stuart T. Progress in multifactorial single-cell chromatin profiling methods. Biochem Soc Trans 2024:BST20231471. [PMID: 39023855 DOI: 10.1042/bst20231471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 07/01/2024] [Accepted: 07/08/2024] [Indexed: 07/20/2024]
Abstract
Chromatin states play a key role in shaping overall cellular states and fates. Building a complete picture of the functional state of chromatin in cells requires the co-detection of several distinct biochemical aspects. These span DNA methylation, chromatin accessibility, chromosomal conformation, histone posttranslational modifications, and more. While this certainly presents a challenging task, over the past few years many new and creative methods have been developed that now enable co-assay of these different aspects of chromatin at single cell resolution. This field is entering an exciting phase, where a confluence of technological improvements, decreased sequencing costs, and computational innovation are presenting new opportunities to dissect the diversity of chromatin states present in tissues, and how these states may influence gene regulation. In this review, I discuss the spectrum of current experimental approaches for multifactorial chromatin profiling, highlight some of the experimental and analytical challenges, as well as some areas for further innovation.
Collapse
Affiliation(s)
- Tim Stuart
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
| |
Collapse
|
5
|
Li MM, Huang Y, Sumathipala M, Liang MQ, Valdeolivas A, Ananthakrishnan AN, Liao K, Marbach D, Zitnik M. Contextual AI models for single-cell protein biology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.18.549602. [PMID: 37503080 PMCID: PMC10370131 DOI: 10.1101/2023.07.18.549602] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here, we introduce Pinnacle, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multi-organ single-cell atlas, Pinnacle learns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues. Pinnacle's embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs' effects across cell types. Pinnacle outperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases, and pinpoints cell type contexts with higher predictive capability than context-free models. Pinnacle's ability to adjust its outputs based on the context in which it operates paves way for large-scale context-specific predictions in biology.
Collapse
Affiliation(s)
- Michelle M. Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yepeng Huang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marissa Sumathipala
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Man Qing Liang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alberto Valdeolivas
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Ashwin N. Ananthakrishnan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
| | - Katherine Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women’s Hospital, Boston, MA, USA
| | - Daniel Marbach
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| |
Collapse
|
6
|
Verhey TB, Seo H, Gillmor A, Thoppey-Manoharan V, Schriemer D, Morrissy S. mosaicMPI: a framework for modular data integration across cohorts and -omics modalities. Nucleic Acids Res 2024; 52:e53. [PMID: 38813827 PMCID: PMC11229337 DOI: 10.1093/nar/gkae442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 04/26/2024] [Accepted: 05/10/2024] [Indexed: 05/31/2024] Open
Abstract
Advances in molecular profiling have facilitated generation of large multi-modal datasets that can potentially reveal critical axes of biological variation underlying complex diseases. Distilling biological meaning, however, requires computational strategies that can perform mosaic integration across diverse cohorts and datatypes. Here, we present mosaicMPI, a framework for discovery of low to high-resolution molecular programs representing both cell types and states, and integration within and across datasets into a network representing biological themes. Using existing datasets in glioblastoma, we demonstrate that this approach robustly integrates single cell and bulk programs across multiple platforms. Clinical and molecular annotations from cohorts are statistically propagated onto this network of programs, yielding a richly characterized landscape of biological themes. This enables deep understanding of individual tumor samples, systematic exploration of relationships between modalities, and generation of a reference map onto which new datasets can rapidly be mapped. mosaicMPI is available at https://github.com/MorrissyLab/mosaicMPI.
Collapse
Affiliation(s)
- Theodore B Verhey
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| | - Heewon Seo
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - Aaron Gillmor
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - Varsha Thoppey-Manoharan
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - David Schriemer
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - Sorana Morrissy
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
7
|
Rautenstrauch P, Ohler U. Liam tackles complex multimodal single-cell data integration challenges. Nucleic Acids Res 2024; 52:e52. [PMID: 38842910 PMCID: PMC11229356 DOI: 10.1093/nar/gkae409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 03/08/2024] [Accepted: 05/29/2024] [Indexed: 06/07/2024] Open
Abstract
Multi-omics characterization of single cells holds outstanding potential for profiling the dynamics and relations of gene regulatory states of thousands of cells. How to integrate multimodal data is an open problem, especially when aiming to combine data from multiple sources or conditions containing both biological and technical variation. We introduce liam, a flexible model for the simultaneous horizontal and vertical integration of paired single-cell multimodal data and mosaic integration of paired with unimodal data. Liam learns a joint low-dimensional representation of the measured modalities, which proves beneficial when the information content or quality of the modalities differ. Its integration accounts for complex batch effects using a tunable combination of conditional and adversarial training, which can be optimized using replicate information while retaining selected biological variation. We demonstrate liam's superior performance on multiple paired multimodal data types, including Multiome and CITE-seq data, and in mosaic integration scenarios. Our detailed benchmarking experiments illustrate the complexities and challenges remaining for integration and the meaningful assessment of its success.
Collapse
Affiliation(s)
- Pia Rautenstrauch
- Humboldt-Universität zu Berlin, Department of Computer Science, 10099 Berlin, Germany
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany
| | - Uwe Ohler
- Humboldt-Universität zu Berlin, Department of Computer Science, 10099 Berlin, Germany
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany
- Humboldt-Universität zu Berlin, Department of Biology, 10099 Berlin, Germany
| |
Collapse
|
8
|
Gondal MN, Shah SUR, Chinnaiyan AM, Cieslik M. A systematic overview of single-cell transcriptomics databases, their use cases, and limitations. FRONTIERS IN BIOINFORMATICS 2024; 4:1417428. [PMID: 39040140 PMCID: PMC11260681 DOI: 10.3389/fbinf.2024.1417428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Accepted: 06/11/2024] [Indexed: 07/24/2024] Open
Abstract
Rapid advancements in high-throughput single-cell RNA-seq (scRNA-seq) technologies and experimental protocols have led to the generation of vast amounts of transcriptomic data that populates several online databases and repositories. Here, we systematically examined large-scale scRNA-seq databases, categorizing them based on their scope and purpose such as general, tissue-specific databases, disease-specific databases, cancer-focused databases, and cell type-focused databases. Next, we discuss the technical and methodological challenges associated with curating large-scale scRNA-seq databases, along with current computational solutions. We argue that understanding scRNA-seq databases, including their limitations and assumptions, is crucial for effectively utilizing this data to make robust discoveries and identify novel biological insights. Such platforms can help bridge the gap between computational and wet lab scientists through user-friendly web-based interfaces needed for democratizing access to single-cell data. These platforms would facilitate interdisciplinary research, enabling researchers from various disciplines to collaborate effectively. This review underscores the importance of leveraging computational approaches to unravel the complexities of single-cell data and offers a promising direction for future research in the field.
Collapse
Affiliation(s)
- Mahnoor N. Gondal
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI, United States
| | - Saad Ur Rehman Shah
- Gies College of Business, University of Illinois Business College, Champaign, MI, United States
| | - Arul M. Chinnaiyan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI, United States
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States
- Department of Urology, University of Michigan, Ann Arbor, MI, United States
- Howard Hughes Medical Institute, Ann Arbor, MI, United States
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, United States
| | - Marcin Cieslik
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI, United States
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, United States
| |
Collapse
|
9
|
Tasca P, van den Berg BM, Rabelink TJ, Wang G, Heijs B, van Kooten C, de Vries APJ, Kers J. Application of spatial-omics to the classification of kidney biopsy samples in transplantation. Nat Rev Nephrol 2024:10.1038/s41581-024-00861-x. [PMID: 38965417 DOI: 10.1038/s41581-024-00861-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2024] [Indexed: 07/06/2024]
Abstract
Improvement of long-term outcomes through targeted treatment is a primary concern in kidney transplant medicine. Currently, the validation of a rejection diagnosis and subsequent treatment depends on the histological assessment of allograft biopsy samples, according to the Banff classification system. However, the lack of (early) disease-specific tissue markers hinders accurate diagnosis and thus timely intervention. This challenge mainly results from an incomplete understanding of the pathophysiological processes underlying late allograft failure. Integration of large-scale multimodal approaches for investigating allograft biopsy samples might offer new insights into this pathophysiology, which are necessary for the identification of novel therapeutic targets and the development of tailored immunotherapeutic interventions. Several omics technologies - including transcriptomic, proteomic, lipidomic and metabolomic tools (and multimodal data analysis strategies) - can be applied to allograft biopsy investigation. However, despite their successful application in research settings and their potential clinical value, several barriers limit the broad implementation of many of these tools into clinical practice. Among spatial-omics technologies, mass spectrometry imaging, which is under-represented in the transplant field, has the potential to enable multi-omics investigations that might expand the insights gained with current clinical analysis technologies.
Collapse
Affiliation(s)
- Paola Tasca
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, the Netherlands
- Leiden Transplant Center, Leiden University Medical Center, Leiden, the Netherlands
- Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands
| | - Bernard M van den Berg
- Department of Internal Medicine, Division of Nephrology, Einthoven Laboratory of Vascular and Regenerative Medicine, Leiden University Medical Center, Leiden, the Netherlands
| | - Ton J Rabelink
- Department of Internal Medicine, Division of Nephrology, Einthoven Laboratory of Vascular and Regenerative Medicine, Leiden University Medical Center, Leiden, the Netherlands
- The Novo Nordisk Foundation Center for Stem Cell Medicine (Renew), Leiden University Medical Center, Leiden, the Netherlands
| | - Gangqi Wang
- Department of Internal Medicine, Division of Nephrology, Einthoven Laboratory of Vascular and Regenerative Medicine, Leiden University Medical Center, Leiden, the Netherlands
- The Novo Nordisk Foundation Center for Stem Cell Medicine (Renew), Leiden University Medical Center, Leiden, the Netherlands
| | - Bram Heijs
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, the Netherlands
- Bruker Daltonics GmbH & Co. KG, Bremen, Germany
| | - Cees van Kooten
- Leiden Transplant Center, Leiden University Medical Center, Leiden, the Netherlands
- Department of Internal Medicine, Division of Nephrology, Einthoven Laboratory of Vascular and Regenerative Medicine, Leiden University Medical Center, Leiden, the Netherlands
| | - Aiko P J de Vries
- Leiden Transplant Center, Leiden University Medical Center, Leiden, the Netherlands.
- Department of Internal Medicine, Division of Nephrology, Einthoven Laboratory of Vascular and Regenerative Medicine, Leiden University Medical Center, Leiden, the Netherlands.
| | - Jesper Kers
- Leiden Transplant Center, Leiden University Medical Center, Leiden, the Netherlands
- Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands
- Department of Pathology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, the Netherlands
- Center for Analytical Sciences Amsterdam, Van't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam, the Netherlands
| |
Collapse
|
10
|
Yu Y, Hou W, Liu Y, Wang H, Dong L, Mai Y, Chen Q, Li Z, Sun S, Yang J, Cao Z, Zhang P, Zi Y, Liu R, Gao J, Zhang N, Li J, Ren L, Jiang H, Shang J, Zhu S, Wang X, Qing T, Bao D, Li B, Li B, Suo C, Pi Y, Wang X, Dai F, Scherer A, Mattila P, Han J, Zhang L, Jiang H, Thierry-Mieg D, Thierry-Mieg J, Xiao W, Hong H, Tong W, Wang J, Li J, Fang X, Jin L, Xu J, Qian F, Zhang R, Shi L, Zheng Y. Quartet RNA reference materials improve the quality of transcriptomic data through ratio-based profiling. Nat Biotechnol 2024; 42:1118-1132. [PMID: 37679545 PMCID: PMC11251996 DOI: 10.1038/s41587-023-01867-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 06/15/2023] [Indexed: 09/09/2023]
Abstract
Certified RNA reference materials are indispensable for assessing the reliability of RNA sequencing to detect intrinsically small biological differences in clinical settings, such as molecular subtyping of diseases. As part of the Quartet Project for quality control and data integration of multi-omics profiling, we established four RNA reference materials derived from immortalized B-lymphoblastoid cell lines from four members of a monozygotic twin family. Additionally, we constructed ratio-based transcriptome-wide reference datasets between two samples, providing cross-platform and cross-laboratory 'ground truth'. Investigation of the intrinsically subtle biological differences among the Quartet samples enables sensitive assessment of cross-batch integration of transcriptomic measurements at the ratio level. The Quartet RNA reference materials, combined with the ratio-based reference datasets, can serve as unique resources for assessing and improving the quality of transcriptomic data in clinical and biological settings.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Haiyan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | | | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zhihui Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shanyue Sun
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, China
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peipei Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yi Zi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ruimei Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jian Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - He Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Sibo Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xiaolin Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Tao Qing
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ding Bao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bingying Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bin Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Chen Suo
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yan Pi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xia Wang
- National Institute of Metrology, Beijing, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, The Netherlands
| | - Pirkko Mattila
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, The Netherlands
| | | | - Lijun Zhang
- Nanjing Vazyme Biotech Co. Ltd., Nanjing, China
| | | | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
- National Center of Gerontology, Beijing, China
| | - Xiang Fang
- National Institute of Metrology, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA.
| | - Feng Qian
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China.
| | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China.
- National Center of Gerontology, Beijing, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes, Shanghai, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
11
|
Zheng Y, Liu Y, Yang J, Dong L, Zhang R, Tian S, Yu Y, Ren L, Hou W, Zhu F, Mai Y, Han J, Zhang L, Jiang H, Lin L, Lou J, Li R, Lin J, Liu H, Kong Z, Wang D, Dai F, Bao D, Cao Z, Chen Q, Chen Q, Chen X, Gao Y, Jiang H, Li B, Li B, Li J, Liu R, Qing T, Shang E, Shang J, Sun S, Wang H, Wang X, Zhang N, Zhang P, Zhang R, Zhu S, Scherer A, Wang J, Wang J, Huo Y, Liu G, Cao C, Shao L, Xu J, Hong H, Xiao W, Liang X, Lu D, Jin L, Tong W, Ding C, Li J, Fang X, Shi L. Multi-omics data integration using ratio-based quantitative profiling with Quartet reference materials. Nat Biotechnol 2024; 42:1133-1149. [PMID: 37679543 PMCID: PMC11252085 DOI: 10.1038/s41587-023-01934-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 07/31/2023] [Indexed: 09/09/2023]
Abstract
Characterization and integration of the genome, epigenome, transcriptome, proteome and metabolome of different datasets is difficult owing to a lack of ground truth. Here we develop and characterize suites of publicly available multi-omics reference materials of matched DNA, RNA, protein and metabolites derived from immortalized cell lines from a family quartet of parents and monozygotic twin daughters. These references provide built-in truth defined by relationships among the family members and the information flow from DNA to RNA to protein. We demonstrate how using a ratio-based profiling approach that scales the absolute feature values of a study sample relative to those of a concurrently measured common reference sample produces reproducible and comparable data suitable for integration across batches, labs, platforms and omics types. Our study identifies reference-free 'absolute' feature quantification as the root cause of irreproducibility in multi-omics measurement and data integration and establishes the advantages of ratio-based multi-omics profiling with common reference materials.
Collapse
Affiliation(s)
- Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Sha Tian
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Feng Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | | | | | | | - Ling Lin
- Zhangjiang Center for Translational Medicine, Shanghai Biotecan Medical Diagnostics Co. Ltd., Shanghai, China
| | - Jingwei Lou
- Zhangjiang Center for Translational Medicine, Shanghai Biotecan Medical Diagnostics Co. Ltd., Shanghai, China
| | - Ruiqiang Li
- Novogene Bioinformatics Institute, Beijing, China
| | - Jingchao Lin
- Metabo-Profile Biotechnology (Shanghai) Co. Ltd., Shanghai, China
| | | | | | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, China
| | | | - Ding Bao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xingdong Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - He Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bin Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bingying Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, China
| | - Ruimei Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Tao Qing
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Erfei Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shanyue Sun
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Haiyan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xiaolin Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peipei Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ruolan Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Sibo Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jiucun Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jing Wang
- National Institute of Metrology, Beijing, China
| | - Yinbo Huo
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Gang Liu
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Chengming Cao
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Li Shao
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Xiaozhen Liang
- Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, Shanghai, China
| | - Daru Lu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Weida Tong
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Chen Ding
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| |
Collapse
|
12
|
Chen S, Zhu B, Huang S, Hickey JW, Lin KZ, Snyder M, Greenleaf WJ, Nolan GP, Zhang NR, Ma Z. Integration of spatial and single-cell data across modalities with weakly linked features. Nat Biotechnol 2024; 42:1096-1106. [PMID: 37679544 DOI: 10.1038/s41587-023-01935-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 08/02/2023] [Indexed: 09/09/2023]
Abstract
Although single-cell and spatial sequencing methods enable simultaneous measurement of more than one biological modality, no technology can capture all modalities within the same cell. For current data integration methods, the feasibility of cross-modal integration relies on the existence of highly correlated, a priori 'linked' features. We describe matching X-modality via fuzzy smoothed embedding (MaxFuse), a cross-modal data integration method that, through iterative coembedding, data smoothing and cell matching, uses all information in each modality to obtain high-quality integration even when features are weakly linked. MaxFuse is modality-agnostic and demonstrates high robustness and accuracy in the weak linkage scenario, achieving 20~70% relative improvement over existing methods under key evaluation metrics on benchmarking datasets. A prototypical example of weak linkage is the integration of spatial proteomic data with single-cell sequencing data. On two example analyses of this type, MaxFuse enabled the spatial consolidation of proteomic, transcriptomic and epigenomic information at single-cell resolution on the same tissue section.
Collapse
Affiliation(s)
- Shuxiao Chen
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Bokai Zhu
- Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Sijia Huang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - John W Hickey
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Garry P Nolan
- Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA.
- Department of Pathology, Stanford University, Stanford, CA, USA.
| | - Nancy R Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA.
| | - Zongming Ma
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| |
Collapse
|
13
|
Ma Y, Zhou X. Accurate and efficient integrative reference-informed spatial domain detection for spatial transcriptomics. Nat Methods 2024; 21:1231-1244. [PMID: 38844627 DOI: 10.1038/s41592-024-02284-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 04/18/2024] [Indexed: 06/23/2024]
Abstract
Spatially resolved transcriptomics (SRT) studies are becoming increasingly common and large, offering unprecedented opportunities in mapping complex tissue structures and functions. Here we present integrative and reference-informed tissue segmentation (IRIS), a computational method designed to characterize tissue spatial organization in SRT studies through accurately and efficiently detecting spatial domains. IRIS uniquely leverages single-cell RNA sequencing data for reference-informed detection of biologically interpretable spatial domains, integrating multiple SRT slices while explicitly considering correlations both within and across slices. We demonstrate the advantages of IRIS through in-depth analysis of six SRT datasets encompassing diverse technologies, tissues, species and resolutions. In these applications, IRIS achieves substantial accuracy gains (39-1,083%) and speed improvements (4.6-666.0) in moderate-sized datasets, while representing the only method applicable for large datasets including Stereo-seq and 10x Xenium. As a result, IRIS reveals intricate brain structures, uncovers tumor microenvironment heterogeneity and detects structural changes in diabetes-affected testis, all with exceptional speed and accuracy.
Collapse
Affiliation(s)
- Ying Ma
- Department of Biostatistics, Brown University, Providence, RI, USA
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
14
|
Chen R, Nie P, Wang J, Wang GZ. Deciphering brain cellular and behavioral mechanisms: Insights from single-cell and spatial RNA sequencing. WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1865. [PMID: 38972934 DOI: 10.1002/wrna.1865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/05/2024] [Accepted: 05/14/2024] [Indexed: 07/09/2024]
Abstract
The brain is a complex computing system composed of a multitude of interacting neurons. The computational outputs of this system determine the behavior and perception of every individual. Each brain cell expresses thousands of genes that dictate the cell's function and physiological properties. Therefore, deciphering the molecular expression of each cell is of great significance for understanding its characteristics and role in brain function. Additionally, the positional information of each cell can provide crucial insights into their involvement in local brain circuits. In this review, we briefly overview the principles of single-cell RNA sequencing and spatial transcriptomics, the potential issues and challenges in their data processing, and their applications in brain research. We further outline several promising directions in neuroscience that could be integrated with single-cell RNA sequencing, including neurodevelopment, the identification of novel brain microstructures, cognition and behavior, neuronal cell positioning, molecules and cells related to advanced brain functions, sleep-wake cycles/circadian rhythms, and computational modeling of brain function. We believe that the deep integration of these directions with single-cell and spatial RNA sequencing can contribute significantly to understanding the roles of individual cells or cell types in these specific functions, thereby making important contributions to addressing critical questions in those fields. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA in Disease and Development > RNA in Development RNA in Disease and Development > RNA in Disease.
Collapse
Affiliation(s)
- Renrui Chen
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Pengxing Nie
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Jing Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Guang-Zhong Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
15
|
MaxFuse enables data integration across weakly linked spatial and single-cell modalities. Nat Biotechnol 2024; 42:1036-1037. [PMID: 37679547 DOI: 10.1038/s41587-023-01943-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
|
16
|
Liu J, Ma J, Wen J, Zhou X. A Cell Cycle-Aware Network for Data Integration and Label Transferring of Single-Cell RNA-Seq and ATAC-Seq. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2401815. [PMID: 38887194 DOI: 10.1002/advs.202401815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/22/2024] [Indexed: 06/20/2024]
Abstract
In recent years, the integration of single-cell multi-omics data has provided a more comprehensive understanding of cell functions and internal regulatory mechanisms from a non-single omics perspective, but it still suffers many challenges, such as omics-variance, sparsity, cell heterogeneity, and confounding factors. As it is known, the cell cycle is regarded as a confounder when analyzing other factors in single-cell RNA-seq data, but it is not clear how it will work on the integrated single-cell multi-omics data. Here, a cell cycle-aware network (CCAN) is developed to remove cell cycle effects from the integrated single-cell multi-omics data while keeping the cell type-specific variations. This is the first computational model to study the cell-cycle effects in the integration of single-cell multi-omics data. Validations on several benchmark datasets show the outstanding performance of CCAN in a variety of downstream analyses and applications, including removing cell cycle effects and batch effects of scRNA-seq datasets from different protocols, integrating paired and unpaired scRNA-seq and scATAC-seq data, accurately transferring cell type labels from scRNA-seq to scATAC-seq data, and characterizing the differentiation process from hematopoietic stem cells to different lineages in the integration of differentiation data.
Collapse
Affiliation(s)
- Jiajia Liu
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Jian Ma
- Department of Electronic Information and Computer Engineering, The Engineering & Technical College of Chengdu University of Technology, Leshan, Sichuan, 614000, China
| | - Jianguo Wen
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Xiaobo Zhou
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| |
Collapse
|
17
|
Curion F, Theis FJ. Machine learning integrative approaches to advance computational immunology. Genome Med 2024; 16:80. [PMID: 38862979 PMCID: PMC11165829 DOI: 10.1186/s13073-024-01350-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 05/23/2024] [Indexed: 06/13/2024] Open
Abstract
The study of immunology, traditionally reliant on proteomics to evaluate individual immune cells, has been revolutionized by single-cell RNA sequencing. Computational immunologists play a crucial role in analysing these datasets, moving beyond traditional protein marker identification to encompass a more detailed view of cellular phenotypes and their functional roles. Recent technological advancements allow the simultaneous measurements of multiple cellular components-transcriptome, proteome, chromatin, epigenetic modifications and metabolites-within single cells, including in spatial contexts within tissues. This has led to the generation of complex multiscale datasets that can include multimodal measurements from the same cells or a mix of paired and unpaired modalities. Modern machine learning (ML) techniques allow for the integration of multiple "omics" data without the need for extensive independent modelling of each modality. This review focuses on recent advancements in ML integrative approaches applied to immunological studies. We highlight the importance of these methods in creating a unified representation of multiscale data collections, particularly for single-cell and spatial profiling technologies. Finally, we discuss the challenges of these holistic approaches and how they will be instrumental in the development of a common coordinate framework for multiscale studies, thereby accelerating research and enabling discoveries in the computational immunology field.
Collapse
Affiliation(s)
- Fabiola Curion
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
18
|
Gerniers A, Nijssen S, Dupont P. scCross: efficient search for rare subpopulations across multiple single-cell samples. Bioinformatics 2024; 40:btae371. [PMID: 38889273 PMCID: PMC11256925 DOI: 10.1093/bioinformatics/btae371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/21/2024] [Accepted: 06/11/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Identifying rare cell types is an important task to capture the heterogeneity of single-cell data, such as scRNA-seq. The widespread availability of such data enables to aggregate multiple samples, corresponding for example to different donors, into the same study. Yet, such aggregated data is often subject to batch effects between samples. Clustering it therefore generally requires the use of data integration methods, which can lead to overcorrection, making the identification of rare cells difficult. We present scCross, a biclustering method identifying rare subpopulations of cells present across multiple single-cell samples. It jointly identifies a group of cells with specific marker genes by relying on a global sum criterion, computed over entire subpopulation of cells, rather than pairwise comparisons between individual cells. This proves robust with respect to the high variability of scRNA-seq data, in particular batch effects. RESULTS We show through several case studies that scCross is able to identify rare subpopulations across multiple samples without performing prior data integration. Namely, it identifies a cilium subpopulation with potential new ciliary genes from lung cancer cells, which is not detected by typical alternatives. It also highlights rare subpopulations in human pancreas samples sequenced with different protocols, despite visible shifts in expression levels between batches. We further show that scCross outperforms typical alternatives at identifying a target rare cell type in a controlled experiment with artificially created batch effects. This shows the ability of scCross to efficiently identify rare cell subpopulations characterized by specific genes despite the presence of batch effects. AVAILABILITY AND IMPLEMENTATION The R and Scala implementation of scCross is freely available on GitHub, at https://github.com/agerniers/scCross/. A snapshot of the code and the data underlying this article are available on Zenodo, at https://zenodo.org/doi/10.5281/zenodo.10471063.
Collapse
Affiliation(s)
- Alexander Gerniers
- ICTEAM/INGI/Artificial Intelligence and Algorithms Group, UCLouvain, Louvain-la-Neuve 1348, Belgium
| | - Siegfried Nijssen
- ICTEAM/INGI/Artificial Intelligence and Algorithms Group, UCLouvain, Louvain-la-Neuve 1348, Belgium
| | - Pierre Dupont
- ICTEAM/INGI/Artificial Intelligence and Algorithms Group, UCLouvain, Louvain-la-Neuve 1348, Belgium
| |
Collapse
|
19
|
Boldrini M, Xiao Y, Sing T, Zhu C, Jabbi M, Pantazopoulos H, Gürsoy G, Martinowich K, Punzi G, Vallender EJ, Zody M, Berretta S, Hyde TM, Kleinman JE, Marenco S, Roussos P, Lewis DA, Turecki G, Lehner T, Mann JJ. Omics Approaches to Investigate the Pathogenesis of Suicide. Biol Psychiatry 2024:S0006-3223(24)01352-0. [PMID: 38821194 DOI: 10.1016/j.biopsych.2024.05.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/17/2024] [Accepted: 05/23/2024] [Indexed: 06/02/2024]
Abstract
Suicide is the second leading cause of death in U.S. adolescents and young adults and is generally associated with a psychiatric disorder. Suicidal behavior has a complex etiology and pathogenesis. Moderate heritability suggests genetic causes. Associations between childhood and recent life adversity indicate contributions from epigenetic factors. Genomic contributions to suicide pathogenesis remain largely unknown. This article is based on a workshop held to design strategies to identify molecular drivers of suicide neurobiology that would be putative new treatment targets. The panel determined that while bulk tissue studies provide comprehensive information, single-nucleus approaches that identify cell type-specific changes are needed. While single-nuclei techniques lack information on cytoplasm, processes, spines, and synapses, spatial multiomic technologies on intact tissue detect cell alterations specific to brain tissue layers and subregions. Because suicide has genetic and environmental drivers, multiomic approaches that combine cell type-specific epigenome, transcriptome, and proteome provide a more complete picture of pathogenesis. To determine the direction of effect of suicide risk gene variants on RNA and protein expression and how these interact with epigenetic marks, single-nuclei and spatial multiomics quantitative trait loci maps should be integrated with whole-genome sequencing and genome-wide association databases. The workshop concluded with a recommendation for the formation of an international suicide biology consortium that will bring together brain banks and investigators with expertise in cutting-edge omics technologies to delineate the biology of suicide and identify novel potential treatment targets to be tested in cellular and animal models for drug and biomarker discovery to guide suicide prevention.
Collapse
Affiliation(s)
- Maura Boldrini
- Department of Psychiatry, Columbia University, New York, New York; Division of Molecular Imaging and Neuropathology, New York State Psychiatric Institute, New York, New York.
| | - Yang Xiao
- Department of Biomedical Engineering, Columbia University, New York, New York
| | - Tarjinder Sing
- Department of Psychiatry, Columbia University, New York, New York; Division of Molecular Imaging and Neuropathology, New York State Psychiatric Institute, New York, New York; New York Genome Center, New York, New York
| | - Chenxu Zhu
- New York Genome Center, New York, New York; Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York
| | - Mbemba Jabbi
- Department of Psychiatry and Behavioral Sciences, Mulva Clinics for the Neurosciences, Dell Medical School, The University of Texas at Austin, Austin, Texas
| | - Harry Pantazopoulos
- Department of Psychiatry and Human Behavior, University of Mississippi Medical Center, Jackson, Mississippi
| | - Gamze Gürsoy
- New York Genome Center, New York, New York; Departments of Biomedical Informatics and Computer Science, Columbia University, New York, New York
| | - Keri Martinowich
- Lieber Institute for Brain Development, Department of Psychiatry and Behavioral Sciences, Baltimore, Maryland
| | - Giovanna Punzi
- Lieber Institute for Brain Development, Department of Psychiatry and Behavioral Sciences, Baltimore, Maryland
| | - Eric J Vallender
- Department of Psychiatry and Human Behavior, University of Mississippi Medical Center, Jackson, Mississippi
| | | | - Sabina Berretta
- Department of Psychiatry, Harvard Brain Tissue Resource Center, Harvard Medical School, McLean Hospital, Belmont, Massachusetts
| | - Thomas M Hyde
- Lieber Institute for Brain Development, Department of Psychiatry and Behavioral Sciences, Baltimore, Maryland
| | - Joel E Kleinman
- Lieber Institute for Brain Development, Department of Psychiatry and Behavioral Sciences, Baltimore, Maryland
| | - Stefano Marenco
- Human Brain Collection Core, National Institute of Mental Health's (NIMH) Division of Intramural Research Programs, Bethesda, Maryland
| | - Panagiotis Roussos
- Center for Precision Medicine and Translational Therapeutics, Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, New York
| | - David A Lewis
- Departments of Psychiatry and Neuroscience, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Gustavo Turecki
- Department of Psychiatry, Douglas Institute, McGill University, Montréal, Québec, Canada
| | | | - J John Mann
- Department of Psychiatry, Columbia University, New York, New York; Division of Molecular Imaging and Neuropathology, New York State Psychiatric Institute, New York, New York
| |
Collapse
|
20
|
Rodosthenous T, Shahrezaei V, Evangelou M. Multi-view data visualisation via manifold learning. PeerJ Comput Sci 2024; 10:e1993. [PMID: 38855253 PMCID: PMC11157621 DOI: 10.7717/peerj-cs.1993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 03/25/2024] [Indexed: 06/11/2024]
Abstract
Non-linear dimensionality reduction can be performed by manifold learning approaches, such as stochastic neighbour embedding (SNE), locally linear embedding (LLE) and isometric feature mapping (ISOMAP). These methods aim to produce two or three latent embeddings, primarily to visualise the data in intelligible representations. This manuscript proposes extensions of Student's t-distributed SNE (t-SNE), LLE and ISOMAP, for dimensionality reduction and visualisation of multi-view data. Multi-view data refers to multiple types of data generated from the same samples. The proposed multi-view approaches provide more comprehensible projections of the samples compared to the ones obtained by visualising each data-view separately. Commonly, visualisation is used for identifying underlying patterns within the samples. By incorporating the obtained low-dimensional embeddings from the multi-view manifold approaches into the K-means clustering algorithm, it is shown that clusters of the samples are accurately identified. Through extensive comparisons of novel and existing multi-view manifold learning algorithms on real and synthetic data, the proposed multi-view extension of t-SNE, named multi-SNE, is found to have the best performance, quantified both qualitatively and quantitatively by assessing the clusterings obtained. The applicability of multi-SNE is illustrated by its implementation in the newly developed and challenging multi-omics single-cell data. The aim is to visualise and identify cell heterogeneity and cell types in biological tissues relevant to health and disease. In this application, multi-SNE provides an improved performance over single-view manifold learning approaches and a promising solution for unified clustering of multi-omics single-cell data.
Collapse
Affiliation(s)
| | - Vahid Shahrezaei
- Department of Mathematics, Imperial College London, London, United Kingdom
| | - Marina Evangelou
- Department of Mathematics, Imperial College London, London, United Kingdom
| |
Collapse
|
21
|
Lotfollahi M, Yuhan Hao, Theis FJ, Satija R. The future of rapid and automated single-cell data analysis using reference mapping. Cell 2024; 187:2343-2358. [PMID: 38729109 PMCID: PMC11184658 DOI: 10.1016/j.cell.2024.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 03/05/2024] [Accepted: 03/08/2024] [Indexed: 05/12/2024]
Abstract
As the number of single-cell datasets continues to grow rapidly, workflows that map new data to well-curated reference atlases offer enormous promise for the biological community. In this perspective, we discuss key computational challenges and opportunities for single-cell reference-mapping algorithms. We discuss how mapping algorithms will enable the integration of diverse datasets across disease states, molecular modalities, genetic perturbations, and diverse species and will eventually replace manual and laborious unsupervised clustering pipelines.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Yuhan Hao
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK; Department of Mathematics, Technical University of Munich, Garching, Germany.
| | - Rahul Satija
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA.
| |
Collapse
|
22
|
Giansanti V, Giannese F, Botrugno OA, Gandolfi G, Balestrieri C, Antoniotti M, Tonon G, Cittaro D. Scalable integration of multiomic single-cell data using generative adversarial networks. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae300. [PMID: 38696763 DOI: 10.1093/bioinformatics/btae300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/22/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024]
Abstract
MOTIVATION Single-cell profiling has become a common practice to investigate the complexity of tissues, organs, and organisms. Recent technological advances are expanding our capabilities to profile various molecular layers beyond the transcriptome such as, but not limited to, the genome, the epigenome, and the proteome. Depending on the experimental procedure, these data can be obtained from separate assays or the very same cells. Yet, integration of more than two assays is currently not supported by the majority of the computational frameworks avaiable. RESULTS We here propose a Multi-Omic data integration framework based on Wasserstein Generative Adversarial Networks suitable for the analysis of paired or unpaired data with a high number of modalities (>2). At the core of our strategy is a single network trained on all modalities together, limiting the computational burden when many molecular layers are evaluated. AVAILABILITY AND IMPLEMENTATION Source code of our framework is available at https://github.com/vgiansanti/MOWGAN.
Collapse
Affiliation(s)
- Valentina Giansanti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, 20125, Italy
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Francesca Giannese
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Oronza A Botrugno
- Functional Genomics of Cancer Unit, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Università Vita-Salute San Raffaele, Milan, 20132, Italy
| | - Giorgia Gandolfi
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Chiara Balestrieri
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Experimental Hematology Unit, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, 20125, Italy
- Bicocca Bioinformatics Biostatistics and Bioimaging Centre-B4, Università degli Studi di Milano-Bicocca, Milan, 20125, Italy
- Istituto di Bioimmagini e Fisiologia Molecolare, Consiglio Nazionale delle Ricerche (CNR), Milan, 20090, Italy
| | - Giovanni Tonon
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Functional Genomics of Cancer Unit, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Università Vita-Salute San Raffaele, Milan, 20132, Italy
| | - Davide Cittaro
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| |
Collapse
|
23
|
Xu J, Huang D, Zhang X. scmFormer Integrates Large-Scale Single-Cell Proteomics and Transcriptomics Data by Multi-Task Transformer. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307835. [PMID: 38483032 PMCID: PMC11109621 DOI: 10.1002/advs.202307835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/24/2024] [Indexed: 05/23/2024]
Abstract
Transformer-based models have revolutionized single cell RNA-seq (scRNA-seq) data analysis. However, their applicability is challenged by the complexity and scale of single-cell multi-omics data. Here a novel single-cell multi-modal/multi-task transformer (scmFormer) is proposed to fill up the existing blank of integrating single-cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large-scale single-cell multimodal data and heterogeneous multi-batch paired multi-omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell-type labels from single-cell transcriptomics to proteomics data. Using COVID-19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well-suited for spatial multi-omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single-cell multi-omics data.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- University of Chinese Academy of SciencesBeijing100049China
| | - De‐Shuang Huang
- Eastern Institute for Advanced StudyEastern Institute of TechnologyNingbo315200China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- Center of Economic BotanyCore Botanical GardensChinese Academy of SciencesWuhan430074China
| |
Collapse
|
24
|
Cui X, Chen X, Li Z, Gao Z, Chen S, Jiang R. Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity. NATURE COMPUTATIONAL SCIENCE 2024; 4:346-359. [PMID: 38730185 DOI: 10.1038/s43588-024-00625-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 04/05/2024] [Indexed: 05/12/2024]
Abstract
Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models-especially variational autoencoders-have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE's capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively.
Collapse
Affiliation(s)
- Xuejian Cui
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China.
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
25
|
Hu P, Rychik J, Zhao J, Bai H, Bauer A, Yu W, Rand EB, Dodds KM, Goldberg DJ, Tan K, Wilkins BJ, Pei L. Single-cell multiomics guided mechanistic understanding of Fontan-associated liver disease. Sci Transl Med 2024; 16:eadk6213. [PMID: 38657025 PMCID: PMC11103255 DOI: 10.1126/scitranslmed.adk6213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 04/02/2024] [Indexed: 04/26/2024]
Abstract
The Fontan operation is the current standard of care for single-ventricle congenital heart disease. Individuals with a Fontan circulation (FC) exhibit central venous hypertension and face life-threatening complications of hepatic fibrosis, known as Fontan-associated liver disease (FALD). The fundamental biology and mechanisms of FALD are little understood. Here, we generated a transcriptomic and epigenomic atlas of human FALD at single-cell resolution using multiomic snRNA-ATAC-seq. We found profound cell type-specific transcriptomic and epigenomic changes in FC livers. Central hepatocytes (cHep) exhibited the most substantial changes, featuring profound metabolic reprogramming. These cHep changes preceded substantial activation of hepatic stellate cells and liver fibrosis, suggesting cHep as a potential first "responder" in the pathogenesis of FALD. We also identified a network of ligand-receptor pairs that transmit signals from cHep to hepatic stellate cells, which may promote their activation and liver fibrosis. We further experimentally demonstrated that activins A and B promote fibrotic activation in vitro and identified mechanisms of activin A's transcriptional activation in FALD. Together, our single-cell transcriptomic and epigenomic atlas revealed mechanistic insights into the pathogenesis of FALD and may aid identification of potential therapeutic targets.
Collapse
Affiliation(s)
- Po Hu
- Center for Mitochondrial and Epigenomic Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Cardiovascular Institute, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| | - Jack Rychik
- Department of Pediatrics, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
| | - Juanjuan Zhao
- Center for Mitochondrial and Epigenomic Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Cardiovascular Institute, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
| | - Huajun Bai
- Center for Mitochondrial and Epigenomic Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Cardiovascular Institute, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
| | - Aidan Bauer
- Center for Mitochondrial and Epigenomic Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Cardiovascular Institute, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| | - Wenbao Yu
- Center for Childhood Cancer Research, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| | - Elizabeth B. Rand
- Department of Pediatrics, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
| | - Kathryn M. Dodds
- Department of Pediatrics, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- School of Nursing, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| | - David J. Goldberg
- Department of Pediatrics, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
| | - Kai Tan
- Center for Childhood Cancer Research, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| | - Benjamin J. Wilkins
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
| | - Liming Pei
- Center for Mitochondrial and Epigenomic Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Cardiovascular Institute, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia; Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
- Institute for Diabetes, Obesity, and Metabolism, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
- Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA 19104, USA
| |
Collapse
|
26
|
Cao Y, Zhao X, Tang S, Jiang Q, Li S, Li S, Chen S. scButterfly: a versatile single-cell cross-modality translation method via dual-aligned variational autoencoders. Nat Commun 2024; 15:2973. [PMID: 38582890 PMCID: PMC10998864 DOI: 10.1038/s41467-024-47418-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 03/28/2024] [Indexed: 04/08/2024] Open
Abstract
Recent advancements for simultaneously profiling multi-omics modalities within individual cells have enabled the interrogation of cellular heterogeneity and molecular hierarchy. However, technical limitations lead to highly noisy multi-modal data and substantial costs. Although computational methods have been proposed to translate single-cell data across modalities, broad applications of the methods still remain impeded by formidable challenges. Here, we propose scButterfly, a versatile single-cell cross-modality translation method based on dual-aligned variational autoencoders and data augmentation schemes. With comprehensive experiments on multiple datasets, we provide compelling evidence of scButterfly's superiority over baseline methods in preserving cellular heterogeneity while translating datasets of various contexts and in revealing cell type-specific biological insights. Besides, we demonstrate the extensive applications of scButterfly for integrative multi-omics analysis of single-modality data, data enhancement of poor-quality single-cell multi-omics, and automatic cell type annotation of scATAC-seq data. Moreover, scButterfly can be generalized to unpaired data training, perturbation-response analysis, and consecutive translation.
Collapse
Affiliation(s)
- Yichuan Cao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xiamiao Zhao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Qun Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
27
|
Koca MB, Sevilgen FE. Integration of single-cell proteomic datasets through distinctive proteins in cell clusters. Proteomics 2024; 24:e2300282. [PMID: 38135888 DOI: 10.1002/pmic.202300282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/01/2023] [Accepted: 12/04/2023] [Indexed: 12/24/2023]
Abstract
The use of mass spectrometry and antibody-based sequencing technologies at the single-cell level has led to an increase in single-cell proteomic datasets. Integrating these datasets is crucial to eliminate the batch effect that often arises due to their limited sequencing molecules. Although methods for horizontally integrating high-dimensional single-cell transcriptomic datasets can also be applied to single-cell proteomic datasets, a specialized approach explicitly tailored for low-dimensional proteomic datasets may enhance the integration process. Here, we introduce SCPRO-HI, an algorithm for the horizontal integration of antibody-based single-cell proteomic datasets. It utilizes a hierarchical cell anchoring technique to match cells based on the similarity of distinctive proteins for constituting cell clusters. A novel variational auto-encoder model is employed for correcting batch effects on the protein abundances, eliminating the need for mapping them into a new domain. Moreover, we propose a technique for extending the algorithm to high-dimensional datasets. The performance of the SCPRO-HI algorithm is evaluated using simulated and real-world single-cell proteomic datasets. The findings demonstrate our algorithm outperforms state-of-the-art methods, achieving a 75% higher silhouette score while preserving HVPs 13% better. Furthermore, the algorithm shows competitive performance in transcriptomic datasets, suggesting potential for integrating high-dimensional mass-spectrometry-based proteomic datasets.
Collapse
Affiliation(s)
- Mehmet Burak Koca
- Computer Engineering Department, Gebze Technical University, Kocaeli, Türkiye
| | - Fatih Erdoğan Sevilgen
- Institute for Data Science and Artificial Intelligence, Boğaziçi University, İstanbul, Türkiye
| |
Collapse
|
28
|
Li Z, Brittan M, Mills NL. A Multimodal Omics Framework to Empower Target Discovery for Cardiovascular Regeneration. Cardiovasc Drugs Ther 2024; 38:223-236. [PMID: 37421484 PMCID: PMC10959818 DOI: 10.1007/s10557-023-07484-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/19/2023] [Indexed: 07/10/2023]
Abstract
Ischaemic heart disease is a global healthcare challenge with high morbidity and mortality. Early revascularisation in acute myocardial infarction has improved survival; however, limited regenerative capacity and microvascular dysfunction often lead to impaired function and the development of heart failure. New mechanistic insights are required to identify robust targets for the development of novel strategies to promote regeneration. Single-cell RNA sequencing (scRNA-seq) has enabled profiling and analysis of the transcriptomes of individual cells at high resolution. Applications of scRNA-seq have generated single-cell atlases for multiple species, revealed distinct cellular compositions for different regions of the heart, and defined multiple mechanisms involved in myocardial injury-induced regeneration. In this review, we summarise findings from studies of healthy and injured hearts in multiple species and spanning different developmental stages. Based on this transformative technology, we propose a multi-species, multi-omics, meta-analysis framework to drive the discovery of new targets to promote cardiovascular regeneration.
Collapse
Affiliation(s)
- Ziwen Li
- BHF Centre for Cardiovascular Science, The Queen's Medical Research Institute, University of Edinburgh, Edinburgh, UK.
| | - Mairi Brittan
- BHF Centre for Cardiovascular Science, The Queen's Medical Research Institute, University of Edinburgh, Edinburgh, UK
| | - Nicholas L Mills
- BHF Centre for Cardiovascular Science, The Queen's Medical Research Institute, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
29
|
Guo X, Ning J, Chen Y, Liu G, Zhao L, Fan Y, Sun S. Recent advances in differential expression analysis for single-cell RNA-seq and spatially resolved transcriptomic studies. Brief Funct Genomics 2024; 23:95-109. [PMID: 37022699 DOI: 10.1093/bfgp/elad011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 12/09/2022] [Accepted: 03/10/2023] [Indexed: 04/07/2023] Open
Abstract
Differential expression (DE) analysis is a necessary step in the analysis of single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) data. Unlike traditional bulk RNA-seq, DE analysis for scRNA-seq or SRT data has unique characteristics that may contribute to the difficulty of detecting DE genes. However, the plethora of DE tools that work with various assumptions makes it difficult to choose an appropriate one. Furthermore, a comprehensive review on detecting DE genes for scRNA-seq data or SRT data from multi-condition, multi-sample experimental designs is lacking. To bridge such a gap, here, we first focus on the challenges of DE detection, then highlight potential opportunities that facilitate further progress in scRNA-seq or SRT analysis, and finally provide insights and guidance in selecting appropriate DE tools or developing new computational DE methods.
Collapse
Affiliation(s)
- Xiya Guo
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Jin Ning
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Yuanze Chen
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Guoliang Liu
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Liyan Zhao
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Yue Fan
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Shiquan Sun
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| |
Collapse
|
30
|
Danishuddin, Khan S, Kim JJ. Spatial transcriptomics data and analytical methods: An updated perspective. Drug Discov Today 2024; 29:103889. [PMID: 38244672 DOI: 10.1016/j.drudis.2024.103889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/01/2024] [Accepted: 01/15/2024] [Indexed: 01/22/2024]
Abstract
Spatial transcriptomics (ST) is a newly emerging field that integrates high-resolution imaging and transcriptomic data to enable the high-throughput analysis of the spatial localization of transcripts in diverse biological systems. The rapid progress in this field necessitates the development of innovative computational methods to effectively tackle the distinct challenges posed by the analysis of ST data. These platforms, integrating AI techniques, offer a promising avenue for understanding disease mechanisms and expediting drug discovery. Despite significant advances in the development of ST data analysis techniques, there is an ongoing need to enhance these models for increased biological relevance. In this review, we briefly discuss the ST-related databases and current deep-learning-based models for spatial transcriptome data analyses and highlight their roles and future perspectives in biomedical applications.
Collapse
Affiliation(s)
- Danishuddin
- Department of Biotechnology, Yeungnam University, Gyeongsan, Gyeongbuk 38541, Korea.
| | - Shawez Khan
- National Center for Cancer Immune Therapy (CCIT-DK), Department of Oncology, Copenhagen University Hospital, Herlev, Denmark
| | - Jong Joo Kim
- Department of Biotechnology, Yeungnam University, Gyeongsan, Gyeongbuk 38541, Korea.
| |
Collapse
|
31
|
Imbalanced single-cell data integration leads to loss of biological information. Nat Biotechnol 2024:10.1038/s41587-023-02114-x. [PMID: 38429429 DOI: 10.1038/s41587-023-02114-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2024]
|
32
|
Maan H, Zhang L, Yu C, Geuenich MJ, Campbell KR, Wang B. Characterizing the impacts of dataset imbalance on single-cell data integration. Nat Biotechnol 2024:10.1038/s41587-023-02097-9. [PMID: 38429430 DOI: 10.1038/s41587-023-02097-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 12/13/2023] [Indexed: 03/03/2024]
Abstract
Computational methods for integrating single-cell transcriptomic data from multiple samples and conditions do not generally account for imbalances in the cell types measured in different datasets. In this study, we examined how differences in the cell types present, the number of cells per cell type and the cell type proportions across samples affect downstream analyses after integration. The Iniquitate pipeline assesses the robustness of integration results after perturbing the degree of imbalance between datasets. Benchmarking of five state-of-the-art single-cell RNA sequencing integration techniques in 2,600 integration experiments indicates that sample imbalance has substantial impacts on downstream analyses and the biological interpretation of integration results. Imbalance perturbation led to statistically significant variation in unsupervised clustering, cell type classification, differential expression and marker gene annotation, query-to-reference mapping and trajectory inference. We quantified the impacts of imbalance through newly introduced properties-aggregate cell type support and minimum cell type center distance. To better characterize and mitigate impacts of imbalance, we introduce balanced clustering metrics and imbalanced integration guidelines for integration method users.
Collapse
Affiliation(s)
- Hassaan Maan
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada.
- Vector Institute, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
| | - Lin Zhang
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Chengxin Yu
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada
| | - Michael J Geuenich
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada
| | - Kieran R Campbell
- Vector Institute, Toronto, Ontario, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada.
| | - Bo Wang
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada.
- Vector Institute, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
33
|
Navikas V, Kowal J, Rodriguez D, Rivest F, Brajkovic S, Cassano M, Dupouy D. Semi-automated approaches for interrogating spatial heterogeneity of tissue samples. Sci Rep 2024; 14:5025. [PMID: 38424144 PMCID: PMC10904364 DOI: 10.1038/s41598-024-55387-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 02/22/2024] [Indexed: 03/02/2024] Open
Abstract
Tissues are spatially orchestrated ecosystems composed of heterogeneous cell populations and non-cellular elements. Tissue components' interactions shape the biological processes that govern homeostasis and disease, thus comprehensive insights into tissues' composition are crucial for understanding their biology. Recently, advancements in the spatial biology field enabled the in-depth analyses of tissue architecture at single-cell resolution, while preserving the structural context. The increasing number of biomarkers analyzed, together with whole tissue imaging, generate datasets approaching several hundreds of gigabytes in size, which are rich sources of valuable knowledge but require investments in infrastructure and resources for extracting quantitative information. The analysis of multiplex whole-tissue images requires extensive training and experience in data analysis. Here, we showcase how a set of open-source tools can allow semi-automated image data extraction to study the spatial composition of tissues with a focus on tumor microenvironment (TME). With the use of Lunaphore COMET platform, we interrogated lung cancer specimens where we examined the expression of 20 biomarkers. Subsequently, the tissue composition was interrogated using an in-house optimized nuclei detection algorithm followed by a newly developed image artifact exclusion approach. Thereafter, the data was processed using several publicly available tools, highlighting the compatibility of COMET-derived data with currently available image analysis frameworks. In summary, we showcased an innovative semi-automated workflow that highlights the ease of adoption of multiplex imaging to explore TME composition at single-cell resolution using a simple slide in, data out approach. Our workflow is easily transferrable to various cohorts of specimens to provide a toolset for spatial cellular dissection of the tissue composition.
Collapse
Affiliation(s)
| | - Joanna Kowal
- Lunaphore Technologies SA, Tolochenaz, Switzerland
| | | | | | | | | | - Diego Dupouy
- Lunaphore Technologies SA, Tolochenaz, Switzerland.
| |
Collapse
|
34
|
Majd H, Cesiulis A, Samuel RM, Richter MN, Elder N, Guyer RA, Hao MM, Stamp LA, Goldstein AM, Fattahi F. A call for a unified and multimodal definition of cellular identity in the enteric nervous system. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.15.575794. [PMID: 38293133 PMCID: PMC10827084 DOI: 10.1101/2024.01.15.575794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
The enteric nervous system (ENS) is a tantalizing frontier in neuroscience. With the recent emergence of single cell transcriptomic technologies, this rare and poorly understood tissue has begun to be better characterized in recent years. A precise functional mapping of enteric neuron diversity is critical for understanding ENS biology and enteric neuropathies. Nonetheless, this pursuit has faced considerable technical challenges. By leveraging different methods to compare available primary mouse and human ENS datasets, we underscore the urgent need for careful identity annotation, achieved through the harmonization and advancements of wet lab and computational techniques. We took different approaches including differential gene expression, module scoring, co-expression and correlation analysis, unbiased biological function hierarchical clustering, data integration and label transfer to compare and contrast functional annotations of several independently reported ENS datasets. These analyses highlight substantial discrepancies stemming from an overreliance on transcriptomics data without adequate validation in tissues. To achieve a comprehensive understanding of enteric neuron identity and their functional context, it is imperative to expand tissue sources and incorporate innovative technologies such as multiplexed imaging, electrophysiology, spatial transcriptomics, as well as comprehensive profiling of epigenome, proteome, and metabolome. Harnessing human pluripotent stem cell (hPSC) models provides unique opportunities for delineating lineage trees of the human ENS, and offers unparalleled advantages, including their scalability and compatibility with genetic manipulation and unbiased screens. We encourage a paradigm shift in our comprehension of cellular complexity and function in the ENS by calling for large-scale collaborative efforts and research investments.
Collapse
Affiliation(s)
- Homa Majd
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Andrius Cesiulis
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Ryan M Samuel
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Mikayla N Richter
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Nicholas Elder
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Richard A Guyer
- Department of Pediatric Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Marlene M. Hao
- Department of Anatomy and Physiology, the University of Melbourne, Parkville, VIC, Australia
| | - Lincon A. Stamp
- Department of Anatomy and Physiology, the University of Melbourne, Parkville, VIC, Australia
| | - Allan M Goldstein
- Department of Pediatric Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Faranak Fattahi
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
- Program in Craniofacial Biology, University of California, San Francisco, California, USA
- Lead contact
| |
Collapse
|
35
|
Liu J, Ma J, Wen J, Zhou X. A Cell Cycle-aware Network for Data Integration and Label Transferring of Single-cell RNA-seq and ATAC-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.31.578213. [PMID: 38352302 PMCID: PMC10862874 DOI: 10.1101/2024.01.31.578213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
In recent years, the integration of single-cell multi-omics data has provided a more comprehensive understanding of cell functions and internal regulatory mechanisms from a non-single omics perspective, but it still suffers many challenges, such as omics-variance, sparsity, cell heterogeneity and confounding factors. As we know, cell cycle is regarded as a confounder when analyzing other factors in single-cell RNA-seq data, but it's not clear how it will work on the integrated single-cell multi-omics data. Here, we developed a Cell Cycle-Aware Network (CCAN) to remove cell cycle effects from the integrated single-cell multi-omics data while keeping the cell type-specific variations. This is the first computational model to study the cell-cycle effects in the integration of single-cell multi-omics data. Validations on several benchmark datasets show the out-standing performance of CCAN in a variety of downstream analyses and applications, including removing cell cycle effects and batch effects of scRNA-seq datasets from different protocols, integrating paired and unpaired scRNA-seq and scATAC-seq data, accurately transferring cell type labels from scRNA-seq to scATAC-seq data, and characterizing the differentiation process from hematopoietic stem cells to different lineages in the integration of differentiation data.
Collapse
|
36
|
Song D, Wang Q, Yan G, Liu T, Sun T, Li JJ. scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nat Biotechnol 2024; 42:247-252. [PMID: 37169966 PMCID: PMC11182337 DOI: 10.1038/s41587-023-01772-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 03/30/2023] [Indexed: 05/13/2023]
Abstract
We present a statistical simulator, scDesign3, to generate realistic single-cell and spatial omics data, including various cell states, experimental designs and feature modalities, by learning interpretable parameters from real data. Using a unified probabilistic model for single-cell and spatial omics data, scDesign3 infers biologically meaningful parameters; assesses the goodness-of-fit of inferred cell clusters, trajectories and spatial locations; and generates in silico negative and positive controls for benchmarking computational tools.
Collapse
Affiliation(s)
- Dongyuan Song
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA, USA
| | - Qingyang Wang
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Guanao Yan
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Tianyang Liu
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Tianyi Sun
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Jingyi Jessica Li
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA, USA.
- Department of Statistics, University of California, Los Angeles, CA, USA.
- Department of Human Genetics, University of California, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California, Los Angeles, CA, USA.
- Department of Biostatistics, University of California, Los Angeles, CA, USA.
- Radcliffe Institute for Advanced Study, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
37
|
Ghazanfar S, Guibentif C, Marioni JC. Stabilized mosaic single-cell data integration using unshared features. Nat Biotechnol 2024; 42:284-292. [PMID: 37231260 PMCID: PMC10869270 DOI: 10.1038/s41587-023-01766-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 03/28/2023] [Indexed: 05/27/2023]
Abstract
Currently available single-cell omics technologies capture many unique features with different biological information content. Data integration aims to place cells, captured with different technologies, onto a common embedding to facilitate downstream analytical tasks. Current horizontal data integration techniques use a set of common features, thereby ignoring non-overlapping features and losing information. Here we introduce StabMap, a mosaic data integration technique that stabilizes mapping of single-cell data by exploiting the non-overlapping features. StabMap first infers a mosaic data topology based on shared features, then projects all cells onto supervised or unsupervised reference coordinates by traversing shortest paths along the topology. We show that StabMap performs well in various simulation contexts, facilitates 'multi-hop' mosaic data integration where some datasets do not share any features and enables the use of spatial gene expression features for mapping dissociated single-cell data onto a spatial transcriptomic reference.
Collapse
Affiliation(s)
- Shila Ghazanfar
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- School of Mathematics and Statistics, The University of Sydney, Camperdown, New South Wales, Australia.
- Charles Perkins Centre, The University of Sydney, Camperdown, New South Wales, Australia.
| | - Carolina Guibentif
- Sahlgrenska Center for Cancer Research, Inst. Biomedicine, Dept. Microbiology and Immunology, University of Gothenburg, Gothenburg, Sweden
| | - John C Marioni
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
38
|
Andreatta M, Hérault L, Gueguen P, Gfeller D, Berenstein AJ, Carmona SJ. Semi-supervised integration of single-cell transcriptomics data. Nat Commun 2024; 15:872. [PMID: 38287014 PMCID: PMC10825117 DOI: 10.1038/s41467-024-45240-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 01/16/2024] [Indexed: 01/31/2024] Open
Abstract
Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biological variability. In this work we present STACAS, a batch correction method for scRNA-seq that leverages prior knowledge on cell types to preserve biological variability upon integration. Through an open-source benchmark, we show that semi-supervised STACAS outperforms state-of-the-art unsupervised methods, as well as supervised methods such as scANVI and scGen. STACAS scales well to large datasets and is robust to incomplete and imprecise input cell type labels, which are commonly encountered in real-life integration tasks. We argue that the incorporation of prior cell type information should be a common practice in single-cell data integration, and we provide a flexible framework for semi-supervised batch effect correction.
Collapse
Affiliation(s)
- Massimo Andreatta
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Léonard Hérault
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Paul Gueguen
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Ariel J Berenstein
- Laboratorio de Biología Molecular, División Patología, Instituto Multidisciplinario de Investigaciones en Patologías Pediátricas (IMIPP), CONICET-GCBA, Buenos Aires, C1425EFD, Argentina
| | - Santiago J Carmona
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland.
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
39
|
Zhao F, Ma X, Yao B, Chen L. scaDA: A Novel Statistical Method for Differential Analysis of Single-Cell Chromatin Accessibility Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.21.576570. [PMID: 38328112 PMCID: PMC10849518 DOI: 10.1101/2024.01.21.576570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Single-cell ATAC-seq sequencing data (scATAC-seq) has been widely used to investigate chromatin accessibility on the single-cell level. One important application of scATAC-seq data analysis is differential chromatin accessibility analysis. However, the data characteristics of scATAC-seq such as excessive zeros and large variability of chromatin accessibility across cells impose a unique challenge for DA analysis. Existing statistical methods focus on detecting the mean difference of the chromatin accessible regions while overlooking the distribution difference. Motivated by real data exploration that distribution difference exists among cell types, we introduce a novel composite statistical test named "scaDA", which is based on zero-inflated negative binomial model (ZINB), for performing differential distribution analysis of chromatin accessibility by jointly testing the abundance, prevalence and dispersion simultaneously. Benefiting from both dispersion shrinkage and iterative refinement of mean and prevalence parameter estimates, scaDA demonstrates its superiority to both ZINB-based likelihood ratio tests and published methods by achieving the highest power and best FDR control in a comprehensive simulation study. In addition to demonstrating the highest power in three real sc-multiome data analyses, scaDA successfully identifies differentially accessible regions in microglia from sc-multiome data for an Alzheimer's disease (AD) study, regions which are most enriched in GO terms related to neurogenesis, the clinical phenotype of AD, and SNPs identified in AD-associated GWAS.
Collapse
Affiliation(s)
- Fengdi Zhao
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| | - Xin Ma
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| | - Bing Yao
- Department of Human Genetics, Emory University, Atlanta, GA, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| |
Collapse
|
40
|
Wang L, Nie R, Miao X, Cai Y, Wang A, Zhang H, Zhang J, Cai J. InClust+: the deep generative framework with mask modules for multimodal data integration, imputation, and cross-modal generation. BMC Bioinformatics 2024; 25:41. [PMID: 38267858 PMCID: PMC10809631 DOI: 10.1186/s12859-024-05656-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 01/15/2024] [Indexed: 01/26/2024] Open
Abstract
BACKGROUND With the development of single-cell technology, many cell traits can be measured. Furthermore, the multi-omics profiling technology could jointly measure two or more traits in a single cell simultaneously. In order to process the various data accumulated rapidly, computational methods for multimodal data integration are needed. RESULTS Here, we present inClust+, a deep generative framework for the multi-omics. It's built on previous inClust that is specific for transcriptome data, and augmented with two mask modules designed for multimodal data processing: an input-mask module in front of the encoder and an output-mask module behind the decoder. InClust+ was first used to integrate scRNA-seq and MERFISH data from similar cell populations, and to impute MERFISH data based on scRNA-seq data. Then, inClust+ was shown to have the capability to integrate the multimodal data (e.g. tri-modal data with gene expression, chromatin accessibility and protein abundance) with batch effect. Finally, inClust+ was used to integrate an unlabeled monomodal scRNA-seq dataset and two labeled multimodal CITE-seq datasets, transfer labels from CITE-seq datasets to scRNA-seq dataset, and generate the missing modality of protein abundance in monomodal scRNA-seq data. In the above examples, the performance of inClust+ is better than or comparable to the most recent tools in the corresponding task. CONCLUSIONS The inClust+ is a suitable framework for handling multimodal data. Meanwhile, the successful implementation of mask in inClust+ means that it can be applied to other deep learning methods with similar encoder-decoder architecture to broaden the application scope of these models.
Collapse
Affiliation(s)
- Lifei Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China.
| | - Rui Nie
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuexia Miao
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yankai Cai
- School of Economic and Management, China University of Geoscience, Wuhan, China
| | - Anqi Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Hanwen Zhang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Jiang Zhang
- School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| | - Jun Cai
- China National Center for Bioinformation, Beijing, China.
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
41
|
He Z, Hu S, Chen Y, An S, Zhou J, Liu R, Shi J, Wang J, Dong G, Shi J, Zhao J, Ou-Yang L, Zhu Y, Bo X, Ying X. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol 2024:10.1038/s41587-023-02040-y. [PMID: 38263515 DOI: 10.1038/s41587-023-02040-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 10/23/2023] [Indexed: 01/25/2024]
Abstract
Integrating single-cell datasets produced by multiple omics technologies is essential for defining cellular heterogeneity. Mosaic integration, in which different datasets share only some of the measured modalities, poses major challenges, particularly regarding modality alignment and batch effect removal. Here, we present a deep probabilistic framework for the mosaic integration and knowledge transfer (MIDAS) of single-cell multimodal data. MIDAS simultaneously achieves dimensionality reduction, imputation and batch correction of mosaic data by using self-supervised modality alignment and information-theoretic latent disentanglement. We demonstrate its superiority to 19 other methods and reliability by evaluating its performance in trimodal and mosaic integration tasks. We also constructed a single-cell trimodal atlas of human peripheral blood mononuclear cells and tailored transfer learning and reciprocal reference mapping schemes to enable flexible and accurate knowledge transfer from the atlas to new data. Applications in mosaic integration, pseudotime analysis and cross-tissue knowledge transfer on bone marrow mosaic datasets demonstrate the versatility and superiority of MIDAS. MIDAS is available at https://github.com/labomics/midas .
Collapse
Affiliation(s)
- Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiahao Zhou
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Junfeng Shi
- School of Automation, China University of Geosciences, Wuhan, China
| | - Jing Wang
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jinhui Shi
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiaxin Zhao
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China.
| |
Collapse
|
42
|
Park Y, Muttray NP, Hauschild AC. Species-agnostic transfer learning for cross-species transcriptomics data integration without gene orthology. Brief Bioinform 2024; 25:bbae004. [PMID: 38305455 PMCID: PMC10835749 DOI: 10.1093/bib/bbae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/24/2023] [Accepted: 12/10/2023] [Indexed: 02/03/2024] Open
Abstract
Novel hypotheses in biomedical research are often developed or validated in model organisms such as mice and zebrafish and thus play a crucial role. However, due to biological differences between species, translating these findings into human applications remains challenging. Moreover, commonly used orthologous gene information is often incomplete and entails a significant information loss during gene-id conversion. To address these issues, we present a novel methodology for species-agnostic transfer learning with heterogeneous domain adaptation. We extended the cross-domain structure-preserving projection toward out-of-sample prediction. Our approach not only allows knowledge integration and translation across various species without relying on gene orthology but also identifies similar GO among the most influential genes composing the latent space for integration. Subsequently, during the alignment of latent spaces, each composed of species-specific genes, it is possible to identify functional annotations of genes missing from public orthology databases. We evaluated our approach with four different single-cell sequencing datasets focusing on cell-type prediction and compared it against related machine-learning approaches. In summary, the developed model outperforms related methods working without prior knowledge when predicting unseen cell types based on other species' data. The results demonstrate that our novel approach allows knowledge transfer beyond species barriers without the dependency on known gene orthology but utilizing the entire gene sets.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen Göttingen, Germany
| | - Nils P Muttray
- Applied Statistics, Georg-August-Universität Göttingen Göttingen, Germany
| | - Anne-Christin Hauschild
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- Campus-Institute Data Science (CIDAS), Georg-August-Universität Göttingen Göttingen, Germany
| |
Collapse
|
43
|
Kiessling P, Kuppe C. Spatial multi-omics: novel tools to study the complexity of cardiovascular diseases. Genome Med 2024; 16:14. [PMID: 38238823 PMCID: PMC10795303 DOI: 10.1186/s13073-024-01282-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 01/02/2024] [Indexed: 01/22/2024] Open
Abstract
Spatial multi-omic studies have emerged as a promising approach to comprehensively analyze cells in tissues, enabling the joint analysis of multiple data modalities like transcriptome, epigenome, proteome, and metabolome in parallel or even the same tissue section. This review focuses on the recent advancements in spatial multi-omics technologies, including novel data modalities and computational approaches. We discuss the advancements in low-resolution and high-resolution spatial multi-omics methods which can resolve up to 10,000 of individual molecules at subcellular level. By applying and integrating these techniques, researchers have recently gained valuable insights into the molecular circuits and mechanisms which govern cell biology along the cardiovascular disease spectrum. We provide an overview of current data analysis approaches, with a focus on data integration of multi-omic datasets, highlighting strengths and weaknesses of various computational pipelines. These tools play a crucial role in analyzing and interpreting spatial multi-omics datasets, facilitating the discovery of new findings, and enhancing translational cardiovascular research. Despite nontrivial challenges, such as the need for standardization of experimental setups, data analysis, and improved computational tools, the application of spatial multi-omics holds tremendous potential in revolutionizing our understanding of human disease processes and the identification of novel biomarkers and therapeutic targets. Exciting opportunities lie ahead for the spatial multi-omics field and will likely contribute to the advancement of personalized medicine for cardiovascular diseases.
Collapse
Affiliation(s)
- Paul Kiessling
- Department of Nephrology, Rheumatology, and Clinical Immunology, University Hospital RWTH Aachen, Aachen, Germany
| | - Christoph Kuppe
- Department of Nephrology, Rheumatology, and Clinical Immunology, University Hospital RWTH Aachen, Aachen, Germany.
| |
Collapse
|
44
|
Aragones DG, Palomino-Segura M, Sicilia J, Crainiciuc G, Ballesteros I, Sánchez-Cabo F, Hidalgo A, Calvo GF. Variable selection for nonlinear dimensionality reduction of biological datasets through bootstrapping of correlation networks. Comput Biol Med 2024; 168:107827. [PMID: 38086138 DOI: 10.1016/j.compbiomed.2023.107827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/15/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024]
Abstract
Identifying the most relevant variables or features in massive datasets for dimensionality reduction can lead to improved and more informative display, faster computation times, and more explainable models of complex systems. Despite significant advances and available algorithms, this task generally remains challenging, especially in unsupervised settings. In this work, we propose a method that constructs correlation networks using all intervening variables and then selects the most informative ones based on network bootstrapping. The method can be applied in both supervised and unsupervised scenarios. We demonstrate its functionality by applying Uniform Manifold Approximation and Projection for dimensionality reduction to several high-dimensional biological datasets, derived from 4D live imaging recordings of hundreds of morpho-kinetic variables, describing the dynamics of thousands of individual leukocytes at sites of prominent inflammation. We compare our method with other standard ones in the field, such as Principal Component Analysis and Elastic Net, showing that it outperforms them. The proposed method can be employed in a wide range of applications, encompassing data analysis and machine learning.
Collapse
Affiliation(s)
- David G Aragones
- Department of Mathematics & MOLAB-Mathematical Oncology Laboratory, Universidad de Castilla-La Mancha, Ciudad Real, Spain
| | - Miguel Palomino-Segura
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain; Immunophysiology Research Group, Instituto Universitario de Investigación Biosanitaria de Extremadura (INUBE), Badajoz, Spain; Department of Physiology, Faculty of Sciences, University of Extremadura, Badajoz, Spain
| | - Jon Sicilia
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Georgiana Crainiciuc
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Iván Ballesteros
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Fátima Sánchez-Cabo
- Bioinformatics Unit, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Andrés Hidalgo
- Vascular Biology and Therapeutics Program and Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA
| | - Gabriel F Calvo
- Department of Mathematics & MOLAB-Mathematical Oncology Laboratory, Universidad de Castilla-La Mancha, Ciudad Real, Spain.
| |
Collapse
|
45
|
Curry AR, Ooi L, Matosin N. How spatial omics approaches can be used to map the biological impacts of stress in psychiatric disorders: a perspective, overview and technical guide. Stress 2024; 27:2351394. [PMID: 38752853 DOI: 10.1080/10253890.2024.2351394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 04/29/2024] [Indexed: 05/21/2024] Open
Abstract
Exposure to significant levels of stress and trauma throughout life is a leading risk factor for the development of major psychiatric disorders. Despite this, we do not have a comprehensive understanding of the mechanisms that explain how stress raises psychiatric disorder risk. Stress in humans is complex and produces variable molecular outcomes depending on the stress type, timing, and duration. Deciphering how stress increases disorder risk has consequently been challenging to address with the traditional single-target experimental approaches primarily utilized to date. Importantly, the molecular processes that occur following stress are not fully understood but are needed to find novel treatment targets. Sequencing-based omics technologies, allowing for an unbiased investigation of physiological changes induced by stress, are rapidly accelerating our knowledge of the molecular sequelae of stress at a single-cell resolution. Spatial multi-omics technologies are now also emerging, allowing for simultaneous analysis of functional molecular layers, from epigenome to proteome, with anatomical context. The technology has immense potential to transform our understanding of how disorders develop, which we believe will significantly propel our understanding of how specific risk factors, such as stress, contribute to disease course. Here, we provide our perspective of how we believe these technologies will transform our understanding of the neurobiology of stress, and also provided a technical guide to assist molecular psychiatry and stress researchers who wish to implement spatial omics approaches in their own research. Finally, we identify potential future directions using multi-omics technology in stress research.
Collapse
Affiliation(s)
- Amber R Curry
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW, Australia
- Molecular Horizons, School of Chemistry and Molecular Bioscience, Faculty of Science Medicine and Health, University of Wollongong, Wollongong, NSW, Australia
| | - Lezanne Ooi
- Molecular Horizons, School of Chemistry and Molecular Bioscience, Faculty of Science Medicine and Health, University of Wollongong, Wollongong, NSW, Australia
| | - Natalie Matosin
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW, Australia
- Molecular Horizons, School of Chemistry and Molecular Bioscience, Faculty of Science Medicine and Health, University of Wollongong, Wollongong, NSW, Australia
| |
Collapse
|
46
|
Bergman DR, Norton KA, Jain HV, Jackson T. Connecting Agent-Based Models with High-Dimensional Parameter Spaces to Multidimensional Data Using SMoRe ParS: A Surrogate Modeling Approach. Bull Math Biol 2023; 86:11. [PMID: 38159216 PMCID: PMC10757706 DOI: 10.1007/s11538-023-01240-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 11/22/2023] [Indexed: 01/03/2024]
Abstract
Across a broad range of disciplines, agent-based models (ABMs) are increasingly utilized for replicating, predicting, and understanding complex systems and their emergent behavior. In the biological and biomedical sciences, researchers employ ABMs to elucidate complex cellular and molecular interactions across multiple scales under varying conditions. Data generated at these multiple scales, however, presents a computational challenge for robust analysis with ABMs. Indeed, calibrating ABMs remains an open topic of research due to their own high-dimensional parameter spaces. In response to these challenges, we extend and validate our novel methodology, Surrogate Modeling for Reconstructing Parameter Surfaces (SMoRe ParS), arriving at a computationally efficient framework for connecting high dimensional ABM parameter spaces with multidimensional data. Specifically, we modify SMoRe ParS to initially confine high dimensional ABM parameter spaces using unidimensional data, namely, single time-course information of in vitro cancer cell growth assays. Subsequently, we broaden the scope of our approach to encompass more complex ABMs and constrain parameter spaces using multidimensional data. We explore this extension with in vitro cancer cell inhibition assays involving the chemotherapeutic agent oxaliplatin. For each scenario, we validate and evaluate the effectiveness of our approach by comparing how well ABM simulations match the experimental data when using SMoRe ParS-inferred parameters versus parameters inferred by a commonly used direct method. In so doing, we show that our approach of using an explicitly formulated surrogate model as an interlocutor between the ABM and the experimental data effectively calibrates the ABM parameter space to multidimensional data. Our method thus provides a robust and scalable strategy for leveraging multidimensional data to inform multiscale ABMs and explore the uncertainty in their parameters.
Collapse
Affiliation(s)
- Daniel R Bergman
- Department of Mathematics, University of Michigan, 530 Church Street, Ann Arbor, MI, 48109, USA
| | - Kerri-Ann Norton
- Computational Biology Laboratory, Computer Science Program, Bard College, 30 Campus Road, Annandale-on-Hudson, NY, 12504, USA
| | - Harsh Vardhan Jain
- Department of Mathematics & Statistics, University of Minnesota Duluth, 1117 University Drive, Duluth, MN, 55812, USA
| | - Trachette Jackson
- Department of Mathematics, University of Michigan, 530 Church Street, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
47
|
Lee AS, Ayers LJ, Kosicki M, Chan WM, Fozo LN, Pratt BM, Collins TE, Zhao B, Rose MF, Sanchis-Juan A, Fu JM, Wong I, Zhao X, Tenney AP, Lee C, Laricchia KM, Barry BJ, Bradford VR, Lek M, MacArthur DG, Lee EA, Talkowski ME, Brand H, Pennacchio LA, Engle EC. A cell type-aware framework for nominating non-coding variants in Mendelian regulatory disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.22.23300468. [PMID: 38234731 PMCID: PMC10793524 DOI: 10.1101/2023.12.22.23300468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Unsolved Mendelian cases often lack obvious pathogenic coding variants, suggesting potential non-coding etiologies. Here, we present a single cell multi-omic framework integrating embryonic mouse chromatin accessibility, histone modification, and gene expression assays to discover cranial motor neuron (cMN) cis-regulatory elements and subsequently nominate candidate non-coding variants in the congenital cranial dysinnervation disorders (CCDDs), a set of Mendelian disorders altering cMN development. We generated single cell epigenomic profiles for ~86,000 cMNs and related cell types, identifying ~250,000 accessible regulatory elements with cognate gene predictions for ~145,000 putative enhancers. Seventy-five percent of elements (44 of 59) validated in an in vivo transgenic reporter assay, demonstrating that single cell accessibility is a strong predictor of enhancer activity. Applying our cMN atlas to 899 whole genome sequences from 270 genetically unsolved CCDD pedigrees, we achieved significant reduction in our variant search space and nominated candidate variants predicted to regulate known CCDD disease genes MAFB, PHOX2A, CHN1, and EBF3 - as well as new candidates in recurrently mutated enhancers through peak- and gene-centric allelic aggregation. This work provides novel non-coding variant discoveries of relevance to CCDDs and a generalizable framework for nominating non-coding variants of potentially high functional impact in other Mendelian disorders.
Collapse
Affiliation(s)
- Arthur S Lee
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Lauren J Ayers
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Michael Kosicki
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA
| | - Wai-Man Chan
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Howard Hughes Medical Institute, Chevy Chase, MD
| | - Lydia N Fozo
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Brandon M Pratt
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Thomas E Collins
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Boxun Zhao
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | - Matthew F Rose
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Pathology, Boston Children's Hospital, Boston, MA
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA
- Medical Genetics Training Program, Harvard Medical School, Boston, MA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| | - Jack M Fu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Isaac Wong
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Alan P Tenney
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Cassia Lee
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Harvard College, Cambridge, MA
| | - Kristen M Laricchia
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Brenda J Barry
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Howard Hughes Medical Institute, Chevy Chase, MD
| | - Victoria R Bradford
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Monkol Lek
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, NSW, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Eunjung Alice Lee
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
- Department of Genetics, Harvard Medical School, Boston, MA
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Harrison Brand
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
- Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Boston, MA
| | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA
| | - Elizabeth C Engle
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Howard Hughes Medical Institute, Chevy Chase, MD
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
- Medical Genetics Training Program, Harvard Medical School, Boston, MA
- Department of Ophthalmology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| |
Collapse
|
48
|
Guo ZH, Wu Y, Wang S, Zhang Q, Shi JM, Wang YB, Chen ZH. scInterpreter: a knowledge-regularized generative model for interpretably integrating scRNA-seq data. BMC Bioinformatics 2023; 24:481. [PMID: 38104057 PMCID: PMC10724984 DOI: 10.1186/s12859-023-05579-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/23/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND The rapid emergence of single-cell RNA-seq (scRNA-seq) data presents remarkable opportunities for broad investigations through integration analyses. However, most integration models are black boxes that lack interpretability or are hard to train. RESULTS To address the above issues, we propose scInterpreter, a deep learning-based interpretable model. scInterpreter substantially outperforms other state-of-the-art (SOTA) models in multiple benchmark datasets. In addition, scInterpreter is extensible and can integrate and annotate atlas scRNA-seq data. We evaluated the robustness of scInterpreter in a variety of situations. Through comparison experiments, we found that with a knowledge prior, the training process can be significantly accelerated. Finally, we conducted interpretability analysis for each dimension (pathway) of cell representation in the embedding space. CONCLUSIONS The results showed that the cell representations obtained by scInterpreter are full of biological significance. Through weight sorting, we found several new genes related to pathways in PBMC dataset. In general, scInterpreter is an effective and interpretable integration tool. It is expected that scInterpreter will bring great convenience to the study of single-cell transcriptomics.
Collapse
Affiliation(s)
- Zhen-Hao Guo
- College of Electronics and Information Engineering, Tongji University, Shanghai, 200000, China
- Department of Clinical Anesthesiology, Faculty of Anesthesiology, Second Military Medical University / Naval Medical University, Shanghai, 200433, China
| | - Yan Wu
- College of Electronics and Information Engineering, Tongji University, Shanghai, 200000, China.
| | - Siguo Wang
- EIT Institute for Advanced Study, Ningbo, 315201, Zhejiang, China
| | - Qinhu Zhang
- EIT Institute for Advanced Study, Ningbo, 315201, Zhejiang, China
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning, 530007, China
| | - Jin-Ming Shi
- Department of Endocrinology, Aviation General Hospital, Beijing, 100000, China
| | - Yan-Bin Wang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, Zhejiang, China
| | - Zhan-Heng Chen
- Department of Clinical Anesthesiology, Faculty of Anesthesiology, Second Military Medical University / Naval Medical University, Shanghai, 200433, China.
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning, 530007, China.
| |
Collapse
|
49
|
Shree A, Pavan MK, Zafar H. scDREAMER for atlas-level integration of single-cell datasets using deep generative model paired with adversarial classifier. Nat Commun 2023; 14:7781. [PMID: 38012145 PMCID: PMC10682386 DOI: 10.1038/s41467-023-43590-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 11/14/2023] [Indexed: 11/29/2023] Open
Abstract
Integration of heterogeneous single-cell sequencing datasets generated across multiple tissue locations, time, and conditions is essential for a comprehensive understanding of the cellular states and expression programs underlying complex biological systems. Here, we present scDREAMER ( https://github.com/Zafar-Lab/scDREAMER ), a data-integration framework that employs deep generative models and adversarial training for both unsupervised and supervised (scDREAMER-Sup) integration of multiple batches. Using six real benchmarking datasets, we demonstrate that scDREAMER can overcome critical challenges including skewed cell type distribution among batches, nested batch-effects, large number of batches and conservation of development trajectory across batches. Our experiments also show that scDREAMER and scDREAMER-Sup outperform state-of-the-art unsupervised and supervised integration methods respectively in batch-correction and conservation of biological variation. Using a 1 million cells dataset, we demonstrate that scDREAMER is scalable and can perform atlas-level cross-species (e.g., human and mouse) integration while being faster than other deep-learning-based methods.
Collapse
Affiliation(s)
- Ajita Shree
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India
| | - Musale Krushna Pavan
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India
| | - Hamim Zafar
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India.
- Department of Biological Sciences and Bioengineering, Indian Institute of Technology Kanpur, Kanpur, India.
- Mehta Family Centre for Engineering in Medicine, Indian Institute of Technology Kanpur, Kanpur, India.
| |
Collapse
|
50
|
Nazaret A, Fan JL, Lavallée VP, Cornish AE, Kiseliovas V, Masilionis I, Chun J, Bowman RL, Eisman SE, Wang J, Shi L, Levine RL, Mazutis L, Blei D, Pe'er D, Azizi E. Deep generative model deciphers derailed trajectories in acute myeloid leukemia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.11.566719. [PMID: 38014231 PMCID: PMC10680623 DOI: 10.1101/2023.11.11.566719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Single-cell genomics has the potential to map cell states and their dynamics in an unbiased way in response to perturbations like disease. However, elucidating the cell-state transitions from healthy to disease requires analyzing data from perturbed samples jointly with unperturbed reference samples. Existing methods for integrating and jointly visualizing single-cell datasets from distinct contexts tend to remove key biological differences or do not correctly harmonize shared mechanisms. We present Decipher, a model that combines variational autoencoders with deep exponential families to reconstruct derailed trajectories ( https://github.com/azizilab/decipher ). Decipher jointly represents normal and perturbed single-cell RNA-seq datasets, revealing shared and disrupted dynamics. It further introduces a novel approach to visualize data, without the need for methods such as UMAP or TSNE. We demonstrate Decipher on data from acute myeloid leukemia patient bone marrow specimens, showing that it successfully characterizes the divergence from normal hematopoiesis and identifies transcriptional programs that become disrupted in each patient when they acquire NPM1 driver mutations.
Collapse
|