1
|
Mangelinck A, Molitor E, Marchiq I, Alaoui L, Bouaziz M, Andrade-Pereira R, Darville H, Becht E, Lefebvre C. The combined use of scRNA-seq and network propagation highlights key features of pan-cancer Tumor-Infiltrating T cells. PLoS One 2024; 19:e0315980. [PMID: 39729479 DOI: 10.1371/journal.pone.0315980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 12/03/2024] [Indexed: 12/29/2024] Open
Abstract
Improving the selectivity and effectiveness of drugs represents a crucial issue for future therapeutic developments in immuno-oncology. Traditional bulk transcriptomics faces limitations in this context for the early phase of target discovery as resulting gene expression levels represent the average measure from multiple cell populations. Alternatively, single cell RNA sequencing can dive into unique cell populations transcriptome, facilitating the identification of specific targets. Here, we generated Tumor-Infiltrating regulatory T cells (TI-Tregs) and exhausted T cells (Tex) gene signatures from a single cell RNA-seq pan-cancer T cell atlas. To overcome noise and sparsity inherent to single cell transcriptomics, we then propagated the gene signatures by diffusion in a protein-protein interaction network using the Patrimony high-throughput computing platform. This methodology enabled the refining of signatures by rescoring genes based on their biological connectivity and shed light not only on processes characteristics of TI-Treg and Tex development and functions but also on their immunometabolic specificities. The combined use of single cell transcriptomics and network propagation may thus represent an innovative and effective methodology for the characterization of cell populations of interest and eventually the development of new therapeutic strategies in immuno-oncology.
Collapse
Affiliation(s)
| | - Elodie Molitor
- Lincoln, Research & Development, Boulogne-Billancourt, France
| | | | - Lamine Alaoui
- Servier, Research & Development, Gif-sur-Yvette, France
| | | | | | | | - Etienne Becht
- Servier, Research & Development, Gif-sur-Yvette, France
| | | |
Collapse
|
2
|
Fu Z, Jiang S, Sun Y, Zheng S, Zong L, Li P. Cut&tag: a powerful epigenetic tool for chromatin profiling. Epigenetics 2024; 19:2293411. [PMID: 38105608 PMCID: PMC10730171 DOI: 10.1080/15592294.2023.2293411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/05/2023] [Indexed: 12/19/2023] Open
Abstract
Analysis of transcription factors and chromatin modifications at the genome-wide level provides insights into gene regulatory processes, such as transcription, cell differentiation and cellular response. Chromatin immunoprecipitation is the most popular and powerful approach for mapping chromatin, and other enzyme-tethering techniques have recently become available for living cells. Among these, Cleavage Under Targets and Tagmentation (CUT&Tag) is a relatively novel chromatin profiling method that has rapidly gained popularity in the field of epigenetics since 2019. It has also been widely adapted to map chromatin modifications and TFs in different species, illustrating the association of these chromatin epitopes with various physiological and pathological processes. Scalable single-cell CUT&Tag can be combined with distinct platforms to distinguish cellular identity, epigenetic features and even spatial chromatin profiling. In addition, CUT&Tag has been developed as a strategy for joint profiling of the epigenome, transcriptome or proteome on the same sample. In this review, we will mainly consolidate the applications of CUT&Tag and its derivatives on different platforms, give a detailed explanation of the pros and cons of this technique as well as the potential development trends and applications in the future.
Collapse
Affiliation(s)
- Zhijun Fu
- BGI Tech Solutions Co, Ltd. BGI-Shenzhen, Shenzhen, China
| | - Sanjie Jiang
- BGI Tech Solutions Co, Ltd. BGI-Shenzhen, Shenzhen, China
| | - Yiwen Sun
- BGI Tech Solutions Co, Ltd. BGI-Shenzhen, Shenzhen, China
| | - Shanqiao Zheng
- BGI Tech Solutions Co, Ltd. BGI-Shenzhen, Shenzhen, China
| | - Liang Zong
- BGI Tech Solutions Co, Ltd. BGI-Wuhan, Wuhan, China
| | - Peipei Li
- BGI Tech Solutions Co, Ltd. BGI-Shenzhen, Shenzhen, China
| |
Collapse
|
3
|
Shi M, Li X. Addressing scalability and managing sparsity and dropout events in single-cell representation identification with ZIGACL. Brief Bioinform 2024; 26:bbae703. [PMID: 39775477 PMCID: PMC11705091 DOI: 10.1093/bib/bbae703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 11/06/2024] [Accepted: 12/23/2024] [Indexed: 01/11/2025] Open
Abstract
Despite significant advancements in single-cell representation learning, scalability and managing sparsity and dropout events continue to challenge the field as scRNA-seq datasets expand. While current computational tools struggle to maintain both efficiency and accuracy, the accurate connection of these dropout events to specific biological functions usually requires additional, complex experiments, often hampered by potential inaccuracies in cell-type annotation. To tackle these challenges, the Zero-Inflated Graph Attention Collaborative Learning (ZIGACL) method has been developed. This innovative approach combines a Zero-Inflated Negative Binomial model with a Graph Attention Network, leveraging mutual information from neighboring cells to enhance dimensionality reduction and apply dynamic adjustments to the learning process through a co-supervised deep graph clustering model. ZIGACL's integration of denoising and topological embedding significantly improves clustering accuracy and ensures similar cells are grouped closely in the latent space. Comparative analyses across nine real scRNA-seq datasets have shown that ZIGACL significantly enhances single-cell data analysis by offering superior clustering performance and improved stability in cell representations, effectively addressing scalability and managing sparsity and dropout events, thereby advancing our understanding of cellular heterogeneity.
Collapse
Affiliation(s)
- Mingguang Shi
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui, China
| | - Xuefeng Li
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui, China
| |
Collapse
|
4
|
Ma X, Lin L, Zhao Q, Iqbal M. TriTan: an efficient triple nonnegative matrix factorization method for integrative analysis of single-cell multiomics data. Brief Bioinform 2024; 26:bbae615. [PMID: 39581871 PMCID: PMC11586128 DOI: 10.1093/bib/bbae615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 10/15/2024] [Accepted: 11/19/2024] [Indexed: 11/26/2024] Open
Abstract
Single-cell multiomics have opened up tremendous opportunities for understanding gene regulatory networks underlying cell states by simultaneously profiling transcriptomes, epigenomes, and proteomes of the same cell. However, existing computational methods for integrative analysis of these high-dimensional multiomics data are either computationally expensive or limited in interpretation. These limitations pose challenges in the implementation of these methods in large-scale studies and hinder a more in-depth understanding of the underlying regulatory mechanisms. Here, we propose TriTan (Triple inTegrative fast non-negative matrix factorization), an efficient joint factorization method for single-cell multiomics data. TriTan implements a highly efficient factorization algorithm, greatly improving its computational performance. Three matrix factorization produced by TriTan helps in clustering cells, identifying signature features for each cell type, and uncovering feature associations across omics, which facilitates the identification of domains of regulatory chromatin and the prediction of cell-type-specific regulatory networks. We applied TriTan to the single-cell multiomics data obtained from different technologies and benchmarked it against the state-of-the-art methods where it shows highly competitive performance. Furthermore, we showed a range of downstream analyses conducted utilizing TriTan outputs, highlighting its capacity to facilitate interpretation in biological discovery.
Collapse
Affiliation(s)
- Xin Ma
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Lijing Lin
- Centre for Health Informatics, Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Qian Zhao
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Mudassar Iqbal
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| |
Collapse
|
5
|
Tasca P, van den Berg BM, Rabelink TJ, Wang G, Heijs B, van Kooten C, de Vries APJ, Kers J. Application of spatial-omics to the classification of kidney biopsy samples in transplantation. Nat Rev Nephrol 2024; 20:755-766. [PMID: 38965417 DOI: 10.1038/s41581-024-00861-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2024] [Indexed: 07/06/2024]
Abstract
Improvement of long-term outcomes through targeted treatment is a primary concern in kidney transplant medicine. Currently, the validation of a rejection diagnosis and subsequent treatment depends on the histological assessment of allograft biopsy samples, according to the Banff classification system. However, the lack of (early) disease-specific tissue markers hinders accurate diagnosis and thus timely intervention. This challenge mainly results from an incomplete understanding of the pathophysiological processes underlying late allograft failure. Integration of large-scale multimodal approaches for investigating allograft biopsy samples might offer new insights into this pathophysiology, which are necessary for the identification of novel therapeutic targets and the development of tailored immunotherapeutic interventions. Several omics technologies - including transcriptomic, proteomic, lipidomic and metabolomic tools (and multimodal data analysis strategies) - can be applied to allograft biopsy investigation. However, despite their successful application in research settings and their potential clinical value, several barriers limit the broad implementation of many of these tools into clinical practice. Among spatial-omics technologies, mass spectrometry imaging, which is under-represented in the transplant field, has the potential to enable multi-omics investigations that might expand the insights gained with current clinical analysis technologies.
Collapse
Affiliation(s)
- Paola Tasca
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, the Netherlands
- Leiden Transplant Center, Leiden University Medical Center, Leiden, the Netherlands
- Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands
| | - Bernard M van den Berg
- Department of Internal Medicine, Division of Nephrology, Einthoven Laboratory of Vascular and Regenerative Medicine, Leiden University Medical Center, Leiden, the Netherlands
| | - Ton J Rabelink
- Department of Internal Medicine, Division of Nephrology, Einthoven Laboratory of Vascular and Regenerative Medicine, Leiden University Medical Center, Leiden, the Netherlands
- The Novo Nordisk Foundation Center for Stem Cell Medicine (Renew), Leiden University Medical Center, Leiden, the Netherlands
| | - Gangqi Wang
- Department of Internal Medicine, Division of Nephrology, Einthoven Laboratory of Vascular and Regenerative Medicine, Leiden University Medical Center, Leiden, the Netherlands
- The Novo Nordisk Foundation Center for Stem Cell Medicine (Renew), Leiden University Medical Center, Leiden, the Netherlands
| | - Bram Heijs
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, the Netherlands
- Bruker Daltonics GmbH & Co. KG, Bremen, Germany
| | - Cees van Kooten
- Leiden Transplant Center, Leiden University Medical Center, Leiden, the Netherlands
- Department of Internal Medicine, Division of Nephrology, Einthoven Laboratory of Vascular and Regenerative Medicine, Leiden University Medical Center, Leiden, the Netherlands
| | - Aiko P J de Vries
- Leiden Transplant Center, Leiden University Medical Center, Leiden, the Netherlands.
- Department of Internal Medicine, Division of Nephrology, Einthoven Laboratory of Vascular and Regenerative Medicine, Leiden University Medical Center, Leiden, the Netherlands.
| | - Jesper Kers
- Leiden Transplant Center, Leiden University Medical Center, Leiden, the Netherlands
- Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands
- Department of Pathology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, the Netherlands
- Center for Analytical Sciences Amsterdam, Van't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam, the Netherlands
| |
Collapse
|
6
|
Krishnan SN, Ji S, Elhossiny AM, Rao A, Frankel TL, Rao A. Proximogram-A multi-omics network-based framework to capture tissue heterogeneity integrating single-cell omics and spatial profiling. Comput Biol Med 2024; 182:109082. [PMID: 39255657 DOI: 10.1016/j.compbiomed.2024.109082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 08/26/2024] [Accepted: 08/27/2024] [Indexed: 09/12/2024]
Abstract
The increasing availability of patient-derived multimodal biological data for various diseases has opened up avenues for finding the optimal methods for jointly leveraging the information extracted in a customizable and scalable manner. Here, we propose the Proximogram, a graph-based representation that provides a joint construct for embedding independently obtained omics and spatial data. To evaluate the representation, we generated proximograms from 2 distinct biological sources, namely, multiplexed immunofluorescence images and single-cell RNA-seq data obtained from patients across two pancreatic diseases that include normal and chronic Pancreatitis (CP) and pancreatic ductal adenocarcinoma (PDAC). The generated proximograms were used as inputs to 2 distinct graph deep-learning models. The improved classification results over simpler spatial-data-based input graphs point to the increased discriminatory power obtained by integrating structural information from single-cell ligand-receptor signaling data and the spatial architecture of cells in each disease class, which can help point to markers of high diagnostic significance.
Collapse
Affiliation(s)
- Santhoshi N Krishnan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Sunjong Ji
- Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA
| | - Ahmed M Elhossiny
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | | | | | - Arvind Rao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA; Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA; Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, USA; Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
7
|
Maity D, Sivakumar N, Kamat P, Zamponi N, Min C, Du W, Jayatilaka H, Johnston A, Starich B, Agrawal A, Riley D, Venturutti L, Melnick A, Cerchietti L, Walston J, Phillip JM. Profiling Dynamic Patterns of Single-Cell Motility. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400918. [PMID: 39136147 PMCID: PMC11481225 DOI: 10.1002/advs.202400918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 06/21/2024] [Indexed: 10/17/2024]
Abstract
Cell motility plays an essential role in many biological processes as cells move and interact within their local microenvironments. Current methods for quantifying cell motility typically involve tracking individual cells over time, but the results are often presented as averaged values across cell populations. While informative, these ensemble approaches have limitations in assessing cellular heterogeneity and identifying generalizable patterns of single-cell behaviors, at baseline and in response to perturbations. In this study, CaMI is introduced, a computational framework designed to leverage the single-cell nature of motility data. CaMI identifies and classifies distinct spatio-temporal behaviors of individual cells, enabling robust classification of single-cell motility patterns in a large dataset (n = 74 253 cells). This framework allows quantification of spatial and temporal heterogeneities, determination of single-cell motility behaviors across various biological conditions and provides a visualization scheme for direct interpretation of dynamic cell behaviors. Importantly, CaMI reveals insights that conventional cell motility analyses may overlook, showcasing its utility in uncovering robust biological insights. Together, a multivariate framework is presented to classify emergent patterns of single-cell motility, emphasizing the critical role of cellular heterogeneity in shaping cell behaviors across populations.
Collapse
Affiliation(s)
- Debonil Maity
- Department of Biomedical EngineeringJohns Hopkins UniversityBaltimoreMD21212USA
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
| | - Nikita Sivakumar
- Department of Biomedical EngineeringJohns Hopkins UniversityBaltimoreMD21212USA
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
| | - Pratik Kamat
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMD21212USA
| | - Nahuel Zamponi
- Department of MedicineDivision of Hematology and Medical OncologyWeill Cornell MedicineNew York10065USA
| | - Chanhong Min
- Department of Biomedical EngineeringJohns Hopkins UniversityBaltimoreMD21212USA
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
| | - Wenxuan Du
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMD21212USA
| | - Hasini Jayatilaka
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
| | - Adrian Johnston
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMD21212USA
| | - Bartholomew Starich
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMD21212USA
| | - Anshika Agrawal
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMD21212USA
| | - Deanna Riley
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
| | - Leandro Venturutti
- Department of Pathology and Laboratory MedicineUniversity of British ColumbiaCentre for Lymphoid CancerBritish Columbia Cancer Research Institute VancouverBritish ColumbiaV6T 1Z4Canada
| | - Ari Melnick
- Department of MedicineDivision of Hematology and Medical OncologyWeill Cornell MedicineNew York10065USA
| | - Leandro Cerchietti
- Department of MedicineDivision of Hematology and Medical OncologyWeill Cornell MedicineNew York10065USA
| | - Jeremy Walston
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
- Department of MedicineGeriatrics and GerontologyJohns Hopkins School of MedicineBaltimoreMD21224USA
| | - Jude M. Phillip
- Department of Biomedical EngineeringJohns Hopkins UniversityBaltimoreMD21212USA
- Institute for NanobiotechnologyJohns Hopkins UniversityBaltimoreMD21212USA
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMD21212USA
- Department of OncologySidney Kimmel Comprehensive Cancer CenterJohns Hopkins School of MedicineBaltimoreMD21287USA
| |
Collapse
|
8
|
Mazumder S, Bhattacharya D, Lahiri D, Nag M. Milletomics: a metabolomics centered integrated omics approach toward genetic progression. Funct Integr Genomics 2024; 24:149. [PMID: 39218822 DOI: 10.1007/s10142-024-01430-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 07/25/2024] [Accepted: 08/20/2024] [Indexed: 09/04/2024]
Abstract
Producing alternative staple foods like millet will be essential to feeding ten billion people by 2050. The increased demand for millet is driving researchers to improve its genetic variation. Millets include protein, dietary fiber, phenolic substances, and flavonoid components. Its climate resilience makes millet an appealing crop for agronomic sustainability. Integrative omics technologies could potentially identify and develop millets with desirable phenotypes that may have high agronomic value. Millets' salinity and drought tolerance have been enhanced using transcriptomics. In foxtail, finger, and pearl millet, proteomics has discovered salt-tolerant protein, phytohormone-focused protein, and drought tolerance. Metabolomics studies have revealed that certain metabolic pathways including those involving lignin, flavonoids, phenylpropanoid, and lysophospholipids are critical for many processes, including seed germination, photosynthesis, energy metabolism, and the synthesis of bioactive chemicals necessary for drought tolerance. Metabolomics integration with other omics revealed metabolome engineering and trait-specific metabolite creation. Integrated metabolomics and ionomics are still in the development stage, but they could potentially assist in comprehending the pathway of ionomers to control nutrient levels and biofortify millet. Epigenomic analysis has shown alterations in DNA methylation patterns and chromatin structure in foxtail and pearl millets in response to abiotic stress. Whole-genome sequencing utilizing next-generation sequencing is the most proficient method for finding stress-induced phytoconstituent genes. New genome sequencing enables novel biotechnological interventions including genome-wide association, mutation-based research, and other omics approaches. Millets can breed more effectively by employing next-generation sequencing and genotyping by sequencing, which may mitigate climate change. Millet marker-assisted breeding has advanced with high-throughput markers and combined genotyping technologies.
Collapse
Affiliation(s)
- Saikat Mazumder
- Department of Biotechnology, Institute of Engineering and Management, University of Engineering and Management, Kolkata, West Bengal, India
- Department of Food Technology, Guru Nanak Institute of Technology, Kolkata, West Bengal, India
| | - Debasmita Bhattacharya
- Department of Basic Science and Humanities, Institute of Engineering and Management, Kolkata University of Engineering and Management, Kolkata, West Bengal, India
| | - Dibyajit Lahiri
- Department of Biotechnology, Institute of Engineering and Management, University of Engineering and Management, Kolkata, West Bengal, India
| | - Moupriya Nag
- Department of Biotechnology, Institute of Engineering and Management, University of Engineering and Management, Kolkata, West Bengal, India.
| |
Collapse
|
9
|
Liu J, Ma J, Wen J, Zhou X. A Cell Cycle-Aware Network for Data Integration and Label Transferring of Single-Cell RNA-Seq and ATAC-Seq. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2401815. [PMID: 38887194 PMCID: PMC11336957 DOI: 10.1002/advs.202401815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/22/2024] [Indexed: 06/20/2024]
Abstract
In recent years, the integration of single-cell multi-omics data has provided a more comprehensive understanding of cell functions and internal regulatory mechanisms from a non-single omics perspective, but it still suffers many challenges, such as omics-variance, sparsity, cell heterogeneity, and confounding factors. As it is known, the cell cycle is regarded as a confounder when analyzing other factors in single-cell RNA-seq data, but it is not clear how it will work on the integrated single-cell multi-omics data. Here, a cell cycle-aware network (CCAN) is developed to remove cell cycle effects from the integrated single-cell multi-omics data while keeping the cell type-specific variations. This is the first computational model to study the cell-cycle effects in the integration of single-cell multi-omics data. Validations on several benchmark datasets show the outstanding performance of CCAN in a variety of downstream analyses and applications, including removing cell cycle effects and batch effects of scRNA-seq datasets from different protocols, integrating paired and unpaired scRNA-seq and scATAC-seq data, accurately transferring cell type labels from scRNA-seq to scATAC-seq data, and characterizing the differentiation process from hematopoietic stem cells to different lineages in the integration of differentiation data.
Collapse
Affiliation(s)
- Jiajia Liu
- Center for Computational Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTX77030USA
| | - Jian Ma
- Department of Electronic Information and Computer EngineeringThe Engineering & Technical College of Chengdu University of TechnologyLeshanSichuan614000China
| | - Jianguo Wen
- Center for Computational Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTX77030USA
| | - Xiaobo Zhou
- Center for Computational Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTX77030USA
- McGovern Medical SchoolThe University of Texas Health Science Center at HoustonHoustonTX77030USA
- School of DentistryThe University of Texas Health Science Center at HoustonHoustonTX77030USA
| |
Collapse
|
10
|
Chen K, Han Y, Wang Y, Zhou D, Wu F, Cai W, Zheng S, Xiao Q, Zhang H, Li W. scMoresDB: A comprehensive database of single-cell multi-omics data for human respiratory system. iScience 2024; 27:109567. [PMID: 38617561 PMCID: PMC11015448 DOI: 10.1016/j.isci.2024.109567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 11/26/2023] [Accepted: 03/22/2024] [Indexed: 04/16/2024] Open
Abstract
The human respiratory system is a complex and important system that can suffer a variety of diseases. Single-cell sequencing technologies, applied in many respiratory disease studies, have enhanced our ability in characterizing molecular and phenotypic features at a single-cell resolution. The exponentially increasing data from these studies have consequently led to difficulties in data sharing and analysis. Here, we present scMoresDB, a single-cell multi-omics database platform with extensive omics types tailored for human respiratory diseases. scMoresDB re-analyzes single-cell multi-omics datasets, providing a user-friendly interface with cross-omics search capabilities, interactive visualizations, and analytical tools for comprehensive data sharing and integrative analysis. Our example applications highlight the potential significance of BSG receptor in SARS-CoV-2 infection as well as the involvement of HHIP and TGFB2 in the development and progression of chronic obstructive pulmonary disease. scMoresDB significantly increases accessibility and utility of single-cell data relevant to human respiratory system and associated diseases.
Collapse
Affiliation(s)
- Kang Chen
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
| | - Yutong Han
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
| | - Yanni Wang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
| | - Dingli Zhou
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
| | - Fanjie Wu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
| | - Wenhao Cai
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
| | - Shikang Zheng
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
| | - Qinyuan Xiao
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
- Key Laboratory of Tropical Disease Control of Ministry of Education, Sun Yat-Sen University, Guangzhou 510080, Guangdong Province, China
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou 510080, Guangdong Province, China
| |
Collapse
|
11
|
Wang H, Wang Q, Miao Q, Ma X. Joint learning of data recovering and graph contrastive denoising for incomplete multi-view clustering. INFORMATION FUSION 2024; 104:102155. [DOI: 10.1016/j.inffus.2023.102155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
|
12
|
Bai X, Duren Z, Wan L, Xia LC. Joint inference of clonal structure using single-cell genome and transcriptome sequencing data. NAR Genom Bioinform 2024; 6:lqae017. [PMID: 38486887 PMCID: PMC10939367 DOI: 10.1093/nargab/lqae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/19/2023] [Accepted: 01/29/2024] [Indexed: 03/17/2024] Open
Abstract
Latest advancements in the high-throughput single-cell genome (scDNA) and transcriptome (scRNA) sequencing technologies enabled cell-resolved investigation of tissue clones. However, it remains challenging to cluster and couple single cells for heterogeneous scRNA and scDNA data generated from the same specimen. In this study, we present a computational framework called CCNMF, which employs a novel Coupled-Clone Non-negative Matrix Factorization technique to jointly infer clonal structure for matched scDNA and scRNA data. CCNMF couples multi-omics single cells by linking copy number and gene expression profiles through their general concordance. It successfully resolved the underlying coexisting clones with high correlations between the clonal genome and transcriptome from the same specimen. We validated that CCNMF can achieve high accuracy and robustness using both simulated benchmarks and real-world applications, including an ovarian cancer cell lines mixture, a gastric cancer cell line, and a primary gastric cancer. In summary, CCNMF provides a powerful tool for integrating multi-omics single-cell data, enabling simultaneous resolution of genomic and transcriptomic clonal architecture. This computational framework facilitates the understanding of how cellular gene expression changes in conjunction with clonal genome alternations, shedding light on the cellular genomic difference of subclones that contributes to tumor evolution.
Collapse
Affiliation(s)
- Xiangqi Bai
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Zhana Duren
- Center for Human Genetics and Department of Genetics and Biochemistry, Clemson University, Greenwood, SC 29646, USA
| | - Lin Wan
- NCMIS, LSC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Li C Xia
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
13
|
Mathur S, Singh D, Ranjan R. Recent advances in plant translational genomics for crop improvement. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2024; 139:335-382. [PMID: 38448140 DOI: 10.1016/bs.apcsb.2023.11.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
The growing population, climate change, and limited agricultural resources put enormous pressure on agricultural systems. A plateau in crop yields is occurring and extreme weather events and urbanization threaten the livelihood of farmers. It is imperative that immediate attention is paid to addressing the increasing food demand, ensuring resilience against emerging threats, and meeting the demand for more nutritious, safer food. Under uncertain conditions, it is essential to expand genetic diversity and discover novel crop varieties or variations to develop higher and more stable yields. Genomics plays a significant role in developing abundant and nutrient-dense food crops. An alternative to traditional breeding approach, translational genomics is able to improve breeding programs in a more efficient and precise manner by translating genomic concepts into practical tools. Crop breeding based on genomics offers potential solutions to overcome the limitations of conventional breeding methods, including improved crop varieties that provide more nutritional value and are protected from biotic and abiotic stresses. Genetic markers, such as SNPs and ESTs, contribute to the discovery of QTLs controlling agronomic traits and stress tolerance. In order to meet the growing demand for food, there is a need to incorporate QTLs into breeding programs using marker-assisted selection/breeding and transgenic technologies. This chapter primarily focuses on the recent advances that are made in translational genomics for crop improvement and various omics techniques including transcriptomics, metagenomics, pangenomics, single cell omics etc. Numerous genome editing techniques including CRISPR Cas technology and their applications in crop improvement had been discussed.
Collapse
Affiliation(s)
- Shivangi Mathur
- Plant Molecular Biology Laboratory, Department of Botany, Faculty of Science, Dayalbagh Educational Institute, Agra, India
| | - Deeksha Singh
- Plant Molecular Biology Laboratory, Department of Botany, Faculty of Science, Dayalbagh Educational Institute, Agra, India
| | - Rajiv Ranjan
- Plant Molecular Biology Laboratory, Department of Botany, Faculty of Science, Dayalbagh Educational Institute, Agra, India.
| |
Collapse
|
14
|
Ali M, Yang T, He H, Zhang Y. Plant biotechnology research with single-cell transcriptome: recent advancements and prospects. PLANT CELL REPORTS 2024; 43:75. [PMID: 38381195 DOI: 10.1007/s00299-024-03168-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/05/2024] [Indexed: 02/22/2024]
Abstract
KEY MESSAGE Single-cell transcriptomic techniques have emerged as powerful tools in plant biology, offering high-resolution insights into gene expression at the individual cell level. This review highlights the rapid expansion of single-cell technologies in plants, their potential in understanding plant development, and their role in advancing plant biotechnology research. Single-cell techniques have emerged as powerful tools to enhance our understanding of biological systems, providing high-resolution transcriptomic analysis at the single-cell level. In plant biology, the adoption of single-cell transcriptomics has seen rapid expansion of available technologies and applications. This review article focuses on the latest advancements in the field of single-cell transcriptomic in plants and discusses the potential role of these approaches in plant development and expediting plant biotechnology research in the near future. Furthermore, inherent challenges and limitations of single-cell technology are critically examined to overcome them and enhance our knowledge and understanding.
Collapse
Affiliation(s)
- Muhammad Ali
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
- Peking University-Institute of Advanced Agricultural Sciences, Weifang, China
| | - Tianxia Yang
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding (MOE), China Agricultural University, Beijing, China
| | - Hai He
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
| | - Yu Zhang
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China.
| |
Collapse
|
15
|
Liu J, Ma J, Wen J, Zhou X. A Cell Cycle-aware Network for Data Integration and Label Transferring of Single-cell RNA-seq and ATAC-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.31.578213. [PMID: 38352302 PMCID: PMC10862874 DOI: 10.1101/2024.01.31.578213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
In recent years, the integration of single-cell multi-omics data has provided a more comprehensive understanding of cell functions and internal regulatory mechanisms from a non-single omics perspective, but it still suffers many challenges, such as omics-variance, sparsity, cell heterogeneity and confounding factors. As we know, cell cycle is regarded as a confounder when analyzing other factors in single-cell RNA-seq data, but it's not clear how it will work on the integrated single-cell multi-omics data. Here, we developed a Cell Cycle-Aware Network (CCAN) to remove cell cycle effects from the integrated single-cell multi-omics data while keeping the cell type-specific variations. This is the first computational model to study the cell-cycle effects in the integration of single-cell multi-omics data. Validations on several benchmark datasets show the out-standing performance of CCAN in a variety of downstream analyses and applications, including removing cell cycle effects and batch effects of scRNA-seq datasets from different protocols, integrating paired and unpaired scRNA-seq and scATAC-seq data, accurately transferring cell type labels from scRNA-seq to scATAC-seq data, and characterizing the differentiation process from hematopoietic stem cells to different lineages in the integration of differentiation data.
Collapse
|
16
|
Song D, Wang Q, Yan G, Liu T, Sun T, Li JJ. scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nat Biotechnol 2024; 42:247-252. [PMID: 37169966 PMCID: PMC11182337 DOI: 10.1038/s41587-023-01772-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 03/30/2023] [Indexed: 05/13/2023]
Abstract
We present a statistical simulator, scDesign3, to generate realistic single-cell and spatial omics data, including various cell states, experimental designs and feature modalities, by learning interpretable parameters from real data. Using a unified probabilistic model for single-cell and spatial omics data, scDesign3 infers biologically meaningful parameters; assesses the goodness-of-fit of inferred cell clusters, trajectories and spatial locations; and generates in silico negative and positive controls for benchmarking computational tools.
Collapse
Affiliation(s)
- Dongyuan Song
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA, USA
| | - Qingyang Wang
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Guanao Yan
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Tianyang Liu
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Tianyi Sun
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Jingyi Jessica Li
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA, USA.
- Department of Statistics, University of California, Los Angeles, CA, USA.
- Department of Human Genetics, University of California, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California, Los Angeles, CA, USA.
- Department of Biostatistics, University of California, Los Angeles, CA, USA.
- Radcliffe Institute for Advanced Study, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
17
|
Bawa G, Liu Z, Yu X, Tran LSP, Sun X. Introducing single cell stereo-sequencing technology to transform the plant transcriptome landscape. TRENDS IN PLANT SCIENCE 2024; 29:249-265. [PMID: 37914553 DOI: 10.1016/j.tplants.2023.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 10/01/2023] [Accepted: 10/02/2023] [Indexed: 11/03/2023]
Abstract
Single cell RNA-sequencing (scRNA-seq) advancements have helped detect transcriptional heterogeneities in biological samples. However, scRNA-seq cannot currently provide high-resolution spatial transcriptome information or identify subcellular organs in biological samples. These limitations have led to the development of spatially enhanced-resolution omics-sequencing (Stereo-seq), which combines spatial information with single cell transcriptomics to address the challenges of scRNA-seq alone. In this review, we discuss the advantages of Stereo-seq technology. We anticipate that the application of such an integrated approach in plant research will advance our understanding of biological process in the plant transcriptomics era. We conclude with an outlook of how such integration will enhance crop improvement.
Collapse
Affiliation(s)
- George Bawa
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Key Laboratory of Plant Stress Biology, School of Life Sciences, Henan University, 85 Minglun Street, Kaifeng 475001, PR China
| | - Zhixin Liu
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Key Laboratory of Plant Stress Biology, School of Life Sciences, Henan University, 85 Minglun Street, Kaifeng 475001, PR China
| | - Xiaole Yu
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Key Laboratory of Plant Stress Biology, School of Life Sciences, Henan University, 85 Minglun Street, Kaifeng 475001, PR China
| | - Lam-Son Phan Tran
- Institute of Genomics for Crop Abiotic Stress Tolerance, Department of Plant and Soil Science, Texas Tech University, Lubbock, TX 79409, USA.
| | - Xuwu Sun
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Key Laboratory of Plant Stress Biology, School of Life Sciences, Henan University, 85 Minglun Street, Kaifeng 475001, PR China.
| |
Collapse
|
18
|
Wang L, Nie R, Miao X, Cai Y, Wang A, Zhang H, Zhang J, Cai J. InClust+: the deep generative framework with mask modules for multimodal data integration, imputation, and cross-modal generation. BMC Bioinformatics 2024; 25:41. [PMID: 38267858 PMCID: PMC10809631 DOI: 10.1186/s12859-024-05656-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 01/15/2024] [Indexed: 01/26/2024] Open
Abstract
BACKGROUND With the development of single-cell technology, many cell traits can be measured. Furthermore, the multi-omics profiling technology could jointly measure two or more traits in a single cell simultaneously. In order to process the various data accumulated rapidly, computational methods for multimodal data integration are needed. RESULTS Here, we present inClust+, a deep generative framework for the multi-omics. It's built on previous inClust that is specific for transcriptome data, and augmented with two mask modules designed for multimodal data processing: an input-mask module in front of the encoder and an output-mask module behind the decoder. InClust+ was first used to integrate scRNA-seq and MERFISH data from similar cell populations, and to impute MERFISH data based on scRNA-seq data. Then, inClust+ was shown to have the capability to integrate the multimodal data (e.g. tri-modal data with gene expression, chromatin accessibility and protein abundance) with batch effect. Finally, inClust+ was used to integrate an unlabeled monomodal scRNA-seq dataset and two labeled multimodal CITE-seq datasets, transfer labels from CITE-seq datasets to scRNA-seq dataset, and generate the missing modality of protein abundance in monomodal scRNA-seq data. In the above examples, the performance of inClust+ is better than or comparable to the most recent tools in the corresponding task. CONCLUSIONS The inClust+ is a suitable framework for handling multimodal data. Meanwhile, the successful implementation of mask in inClust+ means that it can be applied to other deep learning methods with similar encoder-decoder architecture to broaden the application scope of these models.
Collapse
Affiliation(s)
- Lifei Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China.
| | - Rui Nie
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuexia Miao
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yankai Cai
- School of Economic and Management, China University of Geoscience, Wuhan, China
| | - Anqi Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Hanwen Zhang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Jiang Zhang
- School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| | - Jun Cai
- China National Center for Bioinformation, Beijing, China.
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
19
|
Zhou S, Li Y, Wu W, Li L. scMMT: a multi-use deep learning approach for cell annotation, protein prediction and embedding in single-cell RNA-seq data. Brief Bioinform 2024; 25:bbad523. [PMID: 38300515 PMCID: PMC10833085 DOI: 10.1093/bib/bbad523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 12/19/2023] [Indexed: 02/02/2024] Open
Abstract
Accurate cell type annotation in single-cell RNA-sequencing data is essential for advancing biological and medical research, particularly in understanding disease progression and tumor microenvironments. However, existing methods are constrained by single feature extraction approaches, lack of adaptability to immune cell types with similar molecular profiles but distinct functions and a failure to account for the impact of cell label noise on model accuracy, all of which compromise the precision of annotation. To address these challenges, we developed a supervised approach called scMMT. We proposed a novel feature extraction technique to uncover more valuable information. Additionally, we constructed a multi-task learning framework based on the GradNorm method to enhance the recognition of challenging immune cells and reduce the impact of label noise by facilitating mutual reinforcement between cell type annotation and protein prediction tasks. Furthermore, we introduced logarithmic weighting and label smoothing mechanisms to enhance the recognition ability of rare cell types and prevent model overconfidence. Through comprehensive evaluations on multiple public datasets, scMMT has demonstrated state-of-the-art performance in various aspects including cell type annotation, rare cell identification, dropout and label noise resistance, protein expression prediction and low-dimensional embedding representation.
Collapse
Affiliation(s)
- Songqi Zhou
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| | - Yang Li
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
- Chongqing Research Institute of Big Data, Peking University, Chongqing, China
| | - Wenyuan Wu
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| | - Li Li
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| |
Collapse
|
20
|
Sturgill D, Wang L, Arda HE. PancrESS - a meta-analysis resource for understanding cell-type specific expression in the human pancreas. BMC Genomics 2024; 25:76. [PMID: 38238687 PMCID: PMC10797729 DOI: 10.1186/s12864-024-09964-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 01/03/2024] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND The human pancreas is composed of specialized cell types producing hormones and enzymes critical to human health. These specialized functions are the result of cell type-specific transcriptional programs which manifest in cell-specific gene expression. Understanding these programs is essential to developing therapies for pancreatic disorders. Transcription in the human pancreas has been widely studied by single-cell RNA technologies, however the diversity of protocols and analysis methods hinders their interpretability in the aggregate. RESULTS In this work, we perform a meta-analysis of pancreatic single-cell RNA sequencing data. We present a database for reference transcriptome abundances and cell-type specificity metrics. This database facilitates the identification and definition of marker genes within the pancreas. Additionally, we introduce a versatile tool which is freely available as an R package, and should permit integration into existing workflows. Our tool accepts count data files generated by widely-used single-cell gene expression platforms in their original format, eliminating an additional pre-formatting step. Although we designed it to calculate expression specificity of pancreas cell types, our tool is agnostic to the biological source of count data, extending its applicability to other biological systems. CONCLUSIONS Our findings enhance the current understanding of expression specificity within the pancreas, surpassing previous work in terms of scope and detail. Furthermore, our database and tool enable researchers to perform similar calculations in diverse biological systems, expanding the applicability of marker gene identification and facilitating comparative analyses.
Collapse
Affiliation(s)
- David Sturgill
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA
| | - Li Wang
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA
| | - H Efsun Arda
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA.
| |
Collapse
|
21
|
Xiong H, Wang Q, Li CC, He A. Single-cell joint profiling of multiple epigenetic proteins and gene transcription. SCIENCE ADVANCES 2024; 10:eadi3664. [PMID: 38170774 PMCID: PMC10796078 DOI: 10.1126/sciadv.adi3664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 12/01/2023] [Indexed: 01/05/2024]
Abstract
Sculpting the epigenome with a combination of histone modifications and transcription factor occupancy determines gene transcription and cell fate specification. Here, we first develop uCoTarget, utilizing a split-pool barcoding strategy for realizing ultrahigh-throughput single-cell joint profiling of multiple epigenetic proteins. Through extensive optimization for sensitivity and multimodality resolution, we demonstrate that uCoTarget enables simultaneous detection of five histone modifications (H3K27ac, H3K4me3, H3K4me1, H3K36me3, and H3K27me3) in 19,860 single cells. We applied uCoTarget to the in vitro generation of hematopoietic stem/progenitor cells (HSPCs) from human embryonic stem cells, presenting multimodal epigenomic profiles in 26,418 single cells. uCoTarget reveals establishment of pairing of HSPC enhancers (H3K27ac) and promoters (H3K4me3) and RUNX1 engagement priming for H3K27ac activation along the HSPC path. We then develop uCoTargetX, an expansion of uCoTarget to simultaneously measure transcriptome and multiple epigenome targets. Together, our methods enable generalizable, versatile multimodal profiles for reconstructing comprehensive epigenome and transcriptome landscapes and analyzing the regulatory interplay at single-cell level.
Collapse
Affiliation(s)
- Haiqing Xiong
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China
| | - Qianhao Wang
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, College of Future Technology, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Chen C. Li
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, College of Future Technology, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Aibin He
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, College of Future Technology, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
- Key laboratory of Carcinogenesis and Translational Research of Ministry of Education of China, Peking University Cancer Hospital & Institute, Peking University, Beijing 100142, China
| |
Collapse
|
22
|
Mir BA, Rehman MU, Tayara H, Chong KT. Improving Enhancer Identification with a Multi-Classifier Stacked Ensemble Model. J Mol Biol 2023; 435:168314. [PMID: 37852600 DOI: 10.1016/j.jmb.2023.168314] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/06/2023] [Accepted: 10/11/2023] [Indexed: 10/20/2023]
Abstract
Enhancers are DNA regions that are responsible for controlling the expression of genes. Enhancers are usually found upstream or downstream of a gene, or even inside a gene's intron region, but are normally located at a distant location from the genes they control. By integrating experimental and computational approaches, it is possible to uncover enhancers within DNA sequences, which possess regulatory properties. Experimental techniques such as ChIP-seq and ATAC-seq can identify genomic regions that are associated with transcription factors or accessible to regulatory proteins. On the other hand, computational techniques can predict enhancers based on sequence features and epigenetic modifications. In our study, we have developed a multi-classifier stacked ensemble (MCSE-enhancer) model that can accurately identify enhancers. We utilized feature descriptors from various physiochemical properties as input for our six baseline classifiers and built a stacked classifier, which outperformed previous enhancer classification techniques in terms of accuracy, specificity, sensitivity, and Mathew's correlation coefficient. Our model achieved an accuracy of 81.5%, representing a 2-3% improvement over existing models.
Collapse
Affiliation(s)
- Bilal Ahmad Mir
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Mobeen Ur Rehman
- Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University, Abu Dhabi 127788, United Arab Emirates.
| | - Hilal Tayara
- School of international Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea.
| |
Collapse
|
23
|
Olson RH, Cohen Kalafut N, Wang D. MANGEM: A web app for multimodal analysis of neuronal gene expression, electrophysiology, and morphology. PATTERNS (NEW YORK, N.Y.) 2023; 4:100847. [PMID: 38035195 PMCID: PMC10682747 DOI: 10.1016/j.patter.2023.100847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/07/2023] [Accepted: 09/01/2023] [Indexed: 12/02/2023]
Abstract
Single-cell techniques like Patch-seq have enabled the acquisition of multimodal data from individual neuronal cells, offering systematic insights into neuronal functions. However, these data can be heterogeneous and noisy. To address this, machine learning methods have been used to align cells from different modalities onto a low-dimensional latent space, revealing multimodal cell clusters. The use of those methods can be challenging without computational expertise or suitable computing infrastructure for computationally expensive methods. To address this, we developed a cloud-based web application, MANGEM (multimodal analysis of neuronal gene expression, electrophysiology, and morphology). MANGEM provides a step-by-step accessible and user-friendly interface to machine learning alignment methods of neuronal multimodal data. It can run asynchronously for large-scale data alignment, provide users with various downstream analyses of aligned cells, and visualize the analytic results. We demonstrated the usage of MANGEM by aligning multimodal data of neuronal cells in the mouse visual cortex.
Collapse
Affiliation(s)
| | - Noah Cohen Kalafut
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
24
|
Tangherloni A, Riva SG, Myers B, Buffa FM, Cazzaniga P. MAGNETO: Cell type marker panel generator from single-cell transcriptomic data. J Biomed Inform 2023; 147:104510. [PMID: 37797704 DOI: 10.1016/j.jbi.2023.104510] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 09/12/2023] [Accepted: 09/29/2023] [Indexed: 10/07/2023]
Abstract
Single-cell RNA sequencing experiments produce data useful to identify different cell types, including uncharacterized and rare ones. This enables us to study the specific functional roles of these cells in different microenvironments and contexts. After identifying a (novel) cell type of interest, it is essential to build succinct marker panels, composed of a few genes referring to cell surface proteins and clusters of differentiation molecules, able to discriminate the desired cells from the other cell populations. In this work, we propose a fully-automatic framework called MAGNETO, which can help construct optimal marker panels starting from a single-cell gene expression matrix and a cell type identity for each cell. MAGNETO builds effective marker panels solving a tailored bi-objective optimization problem, where the first objective regards the identification of the genes able to isolate a specific cell type, while the second conflicting objective concerns the minimization of the total number of genes included in the panel. Our results on three public datasets show that MAGNETO can identify marker panels that identify the cell populations of interest better than state-of-the-art approaches. Finally, by fine-tuning MAGNETO, our results demonstrate that it is possible to obtain marker panels with different specificity levels.
Collapse
Affiliation(s)
- Andrea Tangherloni
- Department of Computing Sciences, Bocconi University, Via Guglielmo Röntgen 1, Milan, 20136, Italy; Bocconi Institute for Data Science and Analytics, Bocconi University, Via Guglielmo Röntgen 1, Milan, 20136, Italy; Department of Human and Social Sciences, University of Bergamo, Piazzale S. Agostino 2, Bergamo, 24129, Italy.
| | - Simone G Riva
- Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Headley Way, Oxford, OX3 9DS, United Kingdom
| | - Brynelle Myers
- Wellcome Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, United Kingdom
| | - Francesca M Buffa
- Department of Computing Sciences, Bocconi University, Via Guglielmo Röntgen 1, Milan, 20136, Italy; Bocconi Institute for Data Science and Analytics, Bocconi University, Via Guglielmo Röntgen 1, Milan, 20136, Italy; Department of Oncology, University of Oxford, Old Road Campus Research Building, Oxford, OX3 7DQ, United Kingdom
| | - Paolo Cazzaniga
- Department of Human and Social Sciences, University of Bergamo, Piazzale S. Agostino 2, Bergamo, 24129, Italy; Bicocca Bioinformatics, Biostatistics, and Bioimaging Centre - B4, Via Follereau 3, Vedano al Lambro, 20854, Italy
| |
Collapse
|
25
|
Tisi A, Palaniappan S, Maccarrone M. Advanced Omics Techniques for Understanding Cochlear Genome, Epigenome, and Transcriptome in Health and Disease. Biomolecules 2023; 13:1534. [PMID: 37892216 PMCID: PMC10605747 DOI: 10.3390/biom13101534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 10/10/2023] [Accepted: 10/13/2023] [Indexed: 10/29/2023] Open
Abstract
Advanced genomics, transcriptomics, and epigenomics techniques are providing unprecedented insights into the understanding of the molecular underpinnings of the central nervous system, including the neuro-sensory cochlea of the inner ear. Here, we report for the first time a comprehensive and updated overview of the most advanced omics techniques for the study of nucleic acids and their applications in cochlear research. We describe the available in vitro and in vivo models for hearing research and the principles of genomics, transcriptomics, and epigenomics, alongside their most advanced technologies (like single-cell omics and spatial omics), which allow for the investigation of the molecular events that occur at a single-cell resolution while retaining the spatial information.
Collapse
Affiliation(s)
- Annamaria Tisi
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, 67100 L’Aquila, Italy;
| | - Sakthimala Palaniappan
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, 67100 L’Aquila, Italy;
| | - Mauro Maccarrone
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, 67100 L’Aquila, Italy;
- Laboratory of Lipid Neurochemistry, European Center for Brain Research (CERC), Santa Lucia Foundation IRCCS, 00143 Rome, Italy
| |
Collapse
|
26
|
Khatib TO, Amanso AM, Knippler CM, Pedro B, Summerbell ER, Zohbi NM, Konen JM, Mouw JK, Marcus AI. A live-cell platform to isolate phenotypically defined subpopulations for spatial multi-omic profiling. PLoS One 2023; 18:e0292554. [PMID: 37819930 PMCID: PMC10566726 DOI: 10.1371/journal.pone.0292554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 09/22/2023] [Indexed: 10/13/2023] Open
Abstract
Numerous techniques have been employed to deconstruct the heterogeneity observed in normal and diseased cellular populations, including single cell RNA sequencing, in situ hybridization, and flow cytometry. While these approaches have revolutionized our understanding of heterogeneity, in isolation they cannot correlate phenotypic information within a physiologically relevant live-cell state with molecular profiles. This inability to integrate a live-cell phenotype-such as invasiveness, cell:cell interactions, and changes in spatial positioning-with multi-omic data creates a gap in understanding cellular heterogeneity. We sought to address this gap by employing lab technologies to design a detailed protocol, termed Spatiotemporal Genomic and Cellular Analysis (SaGA), for the precise imaging-based selection, isolation, and expansion of phenotypically distinct live cells. This protocol requires cells expressing a photoconvertible fluorescent protein and employs live cell confocal microscopy to photoconvert a user-defined single cell or set of cells displaying a phenotype of interest. The total population is then extracted from its microenvironment, and the optically highlighted cells are isolated using fluorescence activated cell sorting. SaGA-isolated cells can then be subjected to multi-omics analysis or cellular propagation for in vitro or in vivo studies. This protocol can be applied to a variety of conditions, creating protocol flexibility for user-specific research interests. The SaGA technique can be accomplished in one workday by non-specialists and results in a phenotypically defined cellular subpopulations for integration with multi-omics techniques. We envision this approach providing multi-dimensional datasets exploring the relationship between live cell phenotypes and multi-omic heterogeneity within normal and diseased cellular populations.
Collapse
Affiliation(s)
- Tala O. Khatib
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, United States of America
- Winship Cancer Institute of Emory University, Atlanta, Georgia, United States of America
- Graduate Program in Biochemistry, Cell, and Developmental Biology, Emory University, Atlanta, Georgia, United States of America
| | - Angelica M. Amanso
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, United States of America
- Winship Cancer Institute of Emory University, Atlanta, Georgia, United States of America
| | - Christina M. Knippler
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, United States of America
- Winship Cancer Institute of Emory University, Atlanta, Georgia, United States of America
| | - Brian Pedro
- Department of Pathology, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
| | - Emily R. Summerbell
- Office of Intramural Training and Education, The National Institutes of Health, Bethesda, Maryland, United States of America
| | - Najdat M. Zohbi
- Graduate Medical Education, Piedmont Macon Medical, Macon, Georgia, United States of America
| | - Jessica M. Konen
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, United States of America
- Winship Cancer Institute of Emory University, Atlanta, Georgia, United States of America
| | - Janna K. Mouw
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, United States of America
- Winship Cancer Institute of Emory University, Atlanta, Georgia, United States of America
| | - Adam I. Marcus
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, United States of America
- Winship Cancer Institute of Emory University, Atlanta, Georgia, United States of America
- Graduate Program in Biochemistry, Cell, and Developmental Biology, Emory University, Atlanta, Georgia, United States of America
| |
Collapse
|
27
|
Zhang J, Ahmad M, Gao H. Application of single-cell multi-omics approaches in horticulture research. MOLECULAR HORTICULTURE 2023; 3:18. [PMID: 37789394 PMCID: PMC10521458 DOI: 10.1186/s43897-023-00067-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 09/15/2023] [Indexed: 10/05/2023]
Abstract
Cell heterogeneity shapes the morphology and function of various tissues and organs in multicellular organisms. Elucidation of the differences among cells and the mechanism of intercellular regulation is essential for an in-depth understanding of the developmental process. In recent years, the rapid development of high-throughput single-cell transcriptome sequencing technologies has influenced the study of plant developmental biology. Additionally, the accuracy and sensitivity of tools used to study the epigenome and metabolome have significantly increased, thus enabling multi-omics analysis at single-cell resolution. Here, we summarize the currently available single-cell multi-omics approaches and their recent applications in plant research, review the single-cell based studies in fruit, vegetable, and ornamental crops, and discuss the potential of such approaches in future horticulture research.
Collapse
Affiliation(s)
- Jun Zhang
- Joint Center for Single Cell Biology, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Mayra Ahmad
- Joint Center for Single Cell Biology, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Hongbo Gao
- Joint Center for Single Cell Biology, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
28
|
Murai T, Matsuda S. Integrated Multimodal Omics and Dietary Approaches for the Management of Neurodegeneration. EPIGENOMES 2023; 7:20. [PMID: 37754272 PMCID: PMC10529483 DOI: 10.3390/epigenomes7030020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 08/26/2023] [Accepted: 08/31/2023] [Indexed: 09/28/2023] Open
Abstract
Neurodegenerative diseases, such as Alzheimer's disease and Parkinson's disease, are caused by a combination of multiple events that damage neuronal function. A well-characterized biomarker of neurodegeneration is the accumulation of proteinaceous aggregates in the brain. However, the gradually worsening symptoms of neurodegenerative diseases are unlikely to be solely due to the result of a mutation in a single gene, but rather a multi-step process involving epigenetic changes. Recently, it has been suggested that a fraction of epigenetic alternations may be correlated to neurodegeneration in the brain. Unlike DNA mutations, epigenetic alterations are reversible, and therefore raise the possibilities for therapeutic intervention, including dietary modifications. Additionally, reactive oxygen species may contribute to the pathogenesis of Alzheimer's disease and Parkinson's disease through epigenetic alternation. Given that the antioxidant properties of plant-derived phytochemicals are likely to exhibit pleiotropic effects against ROS-mediated epigenetic alternation, dietary intervention may be promising for the management of neurodegeneration in these diseases. In this review, the state-of-the-art applications using single-cell multimodal omics approaches, including epigenetics, and dietary approaches for the identification of novel biomarkers and therapeutic approaches for the treatment of neurodegenerative diseases are discussed.
Collapse
Affiliation(s)
- Toshiyuki Murai
- Graduate School of Medicine, Osaka University, 2-2 Yamada-oka, Suita 565-0871, Japan;
| | - Satoru Matsuda
- Department of Food Science and Nutrition, Nara Women’s University, Kita-Uoya Nishimachi, Nara 630-8506, Japan
| |
Collapse
|
29
|
Xue L, Wu Y, Lin Y. Dissecting and improving gene regulatory network inference using single-cell transcriptome data. Genome Res 2023; 33:1609-1621. [PMID: 37580132 PMCID: PMC10620053 DOI: 10.1101/gr.277488.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 08/07/2023] [Indexed: 08/16/2023]
Abstract
Single-cell transcriptome data has been widely used to reconstruct gene regulatory networks (GRNs) controlling critical biological processes such as development and differentiation. Although a growing list of algorithms has been developed to infer GRNs using such data, achieving an inference accuracy consistently higher than random guessing has remained challenging. To address this, it is essential to delineate how the accuracy of regulatory inference is limited. Here, we systematically characterized factors limiting the accuracy of inferred GRNs and demonstrated that using pre-mRNA information can help improve regulatory inference compared to the typically used information (i.e., mature mRNA). Using kinetic modeling and simulated single-cell data sets, we showed that target genes' mature mRNA levels often fail to accurately report upstream regulatory activities because of gene-level and network-level factors, which can be improved by using pre-mRNA levels. We tested this finding on public single-cell RNA-seq data sets using intronic reads as proxies of pre-mRNA levels and can indeed achieve a higher inference accuracy compared to using exonic reads (corresponding to mature mRNAs). Using experimental data sets, we further validated findings from the simulated data sets and identified factors such as transcription factor activity dynamics influencing the accuracy of pre-mRNA-based inference. This work delineates the fundamental limitations of gene regulatory inference and helps improve GRN inference using single-cell RNA-seq data.
Collapse
Affiliation(s)
- Lingfeng Xue
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
| | - Yan Wu
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China, 100871
| | - Yihan Lin
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871;
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China, 100871
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
| |
Collapse
|
30
|
Dong H, Du Z, Ma H, Zhou Z, Yang H, Wang Z. Prediction of distinct populations of innate lymphoid cells by transcriptional profiles. Front Genet 2023; 14:1227452. [PMID: 37719706 PMCID: PMC10500302 DOI: 10.3389/fgene.2023.1227452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/02/2023] [Indexed: 09/19/2023] Open
Abstract
Innate lymphoid cells (ILCs) are a unique type of lymphocyte that differ from adaptive lymphocytes in that they lack antigen receptors, which primarily reside in tissues and are closely associated with fibers. Despite their plasticity and heterogeneity, identifying ILCs in peripheral blood can be difficult due to their small numbers. Accurately and rapidly identifying ILCs is critical for studying homeostasis and inflammation. To address this challenge, we collect single-cell RNA-seq data from 647 patients, including 26,087 transcripts. Background screening, Lasso analysis, and principal component analysis (PCA) are used to select features. Finally, we employ a deep neural network to classify lymphocytes. Our method achieved the highest accuracy compared to other approaches. Furthermore, we identified four genes that play a vital role in lymphocyte development. Adding these gene transcripts into model, we were able to increase the model's AUC. In summary, our study demonstrates the effectiveness of using single-cell transcriptomic analysis combined with machine learning techniques to accurately identify congenital lymphoid cells and advance our understanding of their development and function in the body.
Collapse
Affiliation(s)
- Haiyao Dong
- Department of Thoracic Surgery, China Medical University, Shenyang, China
- Department of Thoracic Surgery, The People’s Hospital of Liaoning Province, Shenyang, China
| | - Zhenguang Du
- Department of No. 3 Oncology, The People’s Hospital of Liaoning Province, Shenyang, China
| | - Haoming Ma
- College of Software, Northeastern University, Shenyang, China
| | - Zhicheng Zhou
- Department of No. 3 Oncology, The People’s Hospital of Liaoning Province, Shenyang, China
| | - Haitao Yang
- Department of Thoracic Surgery, The People’s Hospital of Liaoning Province, Shenyang, China
| | - Zhenyuan Wang
- Department of Thoracic Surgery, China Medical University, Shenyang, China
- Department of Thoracic Surgery, The People’s Hospital of Liaoning Province, Shenyang, China
| |
Collapse
|
31
|
SoRelle ED, Reinoso-Vizcaino NM, Dai J, Barry AP, Chan C, Luftig MA. Epstein-Barr virus evades restrictive host chromatin closure by subverting B cell activation and germinal center regulatory loci. Cell Rep 2023; 42:112958. [PMID: 37561629 PMCID: PMC10559315 DOI: 10.1016/j.celrep.2023.112958] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/02/2023] [Accepted: 07/25/2023] [Indexed: 08/12/2023] Open
Abstract
Chromatin accessibility fundamentally governs gene expression and biological response programs that can be manipulated by pathogens. Here we capture dynamic chromatin landscapes of individual B cells during Epstein-Barr virus (EBV) infection. EBV+ cells that exhibit arrest via antiviral sensing and proliferation-linked DNA damage experience global accessibility reduction. Proliferative EBV+ cells develop expression-linked architectures and motif accessibility profiles resembling in vivo germinal center (GC) phenotypes. Remarkably, EBV elicits dark zone (DZ), light zone (LZ), and post-GC B cell chromatin features despite BCL6 downregulation. Integration of single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq), single-cell RNA sequencing (scRNA-seq), and chromatin immunoprecipitation sequencing (ChIP-seq) data enables genome-wide cis-regulatory predictions implicating EBV nuclear antigens (EBNAs) in phenotype-specific control of GC B cell activation, survival, and immune evasion. Knockouts validate bioinformatically identified regulators (MEF2C and NFE2L2) of EBV-induced GC phenotypes and EBNA-associated loci that regulate gene expression (CD274/PD-L1). These data and methods can inform high-resolution investigations of EBV-host interactions, B cell fates, and virus-mediated lymphomagenesis.
Collapse
Affiliation(s)
- Elliott D SoRelle
- Department of Molecular Genetics and Microbiology, Duke Center for Virology, Duke University School of Medicine, Durham, NC 27710, USA; Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA.
| | - Nicolás M Reinoso-Vizcaino
- Department of Molecular Genetics and Microbiology, Duke Center for Virology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Joanne Dai
- Department of Molecular Genetics and Microbiology, Duke Center for Virology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Ashley P Barry
- Department of Molecular Genetics and Microbiology, Duke Center for Virology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Cliburn Chan
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
| | - Micah A Luftig
- Department of Molecular Genetics and Microbiology, Duke Center for Virology, Duke University School of Medicine, Durham, NC 27710, USA.
| |
Collapse
|
32
|
Itai Y, Rappoport N, Shamir R. Integration of gene expression and DNA methylation data across different experiments. Nucleic Acids Res 2023; 51:7762-7776. [PMID: 37395437 PMCID: PMC10450176 DOI: 10.1093/nar/gkad566] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 06/04/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023] Open
Abstract
Integrative analysis of multi-omic datasets has proven to be extremely valuable in cancer research and precision medicine. However, obtaining multimodal data from the same samples is often difficult. Integrating multiple datasets of different omics remains a challenge, with only a few available algorithms developed to solve it. Here, we present INTEND (IntegratioN of Transcriptomic and EpigeNomic Data), a novel algorithm for integrating gene expression and DNA methylation datasets covering disjoint sets of samples. To enable integration, INTEND learns a predictive model between the two omics by training on multi-omic data measured on the same set of samples. In comprehensive testing on 11 TCGA (The Cancer Genome Atlas) cancer datasets spanning 4329 patients, INTEND achieves significantly superior results compared with four state-of-the-art integration algorithms. We also demonstrate INTEND's ability to uncover connections between DNA methylation and the regulation of gene expression in the joint analysis of two lung adenocarcinoma single-omic datasets from different sources. INTEND's data-driven approach makes it a valuable multi-omic data integration tool. The code for INTEND is available at https://github.com/Shamir-Lab/INTEND.
Collapse
Affiliation(s)
- Yonatan Itai
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nimrod Rappoport
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
33
|
Yeh CH, Chen ZG, Liou CY, Chen MJ. Homogeneous Space Construction and Projection for Single-Cell Expression Prediction Based on Deep Learning. Bioengineering (Basel) 2023; 10:996. [PMID: 37760098 PMCID: PMC10525719 DOI: 10.3390/bioengineering10090996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 08/16/2023] [Accepted: 08/18/2023] [Indexed: 09/29/2023] Open
Abstract
Predicting cellular responses to perturbations is an unsolved problem in biology. Traditional approaches assume that different cell types respond similarly to perturbations. However, this assumption does not take into account the context of genome interactions in different cell types, which leads to compromised prediction quality. More recently, deep learning models used to discover gene-gene relationships can yield more accurate predictions of cellular responses. The huge difference in biological information between different cell types makes it difficult for deep learning models to encode data into a continuous low-dimensional feature space, which means that the features captured by the latent space may not be continuous. Therefore, the mapping relationship between the two conditional spaces learned by the model can only be applied where the real reference data resides, leading to the wrong mapping of the predicted target cells because they are not in the same domain as the reference data. In this paper, we propose an information-navigated variational autoencoder (INVAE), a deep neural network for cell perturbation response prediction. INVAE filters out information that is not conducive to predictive performance. For the remaining information, INVAE constructs a homogeneous space of control conditions, and finds the mapping relationship between the control condition space and the perturbation condition space. By embedding the target unit into the control space and then mapping it to the perturbation space, we can predict the perturbed state of the target unit. Comparing our proposed method with other three state-of-the-art methods on three real datasets, experimental results show that INVAE outperforms existing methods in cell state prediction after perturbation. Furthermore, we demonstrate that filtering out useless information not only improves prediction accuracy but also reveals similarities in how genes in different cell types are regulated following perturbation.
Collapse
Affiliation(s)
- Chia-Hung Yeh
- Department of Electrical Engineering, National Taiwan Normal University, Taipei 10610, Taiwan; (Z.-G.C.); (C.-Y.L.)
- Department of Electrical Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| | - Ze-Guang Chen
- Department of Electrical Engineering, National Taiwan Normal University, Taipei 10610, Taiwan; (Z.-G.C.); (C.-Y.L.)
| | - Cheng-Yue Liou
- Department of Electrical Engineering, National Taiwan Normal University, Taipei 10610, Taiwan; (Z.-G.C.); (C.-Y.L.)
| | - Mei-Juan Chen
- Department of Electrical Engineering, National Dong Hwa University, Hualien 97401, Taiwan
| |
Collapse
|
34
|
Swapna LS, Huang M, Li Y. GTM-decon: guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes. Genome Biol 2023; 24:190. [PMID: 37596691 PMCID: PMC10436670 DOI: 10.1186/s13059-023-03034-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 08/09/2023] [Indexed: 08/20/2023] Open
Abstract
Cell-type composition is an important indicator of health. We present Guided Topic Model for deconvolution (GTM-decon) to automatically infer cell-type-specific gene topic distributions from single-cell RNA-seq data for deconvolving bulk transcriptomes. GTM-decon performs competitively on deconvolving simulated and real bulk data compared with the state-of-the-art methods. Moreover, as demonstrated in deconvolving disease transcriptomes, GTM-decon can infer multiple cell-type-specific gene topic distributions per cell type, which captures sub-cell-type variations. GTM-decon can also use phenotype labels from single-cell or bulk data to infer phenotype-specific gene distributions. In a nested-guided design, GTM-decon identified cell-type-specific differentially expressed genes from bulk breast cancer transcriptomes.
Collapse
Affiliation(s)
| | - Michael Huang
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - Yue Li
- School of Computer Science, McGill University, Montreal, QC, Canada.
| |
Collapse
|
35
|
Abstract
Single-cell RNA sequencing methods have led to improved understanding of the heterogeneity and transcriptomic states present in complex biological systems. Recently, the development of novel single-cell technologies for assaying additional modalities, specifically genomic, epigenomic, proteomic, and spatial data, allows for unprecedented insight into cellular biology. While certain technologies collect multiple measurements from the same cells simultaneously, even when modalities are separately assayed in different cells, we can apply novel computational methods to integrate these data. The application of computational integration methods to multimodal paired and unpaired data results in rich information about the identities of the cells present and the interactions between different levels of biology, such as between genetic variation and transcription. In this review, we both discuss the single-cell technologies for measuring these modalities and describe and characterize a variety of computational integration methods for combining the resulting data to leverage multimodal information toward greater biological insight.
Collapse
Affiliation(s)
- Emily Flynn
- CoLabs, University of California, San Francisco, California, USA;
| | - Ana Almonte-Loya
- CoLabs, University of California, San Francisco, California, USA;
- Biomedical Informatics Program, University of California, San Francisco, California, USA
| | - Gabriela K Fragiadakis
- CoLabs, University of California, San Francisco, California, USA;
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
| |
Collapse
|
36
|
Vandereyken K, Sifrim A, Thienpont B, Voet T. Methods and applications for single-cell and spatial multi-omics. Nat Rev Genet 2023; 24:494-515. [PMID: 36864178 PMCID: PMC9979144 DOI: 10.1038/s41576-023-00580-2] [Citation(s) in RCA: 349] [Impact Index Per Article: 174.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/20/2023] [Indexed: 03/04/2023]
Abstract
The joint analysis of the genome, epigenome, transcriptome, proteome and/or metabolome from single cells is transforming our understanding of cell biology in health and disease. In less than a decade, the field has seen tremendous technological revolutions that enable crucial new insights into the interplay between intracellular and intercellular molecular mechanisms that govern development, physiology and pathogenesis. In this Review, we highlight advances in the fast-developing field of single-cell and spatial multi-omics technologies (also known as multimodal omics approaches), and the computational strategies needed to integrate information across these molecular layers. We demonstrate their impact on fundamental cell biology and translational research, discuss current challenges and provide an outlook to the future.
Collapse
Affiliation(s)
- Katy Vandereyken
- KU Leuven Institute for Single Cell Omics (LISCO), University of Leuven, KU Leuven, Leuven, Belgium
- Department of Human Genetics, University of Leuven, KU Leuven, Leuven, Belgium
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Alejandro Sifrim
- KU Leuven Institute for Single Cell Omics (LISCO), University of Leuven, KU Leuven, Leuven, Belgium
- Department of Human Genetics, University of Leuven, KU Leuven, Leuven, Belgium
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Bernard Thienpont
- KU Leuven Institute for Single Cell Omics (LISCO), University of Leuven, KU Leuven, Leuven, Belgium
- Department of Human Genetics, University of Leuven, KU Leuven, Leuven, Belgium
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Thierry Voet
- KU Leuven Institute for Single Cell Omics (LISCO), University of Leuven, KU Leuven, Leuven, Belgium.
- Department of Human Genetics, University of Leuven, KU Leuven, Leuven, Belgium.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
| |
Collapse
|
37
|
Ritter U. In situ veritas: combining omics and multiplex imaging can facilitate the detection and characterization of cell-cell interactions in tissues. Front Med (Lausanne) 2023; 10:1155057. [PMID: 37332762 PMCID: PMC10270289 DOI: 10.3389/fmed.2023.1155057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 04/25/2023] [Indexed: 06/20/2023] Open
Affiliation(s)
- Uwe Ritter
- Chair for Immunology, University of Regensburg, Regensburg, Germany
- Department for Immunology, Leibniz Institute for Immunotherapy (LIT), Regensburg, Germany
| |
Collapse
|
38
|
Murai T, Matsuda S. Fatty Acid Metabolites and the Tumor Microenvironment as Potent Regulators of Cancer Stem Cell Signaling. Metabolites 2023; 13:709. [PMID: 37367867 DOI: 10.3390/metabo13060709] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/22/2023] [Accepted: 05/29/2023] [Indexed: 06/28/2023] Open
Abstract
Individual cancer cells are not equal but are organized into a cellular hierarchy in which only a rare few leukemia cells can self-renew in a manner reminiscent of the characteristic stem cell properties. The PI3K/AKT pathway functions in a variety of cancers and plays a critical role in the survival and proliferation of healthy cells under physiologic conditions. In addition, cancer stem cells might exhibit a variety of metabolic reprogramming phenotypes that cannot be completely attributed to the intrinsic heterogeneity of cancer. Given the heterogeneity of cancer stem cells, new strategies with single-cell resolution will become a powerful tool to eradicate the aggressive cell population harboring cancer stem cell phenotypes. Here, this article will provide an overview of the most important signaling pathways of cancer stem cells regarding their relevance to the tumor microenvironment and fatty acid metabolism, suggesting valuable strategies among cancer immunotherapies to inhibit the recurrence of tumors.
Collapse
Affiliation(s)
- Toshiyuki Murai
- Graduate School of Medicine, Osaka University, 2-2 Yamada-oka, Suita 565-0871, Japan
| | - Satoru Matsuda
- Department of Food Science and Nutrition, Nara Women's University, Kita-Uoya Nishimachi, Nara 630-8506, Japan
| |
Collapse
|
39
|
Liu C, Huang H, Yang P. Multi-task learning from multimodal single-cell omics with Matilda. Nucleic Acids Res 2023; 51:e45. [PMID: 36912104 PMCID: PMC10164589 DOI: 10.1093/nar/gkad157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 01/28/2023] [Accepted: 02/21/2023] [Indexed: 03/14/2023] Open
Abstract
Multimodal single-cell omics technologies enable multiple molecular programs to be simultaneously profiled at a global scale in individual cells, creating opportunities to study biological systems at a resolution that was previously inaccessible. However, the analysis of multimodal single-cell omics data is challenging due to the lack of methods that can integrate across multiple data modalities generated from such technologies. Here, we present Matilda, a multi-task learning method for integrative analysis of multimodal single-cell omics data. By leveraging the interrelationship among tasks, Matilda learns to perform data simulation, dimension reduction, cell type classification, and feature selection in a single unified framework. We compare Matilda with other state-of-the-art methods on datasets generated from some of the most popular multimodal single-cell omics technologies. Our results demonstrate the utility of Matilda for addressing multiple key tasks on integrative multimodal single-cell omics data analysis. Matilda is implemented in Pytorch and is freely available from https://github.com/PYangLab/Matilda.
Collapse
Affiliation(s)
- Chunlei Liu
- Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
| | - Hao Huang
- Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Pengyi Yang
- Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
40
|
Wani SA, Khan SA, Quadri SMK. scJVAE: A novel method for integrative analysis of multimodal single-cell data. Comput Biol Med 2023; 158:106865. [PMID: 37030268 DOI: 10.1016/j.compbiomed.2023.106865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 02/22/2023] [Accepted: 03/30/2023] [Indexed: 04/07/2023]
Abstract
The study of cellular decision-making can be approached comprehensively using multimodal single-cell omics technology. Recent advances in multimodal single-cell technology have enabled simultaneous profiling of more than one modality from the same cell, providing more significant insights into cell characteristics. However, learning the joint representation of multimodal single-cell data is challenging due to batch effects. Here we present a novel method, scJVAE (single-cell Joint Variational AutoEncoder), for batch effect removal and joint representation of multimodal single-cell data. The scJVAE integrates and learns joint embedding of paired scRNA-seq and scATAC-seq data modalities. We evaluate and demonstrate the ability of scJVAE to remove batch effects using various datasets with paired gene expression and open chromatin. We also consider scJVAE for downstream analysis, such as lower dimensional representation, cell-type clustering, and time and memory requirement. We find scJVAE a robust and scalable method outperforming existing state-of-the-art batch effect removal and integration methods.
Collapse
Affiliation(s)
- Shahid Ahmad Wani
- Department of Computer Science, Jamia Millia Islamia, New Delhi, 110025, India.
| | - Sumeer Ahmad Khan
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - S M K Quadri
- Department of Computer Science, Jamia Millia Islamia, New Delhi, 110025, India
| |
Collapse
|
41
|
Miranda AMA, Janbandhu V, Maatz H, Kanemaru K, Cranley J, Teichmann SA, Hübner N, Schneider MD, Harvey RP, Noseda M. Single-cell transcriptomics for the assessment of cardiac disease. Nat Rev Cardiol 2023; 20:289-308. [PMID: 36539452 DOI: 10.1038/s41569-022-00805-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/03/2022] [Indexed: 12/24/2022]
Abstract
Cardiovascular disease is the leading cause of death globally. An advanced understanding of cardiovascular disease mechanisms is required to improve therapeutic strategies and patient risk stratification. State-of-the-art, large-scale, single-cell and single-nucleus transcriptomics facilitate the exploration of the cardiac cellular landscape at an unprecedented level, beyond its descriptive features, and can further our understanding of the mechanisms of disease and guide functional studies. In this Review, we provide an overview of the technical challenges in the experimental design of single-cell and single-nucleus transcriptomics studies, as well as a discussion of the type of inferences that can be made from the data derived from these studies. Furthermore, we describe novel findings derived from transcriptomics studies for each major cardiac cell type in both health and disease, and from development to adulthood. This Review also provides a guide to interpreting the exhaustive list of newly identified cardiac cell types and states, and highlights the consensus and discordances in annotation, indicating an urgent need for standardization. We describe advanced applications such as integration of single-cell data with spatial transcriptomics to map genes and cells on tissue and define cellular microenvironments that regulate homeostasis and disease progression. Finally, we discuss current and future translational and clinical implications of novel transcriptomics approaches, and provide an outlook of how these technologies will change the way we diagnose and treat heart disease.
Collapse
Affiliation(s)
| | - Vaibhao Janbandhu
- Victor Chang Cardiac Research Institute, Sydney, NSW, Australia
- School of Clinical Medicine, Faculty of Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Henrike Maatz
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Kazumasa Kanemaru
- Cellular Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - James Cranley
- Cellular Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Sarah A Teichmann
- Cellular Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- Deptartment of Physics, Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Norbert Hübner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Charite-Universitätsmedizin Berlin, Berlin, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Berlin, Berlin, Germany
| | | | - Richard P Harvey
- Victor Chang Cardiac Research Institute, Sydney, NSW, Australia
- School of Clinical Medicine, Faculty of Medicine, UNSW Sydney, Sydney, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW, Australia
| | - Michela Noseda
- National Heart and Lung Institute, Imperial College London, London, UK.
| |
Collapse
|
42
|
Gossi F, Pati P, Chouvardas P, Martinelli AL, Kruithof-de Julio M, Rapsomaniki MA. Matching single cells across modalities with contrastive learning and optimal transport. Brief Bioinform 2023; 24:7147026. [PMID: 37122067 DOI: 10.1093/bib/bbad130] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/25/2023] [Accepted: 03/14/2023] [Indexed: 05/02/2023] Open
Abstract
Understanding the interactions between the biomolecules that govern cellular behaviors remains an emergent question in biology. Recent advances in single-cell technologies have enabled the simultaneous quantification of multiple biomolecules in the same cell, opening new avenues for understanding cellular complexity and heterogeneity. Still, the resulting multimodal single-cell datasets present unique challenges arising from the high dimensionality and multiple sources of acquisition noise. Computational methods able to match cells across different modalities offer an appealing alternative towards this goal. In this work, we propose MatchCLOT, a novel method for modality matching inspired by recent promising developments in contrastive learning and optimal transport. MatchCLOT uses contrastive learning to learn a common representation between two modalities and applies entropic optimal transport as an approximate maximum weight bipartite matching algorithm. Our model obtains state-of-the-art performance on two curated benchmarking datasets and an independent test dataset, improving the top scoring method by 26.1% while preserving the underlying biological structure of the multimodal data. Importantly, MatchCLOT offers high gains in computational time and memory that, in contrast to existing methods, allows it to scale well with the number of cells. As single-cell datasets become increasingly large, MatchCLOT offers an accurate and efficient solution to the problem of modality matching.
Collapse
Affiliation(s)
- Federico Gossi
- IBM Research Europe, Säumerstrasse 4, 8803 Rüschlikon, Switzerland
- Department of Computer Science, ETH Zurich, Universitätstrasse 6, 8092 Zürich, Switzerland
| | - Pushpak Pati
- IBM Research Europe, Säumerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Panagiotis Chouvardas
- Department for BioMedical Research, Urology Research Laboratory, University of Bern, Murtenstrasse 24, 3008 Bern, Switzerland
| | - Adriano Luca Martinelli
- IBM Research Europe, Säumerstrasse 4, 8803 Rüschlikon, Switzerland
- Institute of Molecular Systems Biology, ETH Zurich, Otto-Stern-Weg 3, 8093 Zürich, Switzerland
| | - Marianna Kruithof-de Julio
- Department for BioMedical Research, Urology Research Laboratory, University of Bern, Murtenstrasse 24, 3008 Bern, Switzerland
- Department of Urology, Inselspital, Bern University Hospital, Freiburgstrasse 15, 3010 Bern, Switzerland
| | | |
Collapse
|
43
|
Olson RH, Kalafut NC, Wang D. MANGEM: a web app for Multimodal Analysis of Neuronal Gene expression, Electrophysiology and Morphology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.03.535322. [PMID: 37066386 PMCID: PMC10104012 DOI: 10.1101/2023.04.03.535322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Single-cell techniques have enabled the acquisition of multi-modal data, particularly for neurons, to characterize cellular functions. Patch-seq, for example, combines patch-clamp recording, cell imaging, and single-cell RNA-seq to obtain electrophysiology, morphology, and gene expression data from a single neuron. While these multi-modal data offer potential insights into neuronal functions, they can be heterogeneous and noisy. To address this, machine-learning methods have been used to align cells from different modalities onto a low-dimensional latent space, revealing multi-modal cell clusters. However, the use of those methods can be challenging for biologists and neuroscientists without computational expertise and also requires suitable computing infrastructure for computationally expensive methods. To address these issues, we developed a cloud-based web application, MANGEM (Multimodal Analysis of Neuronal Gene expression, Electrophysiology, and Morphology) at https://ctc.waisman.wisc.edu/mangem. MANGEM provides a step-by-step accessible and user-friendly interface to machine-learning alignment methods of neuronal multi-modal data while enabling real-time visualization of characteristics of raw and aligned cells. It can be run asynchronously for large-scale data alignment, provides users with various downstream analyses of aligned cells and visualizes the analytic results such as identifying multi-modal cell clusters of cells and detecting correlated genes with electrophysiological and morphological features. We demonstrated the usage of MANGEM by aligning Patch-seq multimodal data of neuronal cells in the mouse visual cortex.
Collapse
Affiliation(s)
| | - Noah Cohen Kalafut
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705 USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53706 USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705 USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53706 USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706 USA
| |
Collapse
|
44
|
Zhang L, Lin L, Li J. Multi-view clustering by CPS-merge analysis with application to multimodal single-cell data. PLoS Comput Biol 2023; 19:e1011044. [PMID: 37068097 PMCID: PMC10138214 DOI: 10.1371/journal.pcbi.1011044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 04/27/2023] [Accepted: 03/22/2023] [Indexed: 04/18/2023] Open
Abstract
Multi-view data can be generated from diverse sources, by different technologies, and in multiple modalities. In various fields, integrating information from multi-view data has pushed the frontier of discovery. In this paper, we develop a new approach for multi-view clustering, which overcomes the limitations of existing methods such as the need of pooling data across views, restrictions on the clustering algorithms allowed within each view, and the disregard for complementary information between views. Our new method, called CPS-merge analysis, merges clusters formed by the Cartesian product of single-view cluster labels, guided by the principle of maximizing clustering stability as evaluated by CPS analysis. In addition, we introduce measures to quantify the contribution of each view to the formation of any cluster. CPS-merge analysis can be easily incorporated into an existing clustering pipeline because it only requires single-view cluster labels instead of the original data. We can thus readily apply advanced single-view clustering algorithms. Importantly, our approach accounts for both consensus and complementary effects between different views, whereas existing ensemble methods focus on finding a consensus for multiple clustering results, implying that results from different views are variations of one clustering structure. Through experiments on single-cell datasets, we demonstrate that our approach frequently outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Lixiang Zhang
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Lin Lin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - Jia Li
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
45
|
Khatib TO, Amanso AM, Pedro B, Knippler CM, Summerbell ER, Zohbi NM, Konen JM, Mouw JK, Marcus AI. A live-cell platform to isolate phenotypically defined subpopulations for spatial multi-omic profiling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.28.530493. [PMID: 36909653 PMCID: PMC10002729 DOI: 10.1101/2023.02.28.530493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Numerous techniques have been employed to deconstruct the heterogeneity observed in normal and diseased cellular populations, including single cell RNA sequencing, in situ hybridization, and flow cytometry. While these approaches have revolutionized our understanding of heterogeneity, in isolation they cannot correlate phenotypic information within a physiologically relevant live-cell state, with molecular profiles. This inability to integrate a historical live-cell phenotype, such as invasiveness, cell:cell interactions, and changes in spatial positioning, with multi-omic data, creates a gap in understanding cellular heterogeneity. We sought to address this gap by employing lab technologies to design a detailed protocol, termed Spatiotemporal Genomics and Cellular Analysis (SaGA), for the precise imaging-based selection, isolation, and expansion of phenotypically distinct live-cells. We begin with cells stably expressing a photoconvertible fluorescent protein and employ live cell confocal microscopy to photoconvert a user-defined single cell or set of cells displaying a phenotype of interest. The total population is then extracted from its microenvironment, and the optically highlighted cells are isolated using fluorescence activated cell sorting. SaGA-isolated cells can then be subjected to multi-omics analysis or cellular propagation for in vitro or in vivo studies. This protocol can be applied to a variety of conditions, creating protocol flexibility for user-specific research interests. The SaGA technique can be accomplished in one workday by non-specialists and results in a phenotypically defined cellular subpopulation for integration with multi-omics techniques. We envision this approach providing multi-dimensional datasets exploring the relationship between live-cell phenotype and multi-omic heterogeneity within normal and diseased cellular populations.
Collapse
Affiliation(s)
- Tala O Khatib
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, USA
- Winship Cancer Institute of Emory University, Atlanta, Georgia, USA
- Graduate Program in Biochemistry, Cell, and Developmental Biology, Emory University, Atlanta, Georgia, USA
- These authors contributed equally
| | - Angelica M Amanso
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, USA
- Winship Cancer Institute of Emory University, Atlanta, Georgia, USA
- These authors contributed equally
| | - Brian Pedro
- Department of Pathology, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Christina M Knippler
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, USA
- Winship Cancer Institute of Emory University, Atlanta, Georgia, USA
| | - Emily R Summerbell
- Office of Intratumoral Training and Education, The National Institutes of Health, Bethesda, Maryland, USA
| | - Najdat M Zohbi
- Graduate Medical Education, Piedmont Macon Medical, Macon, Georgia, USA
| | - Jessica M Konen
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, USA
- Winship Cancer Institute of Emory University, Atlanta, Georgia, USA
| | - Janna K Mouw
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, USA
- Winship Cancer Institute of Emory University, Atlanta, Georgia, USA
| | - Adam I Marcus
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, USA
- Winship Cancer Institute of Emory University, Atlanta, Georgia, USA
- Graduate Program in Biochemistry, Cell, and Developmental Biology, Emory University, Atlanta, Georgia, USA
| |
Collapse
|
46
|
Dall’Olio L, Bolognesi M, Borghesi S, Cattoretti G, Castellani G. BRAQUE: Bayesian Reduction for Amplified Quantization in UMAP Embedding. ENTROPY (BASEL, SWITZERLAND) 2023; 25:354. [PMID: 36832720 PMCID: PMC9955093 DOI: 10.3390/e25020354] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 02/01/2023] [Accepted: 02/10/2023] [Indexed: 06/09/2023]
Abstract
Single-cell biology has revolutionized the way we understand biological processes. In this paper, we provide a more tailored approach to clustering and analyzing spatial single-cell data coming from immunofluorescence imaging techniques. We propose Bayesian Reduction for Amplified Quantization in UMAP Embedding (BRAQUE) as an integrative novel approach, from data preprocessing to phenotype classification. BRAQUE starts with an innovative preprocessing, named Lognormal Shrinkage, which is able to enhance input fragmentation by fitting a lognormal mixture model and shrink each component towards its median, in order to help further the clustering step in finding more separated and clear clusters. Then, BRAQUE's pipeline consists of a dimensionality reduction step performed using UMAP, and a clustering performed using HDBSCAN on UMAP embedding. In the end, clusters are assigned to a cell type by experts, using effects size measures to rank markers and identify characterizing markers (Tier 1), and possibly characterize markers (Tier 2). The number of total cell types in one lymph node detectable with these technologies is unknown and difficult to predict or estimate. Therefore, with BRAQUE, we achieved a higher granularity than other similar algorithms such as PhenoGraph, following the idea that merging similar clusters is easier than splitting unclear ones into clear subclusters.
Collapse
Affiliation(s)
- Lorenzo Dall’Olio
- Department of Physics and Astronomy, University of Bologna, 40127 Bologna, Italy
| | - Maddalena Bolognesi
- Department of Medicine and Surgery, University of Milano Bicocca, 20900 Monza, Italy
| | - Simone Borghesi
- Department of Mathematics and Applications, University of Milano Bicocca, 20126 Milan, Italy
| | - Giorgio Cattoretti
- Department of Medicine and Surgery, University of Milano Bicocca, 20900 Monza, Italy
| | - Gastone Castellani
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40127 Bologna, Italy
| |
Collapse
|
47
|
Kong S, Li R, Tian Y, Zhang Y, Lu Y, Ou Q, Gao P, Li K, Zhang Y. Single-cell omics: A new direction for functional genetic research in human diseases and animal models. Front Genet 2023; 13:1100016. [PMID: 36685871 PMCID: PMC9846559 DOI: 10.3389/fgene.2022.1100016] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 12/16/2022] [Indexed: 01/06/2023] Open
Abstract
Over the past decade, with the development of high-throughput single-cell sequencing technology, single-cell omics has been emerged as a powerful tool to understand the molecular basis of cellular mechanisms and refine our knowledge of diverse cell states. They can reveal the heterogeneity at different genetic layers and elucidate their associations by multiple omics analysis, providing a more comprehensive genetic map of biological regulatory networks. In the post-GWAS era, the molecular biological mechanisms influencing human diseases will be further elucidated by single-cell omics. This review mainly summarizes the development and trend of single-cell omics. This involves single-cell omics technologies, single-cell multi-omics technologies, multiple omics data integration methods, applications in various human organs and diseases, classic laboratory cell lines, and animal disease models. The review will reveal some perspectives for elucidating human diseases and constructing animal models.
Collapse
Affiliation(s)
- Siyuan Kong
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China; College of Animal Science and Technology, Qingdao Agricultural University, Qingdao, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Rongrong Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yunhan Tian
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China; College of Animal Science and Technology, Qingdao Agricultural University, Qingdao, China
- College of Animal Science and Technology, Qingdao Agricultural University, Qingdao, China
| | - Yaqiu Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yuhui Lu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Qiaoer Ou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Peiwen Gao
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Kui Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China; College of Animal Science and Technology, Qingdao Agricultural University, Qingdao, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yubo Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China; College of Animal Science and Technology, Qingdao Agricultural University, Qingdao, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Animal Functional Genomics Group, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- College of Life Science and Engineering, Foshan University, Foshan, China
| |
Collapse
|
48
|
Ikonomou L, Yampolskaya M, Mehta P. Multipotent Embryonic Lung Progenitors: Foundational Units of In Vitro and In Vivo Lung Organogenesis. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1413:49-70. [PMID: 37195526 PMCID: PMC10351616 DOI: 10.1007/978-3-031-26625-6_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Transient, tissue-specific, embryonic progenitors are important cell populations in vertebrate development. In the course of respiratory system development, multipotent mesenchymal and epithelial progenitors drive the diversification of fates that results to the plethora of cell types that compose the airways and alveolar space of the adult lungs. Use of mouse genetic models, including lineage tracing and loss-of-function studies, has elucidated signaling pathways that guide proliferation and differentiation of embryonic lung progenitors as well as transcription factors that underlie lung progenitor identity. Furthermore, pluripotent stem cell-derived and ex vivo expanded respiratory progenitors offer novel, tractable, high-fidelity systems that allow for mechanistic studies of cell fate decisions and developmental processes. As our understanding of embryonic progenitor biology deepens, we move closer to the goal of in vitro lung organogenesis and resulting applications in developmental biology and medicine.
Collapse
Affiliation(s)
- Laertis Ikonomou
- Department of Oral Biology, University at Buffalo, The State University of New York, Buffalo, NY, USA.
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University at Buffalo, The State University of New York, Buffalo, NY, USA.
- Cell, Gene and Tissue Engineering Center, University at Buffalo, The State University of New York, Buffalo, NY, USA.
| | | | - Pankaj Mehta
- Department of Physics, Boston University, Boston, MA, USA
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| |
Collapse
|
49
|
Cao K, Gong Q, Hong Y, Wan L. A unified computational framework for single-cell data integration with optimal transport. Nat Commun 2022; 13:7419. [PMID: 36456571 PMCID: PMC9715710 DOI: 10.1038/s41467-022-35094-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 11/18/2022] [Indexed: 12/05/2022] Open
Abstract
Single-cell data integration can provide a comprehensive molecular view of cells. However, how to integrate heterogeneous single-cell multi-omics as well as spatially resolved transcriptomic data remains a major challenge. Here we introduce uniPort, a unified single-cell data integration framework that combines a coupled variational autoencoder (coupled-VAE) and minibatch unbalanced optimal transport (Minibatch-UOT). It leverages both highly variable common and dataset-specific genes for integration to handle the heterogeneity across datasets, and it is scalable to large-scale datasets. uniPort jointly embeds heterogeneous single-cell multi-omics datasets into a shared latent space. It can further construct a reference atlas for gene imputation across datasets. Meanwhile, uniPort provides a flexible label transfer framework to deconvolute heterogeneous spatial transcriptomic data using an optimal transport plan, instead of embedding latent space. We demonstrate the capability of uniPort by applying it to integrate a variety of datasets, including single-cell transcriptomics, chromatin accessibility, and spatially resolved transcriptomic data.
Collapse
Affiliation(s)
- Kai Cao
- grid.484479.2LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China ,grid.410726.60000 0004 1797 8419School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Qiyu Gong
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Immunology, Faculty of Basic Medicine, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yiguang Hong
- grid.24516.340000000123704535Department of Control Science and Engineering, Tongji University, Shanghai, China
| | - Lin Wan
- grid.484479.2LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China ,grid.410726.60000 0004 1797 8419School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
50
|
Jafari E, Johnson T, Wang Y, Liu Y, Huang K, Wang Y. AIscEA: unsupervised integration of single-cell gene expression and chromatin accessibility via their biological consistency. Bioinformatics 2022; 38:5236-5244. [PMID: 36250795 PMCID: PMC9710555 DOI: 10.1093/bioinformatics/btac683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 10/07/2022] [Accepted: 10/14/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The integrative analysis of single-cell gene expression and chromatin accessibility measurements is essential for revealing gene regulation, but it is one of the key challenges in computational biology. Gene expression and chromatin accessibility are measurements from different modalities, and no common features can be directly used to guide integration. Current state-of-the-art methods lack practical solutions for finding heterogeneous clusters. However, previous methods might not generate reliable results when cluster heterogeneity exists. More importantly, current methods lack an effective way to select hyper-parameters under an unsupervised setting. Therefore, applying computational methods to integrate single-cell gene expression and chromatin accessibility measurements remains difficult. RESULTS We introduce AIscEA-Alignment-based Integration of single-cell gene Expression and chromatin Accessibility-a computational method that integrates single-cell gene expression and chromatin accessibility measurements using their biological consistency. AIscEA first defines a ranked similarity score to quantify the biological consistency between cell clusters across measurements. AIscEA then uses the ranked similarity score and a novel permutation test to identify cluster alignment across measurements. AIscEA further utilizes graph alignment for the aligned cell clusters to align the cells across measurements. We compared AIscEA with the competing methods on several benchmark datasets and demonstrated that AIscEA is highly robust to the choice of hyper-parameters and can better handle the cluster heterogeneity problem. Furthermore, AIscEA significantly outperforms the state-of-the-art methods when integrating real-world SNARE-seq and scMultiome-seq datasets in terms of integration accuracy. AVAILABILITY AND IMPLEMENTATION AIscEA is available at https://figshare.com/articles/software/AIscEA_zip/21291135 on FigShare as well as {https://github.com/elhaam/AIscEA} onGitHub. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elham Jafari
- Computer Science Department, Indiana University, Bloomington, IN 47408, USA
| | - Travis Johnson
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Yue Wang
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Yunlong Liu
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Kun Huang
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Yijie Wang
- Computer Science Department, Indiana University, Bloomington, IN 47408, USA
| |
Collapse
|