1
|
Zeng T, Spence JP, Mostafavi H, Pritchard JK. Bayesian estimation of gene constraint from an evolutionary model with gene features. Nat Genet 2024:10.1038/s41588-024-01820-9. [PMID: 38977852 DOI: 10.1038/s41588-024-01820-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 05/29/2024] [Indexed: 07/10/2024]
Abstract
Measures of selective constraint on genes have been used for many applications, including clinical interpretation of rare coding variants, disease gene discovery and studies of genome evolution. However, widely used metrics are severely underpowered at detecting constraints for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. Here we developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease and other phenotypes, especially for short genes. Our estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve the estimation of many gene-level properties, such as rare variant burden or gene expression differences.
Collapse
Affiliation(s)
- Tony Zeng
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | | | - Hakhamanesh Mostafavi
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Population Health, New York University, New York, NY, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Biology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
2
|
Tian T, Zhang J, Lin X, Wei Z, Hakonarson H. Dependency-aware deep generative models for multitasking analysis of spatial omics data. Nat Methods 2024:10.1038/s41592-024-02257-y. [PMID: 38783067 DOI: 10.1038/s41592-024-02257-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 03/25/2024] [Indexed: 05/25/2024]
Abstract
Spatially resolved transcriptomics (SRT) technologies have significantly advanced biomedical research, but their data analysis remains challenging due to the discrete nature of the data and the high levels of noise, compounded by complex spatial dependencies. Here, we propose spaVAE, a dependency-aware, deep generative spatial variational autoencoder model that probabilistically characterizes count data while capturing spatial correlations. spaVAE introduces a hybrid embedding combining a Gaussian process prior with a Gaussian prior to explicitly capture spatial correlations among spots. It then optimizes the parameters of deep neural networks to approximate the distributions underlying the SRT data. With the approximated distributions, spaVAE can contribute to several analytical tasks that are essential for SRT data analysis, including dimensionality reduction, visualization, clustering, batch integration, denoising, differential expression, spatial interpolation, resolution enhancement and identification of spatially variable genes. Moreover, we have extended spaVAE to spaPeakVAE and spaMultiVAE to characterize spatial ATAC-seq (assay for transposase-accessible chromatin using sequencing) data and spatial multi-omics data, respectively.
Collapse
Affiliation(s)
- Tian Tian
- School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, Hubei, China
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jie Zhang
- National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China
| | - Xiang Lin
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA.
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
3
|
Duhan L, Kumari D, Naime M, Parmar VS, Chhillar AK, Dangi M, Pasrija R. Single-cell transcriptomics: background, technologies, applications, and challenges. Mol Biol Rep 2024; 51:600. [PMID: 38689046 DOI: 10.1007/s11033-024-09553-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024]
Abstract
Single-cell sequencing was developed as a high-throughput tool to elucidate unusual and transient cell states that are barely visible in the bulk. This technology reveals the evolutionary status of cells and differences between populations, helps to identify unique cell subtypes and states, reveals regulatory relationships between genes, targets and molecular mechanisms in disease processes, tumor heterogeneity, the state of the immune environment, etc. However, the high cost and technical limitations of single-cell sequencing initially prevented its widespread application, but with advances in research, numerous new single-cell sequencing techniques have been discovered, lowering the cost barrier. Many single-cell sequencing platforms and bioinformatics methods have recently become commercially available, allowing researchers to make fascinating observations. They are now increasingly being used in various industries. Several protocols have been discovered in this context and each technique has unique characteristics, capabilities and challenges. This review presents the latest advancements in single-cell transcriptomics technologies. This includes single-cell transcriptomics approaches, workflows and statistical approaches to data processing, as well as the potential advances, applications, opportunities and challenges of single-cell transcriptomics technology. You will also get an overview of the entry points for spatial transcriptomics and multi-omics.
Collapse
Affiliation(s)
- Lucky Duhan
- Department of Biochemistry, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Deepika Kumari
- Department of Biochemistry, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Mohammad Naime
- Central Research Institute of Unani Medicine (Under Central Council for Research in Unani Medicine, Ministry of Ayush, Govt of India), Uttar Pradesh, Lucknow, India
| | - Virinder S Parmar
- CUNY-Graduate Center and Departments of Chemistry, Nanoscience Program, City College & Medgar Evers College, The City University of New York, 1638 Bedford Avenue, Brooklyn, NY, 11225, USA
- Institute of Click Chemistry Research and Studies, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Anil K Chhillar
- Centre for Biotechnology, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Mehak Dangi
- Centre for Bioinformatics, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Ritu Pasrija
- Department of Biochemistry, Maharshi Dayanand University, Rohtak, Haryana, 124001, India.
| |
Collapse
|
4
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BMC Bioinformatics 2024; 25:113. [PMID: 38486150 PMCID: PMC10941434 DOI: 10.1186/s12859-024-05724-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/28/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. RESULTS We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. CONCLUSIONS eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Yixuan Qiu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
5
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: Cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.22.568369. [PMID: 38045428 PMCID: PMC10690270 DOI: 10.1101/2023.11.22.568369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Background Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. Results We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. Conclusions eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Yixuan Qiu
- School of Statistics & Management, Shanghai University of Finance and Economics, Shanghai,People's Republic of China
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
6
|
Piran Z, Nitzan M. SiFT: uncovering hidden biological processes by probabilistic filtering of single-cell data. Nat Commun 2024; 15:760. [PMID: 38278815 PMCID: PMC10817921 DOI: 10.1038/s41467-024-44757-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 01/03/2024] [Indexed: 01/28/2024] Open
Abstract
Cellular populations simultaneously encode multiple biological attributes, including spatial configuration, temporal trajectories, and cell-cell interactions. Some of these signals may be overshadowed by others and harder to recover, despite the great progress made to computationally reconstruct biological processes from single-cell data. To address this, we present SiFT, a kernel-based projection method for filtering biological signals in single-cell data, thus uncovering underlying biological processes. SiFT applies to a wide range of tasks, from the removal of unwanted variation in the data to revealing hidden biological structures. We demonstrate how SiFT enhances the liver circadian signal by filtering spatial zonation, recovers regenerative cell subpopulations in spatially-resolved liver data, and exposes COVID-19 disease-related cells, pathways, and dynamics by filtering healthy reference signals. SiFT performs the correction at the gene expression level, can scale to large datasets, and compares favorably to state-of-the-art methods.
Collapse
Affiliation(s)
- Zoe Piran
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel.
- Racah Institute of Physics, The Hebrew University, Jerusalem, Israel.
- Faculty of Medicine, The Hebrew University, Jerusalem, Israel.
| |
Collapse
|
7
|
Zhang R, Yang M, Schreiber J, O'Day DR, Turner JMA, Shendure J, Disteche CM, Deng X, Noble WS. Cross-species imputation and comparison of single-cell transcriptomic profiles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.19.563173. [PMID: 37905060 PMCID: PMC10614954 DOI: 10.1101/2023.10.19.563173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Cross-species comparison and prediction of gene expression profiles are important to understand regulatory changes during evolution and to transfer knowledge learned from model organisms to humans. Single-cell RNA-seq (scRNA-seq) profiles enable us to capture gene expression profiles with respect to variations among individual cells; however, cross-species comparison of scRNA-seq profiles is challenging because of data sparsity, batch effects, and the lack of one-to-one cell matching across species. Moreover, single-cell profiles are challenging to obtain in certain biological contexts, limiting the scope of hypothesis generation. Here we developed Icebear, a neural network framework that decomposes single-cell measurements into factors representing cell identity, species, and batch factors. Icebear enables accurate prediction of single-cell gene expression profiles across species, thereby providing high-resolution cell type and disease profiles in under-characterized contexts. Icebear also facilitates direct cross-species comparison of single-cell expression profiles for conserved genes that are located on the X chromosome in eutherian mammals but on autosomes in chicken. This comparison, for the first time, revealed evolutionary and diverse adaptations of X-chromosome upregulation in mammals.
Collapse
Affiliation(s)
- Ran Zhang
- Department of Genome Sciences, University of Washington
- eScience Institute, University of Washington
| | - Mu Yang
- Department of Biomedical Informatics and Medical Education, University of Washington
| | | | - Diana R O'Day
- Brotman Baty Institute for Precision Medicine, University of Washington
| | | | - Jay Shendure
- Department of Genome Sciences, University of Washington
- Brotman Baty Institute for Precision Medicine, University of Washington
- Howard Hughes Medical Institute
- Allen Center for Cell Lineage Tracing
| | - Christine M Disteche
- Department of Laboratory Medicine and Pathology, University of Washington
- Department of Medicine, University of Washington
| | - Xinxian Deng
- Department of Laboratory Medicine and Pathology, University of Washington
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| |
Collapse
|