1
|
Hao Y, Stuart T, Kowalski MH, Choudhary S, Hoffman P, Hartman A, Srivastava A, Molla G, Madad S, Fernandez-Granda C, Satija R. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol 2024; 42:293-304. [PMID: 37231261 PMCID: PMC10928517 DOI: 10.1038/s41587-023-01767-y] [Citation(s) in RCA: 168] [Impact Index Per Article: 168.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 03/28/2023] [Indexed: 05/27/2023]
Abstract
Mapping single-cell sequencing profiles to comprehensive reference datasets provides a powerful alternative to unsupervised analysis. However, most reference datasets are constructed from single-cell RNA-sequencing data and cannot be used to annotate datasets that do not measure gene expression. Here we introduce 'bridge integration', a method to integrate single-cell datasets across modalities using a multiomic dataset as a molecular bridge. Each cell in the multiomic dataset constitutes an element in a 'dictionary', which is used to reconstruct unimodal datasets and transform them into a shared space. Our procedure accurately integrates transcriptomic data with independent single-cell measurements of chromatin accessibility, histone modifications, DNA methylation and protein levels. Moreover, we demonstrate how dictionary learning can be combined with sketching techniques to improve computational scalability and harmonize 8.6 million human immune cell profiles from sequencing and mass cytometry experiments. Our approach, implemented in version 5 of our Seurat toolkit ( http://www.satijalab.org/seurat ), broadens the utility of single-cell reference datasets and facilitates comparisons across diverse molecular modalities.
Collapse
Affiliation(s)
- Yuhan Hao
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Tim Stuart
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Madeline H Kowalski
- New York Genome Center, New York, NY, USA
- Institute for System Genetics, NYU Langone Medical Center, New York, NY, USA
| | - Saket Choudhary
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Paul Hoffman
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Austin Hartman
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Avi Srivastava
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | | | - Shaista Madad
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Carlos Fernandez-Granda
- Center for Data Science, New York University, New York, NY, USA
- Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Rahul Satija
- Center for Genomics and Systems Biology, New York University, New York, NY, USA.
- New York Genome Center, New York, NY, USA.
| |
Collapse
|