1
|
Hsieh KL, Chu Y, Li X, Pilié PG, Dai Y. scEMB: Learning context representation of genes based on large-scale single-cell transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614685. [PMID: 39386549 PMCID: PMC11463607 DOI: 10.1101/2024.09.24.614685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Background The rapid advancement of single-cell transcriptomic technologies has led to the curation of millions of cellular profiles, providing unprecedented insights into cellular heterogeneity across various tissues and developmental stages. This growing wealth of data presents an opportunity to uncover complex gene-gene relationships, yet also poses significant computational challenges. Results We present scEMB, a transformer-based deep learning model developed to capture context-aware gene embeddings from large-scale single-cell transcriptomics data. Trained on over 30 million single-cell transcriptomes, scEMB utilizes an innovative binning strategy that integrates data across multiple platforms, effectively preserving both gene expression hierarchies and cell-type specificity. In downstream tasks such as batch integration, clustering, and cell type annotation, scEMB demonstrates superior performance compared to existing models like scGPT and Geneformer. Notably, scEMB excels in silico correlation analysis, accurately predicting gene perturbation effects in CRISPR-edited datasets and microglia state transition, identifying a few known Alzheimer's disease (AD) risks genes in top gene list. Additionally, scEMB offers robust fine-tuning capabilities for domain-specific applications, making it a versatile tool for tackling diverse biological problems such as therapeutic target discovery and disease modeling. Conclusions scEMB represents a powerful tool for extracting biologically meaningful insights from complex gene expression data. Its ability to model in silico perturbation effects and conduct correlation analyses in the embedding space highlights its potential to accelerate discoveries in precision medicine and therapeutic development.
Collapse
Affiliation(s)
- Kang-Lin Hsieh
- Department of Genitourinary Medical Oncology, Division of Cancer Medicine, UT MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Yan Chu
- Department of Radiation Physics, Division of Radiation Oncology, UT MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Xiaoyang Li
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Patrick G. Pilié
- Department of Genitourinary Medical Oncology, Division of Cancer Medicine, UT MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Yulin Dai
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
2
|
Dai Y, Hsu YC, Fernandes BS, Zhang K, Li X, Enduru N, Liu A, Manuel AM, Jiang X, Zhao Z. Disentangling Accelerated Cognitive Decline from the Normal Aging Process and Unraveling Its Genetic Components: A Neuroimaging-Based Deep Learning Approach. J Alzheimers Dis 2024; 97:1807-1827. [PMID: 38306043 DOI: 10.3233/jad-231020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Background The progressive cognitive decline, an integral component of Alzheimer's disease (AD), unfolds in tandem with the natural aging process. Neuroimaging features have demonstrated the capacity to distinguish cognitive decline changes stemming from typical brain aging and AD between different chronological points. Objective To disentangle the normal aging effect from the AD-related accelerated cognitive decline and unravel its genetic components using a neuroimaging-based deep learning approach. Methods We developed a deep-learning framework based on a dual-loss Siamese ResNet network to extract fine-grained information from the longitudinal structural magnetic resonance imaging (MRI) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. We then conducted genome-wide association studies (GWAS) and post-GWAS analyses to reveal the genetic basis of AD-related accelerated cognitive decline. Results We used our model to process data from 1,313 individuals, training it on 414 cognitively normal people and predicting cognitive assessment for all participants. In our analysis of accelerated cognitive decline GWAS, we identified two genome-wide significant loci: APOE locus (chromosome 19 p13.32) and rs144614292 (chromosome 11 p15.1). Variant rs144614292 (G > T) has not been reported in previous AD GWA studies. It is within the intronic region of NELL1, which is expressed in neurons and plays a role in controlling cell growth and differentiation. The cell-type-specific enrichment analysis and functional enrichment of GWAS signals highlighted the microglia and immune-response pathways. Conclusions Our deep learning model effectively extracted relevant neuroimaging features and predicted individual cognitive decline. We reported a novel variant (rs144614292) within the NELL1 gene.
Collapse
Affiliation(s)
- Yulin Dai
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yu-Chun Hsu
- Center for Secure Artificial Intelligence for Healthcare, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Brisa S Fernandes
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Kai Zhang
- Center for Secure Artificial Intelligence for Healthcare, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xiaoyang Li
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nitesh Enduru
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Andi Liu
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Astrid M Manuel
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xiaoqian Jiang
- Center for Secure Artificial Intelligence for Healthcare, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Zhongming Zhao
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|