1
|
Kim JP, Cho M, Kim C, Lee H, Jang B, Jung SH, Kim Y, Koh IG, Kim S, Shin D, Lee EH, Lee JY, Park Y, Jang H, Kim BH, Ham H, Kim B, Kim Y, Cho AH, Raj T, Kim HJ, Na DL, Seo SW, An JY, Won HH. Whole-genome sequencing analyses suggest novel genetic factors associated with Alzheimer's disease and a cumulative effects model for risk liability. Nat Commun 2025; 16:4870. [PMID: 40419521 DOI: 10.1038/s41467-025-59949-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Accepted: 05/08/2025] [Indexed: 05/28/2025] Open
Abstract
Genome-wide association studies (GWAS) on Alzheimer's disease (AD) have predominantly focused on identifying common variants in Europeans. Here, we performed whole-genome sequencing (WGS) of 1,559 individuals from a Korean AD cohort to identify various genetic variants and biomarkers associated with AD. Our GWAS analysis identified a previously unreported locus for common variants (APCDD1) associated with AD. Our WGS analysis was extended to explore the less-characterized genetic factors contributing to AD risk. We identified rare noncoding variants located in cis-regulatory elements specific to excitatory neurons associated with cognitive impairment. Moreover, structural variation analysis showed that short tandem repeat expansion was associated with an increased risk of AD, and copy number variant at the HPSE2 locus showed borderline statistical significance. APOE ε4 carriers with high polygenic burden or structural variants exhibited severe cognitive impairment and increased amyloid beta levels, suggesting a cumulative effects model of AD risk.
Collapse
Affiliation(s)
- Jun Pyo Kim
- Alzheimer's Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Minyoung Cho
- Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea
| | - Chanhee Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Hyunwoo Lee
- Department of Health Sciences and Technology, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea
| | - Beomjin Jang
- Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sang-Hyuk Jung
- Department of Medical Informatics, Kangwon National University College of Medicine, Chuncheon, Republic of Korea
| | - Yujin Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - In Gyeong Koh
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Seoyeon Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Daeun Shin
- Alzheimer's Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Eun Hye Lee
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer Disease Research Center, Indiana University School of Medicine, Indianapolis, IN, USA
| | | | - YoungChan Park
- Division of Bio Bigdata, Department of Precision Medicine, Korea National Institution of Health, Cheongju, Republic of Korea
| | - Hyemin Jang
- Alzheimer's Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Seoul National University Hospital, Seoul National University School of Medicine, Seoul, Republic of Korea
| | - Bo-Hyun Kim
- Alzheimer's Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Hongki Ham
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Beomsu Kim
- Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea
| | - Yujin Kim
- Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea
| | - A-Hyun Cho
- Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea
| | - Towfique Raj
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Hee Jin Kim
- Alzheimer's Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Duk L Na
- Alzheimer's Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Sang Won Seo
- Alzheimer's Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea.
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea.
- Department of Health Sciences and Technology, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea.
| | - Joon-Yong An
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea.
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea.
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, Republic of Korea.
| | - Hong-Hee Won
- Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea.
- Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea.
| |
Collapse
|
2
|
Lee H, Kim W, Kwon N, Kim C, Kim S, An JY. Lessons from national biobank projects utilizing whole-genome sequencing for population-scale genomics. Genomics Inform 2025; 23:8. [PMID: 40050991 PMCID: PMC11887102 DOI: 10.1186/s44342-025-00040-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2024] [Accepted: 01/27/2025] [Indexed: 03/09/2025] Open
Abstract
Large-scale national biobank projects utilizing whole-genome sequencing have emerged as transformative resources for understanding human genetic variation and its relationship to health and disease. These initiatives, which include the UK Biobank, All of Us Research Program, Singapore's PRECISE, Biobank Japan, and the National Project of Bio-Big Data of Korea, are generating unprecedented volumes of high-resolution genomic data integrated with comprehensive phenotypic, environmental, and clinical information. This review examines the methodologies, contributions, and challenges of major WGS-based national genome projects worldwide. We first discuss the landscape of national biobank initiatives, highlighting their distinct approaches to data collection, participant recruitment, and phenotype characterization. We then introduce recent technological advances that enable efficient processing and analysis of large-scale WGS data, including improvements in variant calling algorithms, innovative methods for creating multi-sample VCFs, optimized data storage formats, and cloud-based computing solutions. The review synthesizes key discoveries from these projects, particularly in identifying expression quantitative trait loci and rare variants associated with complex diseases. Our review introduces the latest findings from the National Project of Bio-Big Data of Korea, which has advanced our understanding of population-specific genetic variation and rare diseases in Korean and East Asian populations. Finally, we discuss future directions and challenges in maximizing the impact of these resources on precision medicine and global health equity. This comprehensive examination demonstrates how large-scale national genome projects are revolutionizing genetic research and healthcare delivery while highlighting the importance of continued investment in diverse, population-specific genomic resources.
Collapse
Affiliation(s)
- Hyeji Lee
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Wooheon Kim
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, 02841, Republic of Korea
| | - Nahyeon Kwon
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Chanhee Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Sungmin Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- Division of Genome Science, Department of Precision Medicine, National Institute of Health, Cheongju, 28159, Republic of Korea
| | - Joon-Yong An
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea.
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea.
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, 02841, Republic of Korea.
| |
Collapse
|
3
|
Li Z, Wang M, Li S, Shi F. MIRACN: a residual convolutional neural network for predicting cell line specific functional regulatory variants. Brief Bioinform 2025; 26:bbaf196. [PMID: 40273430 PMCID: PMC12021264 DOI: 10.1093/bib/bbaf196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2024] [Revised: 03/28/2025] [Accepted: 04/01/2025] [Indexed: 04/26/2025] Open
Abstract
In post-genome-wide association study era, interpretation of noncoding variants remains a significant challenge due to their complexity and the limited understanding of their functions. Here, we developed MIRACN, a novel residual convolutional neural network designed to predict cell line-specific functional regulatory variants. By utilizing a substantial dataset from massively parallel reporter assays (MPRAs) and employing a multitask learning strategy, MIRACN was trained across seven distinct cell lines, attaining superior performance compared to existing methods, especially in predicting cell type specificity. Comparative evaluations on an independent MPRA test dataset demonstrated that MIRACN not only outperformed in identifying regulatory variants but also provided valuable insights into their cellular context-specific regulatory mechanisms. MIRACN is capable of not only providing scores for functional variants but also pinpointing the specific cell line in which these variants display their function. This enhancement has improved the resolution of current research on the functionality of noncoding variants and has paved the way for more precise diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- Zeyin Li
- School of Information Engineering, Ningxia University, No. 489, Helanshan West Road, Xixia District, Yinchuan, Ningxia 750021, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, No. 489, Helanshan West Road, Xixia District, Yinchuan, Ningxia 750021, China
| | - Min Wang
- School of Information Engineering, Ningxia University, No. 489, Helanshan West Road, Xixia District, Yinchuan, Ningxia 750021, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, No. 489, Helanshan West Road, Xixia District, Yinchuan, Ningxia 750021, China
| | - Songge Li
- School of Information Engineering, Ningxia University, No. 489, Helanshan West Road, Xixia District, Yinchuan, Ningxia 750021, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, No. 489, Helanshan West Road, Xixia District, Yinchuan, Ningxia 750021, China
| | - Fangyuan Shi
- School of Information Engineering, Ningxia University, No. 489, Helanshan West Road, Xixia District, Yinchuan, Ningxia 750021, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, No. 489, Helanshan West Road, Xixia District, Yinchuan, Ningxia 750021, China
- Ningxia Key Laboratory of Artificial Intelligence and Information Security for Channeling Computing Resources from the East to the West, Ningxia University, No. 489, Helanshan West Road, Xixia District, Yinchuan, Ningxia 750021, China
| |
Collapse
|
4
|
Marderstein AR, Kundu S, Padhi EM, Deshpande S, Wang A, Robb E, Sun Y, Yun CM, Pomales-Matos D, Xie Y, Nachun D, Jessa S, Kundaje A, Montgomery SB. Mapping the regulatory effects of common and rare non-coding variants across cellular and developmental contexts in the brain and heart. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.18.638922. [PMID: 40027628 PMCID: PMC11870466 DOI: 10.1101/2025.02.18.638922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Whole genome sequencing has identified over a billion non-coding variants in humans, while GWAS has revealed the non-coding genome as a significant contributor to disease. However, prioritizing causal common and rare non-coding variants in human disease, and understanding how selective pressures have shaped the non-coding genome, remains a significant challenge. Here, we predicted the effects of 15 million variants with deep learning models trained on single-cell ATAC-seq across 132 cellular contexts in adult and fetal brain and heart, producing nearly two billion context-specific predictions. Using these predictions, we distinguish candidate causal variants underlying human traits and diseases and their context-specific effects. While common variant effects are more cell-type-specific, rare variants exert more cell-type-shared regulatory effects, with selective pressures particularly targeting variants affecting fetal brain neurons. To prioritize de novo mutations with extreme regulatory effects, we developed FLARE, a context-specific functional genomic model of constraint. FLARE outperformed other methods in prioritizing case mutations from autism-affected families near syndromic autism-associated genes; for example, identifying mutation outliers near CNTNAP2 that would be missed by alternative approaches. Overall, our findings demonstrate the potential of integrating single-cell maps with population genetics and deep learning-based variant effect prediction to elucidate mechanisms of development and disease-ultimately, supporting the notion that genetic contributions to neurodevelopmental disorders are predominantly rare.
Collapse
Affiliation(s)
- Andrew R. Marderstein
- Department of Pathology, Stanford University, Stanford, CA, USA
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Soumya Kundu
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Evin M. Padhi
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Salil Deshpande
- Department of Genetics, Stanford University, Stanford, CA, USA
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA
| | - Austin Wang
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Esther Robb
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Ying Sun
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Chang M. Yun
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA
| | | | - Yilin Xie
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Daniel Nachun
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Selin Jessa
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Stephen B. Montgomery
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
5
|
Kim S, Lee J, Koh IG, Ji J, Kim HJ, Kim E, Park J, Park JE, An JY. An integrative single-cell atlas for exploring the cellular and temporal specificity of genes related to neurological disorders during human brain development. Exp Mol Med 2024; 56:2271-2282. [PMID: 39363111 PMCID: PMC11541755 DOI: 10.1038/s12276-024-01328-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 07/17/2024] [Accepted: 07/18/2024] [Indexed: 10/05/2024] Open
Abstract
Single-cell technologies have enhanced comprehensive knowledge regarding the human brain by facilitating an extensive transcriptomic census across diverse brain regions. Nevertheless, understanding the cellular and temporal specificity of neurological disorders remains ambiguous due to developmental variations. To address this gap, we illustrated the dynamics of disorder risk gene expression under development by integrating multiple single-cell RNA sequencing datasets. We constructed a comprehensive single-cell atlas of the developing human brain, encompassing 393,060 single cells across diverse developmental stages. Temporal analysis revealed the distinct expression patterns of disorder risk genes, including those associated with autism, highlighting their temporal regulation in different neuronal and glial lineages. We identified distinct neuronal lineages that diverged across developmental stages, each exhibiting temporal-specific expression patterns of disorder-related genes. Lineages of nonneuronal cells determined by molecular profiles also showed temporal-specific expression, indicating a link between cellular maturation and the risk of disorder. Furthermore, we explored the regulatory mechanisms involved in early brain development, revealing enriched patterns of fetal cell types associated with neuronal disorders indicative of the prenatal stage's influence on disease determination. Our findings facilitate unbiased comparisons of cell type‒disorder associations and provide insight into dynamic alterations in risk genes during development, paving the way for a deeper understanding of neurological disorders.
Collapse
Affiliation(s)
- Seoyeon Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Jihae Lee
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, Republic of Korea
| | - In Gyeong Koh
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Jungeun Ji
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Hyun Jung Kim
- Department of Biomedical Sciences, College of Medicine, Korea University, Seoul, Republic of Korea
- Department of Anatomy, College of Medicine, Korea University, Seoul, Republic of Korea
| | - Eunha Kim
- Department of Neuroscience, College of Medicine, Korea University, Seoul, Republic of Korea
- BK21 Graduate Program, Department of Biomedical Sciences, College of Medicine, Korea University, Seoul, Republic of Korea
| | - Jihwan Park
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - Jong-Eun Park
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Joon-Yong An
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea.
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, Republic of Korea.
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, Republic of Korea.
| |
Collapse
|