1
|
Riera-Escamilla A, Nagirnaja L. Utility of exome sequencing in primary spermatogenic disorders: From research to diagnostics. Andrology 2024. [PMID: 39300832 DOI: 10.1111/andr.13753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 07/31/2024] [Accepted: 08/23/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND Primary spermatogenic disorders represent a severe form of male infertility whereby sperm production is impaired due to testicular dysfunction, leading to reduced quality or quantity of spermatozoa. Gene-centered research has certainly demonstrated the importance of the genetic factor in the etiology of both poor sperm morphology or motility and reduced sperm count. In the last decade, next-generation sequencing has expanded the research to whole exome which has transformed our understanding of male infertility genetics, but uncertainty persists in its diagnostic yield, especially in large unrelated populations. OBJECTIVE To evaluate the utility of exome sequencing in detecting genetic factors contributing to various traits of primary spermatogenic disorders, which is a crucial step before interpreting the diagnostic yield of the platform. MATERIALS AND METHODS We manually curated 415 manuscripts and included 19 research studies that predominantly performed whole exome sequencing in cohorts of unrelated cases with primary spermatogenic defects. RESULTS The detection rate, defined as the fraction of cases with an identifiable genetic cause, typically remained below 25% for quantitative defects of spermatozoa, whereas improved rates were observed for traits of abnormal sperm morphology/motility and in populations enriched with consanguineous families. Unlike the quantitative defects, the genetic architecture of the qualitative issues of spermatozoa featured a small number of recurrent genes describing a large fraction of studied cases. These observations were also in line with the lower biological complexity of the pathways affected by the reported genes. DISCUSSION AND CONCLUSIONS This review demonstrates the variability in detection rates of exome sequencing across semen phenotypes, which may have an impact on the expectations of the diagnostic yield in the clinical setting.
Collapse
Affiliation(s)
- Antoni Riera-Escamilla
- Division of Genetics, Oregon National Primate Research Center, Oregon Health & Science University, Beaverton, Oregon, USA
| | - Liina Nagirnaja
- Division of Genetics, Oregon National Primate Research Center, Oregon Health & Science University, Beaverton, Oregon, USA
| |
Collapse
|
2
|
Ofer D, Linial M. Automated annotation of disease subtypes. J Biomed Inform 2024; 154:104650. [PMID: 38701887 DOI: 10.1016/j.jbi.2024.104650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 03/28/2024] [Accepted: 04/29/2024] [Indexed: 05/05/2024]
Abstract
BACKGROUND Distinguishing diseases into distinct subtypes is crucial for study and effective treatment strategies. The Open Targets Platform (OT) integrates biomedical, genetic, and biochemical datasets to empower disease ontologies, classifications, and potential gene targets. Nevertheless, many disease annotations are incomplete, requiring laborious expert medical input. This challenge is especially pronounced for rare and orphan diseases, where resources are scarce. METHODS We present a machine learning approach to identifying diseases with potential subtypes, using the approximately 23,000 diseases documented in OT. We derive novel features for predicting diseases with subtypes using direct evidence. Machine learning models were applied to analyze feature importance and evaluate predictive performance for discovering both known and novel disease subtypes. RESULTS Our model achieves a high (89.4%) ROC AUC (Area Under the Receiver Operating Characteristic Curve) in identifying known disease subtypes. We integrated pre-trained deep-learning language models and showed their benefits. Moreover, we identify 515 disease candidates predicted to possess previously unannotated subtypes. CONCLUSIONS Our models can partition diseases into distinct subtypes. This methodology enables a robust, scalable approach for improving knowledge-based annotations and a comprehensive assessment of disease ontology tiers. Our candidates are attractive targets for further study and personalized medicine, potentially aiding in the unveiling of new therapeutic indications for sought-after targets.
Collapse
Affiliation(s)
- Dan Ofer
- Department of Biological Chemistry, The Life Science Institute, The Hebrew University of Jerusalem, Israel.
| | - Michal Linial
- Department of Biological Chemistry, The Life Science Institute, The Hebrew University of Jerusalem, Israel.
| |
Collapse
|
3
|
Elman JA, Schork NJ, Rangan AV. Exploring the genetic heterogeneity of Alzheimer's disease: Evidence for genetic subtypes. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.05.02.23289347. [PMID: 37205553 PMCID: PMC10187457 DOI: 10.1101/2023.05.02.23289347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background Alzheimer's disease (AD) exhibits considerable phenotypic heterogeneity, suggesting the potential existence of subtypes. AD is under substantial genetic influence, thus identifying systematic variation in genetic risk may provide insights into disease origins. Objective We investigated genetic heterogeneity in AD risk through a multi-step analysis. Methods We performed principal component analysis (PCA) on AD-associated variants in the UK Biobank (AD cases=2,739, controls=5,478) to assess structured genetic heterogeneity. Subsequently, a biclustering algorithm searched for distinct disease-specific genetic signatures among subsets of cases. Replication tests were conducted using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset (AD cases=500, controls=470). We categorized a separate set of ADNI individuals with mild cognitive impairment (MCI; n=399) into genetic subtypes and examined cognitive, amyloid, and tau trajectories. Results PCA revealed three distinct clusters ("constellations") driven primarily by different correlation patterns in a region of strong LD surrounding the MAPT locus. Constellations contained a mixture of cases and controls, reflecting disease-relevant but not disease-specific structure. We found two disease-specific biclusters among AD cases. Pathway analysis linked bicluster-associated variants to neuron morphogenesis and outgrowth. Disease-relevant and disease-specific structure replicated in ADNI, and bicluster 2 exhibited increased CSF p-tau and cognitive decline over time. Conclusions This study unveils a hierarchical structure of AD genetic risk. Disease-relevant constellations may represent haplotype structure that does not increase risk directly but may alter the relative importance of other genetic risk factors. Biclusters may represent distinct AD genetic subtypes. This structure is replicable and relates to differential pathological accumulation and cognitive decline over time.
Collapse
Affiliation(s)
- Jeremy A. Elman
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Center for Behavior Genetics of Aging, University of California San Diego, La Jolla, CA, USA
| | - Nicholas J. Schork
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- The Translational Genomics Research Institute, Quantitative Medicine and Systems Biology, Phoenix, AZ, USA
| | - Aaditya V. Rangan
- Department of Mathematics, New York University, New York, New York, USA
| | | |
Collapse
|
4
|
LaBianca S, Brikell I, Helenius D, Loughnan R, Mefford J, Palmer CE, Walker R, Gådin JR, Krebs M, Appadurai V, Vaez M, Agerbo E, Pedersen MG, Børglum AD, Hougaard DM, Mors O, Nordentoft M, Mortensen PB, Kendler KS, Jernigan TL, Geschwind DH, Ingason A, Dahl AW, Zaitlen N, Dalsgaard S, Werge TM, Schork AJ. Polygenic profiles define aspects of clinical heterogeneity in attention deficit hyperactivity disorder. Nat Genet 2024; 56:234-244. [PMID: 38036780 PMCID: PMC11439085 DOI: 10.1038/s41588-023-01593-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Accepted: 10/25/2023] [Indexed: 12/02/2023]
Abstract
Attention deficit hyperactivity disorder (ADHD) is a complex disorder that manifests variability in long-term outcomes and clinical presentations. The genetic contributions to such heterogeneity are not well understood. Here we show several genetic links to clinical heterogeneity in ADHD in a case-only study of 14,084 diagnosed individuals. First, we identify one genome-wide significant locus by comparing cases with ADHD and autism spectrum disorder (ASD) to cases with ADHD but not ASD. Second, we show that cases with ASD and ADHD, substance use disorder and ADHD, or first diagnosed with ADHD in adulthood have unique polygenic score (PGS) profiles that distinguish them from complementary case subgroups and controls. Finally, a PGS for an ASD diagnosis in ADHD cases predicted cognitive performance in an independent developmental cohort. Our approach uncovered evidence of genetic heterogeneity in ADHD, helping us to understand its etiology and providing a model for studies of other disorders.
Collapse
Affiliation(s)
- Sonja LaBianca
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
| | - Isabell Brikell
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
- National Centre for Register-based Research, Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
| | - Dorte Helenius
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
| | - Robert Loughnan
- Department of Cognitive Science, University of California, San Diego, La Jolla, CA, USA
- Center for Population Neuroscience and Genetics, Laureate Institute for Brain Research, Tulsa, OK, USA
| | - Joel Mefford
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Clare E Palmer
- Center for Human Development, University of California, San Diego, La Jolla, CA, USA
| | - Rebecca Walker
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jesper R Gådin
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
| | - Morten Krebs
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
| | - Vivek Appadurai
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
| | - Morteza Vaez
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
| | - Esben Agerbo
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
- National Centre for Register-based Research, Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
- Centre for Integrated Register-based Research, Aarhus University, Aarhus, Denmark
| | - Marianne Giørtz Pedersen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
- National Centre for Register-based Research, Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
- Centre for Integrated Register-based Research, Aarhus University, Aarhus, Denmark
| | - Anders D Børglum
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
- Department of Biomedicine - Human Genetics, Aarhus University, Aarhus, Denmark
- Centre for Integrative Sequencing, Aarhus University, Aarhus, Denmark
| | - David M Hougaard
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Ole Mors
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
- Psychosis Research Unit, Aarhus University Hospital - Psychiatry, Aarhus, Denmark
| | - Merete Nordentoft
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
- Copenhagen Mental Health Center, Mental Health Services Capital Region of Denmark Copenhagen, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Preben Bo Mortensen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
- National Centre for Register-based Research, Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
- Centre for Integrated Register-based Research, Aarhus University, Aarhus, Denmark
| | - Kenneth S Kendler
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Terry L Jernigan
- Department of Cognitive Science, University of California, San Diego, La Jolla, CA, USA
- Center for Human Development, University of California, San Diego, La Jolla, CA, USA
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, USA
- Department of Radiology, University of California, San Diego, La Jolla, CA, USA
| | - Daniel H Geschwind
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Andrés Ingason
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
| | - Andrew W Dahl
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Noah Zaitlen
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Søren Dalsgaard
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark
- National Centre for Register-based Research, Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
- Centre for Integrated Register-based Research, Aarhus University, Aarhus, Denmark
| | - Thomas M Werge
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark.
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark.
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Andrew J Schork
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark.
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark.
- Neurogenomics Division, The Translational Genomics Research Institute, Phoenix, AZ, USA.
| |
Collapse
|
5
|
Elman JA, Schork NJ, Rangan AV. Exploring the Genetic Heterogeneity of Alzheimer's Disease: Evidence for Genetic Subtypes. J Alzheimers Dis 2024; 100:1209-1226. [PMID: 38995775 DOI: 10.3233/jad-231252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/14/2024]
Abstract
Background Alzheimer's disease (AD) exhibits considerable phenotypic heterogeneity, suggesting the potential existence of subtypes. AD is under substantial genetic influence, thus identifying systematic variation in genetic risk may provide insights into disease origins. Objective We investigated genetic heterogeneity in AD risk through a multi-step analysis. Methods We performed principal component analysis (PCA) on AD-associated variants in the UK Biobank (AD cases = 2,739, controls = 5,478) to assess structured genetic heterogeneity. Subsequently, a biclustering algorithm searched for distinct disease-specific genetic signatures among subsets of cases. Replication tests were conducted using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset (AD cases = 500, controls = 470). We categorized a separate set of ADNI individuals with mild cognitive impairment (MCI; n = 399) into genetic subtypes and examined cognitive, amyloid, and tau trajectories. Results PCA revealed three distinct clusters ("constellations") driven primarily by different correlation patterns in a region of strong LD surrounding the MAPT locus. Constellations contained a mixture of cases and controls, reflecting disease-relevant but not disease-specific structure. We found two disease-specific biclusters among AD cases. Pathway analysis linked bicluster-associated variants to neuron morphogenesis and outgrowth. Disease-relevant and disease-specific structure replicated in ADNI, and bicluster 2 exhibited increased cerebrospinal fluid p-tau and cognitive decline over time. Conclusions This study unveils a hierarchical structure of AD genetic risk. Disease-relevant constellations may represent haplotype structure that does not increase risk directly but may alter the relative importance of other genetic risk factors. Biclusters may represent distinct AD genetic subtypes. This structure is replicable and relates to differential pathological accumulation and cognitive decline over time.
Collapse
Affiliation(s)
- Jeremy A Elman
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Center for Behavior Genetics of Aging, University of California San Diego, La Jolla, CA, USA
| | - Nicholas J Schork
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- The Translational Genomics Research Institute, Quantitative Medicine and Systems Biology, Phoenix, AZ, USA
| | - Aaditya V Rangan
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
6
|
Tang D, Freudenberg J, Dahl A. Factorizing polygenic epistasis improves prediction and uncovers biological pathways in complex traits. Am J Hum Genet 2023; 110:1875-1887. [PMID: 37922884 PMCID: PMC10645564 DOI: 10.1016/j.ajhg.2023.10.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 11/07/2023] Open
Abstract
Epistasis is central in many domains of biology, but it has not yet been proven useful for understanding the etiology of complex traits. This is partly because complex-trait epistasis involves polygenic interactions that are poorly captured in current models. To address this gap, we developed a model called Epistasis Factor Analysis (EFA). EFA assumes that polygenic epistasis can be factorized into interactions between a few epistasis factors (EFs), which represent latent polygenic components of the observed complex trait. The statistical goals of EFA are to improve polygenic prediction and to increase power to detect epistasis, while the biological goal is to unravel genetic effects into more-homogeneous units. We mathematically characterize EFA and use simulations to show that EFA outperforms current epistasis models when its assumptions approximately hold. Applied to predicting yeast growth rates, EFA outperforms the additive model for several traits with large epistasis heritability and uniformly outperforms the standard epistasis model. We replicate these prediction improvements in a second dataset. We then apply EFA to four previously characterized traits in the UK Biobank and find statistically significant epistasis in all four, including two that are robust to scale transformation. Moreover, we find that the inferred EFs partly recover pre-defined biological pathways for two of the traits. Our results demonstrate that more realistic models can identify biologically and statistically meaningful epistasis in complex traits, indicating that epistasis has potential for precision medicine and characterizing the biology underlying GWAS results.
Collapse
Affiliation(s)
- David Tang
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA.
| | - Jerome Freudenberg
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Andy Dahl
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
7
|
Chung J, Sahelijo N, Maruyama T, Hu J, Panitch R, Xia W, Mez J, Stein TD, Saykin AJ, Takeyama H, Farrer LA, Crane PK, Nho K, Jun GR. Alzheimer's disease heterogeneity explained by polygenic risk scores derived from brain transcriptomic profiles. Alzheimers Dement 2023; 19:5173-5184. [PMID: 37166019 PMCID: PMC10638468 DOI: 10.1002/alz.13069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 03/03/2023] [Accepted: 03/08/2023] [Indexed: 05/12/2023]
Abstract
INTRODUCTION Alzheimer's disease (AD) is heterogeneous, both clinically and neuropathologically. We investigated whether polygenic risk scores (PRSs) integrated with transcriptome profiles from AD brains can explain AD clinical heterogeneity. METHODS We conducted co-expression network analysis and identified gene sets (modules) that were preserved in three AD transcriptome datasets and associated with AD-related neuropathological traits including neuritic plaques (NPs) and neurofibrillary tangles (NFTs). We computed the module-based PRSs (mbPRSs) for each module and tested associations with mbPRSs for cognitive test scores, cognitively defined AD subgroups, and brain imaging data. RESULTS Of the modules significantly associated with NPs and/or NFTs, the mbPRSs from two modules (M6 and M9) showed distinct associations with language and visuospatial functioning, respectively. They matched clinical subtypes and brain atrophy at specific regions. DISCUSSION Our findings demonstrate that polygenic profiling based on co-expressed gene sets can explain heterogeneity in AD patients, enabling genetically informed patient stratification and precision medicine in AD. HIGHLIGHTS Co-expression gene-network analysis in Alzheimer's disease (AD) brains identified gene sets (modules) associated with AD heterogeneity. AD-associated modules were selected when genes in each module were enriched for neuritic plaques and neurofibrillary tangles. Polygenic risk scores from two selected modules were linked to the matching cognitively defined AD subgroups (language and visuospatial subgroups). Polygenic risk scores from the two modules were associated with cognitive performance in language and visuospatial domains and the associations were confirmed in regional-specific brain atrophy data.
Collapse
Affiliation(s)
- Jaeyoon Chung
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
| | - Nathan Sahelijo
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
| | - Toru Maruyama
- Department of Life Science and Medical Bioscience, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
| | - Junming Hu
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
| | - Rebecca Panitch
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
| | - Weiming Xia
- Department of Pharmacology & Experimental Therapeutics, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
- Department of Veterans Affairs Medical Center, Bedford, MA 01730, USA
| | - Jesse Mez
- Department of Neurology, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
| | - Thor D. Stein
- Department of Veterans Affairs Medical Center, Bedford, MA 01730, USA
- Department of Pathology & Laboratory Medicine, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
- Boston VA Healthcare Center, Boston, MA 02130, USA
| | | | - Andrew J. Saykin
- Department of Radiology and Imaging Sciences and Indiana Alzheimer’s Disease Research Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Haruko Takeyama
- Department of Life Science and Medical Bioscience, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Japan, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Research Organization for Nano and Life Innovations, Waseda University, 513, Wasedatsurumaki-cho, Shinjuku-ku, Tokyo 162-0041, Japan
- Institute for Advanced Research of Biosystem Dynamics, Waseda Research Institute for Science and Engineering, Graduate School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Lindsay A. Farrer
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
- Department of Neurology, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
- Department of Ophthalmology, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
- Department of Biostatistics, Boston University School of Public Health, 715 Albany Street, Boston, MA 02118, USA
- Department of Epidemiology, Boston University School of Public Health, 715 Albany Street, Boston, MA 02118, USA
| | - Paul K. Crane
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Kwangsik Nho
- Department of Radiology and Imaging Sciences and Indiana Alzheimer’s Disease Research Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Gyungah R. Jun
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
- Department of Ophthalmology, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
- Department of Biostatistics, Boston University School of Public Health, 715 Albany Street, Boston, MA 02118, USA
| |
Collapse
|
8
|
Espuela-Ortiz A, Martin-Gonzalez E, Poza-Guedes P, González-Pérez R, Herrera-Luis E. Genomics of Treatable Traits in Asthma. Genes (Basel) 2023; 14:1824. [PMID: 37761964 PMCID: PMC10531302 DOI: 10.3390/genes14091824] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 09/12/2023] [Accepted: 09/16/2023] [Indexed: 09/29/2023] Open
Abstract
The astounding number of genetic variants revealed in the 15 years of genome-wide association studies of asthma has not kept pace with the goals of translational genomics. Moving asthma diagnosis from a nonspecific umbrella term to specific phenotypes/endotypes and related traits may provide insights into features that may be prevented or alleviated by therapeutical intervention. This review provides an overview of the different asthma endotypes and phenotypes and the genomic findings from asthma studies using patient stratification strategies and asthma-related traits. Asthma genomic research for treatable traits has uncovered novel and previously reported asthma loci, primarily through studies in Europeans. Novel genomic findings for asthma phenotypes and related traits may arise from multi-trait and specific phenotyping strategies in diverse populations.
Collapse
Affiliation(s)
- Antonio Espuela-Ortiz
- Genomics and Health Group, Department of Biochemistry, Microbiology, Cell Biology and Genetics, Universidad de La Laguna (ULL), 38200 San Cristóbal de La Laguna, Tenerife, Spain; (A.E.-O.); (E.M.-G.)
| | - Elena Martin-Gonzalez
- Genomics and Health Group, Department of Biochemistry, Microbiology, Cell Biology and Genetics, Universidad de La Laguna (ULL), 38200 San Cristóbal de La Laguna, Tenerife, Spain; (A.E.-O.); (E.M.-G.)
| | - Paloma Poza-Guedes
- Allergy Department, Hospital Universitario de Canarias, 38320 Santa Cruz de Tenerife, Tenerife, Spain; (P.P.-G.); (R.G.-P.)
- Severe Asthma Unit, Hospital Universitario de Canarias, 38320 San Cristóbal de La Laguna, Tenerife, Spain
| | - Ruperto González-Pérez
- Allergy Department, Hospital Universitario de Canarias, 38320 Santa Cruz de Tenerife, Tenerife, Spain; (P.P.-G.); (R.G.-P.)
- Severe Asthma Unit, Hospital Universitario de Canarias, 38320 San Cristóbal de La Laguna, Tenerife, Spain
| | - Esther Herrera-Luis
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
9
|
Paranjpe I, Wang X, Anandakrishnan N, Haydak JC, Van Vleck T, DeFronzo S, Li Z, Mendoza A, Liu R, Fu J, Forrest I, Zhou W, Lee K, O'Hagan R, Dellepiane S, Menon KM, Gulamali F, Kamat S, Gusella GL, Charney AW, Hofer I, Cho JH, Do R, Glicksberg BS, He JC, Nadkarni GN, Azeloglu EU. Deep learning on electronic medical records identifies distinct subphenotypes of diabetic kidney disease driven by genetic variations in the Rho pathway. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.06.23295120. [PMID: 37732187 PMCID: PMC10508814 DOI: 10.1101/2023.09.06.23295120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
Kidney disease affects 50% of all diabetic patients; however, prediction of disease progression has been challenging due to inherent disease heterogeneity. We use deep learning to identify novel genetic signatures prognostically associated with outcomes. Using autoencoders and unsupervised clustering of electronic health record data on 1,372 diabetic kidney disease patients, we establish two clusters with differential prevalence of end-stage kidney disease. Exome-wide associations identify a novel variant in ARHGEF18, a Rho guanine exchange factor specifically expressed in glomeruli. Overexpression of ARHGEF18 in human podocytes leads to impairments in focal adhesion architecture, cytoskeletal dynamics, cellular motility, and RhoA/Rac1 activation. Mutant GEF18 is resistant to ubiquitin mediated degradation leading to pathologically increased protein levels. Our findings uncover the first known disease-causing genetic variant that affects protein stability of a cytoskeletal regulator through impaired degradation, a potentially novel class of expression quantitative trait loci that can be therapeutically targeted.
Collapse
|
10
|
Maiorino E, Loscalzo J. Phenomics and Robust Multiomics Data for Cardiovascular Disease Subtyping. Arterioscler Thromb Vasc Biol 2023; 43:1111-1123. [PMID: 37226730 PMCID: PMC10330619 DOI: 10.1161/atvbaha.122.318892] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 05/10/2023] [Indexed: 05/26/2023]
Abstract
The complex landscape of cardiovascular diseases encompasses a wide range of related pathologies arising from diverse molecular mechanisms and exhibiting heterogeneous phenotypes. This variety of manifestations poses significant challenges in the development of treatment strategies. The increasing availability of precise phenotypic and multiomics data of cardiovascular disease patient populations has spurred the development of a variety of computational disease subtyping techniques to identify distinct subgroups with unique underlying pathogeneses. In this review, we outline the essential components of computational approaches to select, integrate, and cluster omics and clinical data in the context of cardiovascular disease research. We delve into the challenges faced during different stages of the analysis, including feature selection and extraction, data integration, and clustering algorithms. Next, we highlight representative applications of subtyping pipelines in heart failure and coronary artery disease. Finally, we discuss the current challenges and future directions in the development of robust subtyping approaches that can be implemented in clinical workflows, ultimately contributing to the ongoing evolution of precision medicine in health care.
Collapse
Affiliation(s)
- Enrico Maiorino
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
11
|
Gorla A, Sankararaman S, Burchard E, Flint J, Zaitlen N, Rahmani E. Phenotypic subtyping via contrastive learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522921. [PMID: 36711575 PMCID: PMC9881932 DOI: 10.1101/2023.01.05.522921] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Defining and accounting for subphenotypic structure has the potential to increase statistical power and provide a deeper understanding of the heterogeneity in the molecular basis of complex disease. Existing phenotype subtyping methods primarily rely on clinically observed heterogeneity or metadata clustering. However, they generally tend to capture the dominant sources of variation in the data, which often originate from variation that is not descriptive of the mechanistic heterogeneity of the phenotype of interest; in fact, such dominant sources of variation, such as population structure or technical variation, are, in general, expected to be independent of subphenotypic structure. We instead aim to find a subspace with signal that is unique to a group of samples for which we believe that subphenotypic variation exists (e.g., cases of a disease). To that end, we introduce Phenotype Aware Components Analysis (PACA), a contrastive learning approach leveraging canonical correlation analysis to robustly capture weak sources of subphenotypic variation. In the context of disease, PACA learns a gradient of variation unique to cases in a given dataset, while leveraging control samples for accounting for variation and imbalances of biological and technical confounders between cases and controls. We evaluated PACA using an extensive simulation study, as well as on various subtyping tasks using genotypes, transcriptomics, and DNA methylation data. Our results provide multiple strong evidence that PACA allows us to robustly capture weak unknown variation of interest while being calibrated and well-powered, far superseding the performance of alternative methods. This renders PACA as a state-of-the-art tool for defining de novo subtypes that are more likely to reflect molecular heterogeneity, especially in challenging cases where the phenotypic heterogeneity may be masked by a myriad of strong unrelated effects in the data.
Collapse
Affiliation(s)
- Aditya Gorla
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Esteban Burchard
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
| | - Jonathan Flint
- Department of Psychiatry and Behavioral Sciences, Brain Research Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | - Noah Zaitlen
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Elior Rahmani
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
12
|
Ding K, Zhou Z, Ma Y, Li X, Xiao H, Wu Y, Wu T, Chen D. Identification of Novel Metabolic Subtypes Using Multi-Trait Limited Mixed Regression in the Chinese Population. Biomedicines 2022; 10:biomedicines10123093. [PMID: 36551856 PMCID: PMC9775185 DOI: 10.3390/biomedicines10123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/25/2022] [Accepted: 11/29/2022] [Indexed: 12/03/2022] Open
Abstract
The aggregation and interaction of metabolic risk factors leads to highly heterogeneous pathogeneses, manifestations, and outcomes, hindering risk stratification and targeted management. To deconstruct the heterogeneity, we used baseline data from phase II of the Fangshan Family-Based Ischemic Stroke Study (FISSIC), and a total of 4632 participants were included. A total of 732 individuals who did not have any component of metabolic syndrome (MetS) were set as a reference group, while 3900 individuals with metabolic abnormalities were clustered into subtypes using multi-trait limited mixed regression (MFMR). Four metabolic subtypes were identified with the dominant characteristics of abdominal obesity, hypertension, hyperglycemia, and dyslipidemia. Multivariate logistic regression showed that the hyperglycemia-dominant subtype had the highest coronary heart disease (CHD) risk (OR: 6.440, 95% CI: 3.177-13.977) and that the dyslipidemia-dominant subtype had the highest stroke risk (OR: 2.450, 95% CI: 1.250-5.265). Exome-wide association studies (EWASs) identified eight SNPs related to the dyslipidemia-dominant subtype with genome-wide significance, which were located in the genes APOA5, BUD13, ZNF259, and WNT4. Functional analysis revealed an enrichment of top genes in metabolism-related biological pathways and expression in the heart, brain, arteries, and kidneys. Our findings provide directions for future attempts at risk stratification and evidence-based management in populations with metabolic abnormalities from a systematic perspective.
Collapse
|
13
|
Allesøe RL, Nudel R, Thompson WK, Wang Y, Nordentoft M, Børglum AD, Hougaard DM, Werge T, Rasmussen S, Benros ME. Deep learning-based integration of genetics with registry data for stratification of schizophrenia and depression. SCIENCE ADVANCES 2022; 8:eabi7293. [PMID: 35767618 PMCID: PMC9242585 DOI: 10.1126/sciadv.abi7293] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 05/11/2022] [Indexed: 06/15/2023]
Abstract
Currently, psychiatric diagnoses are, in contrast to most other medical fields, based on subjective symptoms and observable signs and call for new and improved diagnostics to provide the most optimal care. On the basis of a deep learning approach, we performed unsupervised patient stratification of 19,636 patients with depression [major depressive disorder (MDD)] and/or schizophrenia (SCZ) and 22,467 population controls from the iPSYCH2012 case cohort. We integrated data of disorder severity, history of mental disorders and disease comorbidities, genetics, and medical birth data. From this, we stratified the individuals in six and seven unique clusters for MDD and SCZ, respectively. When censoring data until diagnosis, we could predict MDD clusters with areas under the curve (AUCs) of 0.54 to 0.80 and SCZ clusters with AUCs of 0.71 to 0.86. Overall cases and controls could be predicted with an AUC of 0.81, illustrating the utility of data-driven subgrouping in psychiatry.
Collapse
Affiliation(s)
- Rosa Lundbye Allesøe
- Copenhagen Research Centre for Mental Health, Mental Health Centre Copenhagen, Copenhagen University Hospital, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Ron Nudel
- Copenhagen Research Centre for Mental Health, Mental Health Centre Copenhagen, Copenhagen University Hospital, Copenhagen, Denmark
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
| | - Wesley K. Thompson
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Institute of Biological Psychiatry, Mental Health Centre Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, San Diego, CA, USA
| | - Yunpeng Wang
- Lifespan Changes in Brain and Cognition (LCBC), Department of Psychology, University of Oslo, Forskningsveien 3A, 0317 Oslo, Norway
| | - Merete Nordentoft
- Copenhagen Research Centre for Mental Health, Mental Health Centre Copenhagen, Copenhagen University Hospital, Copenhagen, Denmark
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Anders D. Børglum
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Department of Biomedicine, Aarhus University and Centre for Integrative Sequencing, iSEQ, Aarhus, Denmark
- Aarhus Genome Center, Aarhus, Denmark
| | - David M. Hougaard
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Thomas Werge
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Institute of Biological Psychiatry, Mental Health Centre Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Michael Eriksen Benros
- Copenhagen Research Centre for Mental Health, Mental Health Centre Copenhagen, Copenhagen University Hospital, Copenhagen, Denmark
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Aarhus, Denmark
- Department of Immunology and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
14
|
Abstract
Genetic studies of human traits have revolutionized our understanding of the variation between individuals, and yet, the genetics of most traits is still poorly understood. In this review, we highlight the major open problems that need to be solved, and by discussing these challenges provide a primer to the field. We cover general issues such as population structure, epistasis and gene-environment interactions, data-related issues such as ancestry diversity and rare genetic variants, and specific challenges related to heritability estimates, genetic association studies, and polygenic risk scores. We emphasize the interconnectedness of these problems and suggest promising avenues to address them.
Collapse
Affiliation(s)
- Nadav Brandes
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
15
|
Contextualizing genetic risk score for disease screening and rare variant discovery. Nat Commun 2021; 12:4418. [PMID: 34285202 PMCID: PMC8292385 DOI: 10.1038/s41467-021-24387-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 06/07/2021] [Indexed: 11/08/2022] Open
Abstract
Studies of the genetic basis of complex traits have demonstrated a substantial role for common, small-effect variant polygenic burden (PB) as well as large-effect variants (LEV, primarily rare). We identify sufficient conditions in which GWAS-derived PB may be used for well-powered rare pathogenic variant discovery or as a sample prioritization tool for whole-genome or exome sequencing. Through extensive simulations of genetic architectures and generative models of disease liability with parameters informed by empirical data, we quantify the power to detect, among cases, a lower PB in LEV carriers than in non-carriers. Furthermore, we uncover clinically useful conditions wherein the risk derived from the PB is comparable to the LEV-derived risk. The resulting summary-statistics-based methodology (with publicly available software, PB-LEV-SCAN) makes predictions on PB-based LEV screening for 36 complex traits, which we confirm in several disease datasets with available LEV information in the UK Biobank, with important implications on clinical decision-making.
Collapse
|