1
|
Mangnier L, Ruczinski I, Ricard J, Moreau C, Girard S, Maziade M, Bureau A. RetroFun-RVS: A Retrospective Family-Based Framework for Rare Variant Analysis Incorporating Functional Annotations. Genet Epidemiol 2025; 49:e70001. [PMID: 39876583 PMCID: PMC11775437 DOI: 10.1002/gepi.70001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 10/16/2024] [Accepted: 01/03/2025] [Indexed: 01/30/2025]
Abstract
A large proportion of genetic variations involved in complex diseases are rare and located within noncoding regions, making the interpretation of underlying biological mechanisms a daunting task. Although technical and methodological progress has been made to annotate the genome, current disease-rare-variant association tests incorporating such annotations suffer from two major limitations. First, they are generally restricted to case-control designs of unrelated individuals, which often require tens or hundreds of thousands of individuals to achieve sufficient power. Second, they were not evaluated with region-based annotations needed to interpret the causal regulatory mechanisms. In this work, we propose RetroFun-RVS, a new retrospective family-based score test, incorporating functional annotations. A critical feature of the proposed method is to aggregate genotypes to compare against rare variant-sharing expectations among affected family members. Through extensive simulations, we have demonstrated that RetroFun-RVS integrating networks based on 3D genome contacts as functional annotations reach greater power over the region-wide test, other strategies to include subregions and competing methods. Also, the proposed framework shows robustness to non-informative annotations, maintaining its power when causal variants are spread across regions. Asymptotic p-values are susceptible to Type I error inflation when the number of families with rare variants is small, and a bootstrap procedure is recommended in these instances. Application of RetroFun-RVS is illustrated on whole genome sequence in the Eastern Quebec Schizophrenia and Bipolar Disorder Kindred Study with networks constructed from 3D contacts and epigenetic data on neurons. In summary, the integration of functional annotations corresponding to regions or networks with transcriptional impacts in rare variant tests appears promising to highlight regulatory mechanisms involved in complex diseases.
Collapse
Affiliation(s)
- Loïc Mangnier
- Department of Social and Preventive MedicineLaval UniversityQuebec CityQuebecCanada
- CERVO Brain Research CenterQuebec CityQuebecCanada
- Big Data Research CenterLaval UniversityQuebec CityQuebecCanada
| | - Ingo Ruczinski
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreMarylandUSA
| | | | - Claudia Moreau
- Department of Fundamental SciencesUniversity of Quebec in ChicoutimiSaguenayQuebecCanada
| | - Simon Girard
- Department of Fundamental SciencesUniversity of Quebec in ChicoutimiSaguenayQuebecCanada
| | - Michel Maziade
- CERVO Brain Research CenterQuebec CityQuebecCanada
- Department of Psychiatry and NeurosciencesLaval UniversityQuebec CityQuebecCanada
| | - Alexandre Bureau
- Department of Social and Preventive MedicineLaval UniversityQuebec CityQuebecCanada
- CERVO Brain Research CenterQuebec CityQuebecCanada
- Big Data Research CenterLaval UniversityQuebec CityQuebecCanada
| |
Collapse
|
2
|
Hadi AF, Arta RK, Kushima I, Egawa J, Watanabe Y, Ozaki N, Someya T. Association Analysis of Rare CNTN5 Variants With Autism Spectrum Disorder in a Japanese Population. Neuropsychopharmacol Rep 2025; 45:e12527. [PMID: 39887962 DOI: 10.1002/npr2.12527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 12/30/2024] [Accepted: 01/02/2025] [Indexed: 02/01/2025] Open
Abstract
BACKGROUND Contactin-5 (CNTN5), a neural adhesion molecule involved in synaptogenesis and synaptic maturation in the auditory pathway, has been associated with the pathophysiology of autism spectrum disorder (ASD), particularly hyperacusis. To investigate the role of rare CNTN5 variants in ASD susceptibility, we performed resequencing and association analysis in a Japanese population. METHODS We resequenced the CNTN5 coding regions in 302 patients with ASD and prioritized rare putatively damaging variants. The prioritized variants were then genotyped in 313 patients with ASD and 1065 controls. Subsequently, we conducted an association study of selected variants with ASD in 614 patients with ASD and 61 057 controls. Clinical data were reviewed for patients carrying prioritized variants. RESULTS Through resequencing, we prioritized three rare putatively damaging missense variants (W69G, I227L, and L1000S) in patients with ASD. Although we found a nominally significant association between the I227L variant and ASD, it did not remain significant after post hoc correction. Hyperacusis was found in three out of nine patients carrying prioritized variants. CONCLUSION This study does not provide evidence for the contribution of rare CNTN5 variants to the genetic etiology of ASD in the Japanese population.
Collapse
Affiliation(s)
- Abdul Fuad Hadi
- Department of Psychiatry, School of Medicine, and Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| | - Reza K Arta
- Department of Psychiatry, School of Medicine, and Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| | - Itaru Kushima
- Department of Psychiatry, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
- Medical Genomics Center, Nagoya University Hospital, Nagoya, Aichi, Japan
| | - Jun Egawa
- Department of Psychiatry, School of Medicine, and Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| | - Yuichiro Watanabe
- Department of Psychiatry, School of Medicine, and Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
- Department of Psychiatry, Uonuma Kikan Hospital, Niigata, Japan
| | - Norio Ozaki
- Pathophysiology of Mental Disorders, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | - Toshiyuki Someya
- Department of Psychiatry, School of Medicine, and Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| |
Collapse
|
3
|
Öğüt S, Türkol M, Yıkmış S, Bozgeyik E, Abdi G, Kocyigit E, Aadil RM, Seyidoglu N, Karakçı D, Tokatlı N. Ultrasound-assisted enhancement of bioactive compounds in hawthorn vinegar: A functional approach to anticancer and antidiabetic effects. ULTRASONICS SONOCHEMISTRY 2025; 114:107245. [PMID: 39879805 DOI: 10.1016/j.ultsonch.2025.107245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 01/21/2025] [Accepted: 01/25/2025] [Indexed: 01/31/2025]
Abstract
In this study, the effects of ultrasound treatment on bioactive components and functional properties of hawthorn vinegar (Crataegus tanacetifolia) were investigated. Parameters such as total phenolic compound (TPC), total flavonoid content (TFC), ascorbic acid (AA), DPPH radical scavenging activity and CUPRAC reducing capacity were optimised by surface response method (RSM) and 14 min duration and 61.40 % amplitude were determined as the most suitable treatment conditions. The results showed that ultrasound treatment improved the antioxidant properties of hawthorn vinegar by increasing TPC, TFC, DPPH and CUPRAC values. In addition, it was observed that hawthorn vinegar samples exhibited anticancer effects in cell culture experiments. In experiments on A549 (lung), MCF-7 (breast) and HT-29 (colon) cancer cell lines, ultrasound-treated vinegar increased apoptotic effects, suppressed cell migration and reduced necrosis rates in some cell lines. In particular, ultrasound treatment of vinegar resulted in a reduction in the expression of anti-apoptotic genes (BCL-2 and XIAP) and an enhancement in the expression of pro-apoptotic genes (BAX). These findings suggest that ultrasound technology preserves and enhances the bioactive components of hawthorn vinegar, improves its anticancer properties and increases its potential for use as a functional food product.
Collapse
Affiliation(s)
- Selim Öğüt
- Department of Biophysics, Faculty of Medicine, Bandırma Onyedi Eylul University 10250 Bandırma, Balıkesir, Türkiye
| | - Melikenur Türkol
- Department of Nutrition and Dietetics, Faculty of Health Sciences, Tekirdag Namık Kemal University 59030 Tekirdag, Türkiye.
| | - Seydi Yıkmış
- Department of Food Technology, Tekirdag Namık Kemal University 59830 Tekirdag, Türkiye.
| | - Esra Bozgeyik
- Department of Medical Biology, Faculty of Medicine, Adiyaman University, 02200, Adiyaman, Türkiye
| | - Gholamreza Abdi
- Department of Biotechnology, Persian Gulf Research Institute, Persian Gulf University, Bushehr, 75169, Iran.
| | - Emine Kocyigit
- Nutrition and Dietetics, Faculty of Health Sciences, Ordu University 52200 Ordu, Türkiye
| | - Rana Muhammad Aadil
- National Institute of Food Science and Technology, University of Agriculture, Faisalabad 38000 Pakistan
| | - Nilay Seyidoglu
- Department of Physiology, Faculty of Veterinary Medicine, Tekirdag Namik Kemal University 59030 Tekirdag, Türkiye
| | - Deniz Karakçı
- Department of Biochemistry, Faculty of Veterinary Medicine, Tekirdag Namik Kemal University 59030 Tekirdag, Türkiye
| | - Nazlı Tokatlı
- Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Istanbul Health and Technology University 34421 Istanbul, Türkiye
| |
Collapse
|
4
|
Shang L, Wu P, Zhou X. Statistical identification of cell type-specific spatially variable genes in spatial transcriptomics. Nat Commun 2025; 16:1059. [PMID: 39865128 PMCID: PMC11770176 DOI: 10.1038/s41467-025-56280-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 01/06/2025] [Indexed: 01/28/2025] Open
Abstract
An essential task in spatial transcriptomics is identifying spatially variable genes (SVGs). Here, we present Celina, a statistical method for systematically detecting cell type-specific SVGs (ct-SVGs)-a subset of SVGs exhibiting distinct spatial expression patterns within specific cell types. Celina utilizes a spatially varying coefficient model to accurately capture each gene's spatial expression pattern in relation to the distribution of cell types across tissue locations, ensuring effective type I error control and high power. Celina proves powerful compared to existing methods in single-cell resolution spatial transcriptomics and stands as the only effective solution for spot-resolution spatial transcriptomics. Applied to five real datasets, Celina uncovers ct-SVGs associated with tumor progression and patient survival in lung cancer, identifies metagenes with unique spatial patterns linked to cell proliferation and immune response in kidney cancer, and detects genes preferentially expressed near amyloid-β plaques in an Alzheimer's model.
Collapse
Affiliation(s)
- Lulu Shang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peijun Wu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
5
|
Liu X, Li YJ, Fan Q. Zim4rv: an R package to modeling zero-inflated count phenotype on regional-based rare variants. BMC Bioinformatics 2025; 26:18. [PMID: 39819419 PMCID: PMC11740424 DOI: 10.1186/s12859-024-06029-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Accepted: 12/27/2024] [Indexed: 01/19/2025] Open
Abstract
BACKGROUND With the advance of next-generation sequencing, various gene-based rare variant association tests have been developed, particularly for binary and continuous phenotypes. In contrast, fewer methods are available for traits not following binomial or normal distributions. To address this, we previously proposed a set of burden- and kernel-based rare variant tests for count data following zero-inflated Poisson (ZIP) distributions, referred to as ZIP-b and ZIP-k tests. We sought to extend the methods to accommodate negative binomial distribution and implemented these tests in a new R package. RESULTS We introduce ZIM4rv, an R package designed to analyze the association of rare variants with zero-inflated counts outcomes. Our package offers two novel models developed by our team: our previously proposed ZIP-b and ZIP-k tests, and the newly derived Negative Binomial Burden and Kernel Test (ZINB-b, ZINB-k). Additionally, we include an ad-hoc two-stage analysis, testing zero and non-zero as a binary outcome and non-zero as a continuous outcome, respectively. To showcase the utility of our platform, we applied this program to analyze neuritic plaque count data from the ROSMAP cohort. CONCLUSION The R package ZIM4rv presents an integrated workflow for conducting association tests on a set of rare variants with zero-inflated counts data.
Collapse
Affiliation(s)
- Xiaomin Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| | - Yi-Ju Li
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina, USA
- Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, North Carolina, USA
| | - Qiao Fan
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
6
|
Wijnbergen D, Johari M, Ozisik O, 't Hoen PAC, Ehrhart F, Baudot A, Evelo CT, Udd B, Roos M, Mina E. Multi-omics analysis in inclusion body myositis identifies mir-16 responsible for HLA overexpression. Orphanet J Rare Dis 2025; 20:27. [PMID: 39815348 PMCID: PMC11737257 DOI: 10.1186/s13023-024-03526-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 12/27/2024] [Indexed: 01/18/2025] Open
Abstract
BACKGROUND Inclusion Body Myositis is an acquired muscle disease. Its pathogenesis is unclear due to the co-existence of inflammation, muscle degeneration and mitochondrial dysfunction. We aimed to provide a more advanced understanding of the disease by combining multi-omics analysis with prior knowledge. We applied molecular subnetwork identification to find highly interconnected subnetworks with a high degree of change in Inclusion Body Myositis. These could be used as hypotheses for potential pathomechanisms and biomarkers that are implicated in this disease. RESULTS Our multi-omics analysis resulted in five subnetworks that exhibit changes in multiple omics layers. These subnetworks are related to antigen processing and presentation, chemokine-mediated signaling, immune response-signal transduction, rRNA processing, and mRNA splicing. An interesting finding is that the antigen processing and presentation subnetwork links the underexpressed miR-16-5p to overexpressed HLA genes by negative expression correlation. In addition, the rRNA processing subnetwork contains the RPS18 gene, which is not differentially expressed, but has significant variant association. The RPS18 gene could potentially play a role in the underexpression of the genes involved in 18 S ribosomal RNA processing, which it is highly connected to. CONCLUSIONS Our analysis highlights the importance of interrogating multiple omics to enhance knowledge discovery in rare diseases. We report five subnetworks that can provide additional insights into the molecular pathogenesis of Inclusion Body Myositis. Our analytical workflow can be reused as a method to study disease mechanisms involved in other diseases when multiple omics datasets are available.
Collapse
Affiliation(s)
- Daphne Wijnbergen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
| | - Mridul Johari
- Harry Perkins Institute of Medical Research, Centre for Medical Research, University of Western Australia, Nedlands, WA, Australia
- Folkhälsen Research Center, Helsinki, Finland
- Department of Medical and Clinical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Ozan Ozisik
- Université Paris Cité, INSERM U976, Paris, France
| | - Peter A C 't Hoen
- Department of Medical BioSciences, Radboud university medical center, Nijmegen, The Netherlands
| | - Friederike Ehrhart
- Department of Bioinformatics - BiGCaT, NUTRIM/MHeNs, Maastricht University, Maastricht, The Netherlands
| | - Anaïs Baudot
- Aix Marseille University, INSERM, MMG, Marseille, France
- CNRS, Marseille, France
- Barcelona Supercomputing Centre, Barcelona, Spain
| | - Chris T Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Bjarne Udd
- Folkhälsen Research Center, Helsinki, Finland
- Department of Medical and Clinical Genetics, Medicum, University of Helsinki, Helsinki, Finland
- Tampere Neuromuscular Center, University Hospital, Tampere, Finland
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Eleni Mina
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
7
|
Borczyk M, Fichna JP, Piechota M, Gołda S, Zięba M, Hoinkis D, Cięszczyk P, Korostynski M, Janik P, Żekanowski C. Oligogenic risk score for Gilles de la Tourette syndrome reveals a genetic continuum of tic disorders. J Appl Genet 2025:10.1007/s13353-024-00930-8. [PMID: 39792217 DOI: 10.1007/s13353-024-00930-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 11/28/2024] [Accepted: 12/01/2024] [Indexed: 01/12/2025]
Abstract
Gilles de la Tourette syndrome (GTS) and other tic disorders (TDs) have a substantial genetic component with their heritability estimated at between 60 and 80%. Here we propose an oligogenic risk score of TDs using whole-genome sequencing (WGS) data from a group of Polish GTS patients, their families, and control samples (n = 278). In this study, we first reviewed the literature to obtain a preliminary list of 84 GTS/TD candidate genes. From this list, 10 final risk score genes were selected based on single-gene burden tests (SKAT p < 0.05) between unrelated GTS cases (n = 37) and synthetic control samples based on a database of local allele frequencies. These 10 genes were CHADL, DRD2, MAOA, PCDH10, HTR2A, SLITRK5, SORCS3, KCNQ5, CDH9, and CHD8. Variants in and in the vicinity (± 20 kbp) of the ten risk genes (n = 7654) with a median minor allele frequency in the non-Finnish European population of 0.02 were integrated into an additive classifier. This risk score was then applied to healthy and GTS-affected individuals from 23 families and 100 unrelated healthy samples from the Polish population (AUC-ROC = 0.62, p = 0.02). Application of the algorithm to a group of patients with other tic disorders revealed a continuous increase of the oligogenic score with healthy individuals with the lowest mean, then patients with other tic disorders, then GTS patients, and finally with severe GTS cases with the highest oligogenic score. We have further compared our WGS results with the summary statistics of the Psychiatric Genomics Consortium genome-wide association study (PGC GWAS) of TDs and found no signal overlap except for the CHADL gene locus. Polygenic risk scores from common variants of GTS GWAS show no difference between patient and control groups, except for the comparison between patients with non-GTS TDs and patients with severe GTS. Overall, we leveraged WGS data to construct a GTS/TD risk score based on variants that may cooperatively contribute to the aetiology of these disorders. This study provides evidence that typical and severe adult GTS as well as other tic disorders may exist on a single spectrum in terms of their genetic background.
Collapse
Affiliation(s)
- Malgorzata Borczyk
- Laboratory of Pharmacogenomics, Department of Molecular Neuropharmacology, Polish Academy of Sciences, Maj Institute of Pharmacology, Smętna 12, 31-343, Krakow, Poland.
| | - Jakub P Fichna
- Department of Neurogenetics and Functional Genomics, Mossakowski Medical Research Institute, Polish Academy of Sciences, Pawińskiego 5, 02-106, Warsaw, Poland
| | - Marcin Piechota
- Laboratory of Pharmacogenomics, Department of Molecular Neuropharmacology, Polish Academy of Sciences, Maj Institute of Pharmacology, Smętna 12, 31-343, Krakow, Poland
| | - Sławomir Gołda
- Laboratory of Pharmacogenomics, Department of Molecular Neuropharmacology, Polish Academy of Sciences, Maj Institute of Pharmacology, Smętna 12, 31-343, Krakow, Poland
| | - Mateusz Zięba
- Laboratory of Pharmacogenomics, Department of Molecular Neuropharmacology, Polish Academy of Sciences, Maj Institute of Pharmacology, Smętna 12, 31-343, Krakow, Poland
| | | | - Paweł Cięszczyk
- Faculty of Physical Education, Gdansk University of Physical Education and Sport, 80-336, Gdansk, Poland
| | - Michal Korostynski
- Laboratory of Pharmacogenomics, Department of Molecular Neuropharmacology, Polish Academy of Sciences, Maj Institute of Pharmacology, Smętna 12, 31-343, Krakow, Poland
| | - Piotr Janik
- Department of Neurology, Medical University of Warsaw, Żwirki i Wigury 61, 02-091, Warsaw, Poland
| | - Cezary Żekanowski
- Department of Neurogenetics and Functional Genomics, Mossakowski Medical Research Institute, Polish Academy of Sciences, Pawińskiego 5, 02-106, Warsaw, Poland.
- Faculty of Physical Education, Gdansk University of Physical Education and Sport, 80-336, Gdansk, Poland.
| |
Collapse
|
8
|
Shen L, Amei A, Liu B, Xu G, Liu Y, Oh EC, Zhou X, Wang Z. Marginal interaction test for detecting interactions between genetic marker sets and environment in genome-wide studies. G3 (BETHESDA, MD.) 2025; 15:jkae263. [PMID: 39538414 PMCID: PMC11708225 DOI: 10.1093/g3journal/jkae263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 11/04/2024] [Indexed: 11/16/2024]
Abstract
As human complex diseases are influenced by the interaction between genetics and the environment, identifying gene-environment interactions (G×E) is crucial for understanding disease mechanisms and predicting risk. Developing robust quantitative tools for G×E analysis can enhance the study of complex diseases. However, many existing methods that explore G×E focus on the interplay between an environmental factor and genetic variants, exclusively for common or rare variants. In this study, we developed MAGEIT_RAN and MAGEIT_FIX to identify interactions between an environmental factor and a set of genetic markers, including both rare and common variants, based on the MinQue for Summary statistics. The genetic main effects in MAGEIT_RAN and MAGEIT_FIX are modeled as random and fixed effects, respectively. Simulation studies showed that both tests had type I error under control, with MAGEIT_RAN being the most powerful test. Applying MAGEIT to a genome-wide analysis of gene-alcohol interactions on hypertension and seated systolic blood pressure in the Multiethnic Study of Atherosclerosis revealed genes like EIF2AK2, CCNDBP1, and EPB42 influencing blood pressure through alcohol interaction. Pathway analysis identified 1 apoptosis and survival pathway involving PKR and 2 signal transduction pathways associated with hypertension and alcohol intake, demonstrating MAGEIT_RAN's ability to detect biologically relevant gene-environment interactions.
Collapse
Affiliation(s)
- Linchuan Shen
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
| | - Bowen Liu
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Division of Computing, Analysis, and Mathematics, University of Missouri, Kansas City, MO 64108, USA
| | - Gang Xu
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
| | - Yunqing Liu
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
| | - Edwin C Oh
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Department of Internal Medicine, University of Nevada School of Medicine, Las Vegas, NV 89154, USA
| | - Xin Zhou
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT 06510, USA
| |
Collapse
|
9
|
Yao Y, Li X, Wu L, Zhang J, Gui Y, Yu X, Zhou Y, Li X, Liu X, Xing S, An G, Du Z, Liu H, Li S, Yu X, Chen H, Su J, Chen S. Whole-genome sequencing identifies novel loci for keratoconus and facilitates risk stratification in a Han Chinese population. EYE AND VISION (LONDON, ENGLAND) 2025; 12:5. [PMID: 39762938 PMCID: PMC11706019 DOI: 10.1186/s40662-024-00421-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 11/28/2024] [Indexed: 01/11/2025]
Abstract
BACKGROUND Keratoconus (KC) is a prevalent corneal condition with a modest genetic basis. Recent studies have reported significant genetic associations in multi-ethnic cohorts. However, the situation in the Chinese population remains unknown. This study was conducted to identify novel genetic variants linked to KC and to evaluate the potential applicability of a polygenic risk model in the Han Chinese population. METHODS A total of 830 individuals diagnosed with KC and 779 controls from a Chinese cohort were enrolled and genotyped by whole-genome sequencing (WGS). Common and rare variants were respectively subjected to single variant association analysis and gene-based burden analysis. Polygenic risk score (PRS) models were developed using top single-nucleotide polymorphisms (SNPs) identified from a multi-ethnic meta-analysis and then evaluated in the Chinese cohort. RESULTS The characterization of germline variants entailed correction for population stratification and validation of the East Asian ancestry of the included samples via principal component analysis. For rare protein-truncating variants (PTVs) with minor allele frequency (MAF) < 5%, ZC3H11B emerged as the top prioritized gene, albeit failing to reach the significance threshold. We detected three common variants reaching genome-wide significance (P ≤ 5 × 10-8), all of which are novel to KC. Our study validated three well known predisposition loci, COL5A1, EIF3A and FNDC3B. Additionally, a significant correlation of allelic effects was observed for suggestive SNPs between the largest multi-ethnic meta-genome-wide association study (GWAS) and our study. The PRS model, generated using top SNPs from the meta-GWAS, stratified individuals in the upper quartile, revealing up to a 2.16-fold increased risk for KC. CONCLUSIONS Our comprehensive WGS-based GWAS in a large Chinese cohort enhances the efficiency of array-based genetic studies, revealing novel genetic associations for KC and highlighting the potential for refining clinical decision-making and early prevention strategies.
Collapse
Affiliation(s)
- Yinghao Yao
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Xingyong Li
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Lan Wu
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jia Zhang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Yuanyuan Gui
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Xiangyi Yu
- Institute of PSI Genomics, Wenzhou Global Eye & Vision Innovation Center, Wenzhou, 325024, China
| | - Yang Zhou
- Taizhou Eye Hospital, Taizhou, 318001, China
| | - Xuefei Li
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Xinyu Liu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Shilai Xing
- Institute of PSI Genomics, Wenzhou Global Eye & Vision Innovation Center, Wenzhou, 325024, China
| | - Gang An
- Institute of PSI Genomics, Wenzhou Global Eye & Vision Innovation Center, Wenzhou, 325024, China
| | - Zhenlin Du
- Institute of PSI Genomics, Wenzhou Global Eye & Vision Innovation Center, Wenzhou, 325024, China
| | - Hui Liu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Shasha Li
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Xiaoguang Yu
- Institute of PSI Genomics, Wenzhou Global Eye & Vision Innovation Center, Wenzhou, 325024, China
| | - Hua Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jianzhong Su
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Shihao Chen
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China.
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China.
| |
Collapse
|
10
|
Zucker R, Kelman G, Linial M. PWAS Hub: exploring gene-based associations of complex diseases with sex dependency. Nucleic Acids Res 2025; 53:D1132-D1143. [PMID: 39565197 PMCID: PMC11701668 DOI: 10.1093/nar/gkae1125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 10/15/2024] [Accepted: 11/18/2024] [Indexed: 11/21/2024] Open
Abstract
The Proteome-Wide Association Study (PWAS) is a protein-based genetic association approach designed to complement traditional variant-based methods like GWAS. PWAS operates in two stages: first, machine learning models predict the impact of genetic variants on protein-coding genes, generating effect scores. These scores are then aggregated into a gene-damaging score for each individual. This score is then used in case-control statistical tests to significantly link to specific phenotypes. PWAS Hub (v1.2) is a user-friendly platform that facilitates the exploration of gene-disease associations using clinical and genetic data from the UK Biobank (UKB), encompassing 500k individuals. PWAS Hub reports on 819 diseases and phenotypes determined by PheCode and ICD-10 clinical codes, each with a minimum of 400 affected individuals. PWAS-derived gene associations were reported for 72% of the tested phenotypes. The PWAS Hub also analyzes gene associations separately for males and females, considering sex-specific genetic effects, inheritance patterns (dominant and recessive), and gene pleiotropy. We illustrated the utility of the PWAS Hub for primary (essential) hypertension (I10), type 2 diabetes mellitus (E11), and specified haematuria (R31) that showed sex-dependent genetic signals. The PWAS Hub, available at pwas.huji.ac.il, is a valuable resource for studying genetic contributions to common diseases and sex-specific effects.
Collapse
Affiliation(s)
- Roei Zucker
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Guy Kelman
- The Jerusalem Center for Personalized Computational Medicine, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| |
Collapse
|
11
|
Liu H, Zhang H. Powerful Rare-Variant Association Analysis of Secondary Phenotypes. Genet Epidemiol 2025; 49:e22589. [PMID: 39350332 DOI: 10.1002/gepi.22589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 06/24/2024] [Accepted: 09/02/2024] [Indexed: 12/20/2024]
Abstract
Most genome-wide association studies are based on case-control designs, which provide abundant resources for secondary phenotype analyses. However, such studies suffer from biased sampling of primary phenotypes, and the traditional statistical methods can lead to seriously distorted analysis results when they are applied to secondary phenotypes without accounting for the biased sampling mechanism. To our knowledge, there are no statistical methods specifically tailored for rare variant association analysis with secondary phenotypes. In this article, we proposed two novel joint test statistics for identifying secondary-phenotype-associated rare variants based on prospective likelihood and retrospective likelihood, respectively. We also exploit the assumption of gene-environment independence in retrospective likelihood to improve the statistical power and adopt a two-step strategy to balance statistical power and robustness. Simulations and a real-data application are conducted to demonstrate the superior performance of our proposed methods.
Collapse
Affiliation(s)
- Hanyun Liu
- School of Management, University of Science and Technology of China, Hefei, China
| | - Hong Zhang
- School of Management, University of Science and Technology of China, Hefei, China
| |
Collapse
|
12
|
Li R, Li M, Zhao N. A Mixed-Effect Kernel Machine Regression Model for Integrative Analysis of Alpha Diversity in Microbiome Studies. Genet Epidemiol 2025; 49:e22596. [PMID: 39350346 DOI: 10.1002/gepi.22596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 08/22/2024] [Accepted: 09/05/2024] [Indexed: 12/20/2024]
Abstract
Increasing evidence suggests that human microbiota plays a crucial role in many diseases. Alpha diversity, a commonly used summary statistic that captures the richness and/or evenness of the microbial community, has been associated with many clinical conditions. However, individual studies that assess the association between alpha diversity and clinical conditions often provide inconsistent results due to insufficient sample size, heterogeneous study populations and technical variability. In practice, meta-analysis tools have been applied to integrate data from multiple studies. However, these methods do not consider the heterogeneity caused by sequencing protocols, and the contribution of each study to the final model depends mainly on its sample size (or variance estimate). To combine studies with distinct sequencing protocols, a robust statistical framework for integrative analysis of microbiome datasets is needed. Here, we propose a mixed-effect kernel machine regression model to assess the association of alpha diversity with a phenotype of interest. Our approach readily incorporates the study-specific characteristics (including sequencing protocols) to allow for flexible modeling of microbiome effect via a kernel similarity matrix. Within the proposed framework, we provide three hypothesis testing approaches to answer different questions that are of interest to researchers. We evaluate the model performance through extensive simulations based on two distinct data generation mechanisms. We also apply our framework to data from HIV reanalysis consortium to investigate gut dysbiosis in HIV infection.
Collapse
Affiliation(s)
- Runzhe Li
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Mo Li
- Department of Mathematics, University of Louisiana at Lafayette, Lafayette, Louisiana, USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
13
|
Akinbiyi T, McPeek MS, Abney M. ADELLE: A global testing method for trans-eQTL mapping. PLoS Genet 2025; 21:e1011563. [PMID: 39792937 PMCID: PMC11756770 DOI: 10.1371/journal.pgen.1011563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 01/23/2025] [Accepted: 12/31/2024] [Indexed: 01/12/2025] Open
Abstract
Understanding the genetic regulatory mechanisms of gene expression is an ongoing challenge. Genetic variants that are associated with expression levels are readily identified when they are proximal to the gene (i.e., cis-eQTLs), but SNPs distant from the gene whose expression levels they are associated with (i.e., trans-eQTLs) have been much more difficult to discover, even though they account for a majority of the heritability in gene expression levels. A major impediment to the identification of more trans-eQTLs is the lack of statistical methods that are powerful enough to overcome the obstacles of small effect sizes and large multiple testing burden of trans-eQTL mapping. Here, we propose ADELLE, a powerful statistical testing framework that requires only summary statistics and is designed to be most sensitive to SNPs that are associated with multiple gene expression levels, a characteristic of many trans-eQTLs. In simulations, we show that for detecting SNPs that are associated with 0.1%-2% of 10,000 traits, among the 8 methods we consider ADELLE is clearly the most powerful overall, with either the highest power or power not significantly different from the highest for all settings in that range. We apply ADELLE to a mouse advanced intercross line data set and show its ability to find trans-eQTLs that were not significant under a standard analysis. We also apply ADELLE to trans-eQTL mapping in the eQTLGen data, and for 1,451 previously identified trans-eQTLs, we discover trans association with additional expression traits beyond those previously identified. This demonstrates that ADELLE is a powerful tool at uncovering trans regulators of genetic expression.
Collapse
Affiliation(s)
- Takintayo Akinbiyi
- Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Mark Abney
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
14
|
Little A, Zhao N, Mikhaylova A, Zhang A, Ling W, Thibord F, Johnson AD, Raffield LM, Curran JE, Blangero J, O'Connell JR, Xu H, Rotter JI, Rich SS, Rice KM, Chen MH, Reiner A, Kooperberg C, Vu T, Hou L, Fornage M, Loos RJF, Kenny E, Mathias R, Becker L, Smith AV, Boerwinkle E, Yu B, Thornton T, Wu MC. General Kernel Machine Methods for Multi-Omics Integration and Genome-Wide Association Testing With Related Individuals. Genet Epidemiol 2025; 49:e22610. [PMID: 39812506 DOI: 10.1002/gepi.22610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 09/18/2024] [Accepted: 12/17/2024] [Indexed: 01/16/2025]
Abstract
Integrating multi-omics data may help researchers understand the genetic underpinnings of complex traits and diseases. However, the best ways to integrate multi-omics data and use them to address pressing scientific questions remain a challenge. One important and topical problem is how to assess the aggregate effect of multiple genomic data types (e.g. genotypes and gene expression levels) on a phenotype, particularly while accommodating routine issues, such as having related subjects' data in analyses. In this paper, we extend an existing composite kernel machine regression model to integrate two multi-omics data types, while accommodating for general correlation structures amongst outcomes. Due to the kernel machine regression framework, our methods allow for the integration of high-dimensional omics data with small, nonlinear, and interactive effects, and accommodation of general study designs. Here, we focus on scientific questions that aim to assess the association between a functional grouping (such as a gene or a pathway) and a quantitative trait of interest. We use a kernel machine regression to integrate the two multi-omics data types, as they may relate to the trait, and perform a global test of association. We demonstrate the advantage of this approach over single data type association tests via simulation. Finally, we apply this method to a large, multi-ethnic data set to investigate how predicted gene expression and rare genetic variation may be related to two platelet traits.
Collapse
Grants
- U.S. Department of Health and Human Services, National Institute on Minority Health and Health Disparities, National Institutes of Health, National Human Genome Research Institute, National Center for Research Resources, COPD Foundation, National Heart, Lung, and Blood Institute, National Science Foundation, National Institute on Aging, and National Institute of Neurological Disorders and Stroke.
Collapse
Affiliation(s)
- Amarise Little
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Anna Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Angela Zhang
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Wodan Ling
- Department of Population Health Sciences, Division of Biostatistics, Weill Cornell Medicine, New York, New York, USA
| | - Florian Thibord
- National Heart, Lung, and Blood Institute, Boston University's Framingham Heart Study, Framingham, Massachusetts, USA
- National Heart, Lung and Blood Institute, Population Sciences Branch, Framingham, Massachusetts, USA
| | - Andrew D Johnson
- National Heart, Lung, and Blood Institute, Boston University's Framingham Heart Study, Framingham, Massachusetts, USA
- National Heart, Lung and Blood Institute, Population Sciences Branch, Framingham, Massachusetts, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Joanne E Curran
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, Texas, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, Texas, USA
| | - John Blangero
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, Texas, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, Texas, USA
| | | | - Huichun Xu
- Department of Medicine, University of Maryland, Baltimore, Maryland, USA
| | - Jerome I Rotter
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, California, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, USA
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Ming-Huei Chen
- National Heart, Lung, and Blood Institute, Boston University's Framingham Heart Study, Framingham, Massachusetts, USA
- National Heart, Lung and Blood Institute, Population Sciences Branch, Framingham, Massachusetts, USA
| | - Alexander Reiner
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Charles Kooperberg
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Thao Vu
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Myriam Fornage
- Brown Foundation Institute for Molecular Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Eimear Kenny
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- The Center for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Rasika Mathias
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Lewis Becker
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Albert V Smith
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, USA
| | - Eric Boerwinkle
- Department of Epidemiology, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Bing Yu
- Department of Epidemiology, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Timothy Thornton
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Michael C Wu
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| |
Collapse
|
15
|
Mansouri S, Rochette M, Labonté B, Zhang Q, Chen TH. A Novel Statistical Method for Unmasking Sex-Specific Genomics Signatures in Complex Traits. Genet Epidemiol 2025; 49:e22612. [PMID: 39821553 DOI: 10.1002/gepi.22612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 12/10/2024] [Accepted: 01/04/2025] [Indexed: 01/19/2025]
Abstract
Genotype-phenotype association studies have advanced our understanding of complex traits but often overlook sex-specific genetic signals. The growing awareness of sex-specific influences on human traits and diseases necessitates tailored statistical methodologies to dissect these genetic intricacies. Rare genetic variants play a significant role in disease development, often exhibiting stronger per-allele effects than common variants. In sex-dimorphic analysis, traits are viewed as having two sex-specific subsets rather than being uniformly defined. Existing methods for gene-based analysis of rare variants across multiple traits can identify shared genetic signals but cannot reveal the specific subsets from which significant signals originate. This means that when a significant signal is detected, it remains unclear whether it arises from the male samples, female samples, or both. To address this limitation, we propose SubsetRV, a new methodology capable of identifying genes associated with specific traits or diseases in males, females, or both. SubsetRV can also be applied to broader applications in multiple traits analysis. Simulation studies have demonstrated SubsetRV's reliability, and real data analysis on bipolar disorder and schizophrenia has revealed potential sex-specific genetic signals. SubsetRV offers a valuable tool for identifying sex-specific genetic candidates, aiding in understanding disease mechanisms. An R package for SubsetRV is available on GitHub. It can be accessed directly through this link: https://github.com/Mansouri-S/SubsetRV.
Collapse
Affiliation(s)
- Samaneh Mansouri
- Department of Social and Preventive Medicine, Faculty of Medicine, Université Laval, Québec City, Québec, Canada
- CERVO Brain Research Centre, Québec City, Québec, Canada
- Department of Mathematics and Statistics, Université Laval, Québec City, Québec, Canada
| | - Mélissa Rochette
- Department of Mathematics and Statistics, Université Laval, Québec City, Québec, Canada
| | - Benoit Labonté
- CERVO Brain Research Centre, Québec City, Québec, Canada
- Department of Psychiatry and Neurosciences, Faculty of Medicine, Université Laval, Québec City, Québec, Canada
| | - Qingrun Zhang
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| | - Ting-Huei Chen
- CERVO Brain Research Centre, Québec City, Québec, Canada
- Department of Mathematics and Statistics, Université Laval, Québec City, Québec, Canada
| |
Collapse
|
16
|
Wang K, Alberding SY. Powerful Test of Heterogeneity in Two-Sample Summary-Data Mendelian Randomization. Stat Med 2024; 43:5791-5802. [PMID: 39552275 PMCID: PMC11639658 DOI: 10.1002/sim.10279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 07/30/2024] [Accepted: 10/26/2024] [Indexed: 11/19/2024]
Abstract
BACKGROUND The success of a Mendelian randomization (MR) study critically depends on the validity of the assumptions underlying MR. We focus on detecting heterogeneity (also known as horizontal pleiotropy) in two-sample summary-data MR. A popular approach is to apply Cochran'sQ $$ Q $$ statistic method, developed for meta-analysis. However, Cochran'sQ $$ Q $$ statistic, including its modifications, is known to lack power when its degrees of freedom are large. Furthermore, there is no theoretical justification for the claimed null distribution of the minimum of the modified Cochran'sQ $$ Q $$ statistic with exact weighting (Q min $$ {Q}_{\mathrm{min}} $$ ), although it seems to perform well in simulation studies. METHOD The principle of our proposed method is straightforward: if a set of variables are valid instruments, then any linear combination of these variables is still a valid instrument. Specifically, this principle holds when these linear combinations are formed using eigenvectors derived from a variance matrix. Each linear combination follows a known normal distribution from which ap $$ p $$ value can be calculated. We use the minimump $$ p $$ value for these eigenvector-based linear combinations as the test statistic. Additionally, we explore a modification of the modified Cochran'sQ $$ Q $$ statistic by replacing the weighting matrix with a truncated singular value decomposition. RESULTS Extensive simulation studies reveal that the proposed methods outperform Cochran'sQ $$ Q $$ statistic, including those with modified weights, and MR-PRESSO, another popular method for detecting heterogeneity, in cases where the number of instruments is not large or the Wald ratios take two values. We also demonstrate these methods using empirical examples. Furthermore, we show thatQ min $$ {Q}_{\mathrm{min}} $$ does not follow, but is dominated by, the claimed null chi-square distribution. The proposed methods are implemented in an R package iGasso. CONCLUSIONS Dimension reduction techniques are useful for generating powerful tests of heterogeneity in MR.
Collapse
Affiliation(s)
- Kai Wang
- Department of BiostatisticsUniversity of IowaIowa CityIowaUSA
| | | |
Collapse
|
17
|
Samorodnitsky S, Campbell K, Little A, Ling W, Zhao N, Chen YC, Wu MC. Detecting Clinically Relevant Topological Structures in Multiplexed Spatial Proteomics Imaging Using TopKAT. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.18.628976. [PMID: 39764056 PMCID: PMC11702633 DOI: 10.1101/2024.12.18.628976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
Novel multiplexed spatial proteomics imaging platforms expose the spatial architecture of cells in the tumor microenvironment (TME). The diverse cell population in the TME, including its spatial context, has been shown to have important clinical implications, correlating with disease prognosis and treatment response. The accelerating implementation of spatial proteomic technologies motivates new statistical models to test if cell-level images associate with patient-level endpoints. Few existing methods can robustly characterize the geometry of the spatial arrangement of cells and also yield both a valid and powerful test for association with patient-level outcomes. We propose a topology-based approach that combines persistent homology with kernel testing to determine if topological structures created by cells predict continuous, binary, or survival clinical endpoints. We term our method TopKAT (Topological Kernel Association Test) and show that it can be more powerful than statistical tests grounded in the spatial point process model, particularly when cells arise along the boundary of a ring. We demonstrate the properties of TopKAT through simulation studies and apply it to two studies of triple negative breast cancer where we show that TopKAT recovers clinically relevant topological structures in the spatial distribution of immune and tumor cells.
Collapse
Affiliation(s)
- Sarah Samorodnitsky
- Public Health Sciences Division, Fred Hutchinson Cancer Center
- SWOG Statistics and Data Management Center
| | - Katie Campbell
- Medicine, Division of Hematology/Oncology, University of California Los Angeles
| | - Amarise Little
- Public Health Sciences Division, Fred Hutchinson Cancer Center
- SWOG Statistics and Data Management Center
| | - Wodan Ling
- Population Health Sciences, Weill Cornell Medical College
| | - Ni Zhao
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University
| | - Yen-Chi Chen
- Department of Statistics, University of Washington
| | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Center
- SWOG Statistics and Data Management Center
| |
Collapse
|
18
|
Lu Y, Pierce BL, Wang P, Yang F, Chen LS. Alternative splicing induces sample-level variation in gene-gene correlations. BMC Genomics 2024; 23:867. [PMID: 39658796 PMCID: PMC11633002 DOI: 10.1186/s12864-024-11118-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 12/02/2024] [Indexed: 12/12/2024] Open
Abstract
BACKGROUND The vast majority of genes in the genome are multi-exonic, and are alternatively spliced during transcription, resulting in multiple isoforms for each gene. For some genes, different mRNA isoforms may have differential expression levels or be involved in different pathways. Bulk tissue RNA-seq, as a widely used technology for transcriptome quantification, measures the total expression (TE) levels of each gene across multiple isoforms in multiple cell types for each tissue sample. With recent developments in precise quantification of alternative splicing events for each gene, we propose to study the effects of alternative splicing variation on gene-gene correlation effects. We adopted a variance-component model for testing the TE-TE correlations of one gene with a co-expressed gene, accounting for the effects of splicing variation and splicing-by-TE interaction of one gene on the other. RESULTS We analyzed data from the Genotype-Tissue Expression (GTEx) project (V8). At the 5% FDR level, 38,146 pairs of genes out of ∼10 M examined pairs from GTEx lung tissue showed significant TE-splicing interaction effects, implying isoform-specific and/or sample-specific TE-TE correlations. Additional analysis across 13 GTEx brain tissues revealed strong tissue-specificity of TE-splicing interaction effects. Moreover, we showed that accounting for splicing variation across samples could improve the reproducibility of results and could reduce potential confounding effects in studying co-expressed gene pairs with bulk tissue data. Many of those gene pairs had correlation effects specific to only certain isoforms and would otherwise be undetected. By analyzing gene-gene co-expression variation within functional pathways accounting for splicing, we characterized the patterns of the "hub" genes with isoform-specific regulatory effects on multiple other genes. CONCLUSIONS We showed that splicing variation of a gene may interact with TE of the gene and affect the TE of co-expressed genes, resulting in substantial tissue-specific inter-sample variability in gene-gene correlation effects. Accounting for TE-splicing interaction effects could reduce potential confounding effects and improve the robustness of estimation when estimating gene-gene correlations from bulk tissue expression data.
Collapse
Affiliation(s)
- Yihao Lu
- Department of Public Health Sciences, University of Chicago, 5841 South Maryland Ave, MC2000, Chicago, IL, 60637, USA
| | - Brandon L Pierce
- Department of Public Health Sciences, University of Chicago, 5841 South Maryland Ave, MC2000, Chicago, IL, 60637, USA
- Department of Human Genetics, University of Chicago, 920 E 58Th St, Chicago, IL, 60637, USA
| | - Pei Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 770 Lexington Ave, New York, NY, 10065, USA
| | - Fan Yang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver, 13001 E. 17Th Place, Aurora, CO, 80045, USA
| | - Lin S Chen
- Department of Public Health Sciences, University of Chicago, 5841 South Maryland Ave, MC2000, Chicago, IL, 60637, USA.
| |
Collapse
|
19
|
Das A, Lakhani C, Terwagne C, Lin JST, Naito T, Raj T, Knowles DA. Leveraging functional annotations to map rare variants associated with Alzheimer's disease with gruyere. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.12.06.24318577. [PMID: 39677477 PMCID: PMC11643288 DOI: 10.1101/2024.12.06.24318577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
The increasing availability of whole-genome sequencing (WGS) has begun to elucidate the contribution of rare variants (RVs), both coding and non-coding, to complex disease. Multiple RV association tests are available to study the relationship between genotype and phenotype, but most are restricted to per-gene models and do not fully leverage the availability of variant-level functional annotations. We propose Genome-wide Rare Variant EnRichment Evaluation (gruyere), a Bayesian probabilistic model that complements existing methods by learning global, trait-specific weights for functional annotations to improve variant prioritization. We apply gruyere to WGS data from the Alzheimer's Disease (AD) Sequencing Project, consisting of 7,966 cases and 13,412 controls, to identify AD-associated genes and annotations. Growing evidence suggests that disruption of microglial regulation is a key contributor to AD risk, yet existing methods have not had sufficient power to examine rare non-coding effects that incorporate such cell-type specific information. To address this gap, we 1) use predicted enhancer and promoter regions in microglia and other potentially relevant cell types (oligodendrocytes, astrocytes, and neurons) to define per-gene non-coding RV test sets and 2) include cell-type specific variant effect predictions (VEPs) as functional annotations. gruyere identifies 15 significant genetic associations not detected by other RV methods and finds deep learning-based VEPs for splicing, transcription factor binding, and chromatin state are highly predictive of functional non-coding RVs. Our study establishes a novel and robust framework incorporating functional annotations, coding RVs, and cell-type associated non-coding RVs, to perform genome-wide association tests, uncovering AD-relevant genes and annotations.
Collapse
Affiliation(s)
- Anjali Das
- Computer Science, Columbia University, New York, NY, USA
- New York Genome Center, New York,NY, USA
| | | | | | | | - Tatsuhiko Naito
- New York Genome Center, New York,NY, USA
- Neuroscience, Icahn School of Medicine, Mount Sinai, New York, NY, USA
| | - Towfique Raj
- Neuroscience, Icahn School of Medicine, Mount Sinai, New York, NY, USA
| | - David A. Knowles
- Computer Science, Columbia University, New York, NY, USA
- New York Genome Center, New York,NY, USA
- Systems Biology, Columbia University, New York, NY, USA
- Data Science Institute, Columbia University, New York, NY, USA
| |
Collapse
|
20
|
Yu D, Koslovsky M, Steiner MC, Mohammadi K, Zhang C, Swartz MD. TRIO RVEMVS: A Bayesian framework for rare variant association analysis with expectation-maximization variable selection using family trio data. PLoS One 2024; 19:e0314502. [PMID: 39630689 PMCID: PMC11616829 DOI: 10.1371/journal.pone.0314502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 11/11/2024] [Indexed: 12/07/2024] Open
Abstract
It is commonly reported that rare variants may be more functionally related to complex diseases than common variants. However, individual rare variant association tests remain challenging due to low minor allele frequency in the available samples. This paper proposes an expectation maximization variable selection (EMVS) method to simultaneously detect common and rare variants at the individual variant level using family trio data. TRIO_RVEMVS was assessed in both large (1500 families) and small (350 families) datasets based on simulation. The performance of TRIO_RVEMVS was compared with gene-level kernel and burden association tests that use pedigree data (PedGene) and rare-variant extensions of the transmission disequilibrium test (RV-TDT). At the region level, TRIO_RVEMVS outperformed PedGene and RV-TDT when common variants were included. TRIO_RVEMVS performed competitively with PedGene and outperformed RV-TDT when the analysis was only restricted to rare variants. At the individual variants level, with 1,500 trios, the average true positive rate of individual rare variants that were polymorphic across 500 datasets was 12.20%, and the average false positive rate was 0.74%. In the datasets with 350 trios, the average true and false positive rates of individual rare variants were 13.10% and 1.30%, respectively. When applying TRIO_RVEMVS to real data from the Gabriella Miller Kids First Pediatric Research Program, it identified 3 rare variants in q24.21 and q24.22 associated with the risk of orofacial clefts in the Kids First European population.
Collapse
Affiliation(s)
- Duo Yu
- Division of Biostatistics, Data Science Institute, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Matthew Koslovsky
- Department of Statistics, Colorado State University, Fort Collins, Colorado, United States of America
| | - Margaret C. Steiner
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Kusha Mohammadi
- Department of Biostatistics and Data Management, Regeneron Pharmaceuticals, Inc., Tarrytown, New York, United States of America
| | - Chenguang Zhang
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, Pennsylvania, United States of America
| | - Michael D. Swartz
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
21
|
Bass AJ, Cutler DJ, Epstein MP. A powerful framework for differential co-expression analysis of general risk factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.29.626006. [PMID: 39677786 PMCID: PMC11642831 DOI: 10.1101/2024.11.29.626006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Differential co-expression analysis (DCA) aims to identify genes in a pathway whose shared expression depends on a risk factor. While DCA provides insights into the biological activity of diseases, existing methods are limited to categorical risk factors and/or suffer from bias due to batch and variance-specific effects. We propose a new framework, Kernel-based Differential Co-expression Analysis (KDCA), that harnesses correlation patterns between genes in a pathway to detect differential co-expression arising from general (i.e., continuous, discrete, or categorical) risk factors. Using various simulated pathway architectures, we find that KDCA accounts for common sources of bias to control the type I error rate while substantially increasing the power compared to the standard eigengene approach. We then applied KDCA to The Cancer Genome Atlas thyroid data set and found several differentially co-expressed pathways by age of diagnosis and BRAF mutation status that were undetected by the eigengene method. Collectively, our results demonstrate that KDCA is a powerful testing framework that expands DCA applications in expression studies.
Collapse
Affiliation(s)
- Andrew J. Bass
- Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, UK
| | - David J. Cutler
- Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, UK
| | | |
Collapse
|
22
|
An M, Chen C, Xiang J, Li Y, Qiu P, Tang Y, Liu X, Gu Y, Qin N, He Y, Zhu M, Jiang Y, Dai J, Jin G, Ma H, Wang C, Hu Z, Shen H. Systematic identification of pathogenic variants of non-small cell lung cancer in the promoters of DNA-damage repair genes. EBioMedicine 2024; 110:105480. [PMID: 39631147 DOI: 10.1016/j.ebiom.2024.105480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 11/11/2024] [Accepted: 11/14/2024] [Indexed: 12/07/2024] Open
Abstract
BACKGROUND Deficiency in DNA-damage repair (DDR) genes, often due to disruptive coding variants, is linked to higher cancer risk. Our previous study has revealed the association between rare loss-of-function variants in DDR genes and the risk of lung cancer. However, it is still challenging to study the predisposing role of rare regulatory variants of these genes. METHODS Based on whole-genome sequencing data from 2984 patients with non-small cell lung cancer (NSCLC) and 3020 controls, we performed massively parallel reporter assays on 1818 rare variants located in the promoters of DDR genes. Pathway- or gene-level burden analyses were performed using Firth's logistic regression or generalized linear model. FINDINGS We identified 750 rare functional regulatory variants (frVars) that showed allelic differences in transcriptional activity within the promoter regions of DDR genes. Interestingly, the burden of frVars was significantly elevated in cases (odds ratio [OR] = 1.17, p = 0.026), whereas the burden of variants prioritized solely based on bioinformatics annotation was comparable between cases and controls (OR = 1.04, p = 0.549). Among the frVars, 297 were down-regulated transcriptional activity (dr-frVars) and 453 were up-regulated transcriptional activity (ur-frVars); especially, dr-frVars (OR = 1.30, p = 0.008) rather than ur-frVars (OR = 1.06, p = 0.495) were significantly associated with risk of NSCLC. Individuals with NSCLC carried more dr-frVars from Fanconi anemia, homologous recombination, and nucleotide excision repair pathways. In addition, we identified seven genes (i.e., BRCA2, GTF2H1, DDB2, BLM, ALKBH2, APEX1, and RAD51B) with promoter dr-frVars that were associated with lung cancer susceptibility. INTERPRETATION Our findings indicate that functional promoter variants in DDR genes, in addition to protein-truncating variants, can be pathogenic and contribute to lung cancer susceptibility. FUNDING National Natural Science Foundation of China, Youth Foundation of Jiangsu Province, Research Unit of Prospective Cohort of Cardiovascular Diseases and Cancer of Chinese Academy of Medical Sciences, and Natural Science Foundation of Jiangsu Province.
Collapse
Affiliation(s)
- Mingxing An
- Department of Epidemiology, School of Public Health, Southeast University, Nanjing, Jiangsu, China; Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Congcong Chen
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; The Second People's Hospital of Changzhou, The Third Affiliated Hospital of Nanjing Medical University, Changzhou Medical Center, Nanjing Medical University, Changzhou 213003, China
| | - Jun Xiang
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Yang Li
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Pinyu Qiu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Yiru Tang
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Xinyue Liu
- State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Yayun Gu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Na Qin
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Yuanlin He
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Meng Zhu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Yue Jiang
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Juncheng Dai
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Guangfu Jin
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Hongxia Ma
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Research Units of Cohort Study on Cardiovascular Diseases and Cancers, Chinese Academy of Medical Sciences, Beijing 100730, China
| | - Cheng Wang
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China; The Second People's Hospital of Changzhou, The Third Affiliated Hospital of Nanjing Medical University, Changzhou Medical Center, Nanjing Medical University, Changzhou 213003, China.
| | - Zhibin Hu
- Department of Epidemiology, School of Public Health, Southeast University, Nanjing, Jiangsu, China; Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.
| | - Hongbing Shen
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Research Units of Cohort Study on Cardiovascular Diseases and Cancers, Chinese Academy of Medical Sciences, Beijing 100730, China.
| |
Collapse
|
23
|
Allaire P, Mayer J, Moat L, Gabor R, Shay JW, He J, Zeng C, Bastarache L, Hebbring S. Long-telomeropathy is associated with tumor predisposition syndrome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.11.26.24318007. [PMID: 39649603 PMCID: PMC11623752 DOI: 10.1101/2024.11.26.24318007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Telomeres protect chromosomal integrity, and telomere length (TL) is influenced by environmental and genetic factors. While short-telomeres are linked to rare telomeropathies, this study explored the hypothesis that a "long-telomeropathy" is associated with a cancer-predisposing syndrome. Using genomic and health data from 113,861 individuals, a trans-ancestry polygenic risk score for TL (PRS TL ) was developed. A phenome-wide association study (PheWAS) identified 65 tumor traits linked to elevated PRS TL . Using this result, a trans-ancestry phenotype risk score for a long-TL (PheRS LTL ) was develop and validated. Rare variant analyses revealed 13 genes associated with PheRS LTL . Individuals who were carriers of these rare variants had a predisposition for long-TL validating original hypothesis. Most of these genes were new to both cancer and telomere biology. In conclusion, this study identified a novel tumor-predisposing syndrome shaped by both common and rare genetic variants, broadening the understanding of telomeropathies to those with a predisposition for long telomeres.
Collapse
|
24
|
He M, Zhao N. A Mixed Effect Similarity Matrix Regression Model (SMRmix) for Integrating Multiple Microbiome Datasets at Community Level. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.10.584315. [PMID: 38559012 PMCID: PMC10979838 DOI: 10.1101/2024.03.10.584315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
BACKGROUND Recent studies have highlighted the importance of human microbiota in our health and diseases. However, in many areas of research, individual microbiome studies often offer inconsistent results due to the limited sample sizes and the heterogeneity in study populations and experimental procedures. This inconsistency underscores the necessity for integrative analysis of multiple microbiome datasets. Despite the critical need, statistical methods that incorporate multiple microbiome datasets and account for the study heterogeneity are not available in the literature. METHODS In this paper, we develop a mixed effect similarity matrix regression (SMRmix) approach for identifying community level microbiome shifts between outcomes. SMRmix has a close connection with the microbiome kernel association test, one of the most popular approaches for such a task but is only applicable when we have a single study. SMRmix enables researchers to consolidate findings from diverse microbiome studies. RESULTS Via extensive simulations, we show that SMRmix has well-controlled type I error and higher power than some potential competitors. We applied the SMRmix to two real-world datasets. The first, from the HIV-reanalysis consortium, integrated data from 17 studies on gut dysbiosis in HIV. Our analysis confirmed consistent associations between the gut microbiome and HIV infection as well as MSM (men who have sex with men) status, demonstrating greater power than competing methods. The second dataset involved 11 studies on the gut microbiome in colorectal cancer; analysis with SMRmix confirmed significant dysbiosis in affected individuals compared to healthy controls. CONCLUSION The development of SMRmix enables the integration of multiple studies and effectively managing study heterogeneity, and provides a powerful tool for uncovering consistent associations between diseases and community-level microbiome data.
Collapse
|
25
|
Qin F, Luo X, Lu Q, Cai B, Xiao F, Cai G. Spatial pattern and differential expression analysis with spatial transcriptomic data. Nucleic Acids Res 2024; 52:e101. [PMID: 39470725 PMCID: PMC11602167 DOI: 10.1093/nar/gkae962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 10/03/2024] [Accepted: 10/11/2024] [Indexed: 10/30/2024] Open
Abstract
The emergence of spatial transcriptomic technologies has opened new avenues for investigating gene activities while preserving the spatial context of tissues. Utilizing data generated by such technologies, the identification of spatially variable (SV) genes is an essential step in exploring tissue landscapes and biological processes. Particularly in typical experimental designs, such as case-control or longitudinal studies, identifying SV genes between groups is crucial for discovering significant biomarkers or developing targeted therapies for diseases. However, current methods available for analyzing spatial transcriptomic data are still in their infancy, and none of the existing methods are capable of identifying SV genes between groups. To overcome this challenge, we developed SPADE for spatial pattern and differential expression analysis to identify SV genes in spatial transcriptomic data. SPADE is based on a machine learning model of Gaussian process regression with a gene-specific Gaussian kernel, enabling the detection of SV genes both within and between groups. Through benchmarking against existing methods in extensive simulations and real data analyses, we demonstrated the preferred performance of SPADE in detecting SV genes within and between groups. The SPADE source code and documentation are publicly available at https://github.com/thecailab/SPADE.
Collapse
Affiliation(s)
- Fei Qin
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, 921 Assembly Street, Columbia, SC, 29208, USA
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850, USA
| | - Xizhi Luo
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, 921 Assembly Street, Columbia, SC, 29208, USA
- Data and Statistical Sciences, AbbVie Inc., 1 N. Waukegan Road, North Chicago, IL, 60064, USA
| | - Qing Lu
- Department of Biostatistics, College of Public Health and Health Professions and College of Medicine, University of Florida, 2004 Mowry Rd., Gainesville, FL, 32608, USA
| | - Bo Cai
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, 921 Assembly Street, Columbia, SC, 29208, USA
| | - Feifei Xiao
- Department of Biostatistics, College of Public Health and Health Professions and College of Medicine, University of Florida, 2004 Mowry Rd., Gainesville, FL, 32608, USA
| | - Guoshuai Cai
- Department of Biostatistics, College of Public Health and Health Professions and College of Medicine, University of Florida, 2004 Mowry Rd., Gainesville, FL, 32608, USA
- Department of Surgery, College of Medicine, University of Florida, 1600 SW Archer Rd., Gainesville, FL, 32610, USA
| |
Collapse
|
26
|
Sun WX, Chang XY, Chen Y, Zhao Q, Zhang YM. The integration of quantile regression with 3VmrMLM identifies more QTNs and QTN-by-environment interactions using SNP- and haplotype-based markers. PLANT COMMUNICATIONS 2024:101196. [PMID: 39580620 DOI: 10.1016/j.xplc.2024.101196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 10/11/2024] [Accepted: 11/20/2024] [Indexed: 11/26/2024]
Abstract
Current methods used in genome-wide association studies frequently lack power owing to their inability to detect heterogeneous associations and rare and multiallelic variants. To address these issues, quantile regression is integrated with a three (compressed) variance component multi-locus random-SNP-effect mixed linear model (3VmrMLM) to propose q3VmrMLM for detecting heterogeneous quantitative trait nucleotides (QTNs) and QTN-by-environment interactions (QEIs), and then design haplotype-based q3VmrMLM (q3VmrMLM-Hap) for identifying multiallelic haplotypes and rare variants. In Monte Carlo simulation studies, q3VmrMLM had higher power than 3VmrMLM, sequence kernel association test (SKAT), and integrated quantile rank test (iQRAT). In a re-analysis of 10 traits in 1439 rice hybrids, 261 known genes were identified only by q3VmrMLM and q3VmrMLM-Hap, whereas 175 known genes were detected by both the new and existing methods. Of all the significant QTNs with known genes, q3VmrMLM (179: 140 variance heterogeneity and 157 quantile effect heterogeneity) found more heterogeneous QTNs than 3VmrMLM (123), SKAT (27), and iQRAT (29); q3VmrMLM-Hap (121) mapped more low-frequency (<0.05) QTNs than q3VmrMLM (51), 3VmrMLM (43), SKAT (11), and iQRAT (12); and q3VmrMLM-Hap (12), q3VmrMLM (16), and 3VmrMLM (12) had similar power in identifying gene-by-environment interactions. All significant and suggested QTNs achieved the highest predictive accuracy (r = 0.9045). In conclusion, this study describes a new and complementary approach to mining genes and unraveling the genetic architecture of complex traits in crops.
Collapse
Affiliation(s)
- Wen-Xian Sun
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Xiao-Yu Chang
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Ying Chen
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Qiong Zhao
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuan-Ming Zhang
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
27
|
Ghose U, Sproviero W, Winchester L, Amin N, Zhu T, Newby D, Ulm BS, Papathanasiou A, Shi L, Liu Q, Fernandes M, Adams C, Albukhari A, Almansouri M, Choudhry H, van Duijn C, Nevado-Holgado A. Genome-wide association neural networks identify genes linked to family history of Alzheimer's disease. Brief Bioinform 2024; 26:bbae704. [PMID: 39775791 PMCID: PMC11707606 DOI: 10.1093/bib/bbae704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 11/29/2024] [Accepted: 12/23/2024] [Indexed: 01/11/2025] Open
Abstract
Augmenting traditional genome-wide association studies (GWAS) with advanced machine learning algorithms can allow the detection of novel signals in available cohorts. We introduce "genome-wide association neural networks (GWANN)" a novel approach that uses neural networks (NNs) to perform a gene-level association study with family history of Alzheimer's disease (AD). In UK Biobank, we defined cases (n = 42 110) as those with AD or family history of AD and sampled an equal number of controls. The data was split into an 80:20 ratio of training and testing samples, and GWANN was trained on the former followed by identifying associated genes using its performance on the latter. Our method identified 18 genes to be associated with family history of AD. APOE, BIN1, SORL1, ADAM10, APH1B, and SPI1 have been identified by previous AD GWAS. Among the 12 new genes, PCDH9, NRG3, ROR1, LINGO2, SMYD3, and LRRC7 have been associated with neurofibrillary tangles or phosphorylated tau in previous studies. Furthermore, there is evidence for differential transcriptomic or proteomic expression between AD and healthy brains for 10 of the 12 new genes. A series of post hoc analyses resulted in a significantly enriched protein-protein interaction network (P-value < 1 × 10-16), and enrichment of relevant disease and biological pathways such as focal adhesion (P-value = 1 × 10-4), extracellular matrix organization (P-value = 1 × 10-4), Hippo signaling (P-value = 7 × 10-4), Alzheimer's disease (P-value = 3 × 10-4), and impaired cognition (P-value = 4 × 10-3). Applying NNs for GWAS illustrates their potential to complement existing algorithms and methods and enable the discovery of new associations without the need to expand existing cohorts.
Collapse
Affiliation(s)
- Upamanyu Ghose
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
| | - William Sproviero
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Laura Winchester
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Najaf Amin
- Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
| | - Taiyu Zhu
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Danielle Newby
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- Centre for Statistics in Medicine, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, United Kingdom
| | - Brittany S Ulm
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
| | | | - Liu Shi
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- Department of Translational Medicine, Nxera Pharma UK Limited, Cambridge, United Kingdom
| | - Qiang Liu
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- School of Engineering Mathematics and Technology University of Bristol, Ada Lovelace Building, Bristol, United Kingdom
| | - Marco Fernandes
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- School of Medicine, University of St Andrews, St Andrews, United Kingdom
| | - Cassandra Adams
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Centre for Medicines Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Ashwag Albukhari
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Biochemistry Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Majid Almansouri
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Clinical Biochemistry Department, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hani Choudhry
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Biochemistry Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Cornelia van Duijn
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
- Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
| | - Alejo Nevado-Holgado
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
- King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia
| |
Collapse
|
28
|
Tanigawa Y, Kellis M. Hypometric genetics: Improved power in genetic discovery by incorporating quality control flags. Am J Hum Genet 2024; 111:2478-2493. [PMID: 39442521 PMCID: PMC11568753 DOI: 10.1016/j.ajhg.2024.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 09/26/2024] [Accepted: 09/27/2024] [Indexed: 10/25/2024] Open
Abstract
Balancing the tradeoff between quantity and quality of phenotypic data is critical in omics studies. Measurements below the limit of quantification (BLQ) are often tagged in quality control fields, but these flags are currently underutilized in human genetics studies. Extreme phenotype sampling is advantageous for mapping rare variant effects. We hypothesize that genetic drivers, along with environmental and technical factors, contribute to the presence of BLQ flags. Here, we introduce "hypometric genetics" (hMG) analysis and uncover a genetic basis for BLQ flags, indicating an additional source of genetic signal for genetic discovery, especially from phenotypic extremes. Applying our hMG approach to n = 227,469 UK Biobank individuals with metabolomic profiles, we reveal more than 5% heritability for BLQ flags and report biologically relevant associations, for example, at APOC3, APOA5, and PDE3B loci. For common variants, polygenic scores trained only for BLQ flags predict the corresponding quantitative traits with 91% accuracy, validating the genetic basis. For rare coding variant associations, we find an asymmetric 65.4% higher enrichment of metabolite-lowering associations for BLQ flags, highlighting the impact of putative loss-of-function variants with large effects on phenotypic extremes. Joint analysis of binarized BLQ flags and the corresponding quantitative metabolite measurements improves power in Bayesian rare variant aggregation tests, resulting in an average of 181% more prioritized genes. Our approach is broadly applicable to omics profiling. Overall, our results underscore the benefit of integrating quality control flags and quantitative measurements and highlight the advantage of joint analysis of population-based samples and phenotypic extremes in human genetics studies.
Collapse
Affiliation(s)
- Yosuke Tanigawa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
29
|
Li MD, Liu Q, Shi X, Wang Y, Zhu Z, Guan Y, He J, Han H, Mao Y, Ma Y, Yuan W, Yao J, Yang Z. Integrative analysis of genetics, epigenetics and RNA expression data reveal three susceptibility loci for smoking behavior in Chinese Han population. Mol Psychiatry 2024; 29:3516-3526. [PMID: 38789676 DOI: 10.1038/s41380-024-02599-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 04/18/2024] [Accepted: 05/03/2024] [Indexed: 05/26/2024]
Abstract
Despite numerous studies demonstrate that genetics and epigenetics factors play important roles on smoking behavior, our understanding of their functional relevance and coordinated regulation remains largely unknown. Here we present a multiomics study on smoking behavior for Chinese smoker population with the goal of not only identifying smoking-associated functional variants but also deciphering the pathogenesis and mechanism underlying smoking behavior in this under-studied ethnic population. After whole-genome sequencing analysis of 1329 Chinese Han male samples in discovery phase and OpenArray analysis of 3744 samples in replication phase, we discovered that three novel variants located near FOXP1 (rs7635815), and between DGCR6 and PRODH (rs796774020), and in ARVCF (rs148582811) were significantly associated with smoking behavior. Subsequently cis-mQTL and cis-eQTL analysis indicated that these variants correlated significantly with the differential methylation regions (DMRs) or differential expressed genes (DEGs) located in the regions where these variants present. Finally, our in silico multiomics analysis revealed several hub genes, like DRD2, PTPRD, FOXP1, COMT, CTNNAP2, to be synergistic regulated each other in the etiology of smoking.
Collapse
Affiliation(s)
- Ming D Li
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.
- Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China.
| | - Qiang Liu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaoqiang Shi
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yan Wang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhouhai Zhu
- Joint Institute of Tobacco and Health, Kunming, Yunnan, China
| | - Ying Guan
- Joint Institute of Tobacco and Health, Kunming, Yunnan, China
| | - Jingmin He
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
- College of Biological Sciences, Shanxi Agricultural University, Taigu, Shanxi, China
| | - Haijun Han
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Ying Mao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yunlong Ma
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Wenji Yuan
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jianhua Yao
- Joint Institute of Tobacco and Health, Kunming, Yunnan, China
| | - Zhongli Yang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.
| |
Collapse
|
30
|
Camino-Mera A, Pardo-Seco J, Bello X, Argiz L, Boyle RJ, Custovic A, Herberg J, Kaforou M, Arasi S, Fiocchi A, Pecora V, Barni S, Mori F, Bracamonte T, Echeverria L, O'Valle-Aísa V, Hernández-Martínez NL, Carballeira I, García E, Garcia-Magan C, Moure-González JD, Gonzalez-Delgado P, Garriga-Baraut T, Infante S, Zambrano-Ibarra G, Tomás-Pérez M, Machinena A, Pascal M, Prieto A, Vázquez-Cortes S, Fernández-Rivas M, Vila L, Alsina L, Torres MJ, Mangone G, Quirce S, Martinón-Torres F, Vázquez-Ortiz M, Gómez-Carballa A, Salas A. Whole Exome Sequencing Identifies Epithelial and Immune Dysfunction-Related Biomarkers in Food Protein-Induced Enterocolitis Syndrome. Clin Exp Allergy 2024; 54:919-929. [PMID: 39348862 DOI: 10.1111/cea.14564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 09/01/2024] [Indexed: 10/02/2024]
Abstract
BACKGROUND Food protein-induced enterocolitis syndrome (FPIES) is a food allergy primarily affecting infants, often leading to vomiting and shock. Due to its poorly understood pathophysiology and lack of specific biomarkers, diagnosis is frequently delayed. Understanding FPIES genetics can shed light on disease susceptibility and pathophysiology-key to developing diagnostic, prognostic, preventive and therapeutic strategies. Using a well-characterised cohort of patients we explored the potential genome-wide susceptibility factors underlying FPIES. METHODS Blood samples from 41 patients with oral food challenge-proven FPIES were collected for a comprehensive whole exome sequencing association study. RESULTS Notable genetic variants, including rs872786 (RBM8A), rs2241880 (ATG16L1) and rs2289477 (ATG16L1), were identified as significant findings in FPIES. A weighted SKAT model identified six other associated genes including DGKZ and SIRPA. DGKZ induces TGF-β signalling, crucial for epithelial barrier integrity and IgA production; RBM8A is associated with thrombocytopenia absent radius syndrome, frequently associated with cow's milk allergy; SIRPA is associated with increased neutrophils/monocytes in inflamed tissues as often observed in FPIES; ATG16L1 is associated with inflammatory bowel disease. Coexpression correlation analysis revealed a functional correlation between RBM8A and filaggrin gene (FLG) in stomach and intestine tissue, with filaggrin being a known key pathogenic and risk factor for IgE-mediated food allergy. A transcriptome-wide association study suggested genetic variability in patients impacted gene expression of RBM8A (stomach and pancreas) and ATG16L1 (transverse colon). CONCLUSIONS This study represents the first case-control exome association study of FPIES patients and marks a crucial step towards unravelling genetic susceptibility factors underpinning the syndrome. Our findings highlight potential factors and pathways contributing to FPIES, including epithelial barrier dysfunction and immune dysregulation. While these results are novel, they are preliminary and need further validation in a second cohort of patients.
Collapse
Affiliation(s)
- Alba Camino-Mera
- Genetics, Vaccines and Infections Research Group (GenViP), Instituto de Investigación Sanitaria de Santiago, Universidade de Santiago de Compostela, Santiago de Compostela, Galicia, Spain
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, Spain
| | - Jacobo Pardo-Seco
- Genetics, Vaccines and Infections Research Group (GenViP), Instituto de Investigación Sanitaria de Santiago, Universidade de Santiago de Compostela, Santiago de Compostela, Galicia, Spain
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, Spain
| | - Xabier Bello
- Genetics, Vaccines and Infections Research Group (GenViP), Instituto de Investigación Sanitaria de Santiago, Universidade de Santiago de Compostela, Santiago de Compostela, Galicia, Spain
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, Spain
| | - Laura Argiz
- Allergy Section, Clinica Universidad de Navarra, Madrid, Spain
| | - Robert J Boyle
- Section of Inflammation, Repair and Development, National Heart and Lung Institute, Imperial College London, London, UK
| | - Adnan Custovic
- Section of Inflammation, Repair and Development, National Heart and Lung Institute, Imperial College London, London, UK
| | - Jethro Herberg
- Department of Infectious Disease, Imperial College London, London, UK
| | - Myrsini Kaforou
- Department of Infectious Disease, Imperial College London, London, UK
| | - Stefania Arasi
- Allergy Diseases Research Area, Pediatric Allergology Unit, Bambino Gesù Children's Hospital IRCCS, Rome, Italy
| | - Alessandro Fiocchi
- Allergy Diseases Research Area, Pediatric Allergology Unit, Bambino Gesù Children's Hospital IRCCS, Rome, Italy
| | - Valentina Pecora
- Allergy Diseases Research Area, Pediatric Allergology Unit, Bambino Gesù Children's Hospital IRCCS, Rome, Italy
| | - Simona Barni
- Allergy Unit, Meyer Children's Hospital IRCCS, Florence, Italy
| | - Francesca Mori
- Allergy Unit, Meyer Children's Hospital IRCCS, Florence, Italy
| | - Teresa Bracamonte
- Paediatric Allergy Section, Severo Ochoa University Hospital, Madrid, Spain
| | - Luis Echeverria
- Paediatric Allergy Section, Severo Ochoa University Hospital, Madrid, Spain
| | - Virginia O'Valle-Aísa
- Clinical Analysis and Clinical Biochemistry Service, Severo Ochoa University Hospital, Madrid, Spain
| | | | - Iria Carballeira
- Paediatric Allergy Section, Arquitecto Marcide Hospital, Ferrol, A Coruña in Galicia, Spain
| | - Emilio García
- Paediatric Allergy Section, Arquitecto Marcide Hospital, Ferrol, A Coruña in Galicia, Spain
| | - Carlos Garcia-Magan
- Paediatrics Department, Hospital Clínico Universitario de Santiago de Compostela, Coruña, Galicia, Spain
| | | | | | - Teresa Garriga-Baraut
- Paediatric Allergy Section, Vall D'Hebron University Hospital, Growth and Development Research Group, Vall d'Hebron Research Institute (VHIR), Barcelona, Spain
| | - Sonsoles Infante
- Pediatric Allergy Unit, Hospital General Universitario Gregorio Marañón, Gregorio Marañón Health Research Institute (IiSGM), Madrid, Spain
| | - Gabriela Zambrano-Ibarra
- Pediatric Allergy Unit, Hospital General Universitario Gregorio Marañón, Gregorio Marañón Health Research Institute (IiSGM), Madrid, Spain
| | - Margarita Tomás-Pérez
- Pediatric Allergy Unit, Hospital General Universitario Gregorio Marañón, Gregorio Marañón Health Research Institute (IiSGM), Madrid, Spain
| | - Adrianna Machinena
- Allergy and Clinical Immunology Department, Hospital Sant Joan de Déu, Barcelona, Spain
| | - Mariona Pascal
- Immunology Department, CDB, Hospital Clínic de Barcelona, Barcelona, Spain
- IDIBAPS, Universitat de Barcelona, Barcelona, Spain
| | - Ana Prieto
- Paediatric Allergy Section, General University Hospital, Malaga, Spain
| | - Sonia Vázquez-Cortes
- Allergy Department, Hospital Clinico San Carlos, Instituto de Investigación Sanitaria San Carlos (IdISSC), Madrid, Spain
| | - Montserrat Fernández-Rivas
- Allergy Department, Hospital Clinico San Carlos, Instituto de Investigación Sanitaria San Carlos (IdISSC), Universidad Complutense, Madrid, Spain
| | - Leticia Vila
- Paediatric Allergy Section, Teresa Herrera Hospital, Coruna, Spain
| | - Laia Alsina
- Clinical Immunology and Primary Immunodeficiencies Unit, Allergy and Clinical Immunology Department, Hospital Sant Joan de Déu, Institut de Recerca Sant Joan de Déu and Universitat de Barcelona, Barcelona, Spain
| | - María José Torres
- Allergy Department, General University Hospital, Málaga, Spain
- Allergy Research Group, Instituto de Investigación Biomédica de Málaga y Plataforma en Nanomedicina-IBIMA Plataforma Bionand, Málaga, Spain
- Universidad de Málaga (UMA), Málaga, Spain
- Allergy Clinical Unit, Hospital Regional Universitario de Málaga, Málaga, Spain
| | - Giusi Mangone
- Department of Health Sciences, University of Florence, Florence, Italy
| | - Santiago Quirce
- Department of Allergy, La Paz University Hospital, IdiPAZ, Madrid, Spain
| | - Federico Martinón-Torres
- Genetics, Vaccines and Infections Research Group (GenViP), Instituto de Investigación Sanitaria de Santiago, Universidade de Santiago de Compostela, Santiago de Compostela, Galicia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, Spain
- Translational Pediatrics and Infectious Diseases, Department of Pediatrics, Hospital Clínico Universitario de Santiago de Compostela, Santiago de Compostela, Galicia, Spain
| | - Marta Vázquez-Ortiz
- Section of Inflammation, Repair and Development, National Heart and Lung Institute, Imperial College London, London, UK
| | - Alberto Gómez-Carballa
- Genetics, Vaccines and Infections Research Group (GenViP), Instituto de Investigación Sanitaria de Santiago, Universidade de Santiago de Compostela, Santiago de Compostela, Galicia, Spain
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, Spain
| | - Antonio Salas
- Genetics, Vaccines and Infections Research Group (GenViP), Instituto de Investigación Sanitaria de Santiago, Universidade de Santiago de Compostela, Santiago de Compostela, Galicia, Spain
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, Spain
| |
Collapse
|
31
|
Herrera-Luis E, Benke K, Volk H, Ladd-Acosta C, Wojcik GL. Gene-environment interactions in human health. Nat Rev Genet 2024; 25:768-784. [PMID: 38806721 DOI: 10.1038/s41576-024-00731-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/03/2024] [Indexed: 05/30/2024]
Abstract
Gene-environment interactions (G × E), the interplay of genetic variation with environmental factors, have a pivotal impact on human complex traits and diseases. Statistically, G × E can be assessed by determining the deviation from expectation of predictive models based solely on the phenotypic effects of genetics or environmental exposures. Despite the unprecedented, widespread and diverse use of G × E analytical frameworks, heterogeneity in their application and reporting hinders their applicability in public health. In this Review, we discuss study design considerations as well as G × E analytical frameworks to assess polygenic liability dependent on the environment, to identify specific genetic variants exhibiting G × E, and to characterize environmental context for these dynamics. We conclude with recommendations to address the most common challenges and pitfalls in the conceptualization, methodology and reporting of G × E studies, as well as future directions.
Collapse
Affiliation(s)
- Esther Herrera-Luis
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Kelly Benke
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Heather Volk
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Christine Ladd-Acosta
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
32
|
Boedijono FS, Bood V, Eichhorn IA, Hansbro PM, Slebos DJ, van den Berge M, Faiz A, Pouwels SD. Identification of Genetic Factors Associated With DAMP Release in COPD Patients. Arch Bronconeumol 2024; 60:714-717. [PMID: 39034199 DOI: 10.1016/j.arbres.2024.06.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 06/07/2024] [Accepted: 06/28/2024] [Indexed: 07/23/2024]
Affiliation(s)
- Fia Sabrina Boedijono
- Respiratory Bioinformatics and Molecular Biology Group, University of Technology Sydney, Australia; Centre for Inflammation, Centenary Institute and University of Technology Sydney, Faculty of Science, School of Life Sciences, Sydney, Australia
| | - Verena Bood
- Department of Pulmonary Diseases, University Medical Center Groningen, The Netherlands; Department of Pathology and Medical Biology, University Medical Center Groningen, Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, Groningen, The Netherlands
| | - Ilse A Eichhorn
- Department of Pulmonary Diseases, University Medical Center Groningen, The Netherlands; Department of Pathology and Medical Biology, University Medical Center Groningen, Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, Groningen, The Netherlands
| | - Philip M Hansbro
- Centre for Inflammation, Centenary Institute and University of Technology Sydney, Faculty of Science, School of Life Sciences, Sydney, Australia
| | - Dirk-Jan Slebos
- Department of Pulmonary Diseases, University Medical Center Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, Groningen, The Netherlands
| | - Maarten van den Berge
- Department of Pulmonary Diseases, University Medical Center Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, Groningen, The Netherlands
| | - Alen Faiz
- Respiratory Bioinformatics and Molecular Biology Group, University of Technology Sydney, Australia
| | - Simon D Pouwels
- Department of Pulmonary Diseases, University Medical Center Groningen, The Netherlands; Department of Pathology and Medical Biology, University Medical Center Groningen, Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, Groningen, The Netherlands.
| |
Collapse
|
33
|
Scheetz TE, Tollefson MR, Roos BR, Boese EA, Pouw AE, Stone EM, Schnieders MJ, Fingert JH. METTL23 Variants and Patients With Normal-Tension Glaucoma. JAMA Ophthalmol 2024; 142:1037-1045. [PMID: 39325437 PMCID: PMC11428026 DOI: 10.1001/jamaophthalmol.2024.3829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/27/2024] [Indexed: 09/27/2024]
Abstract
Importance This research confirms and further establishes that pathogenic variants in a fourth gene, METTL23, are associated with autosomal dominant normal-tension glaucoma (NTG). Objective To determine the frequency of glaucoma-causing pathogenic variants in the METTL23 gene in a cohort of patients with NTG from Iowa. Design, Setting, and Participants This case-control study took place at a single tertiary care center in Iowa from January 1997 to January 2024, with analysis occurring between January 2023 and January 2024. Two groups of participants were enrolled from the University of Iowa clinics: 331 patients with NTG and 362 control individuals without glaucoma. Patients with a history of trauma; steroid use; stigmata of pigment dispersion syndrome; exfoliation syndrome; or pathogenic variants in MYOC, TBK1, or OPTN were also excluded. Main Outcomes and Measures Detection of an enrichment of METTL23 pathogenic variants in individuals with NTG compared with control individuals without glaucoma. Results The study included 331 patients with NTG (mean [SD] age, 68.0 [11.7] years; 228 [68.9%] female and 103 [31.1%] male) and 362 control individuals without glaucoma (mean [SD] age, 64.5 [12.6] years; 207 [57.2%] female and 155 [42.8%] male). There were 5 detected instances of 4 unique METTL23 pathogenic variants in patients with NTG. Three METTL23 variants-p.Ala7Val, p.Pro22Arg, and p.Arg63Trp-were judged to be likely pathogenic and were detected in 3 patients (0.91%) with NTG. However, when all detected variants were evaluated with either mutation burden analysis or logistic regression, their frequency was not statistically higher in individuals with NTG than in control individuals without glaucoma (1.5% vs 2.5%; P = .27). Conclusion and Relevance This investigation provides evidence that pathogenic variants in METTL23 are associated with NTG. Within an NTG cohort at a tertiary care center, pathogenic variants were associated with approximately 1% of NTG cases, a frequency similar to that of other known normal-tension glaucoma genes, including optineurin (OPTN), TANK-binding kinase 1 (TBK1), and myocilin (MYOC). The findings suggest that METTL23 pathogenic variants are likely involved in a biologic pathway that is associated with glaucoma that occurs at lower intraocular pressures.
Collapse
Affiliation(s)
- Todd. E. Scheetz
- Institute for Vision Research, University of Iowa, Iowa City
- Department of Ophthalmology and Visual Sciences, Carver College of Medicine, University of Iowa, Iowa City
| | - Mallory R. Tollefson
- Deparment of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City
| | - Ben R. Roos
- Institute for Vision Research, University of Iowa, Iowa City
- Department of Ophthalmology and Visual Sciences, Carver College of Medicine, University of Iowa, Iowa City
| | - Erin A. Boese
- Institute for Vision Research, University of Iowa, Iowa City
- Department of Ophthalmology and Visual Sciences, Carver College of Medicine, University of Iowa, Iowa City
| | - Andrew E. Pouw
- Institute for Vision Research, University of Iowa, Iowa City
- Department of Ophthalmology and Visual Sciences, Carver College of Medicine, University of Iowa, Iowa City
| | - Edwin M. Stone
- Institute for Vision Research, University of Iowa, Iowa City
- Department of Ophthalmology and Visual Sciences, Carver College of Medicine, University of Iowa, Iowa City
| | - Michael J. Schnieders
- Institute for Vision Research, University of Iowa, Iowa City
- Deparment of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City
| | - John H. Fingert
- Institute for Vision Research, University of Iowa, Iowa City
- Department of Ophthalmology and Visual Sciences, Carver College of Medicine, University of Iowa, Iowa City
| |
Collapse
|
34
|
Sun X, Bulekova K, Yang J, Lai M, Pitsillides AN, Liu X, Zhang Y, Guo X, Yong Q, Raffield LM, Rotter JI, Rich SS, Abecasis G, Carson AP, Vasan RS, Bis JC, Psaty BM, Boerwinkle E, Fitzpatrick AL, Satizabal CL, Arking DE, Ding J, Levy D, Liu C. Association analysis of mitochondrial DNA heteroplasmic variants: Methods and application. Mitochondrion 2024; 79:101954. [PMID: 39245194 PMCID: PMC11568909 DOI: 10.1016/j.mito.2024.101954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 08/26/2024] [Accepted: 08/31/2024] [Indexed: 09/10/2024]
Abstract
We rigorously assessed a comprehensive association testing framework for heteroplasmy, employing both simulated and real-world data. This framework employed a variant allele fraction (VAF) threshold and harnessed multiple gene-based tests for robust identification and association testing of heteroplasmy. Our simulation studies demonstrated that gene-based tests maintained an appropriate type I error rate at α = 0.001. Notably, when 5 % or more heteroplasmic variants within a target region were linked to an outcome, burden-extension tests (including the adaptive burden test, variable threshold burden test, and z-score weighting burden test) outperformed the sequence kernel association test (SKAT) and the original burden test. Applying this framework, we conducted association analyses on whole-blood derived heteroplasmy in 17,507 individuals of African and European ancestries (31 % of African Ancestry, mean age of 62, with 58 % women) with whole genome sequencing data. We performed both cohort- and ancestry-specific association analyses, followed by meta-analysis on both pooled samples and within each ancestry group. Our results suggest that mtDNA-encoded genes/regions are likely to exhibit varying rates in somatic aging, with the notably strong associations observed between heteroplasmy in the RNR1 and RNR2 genes (p < 0.001) and advance aging by the Original Burden test. In contrast, SKAT identified significant associations (p < 0.001) between diabetes and the aggregated effects of heteroplasmy in several protein-coding genes. Further research is warranted to validate these findings. In summary, our proposed statistical framework represents a valuable tool for facilitating association testing of heteroplasmy with disease traits in large human populations.
Collapse
Affiliation(s)
- Xianbang Sun
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Katia Bulekova
- Research Computing Services, Boston University, Boston, MA 02215, USA
| | - Jian Yang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Meng Lai
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Achilleas N Pitsillides
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Xue Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Yuankai Zhang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Qian Yong
- Longitudinal Studies Section, Translational Gerontology Branch, NIA/NIH, Baltimore, MD 21224, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Stephen S Rich
- Department of Public Health Services, Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Goncalo Abecasis
- TOPMed Informatics Research Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - April P Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | - Ramachandran S Vasan
- Sections of Preventive Medicine and Epidemiology, and Cardiovascular Medicine, Boston University School of Medicine, Boston, MA, 02118, USA; Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA; Departments of Epidemiology, and Health Services, University of Washington, Seattle, WA 98101, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Annette L Fitzpatrick
- Departments of Family Medicine, Epidemiology, and Global Health, University of Washington, Seattle, WA 98195, USA
| | - Claudia L Satizabal
- Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA; Glenn Biggs Institute for Alzheimer's and Neurodegenerative Diseases, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Dan E Arking
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, MD 21205, USA
| | - Jun Ding
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Daniel Levy
- Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA; Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA; Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA.
| |
Collapse
|
35
|
Kelman G, Zucker R, Brandes N, Linial M. PWAS Hub for exploring gene-based associations of common complex diseases. Genome Res 2024; 34:1674-1686. [PMID: 39406500 PMCID: PMC11529988 DOI: 10.1101/gr.278916.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 08/30/2024] [Indexed: 11/01/2024]
Abstract
PWAS (proteome-wide association study) is an innovative genetic association approach that complements widely used methods like GWAS (genome-wide association study). The PWAS approach involves consecutive phases. Initially, machine learning modeling and probabilistic considerations quantify the impact of genetic variants on protein-coding genes' biochemical functions. Secondly, for each individual, aggregating the variants per gene determines a gene-damaging score. Finally, standard statistical tests are activated in the case-control setting to yield statistically significant genes per phenotype. The PWAS Hub offers a user-friendly interface for an in-depth exploration of gene-disease associations from the UK Biobank (UKB). Results from PWAS cover 99 common diseases and conditions, each with over 10,000 diagnosed individuals per phenotype. Users can explore genes associated with these diseases, with separate analyses conducted for males and females. For each phenotype, the analyses account for sex-based genetic effects, inheritance modes (dominant and recessive), and the pleiotropic nature of associated genes. The PWAS Hub showcases its usefulness for asthma by navigating through proteomic-genetic analyses. Inspecting PWAS asthma-listed genes (a total of 27) provide insights into the underlying cellular and molecular mechanisms. Comparison of PWAS-statistically significant genes for common diseases to the Open Targets benchmark shows partial but significant overlap in gene associations for most phenotypes. Graphical tools facilitate comparing genetic effects between PWAS and coding GWAS results, aiding in understanding the sex-specific genetic impact on common diseases. This adaptable platform is attractive to clinicians, researchers, and individuals interested in delving into gene-disease associations and sex-specific genetic effects.
Collapse
Affiliation(s)
- Guy Kelman
- The Jerusalem Center for Personalized Computational Medicine, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Roei Zucker
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Nadav Brandes
- Division of Rheumatology, Department of Medicine, University of California San Francisco, San Francisco, California 94143, USA
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| |
Collapse
|
36
|
Nouri N, Gussler BH, Stockwell A, Truong T, Kang GJ, Browder KC, Malato Y, Sene A, Van Everen S, Wykoff CC, Brown D, Fu A, Palmer JD, Lima de Carvalho JR, Ullah E, Al Rawi R, Chew EY, Zein WM, Guan B, McCarthy MI, Hofmann JW, Chaney SY, Jasper H, Yaspan BL. SLC16A8 is a causal contributor to age-related macular degeneration risk. NPJ Genom Med 2024; 9:50. [PMID: 39468037 PMCID: PMC11519927 DOI: 10.1038/s41525-024-00442-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 10/12/2024] [Indexed: 10/30/2024] Open
Abstract
Age-related macular degeneration (AMD), a complex neurodegenerative disease, is a leading cause of visual impairment worldwide with a strong genetic component. Genetic studies have identified several loci, but few causal genes with functional characterization. Here we highlight multiple lines of evidence which show a causal role in AMD for SLC16A8, which encodes MCT3, a retinal pigment epithelium (RPE) specific lactate transporter. First, in an unbiased, genome-wide analysis of rare coding variants we show multiple SLC16A8 rare variants are associated with AMD risk, corroborating previous borderline significant reports from AMD rare variant studies. Second, we report a novel SLC16A8 mutation in a three-generation family with early onset macular degeneration. Finally, mis-expression in multiple model organisms shows functional and anatomic retinal consequences. This study highlights the important role for SLC16A8 and lactate regulation towards outer retina/RPE health and highlights a potential new therapeutic opportunity for the treatment of AMD.
Collapse
Affiliation(s)
- Navid Nouri
- Genentech, Inc., South San Francisco, CA, USA
| | | | | | - Tom Truong
- Genentech, Inc., South San Francisco, CA, USA
| | | | | | - Yann Malato
- Genentech, Inc., South San Francisco, CA, USA
| | | | | | - Charles C Wykoff
- Retina Consultants of Texas, Retina Consultants of America, Houston, TX, USA
| | - David Brown
- Retina Consultants of Texas, Retina Consultants of America, Houston, TX, USA
| | - Arthur Fu
- West Coast Retina Medical Group, San Francisco, CA, USA
| | - James D Palmer
- Northern California Retina Vitreous Associates, San Jose, CA, USA
| | - Jose Ronaldo Lima de Carvalho
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
- Hospital das Clinicas de Pernambuco-Empresa Brasileira de Servicos Hospitalares, Federal University of Pernambuco, Recife, PE, Brazil
| | - Ehsan Ullah
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ranya Al Rawi
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
| | - Emily Y Chew
- Division of Epidemiology and Clinical Applications, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
| | - Wadih M Zein
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
| | - Bin Guan
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | | | | |
Collapse
|
37
|
Fu B, Anand P, Anand A, Mefford J, Sankararaman S. A scalable adaptive quadratic kernel method for interpretable epistasis analysis in complex traits. Genome Res 2024; 34:1294-1303. [PMID: 39209554 PMCID: PMC11529862 DOI: 10.1101/gr.279140.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024]
Abstract
Our knowledge of the contribution of genetic interactions (epistasis) to variation in human complex traits remains limited, partly due to the lack of efficient, powerful, and interpretable algorithms to detect interactions. Recently proposed approaches for set-based association tests show promise in improving the power to detect epistasis by examining the aggregated effects of multiple variants. Nevertheless, these methods either do not scale to large Biobank data sets or lack interpretability. We propose QuadKAST, a scalable algorithm focused on testing pairwise interaction effects (quadratic effects) within small to medium-sized sets of genetic variants (window size ≤100) on a trait and provide quantified interpretation of these effects. Comprehensive simulations show that QuadKAST is well-calibrated. Additionally, QuadKAST is highly sensitive in detecting loci with epistatic signals and accurate in its estimation of quadratic effects. We applied QuadKAST to 52 quantitative phenotypes measured in ≈300,000 unrelated white British individuals in the UK Biobank to test for quadratic effects within each of 9515 protein-coding genes. We detect 32 trait-gene pairs across 17 traits and 29 genes that demonstrate statistically significant signals of quadratic effects (accounting for the number of genes and traits tested). Across these trait-gene pairs, the proportion of trait variance explained by quadratic effects is comparable to additive effects, with five pairs having a ratio >1. Our method enables the detailed investigation of epistasis on a large scale, offering new insights into its role and importance.
Collapse
Affiliation(s)
- Boyang Fu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA;
| | - Prateek Anand
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Aakarsh Anand
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Joel Mefford
- Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, California 90024, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA;
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
38
|
Wang JH, Hou PL, Chen YH. Multicategory Survival Outcomes Classification via Overlapping Group Screening Process Based on Multinomial Logistic Regression Model With Application to TCGA Transcriptomic Data. Cancer Inform 2024; 23:11769351241286710. [PMID: 39385930 PMCID: PMC11462568 DOI: 10.1177/11769351241286710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 09/05/2024] [Indexed: 10/12/2024] Open
Abstract
Objectives Under the classification of multicategory survival outcomes of cancer patients, it is crucial to identify biomarkers that affect specific outcome categories. The classification of multicategory survival outcomes from transcriptomic data has been thoroughly investigated in computational biology. Nevertheless, several challenges must be addressed, including the ultra-high-dimensional feature space, feature contamination, and data imbalance, all of which contribute to the instability of the diagnostic model. Furthermore, although most methods achieve accurate predicted performance for binary classification with high-dimensional transcriptomic data, their extension to multi-class classification is not straightforward. Methods We employ the One-versus-One strategy to transform multi-class classification into multiple binary classification, and utilize the overlapping group screening procedure with binary logistic regression to include pathway information for identifying important genes and gene-gene interactions for multicategory survival outcomes. Results A series of simulation studies are conducted to compare the classification accuracy of our proposed approach with some existing machine learning methods. In practical data applications, we utilize the random oversampling procedure to tackle class imbalance issues. We then apply the proposed method to analyze transcriptomic data from various cancers in The Cancer Genome Atlas, such as kidney renal papillary cell carcinoma, lung adenocarcinoma, and head and neck squamous cell carcinoma. Our aim is to establish an accurate microarray-based multicategory cancer diagnosis model. The numerical results illustrate that the new proposal effectively enhances cancer diagnosis compared to approaches that neglect pathway information. Conclusions We showcase the effectiveness of the proposed method in terms of class prediction accuracy through evaluations on simulated synthetic datasets as well as real dataset applications. We also identified the cancer-related gene-gene interaction biomarkers and reported the corresponding network structure. According to the identified major genes and gene-gene interactions, we can predict for each patient the probabilities that he/she belongs to each of the survival outcome classes.
Collapse
Affiliation(s)
- Jie-Huei Wang
- Department of Mathematics, National Chung Cheng University, Chiayi City, Taiwan
| | - Po-Lin Hou
- Department of Mathematics, National Chung Cheng University, Chiayi City, Taiwan
| | - Yi-Hau Chen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
39
|
Harris L, McDonagh EM, Zhang X, Fawcett K, Foreman A, Daneck P, Sergouniotis PI, Parkinson H, Mazzarotto F, Inouye M, Hollox EJ, Birney E, Fitzgerald T. Genome-wide association testing beyond SNPs. Nat Rev Genet 2024:10.1038/s41576-024-00778-y. [PMID: 39375560 DOI: 10.1038/s41576-024-00778-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2024] [Indexed: 10/09/2024]
Abstract
Decades of genetic association testing in human cohorts have provided important insights into the genetic architecture and biological underpinnings of complex traits and diseases. However, for certain traits, genome-wide association studies (GWAS) for common SNPs are approaching signal saturation, which underscores the need to explore other types of genetic variation to understand the genetic basis of traits and diseases. Copy number variation (CNV) is an important source of heritability that is well known to functionally affect human traits. Recent technological and computational advances enable the large-scale, genome-wide evaluation of CNVs, with implications for downstream applications such as polygenic risk scoring and drug target identification. Here, we review the current state of CNV-GWAS, discuss current limitations in resource infrastructure that need to be overcome to enable the wider uptake of CNV-GWAS results, highlight emerging opportunities and suggest guidelines and standards for future GWAS for genetic variation beyond SNPs at scale.
Collapse
Affiliation(s)
- Laura Harris
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Ellen M McDonagh
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Xiaolei Zhang
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Katherine Fawcett
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
- Department of Population Health Sciences, University of Leicester, Leicester, UK
| | - Amy Foreman
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Petr Daneck
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Panagiotis I Sergouniotis
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Francesco Mazzarotto
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Michael Inouye
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia
| | - Edward J Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Ewan Birney
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Tomas Fitzgerald
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
40
|
Ling H, Raraigh KS, Pugh EW, Aksit MA, Zhang P, Pace RG, Faino AV, Bamshad MJ, Gibson RL, O'Neal W, Knowles MR, Blackman SM, Cutting GR. Genetic modifiers of body mass index in individuals with cystic fibrosis. Am J Hum Genet 2024; 111:2203-2218. [PMID: 39260370 PMCID: PMC11480786 DOI: 10.1016/j.ajhg.2024.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 08/07/2024] [Accepted: 08/07/2024] [Indexed: 09/13/2024] Open
Abstract
To identify modifier loci underlying variation in body mass index (BMI) in persons with cystic fibrosis (pwCF), we performed a genome-wide association study (GWAS). Utilizing longitudinal height and weight data, along with demographic information and covariates from 4,393 pwCF, we calculated AvgBMIz representing the average of per-quarter BMI Z scores. The GWAS incorporated 9.8M single nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) > 0.005 extracted from whole-genome sequencing (WGS) of each study subject. We observed genome-wide significant association with a variant in FTO (FaT mass and Obesity-associated gene; rs28567725; p value = 1.21e-08; MAF = 0.41, β = 0.106; n = 4,393 individuals) and a variant within ADAMTS5 (A Disintegrin And Metalloproteinase with ThromboSpondin motifs 5; rs162500; p value = 2.11e-10; MAF = 0.005, β = -0.768; n = 4,085 pancreatic-insufficient individuals). Notably, BMI-associated variants in ADAMTS5 occur on a haplotype that is much more common in African (AFR, MAF = 0.183) than European (EUR, MAF = 0.006) populations (1000 Genomes project). A polygenic risk score (PRS) calculated using 924 SNPs (excluding 17 in FTO) showed significant association with AvgBMIz (p value = 2.2e-16; r2 = 0.03). Association between variants in FTO and the PRS correlation reveals similarities in the genetic architecture of BMI in CF and the general population. Inclusion of Black individuals in whom the single-gene disorder CF is much less common but genomic diversity is greater facilitated detection of association with variants that are in LD with functional SNPs in ADAMTS5. Our results illustrate the importance of population diversity, particularly when attempting to identify variants that manifest only under certain physiologic conditions.
Collapse
Affiliation(s)
- Hua Ling
- Center for Inherited Disease Research, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Karen S Raraigh
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Elizabeth W Pugh
- Center for Inherited Disease Research, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Melis A Aksit
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Peng Zhang
- Center for Inherited Disease Research, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Rhonda G Pace
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Anna V Faino
- Children's Core for Biostatistics, Epidemiology and Analytics in Research, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Michael J Bamshad
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Center for Clinical and Translational Research, Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Ronald L Gibson
- Center for Clinical and Translational Research, Seattle Children's Hospital, Seattle, WA 98105, USA; Department of Pediatrics, Division of Pulmonary & Sleep Medicine, University of Washington School of Medicine/Seattle Children's Hospital, Seattle, WA, USA
| | - Wanda O'Neal
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael R Knowles
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Scott M Blackman
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Division of Pediatric Endocrinology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Garry R Cutting
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
| |
Collapse
|
41
|
Ziyatdinov A, Mbatchou J, Marcketta A, Backman J, Gaynor S, Zou Y, Joseph T, Geraghty B, Herman J, Watanabe K, Ghosh A, Kosmicki J, Locke A, Thornton T, Kang HM, Ferreira M, Baras A, Abecasis G, Marchini J. Joint testing of rare variant burden scores using non-negative least squares. Am J Hum Genet 2024; 111:2139-2149. [PMID: 39366334 PMCID: PMC11480795 DOI: 10.1016/j.ajhg.2024.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 08/23/2024] [Accepted: 08/27/2024] [Indexed: 10/06/2024] Open
Abstract
Gene-based burden tests are a popular and powerful approach for analysis of exome-wide association studies. These approaches combine sets of variants within a gene into a single burden score that is then tested for association. Typically, a range of burden scores are calculated and tested across a range of annotation classes and frequency bins. Correlation between these tests can complicate the multiple testing correction and hamper interpretation of the results. We introduce a method called the sparse burden association test (SBAT) that tests the joint set of burden scores under the assumption that causal burden scores act in the same effect direction. The method simultaneously assesses the significance of the model fit and selects the set of burden scores that best explain the association at the same time. Using simulated data, we show that the method is well calibrated and highlight scenarios where the test outperforms existing gene-based tests. We apply the method to 73 quantitative traits from the UK Biobank, showing that SBAT is a valuable additional gene-based test when combined with other existing approaches. This test is implemented in the REGENIE software.
Collapse
Affiliation(s)
| | | | | | | | | | - Yuxin Zou
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | - Adam Locke
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | |
Collapse
|
42
|
Acharya S, Liao S, Jung WJ, Kang YS, Moghaddam VA, Feitosa MF, Wojczynski MK, Lin S, Anema JA, Schwander K, Connell JO, Province MA, Brent MR. A methodology for gene level omics-WAS integration identifies genes influencing traits associated with cardiovascular risks: the Long Life Family Study. Hum Genet 2024; 143:1241-1252. [PMID: 39276247 PMCID: PMC11485042 DOI: 10.1007/s00439-024-02701-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 08/15/2024] [Indexed: 09/16/2024]
Abstract
The Long Life Family Study (LLFS) enrolled 4953 participants in 539 pedigrees displaying exceptional longevity. To identify genetic mechanisms that affect cardiovascular risks in the LLFS population, we developed a multi-omics integration pipeline and applied it to 11 traits associated with cardiovascular risks. Using our pipeline, we aggregated gene-level statistics from rare-variant analysis, GWAS, and gene expression-trait association by Correlated Meta-Analysis (CMA). Across all traits, CMA identified 64 significant genes after Bonferroni correction (p ≤ 2.8 × 10-7), 29 of which replicated in the Framingham Heart Study (FHS) cohort. Notably, 20 of the 29 replicated genes do not have a previously known trait-associated variant in the GWAS Catalog within 50 kb. Thirteen modules in Protein-Protein Interaction (PPI) networks are significantly enriched in genes with low meta-analysis p-values for at least one trait, three of which are replicated in the FHS cohort. The functional annotation of genes in these modules showed a significant over-representation of trait-related biological processes including sterol transport, protein-lipid complex remodeling, and immune response regulation. Among major findings, our results suggest a role of triglyceride-associated and mast-cell functional genes FCER1A, MS4A2, GATA2, HDC, and HRH4 in atherosclerosis risks. Our findings also suggest that lower expression of ATG2A, a gene we found to be associated with BMI, may be both a cause and consequence of obesity. Finally, our results suggest that ENPP3 may play an intermediary role in triglyceride-induced inflammation. Our pipeline is freely available and implemented in the Nextflow workflow language, making it easily runnable on any compute platform ( https://nf-co.re/omicsgenetraitassociation ).
Collapse
Affiliation(s)
- Sandeep Acharya
- Division of Computational and Data Sciences, Washington University, St Louis, MO, USA
| | - Shu Liao
- Department of Computer Science and Engineering, Washington University, St Louis, MO, USA
| | - Wooseok J Jung
- Department of Computer Science and Engineering, Washington University, St Louis, MO, USA
| | - Yu S Kang
- Department of Computer Science and Engineering, Washington University, St Louis, MO, USA
| | - Vaha Akbary Moghaddam
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Mary F Feitosa
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Mary K Wojczynski
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Shiow Lin
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Jason A Anema
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Karen Schwander
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Jeff O Connell
- Department of Medicine, University of Maryland, Baltimore, MD, USA
| | - Michael A Province
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Michael R Brent
- Department of Computer Science and Engineering, Washington University, St Louis, MO, USA.
| |
Collapse
|
43
|
Schraiber JG, Edge MD, Pennell M. Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations. PLoS Biol 2024; 22:e3002847. [PMID: 39383205 PMCID: PMC11493298 DOI: 10.1371/journal.pbio.3002847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 10/21/2024] [Accepted: 09/17/2024] [Indexed: 10/11/2024] Open
Abstract
In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these 2 fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we lay out a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., genome-wide association studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur analytically and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study, we re-examine an analysis testing for coevolution of expression levels between genes across a fungal phylogeny and show that including eigenvectors of the covariance matrix as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.
Collapse
Affiliation(s)
- Joshua G. Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Matt Pennell
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
- Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| |
Collapse
|
44
|
Clarke B, Holtkamp E, Öztürk H, Mück M, Wahlberg M, Meyer K, Munzlinger F, Brechtmann F, Hölzlwimmer FR, Lindner J, Chen Z, Gagneur J, Stegle O. Integration of variant annotations using deep set networks boosts rare variant association testing. Nat Genet 2024; 56:2271-2280. [PMID: 39322779 PMCID: PMC11525182 DOI: 10.1038/s41588-024-01919-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 08/20/2024] [Indexed: 09/27/2024]
Abstract
Rare genetic variants can have strong effects on phenotypes, yet accounting for rare variants in genetic analyses is statistically challenging due to the limited number of allele carriers and the burden of multiple testing. While rich variant annotations promise to enable well-powered rare variant association tests, methods integrating variant annotations in a data-driven manner are lacking. Here we propose deep rare variant association testing (DeepRVAT), a model based on set neural networks that learns a trait-agnostic gene impairment score from rare variant annotations and phenotypes, enabling both gene discovery and trait prediction. On 34 quantitative and 63 binary traits, using whole-exome-sequencing data from UK Biobank, we find that DeepRVAT yields substantial gains in gene discoveries and improved detection of individuals at high genetic risk. Finally, we demonstrate how DeepRVAT enables calibrated and computationally efficient rare variant tests at biobank scale, aiding the discovery of genetic risk factors for human disease traits.
Collapse
Affiliation(s)
- Brian Clarke
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- AI Health Innovation Cluster, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| | - Eva Holtkamp
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association-Munich School for Data Science (MUDS), Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Hakime Öztürk
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Marcel Mück
- AI Health Innovation Cluster, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Magnus Wahlberg
- AI Health Innovation Cluster, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Kayla Meyer
- AI Health Innovation Cluster, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Felix Munzlinger
- AI Health Innovation Cluster, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Felix Brechtmann
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Florian R Hölzlwimmer
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Jonas Lindner
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Zhifen Chen
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Munich, Germany
- Deutsches Zentrum für Herz- und Kreislaufforschung (DZHK), Partner Site Munich Heart Alliance, Munich, Germany
| | - Julien Gagneur
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
- Munich Center for Machine Learning, Munich, Germany.
- Institute of Human Genetics, School of Medicine and Health, Technical University of Munich, Munich, Germany.
| | - Oliver Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.
| |
Collapse
|
45
|
Correia Marques M, Rubin D, Shuldiner EG, Datta M, Schmitz E, Gutierrez Cruz G, Patt A, Bennett E, Grom A, Foell D, Gattorno M, Bohnsack J, Yeung RSM, Prahalad S, Mellins E, Anton J, Len CA, Oliveira S, Woo P, Ozen S, Deng Z, Ombrello MJ. Enrichment of Rare Variants of Hemophagocytic Lymphohistiocytosis Genes in Systemic Juvenile Idiopathic Arthritis. Arthritis Rheumatol 2024; 76:1566-1572. [PMID: 38937141 DOI: 10.1002/art.42938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 05/23/2024] [Accepted: 06/12/2024] [Indexed: 06/29/2024]
Abstract
OBJECTIVE Our objective was to evaluate whether there is an enrichment of rare variants in familial hemophagocytic lymphohistiocytosis (HLH)-associated genes among patients with systemic juvenile idiopathic arthritis (sJIA) with or without macrophage activation syndrome (MAS). METHODS Targeted sequencing of HLH genes (LYST, PRF1, RAB27A, STX11, STXBP2, UNC13D) was performed in patients with sJIA from an established cohort. Sequence data from control participants were obtained in silico (database of Genotypes and Phenotypes: phs000280.v8.p2). Rare variant association testing (RVT) was performed with sequence kernel association test package. Significance was defined as P < 0.05 after 100,000 permutations. RESULTS Sequencing data from 524 sJIA cases were jointly called and harmonized with exome-derived target data from 3,000 controls. Quality control operations produced a set of 480 cases and 2,924 ancestrally matched control participants. RVT of cases and controls revealed a significant association with rare protein-altering variants (minor allele frequency [MAF] < 0.01) of STXBP2 (P = 0.020) and ultrarare variants (MAF < 0.001) of STXBP2 (P = 0.006) and UNC13D (P = 0.046). A subanalysis of 32 cases with known MAS and 90 without revealed a significant difference in the distribution of rare UNC13D variants (P = 0.0047) between the groups. Additionally, patients with sJIA more often carried two or more HLH variants than did controls (P = 0.007), driven largely by digenic combinations involving LYST. CONCLUSION We identified an enrichment of rare HLH variants in patients with sJIA compared with controls, driven by STXBP2 and UNC13D. Biallelic variation in HLH genes was associated with sJIA, driven by LYST. Only UNC13D displayed enrichment in patients with MAS. This suggests that HLH variants may contribute to the pathophysiology of sJIA, even without MAS.
Collapse
Affiliation(s)
- Mariana Correia Marques
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, Maryland
| | - Danielle Rubin
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, Maryland
| | - Emily G Shuldiner
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, Maryland
| | - Mallika Datta
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, Maryland
| | - Elizabeth Schmitz
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, Maryland
| | - Gustavo Gutierrez Cruz
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, Maryland
| | - Andrew Patt
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, Maryland
| | - Elizabeth Bennett
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, Maryland
| | - Alexei Grom
- Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Dirk Foell
- University Hospital Muenster, Muenster, Germany
| | | | - John Bohnsack
- University of Utah Eccles School of Medicine, Salt Lake City
| | | | - Sampath Prahalad
- Emory University School of Medicine and Children's Healthcare of Atlanta, Atlanta, Georgia
| | | | - Jordi Anton
- Hospital Sant Joan de Déu, Universitat de Barcelona, Barcelona, Spain
| | | | - Sheila Oliveira
- Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Patricia Woo
- University College London, London, United Kingdom
| | - Seza Ozen
- Hacettepe University, Ankara, Turkey
| | - Zuoming Deng
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, Maryland
| | - Michael J Ombrello
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, Maryland
| |
Collapse
|
46
|
Das D, Khor ES, Jiang F, He J, Kawakami Y, Wainwright L, Hollinger J, Geiger J, Liu H, Meng F, Porter GA, Jin Z, Murphy P, Yao P. Loss-of-function of RNA-binding protein PRRC2B causes translational defects and congenital cardiovascular malformation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.26.24313895. [PMID: 39398999 PMCID: PMC11469349 DOI: 10.1101/2024.09.26.24313895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Alternative splicing generates variant forms of proteins for a given gene and accounts for functional redundancy or diversification. A novel RNA-binding protein, Pro-rich Coiled-coil Containing Protein 2B (PRRC2B), has been reported by multiple laboratories to mediate uORF-dependent and independent regulation of translation initiation required for cell cycle progression and proliferation. We identified two alternative spliced isoforms in human and mouse hearts and HEK293T cells, full-length (FL) and exon 16-excluded isoform ΔE16. A congenital heart disease-associated human mutation-mimicry knock-in of the equivalent variant in the mouse genome leads to the depletion of the full-length Prrc2b mRNA but not the alternative spliced truncated form ΔE16, does not cause any apparent structural or functional disorders. In contrast, global genetic inactivation of the PRRC2B gene in the mouse genome, nullifying both mRNA isoforms, caused patent ductus arteriosus (PDA) and neonatal lethality in mice. Bulk and single nucleus transcriptome profiling analyses of embryonic mouse hearts demonstrated a significant overall downregulation of multiple smooth muscle-specific genes in Prrc2b mutant mice resulting from reduced smooth muscle cell number. Integrated analysis of proteomic changes in Prrc2b null mouse embryonic hearts and polysome-seq and RNA-seq multi-omics analysis in human HEK293T cells uncover conserved PRRC2B-regulated target mRNAs that encode essential factors required for cardiac and vascular development. Our findings reveal the connection between alternative splicing regulation of PRRC2B, PRRC2B-mediated translational control, and congenital cardiovascular development and disorder. This study may shed light on the significance of PRRC2B in human cardiovascular disease diagnosis and treatment.
Collapse
Affiliation(s)
- Debojyoti Das
- Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine & Dentistry, Rochester, NY 14642
| | - Eng-Soon Khor
- Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine & Dentistry, Rochester, NY 14642
| | - Feng Jiang
- Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine & Dentistry, Rochester, NY 14642
- Department of Biochemistry & Biophysics, University of Rochester School of Medicine & Dentistry, Rochester, New York 14642
| | - Jiali He
- Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine & Dentistry, Rochester, NY 14642
| | - Yui Kawakami
- Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine & Dentistry, Rochester, NY 14642
| | - Lindsey Wainwright
- Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine & Dentistry, Rochester, NY 14642
- Department of Biochemistry & Biophysics, University of Rochester School of Medicine & Dentistry, Rochester, New York 14642
| | - Jared Hollinger
- Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine & Dentistry, Rochester, NY 14642
| | - Joshua Geiger
- Department of Vascular Surgery, University of Rochester School of Medicine & Dentistry, Rochester, New York 14642
| | - Huan Liu
- Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine & Dentistry, Rochester, NY 14642
| | - Fanju Meng
- Department of Biomedical Genetics, University of Rochester School of Medicine & Dentistry, Rochester, New York 14642
| | - George A. Porter
- Department of Pediatrics, Medicine, and Pharmacology and Physiology, University of Rochester School of Medicine & Dentistry, Rochester, New York 14642
| | - Zhenggen Jin
- Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine & Dentistry, Rochester, NY 14642
| | - Patrick Murphy
- Department of Biomedical Genetics, University of Rochester School of Medicine & Dentistry, Rochester, New York 14642
| | - Peng Yao
- Aab Cardiovascular Research Institute, Department of Medicine, University of Rochester School of Medicine & Dentistry, Rochester, NY 14642
- Department of Biochemistry & Biophysics, University of Rochester School of Medicine & Dentistry, Rochester, New York 14642
- The Center for RNA Biology, University of Rochester School of Medicine & Dentistry, Rochester, New York 14642
- The Center for Biomedical Informatics, University of Rochester School of Medicine & Dentistry, Rochester, New York 14642
| |
Collapse
|
47
|
Tseng YP, Chang YS, Mekala VR, Liu TY, Chang JG, Shieh GS. Whole-genome sequencing reveals rare variants associated with gout in Taiwanese males. Front Genet 2024; 15:1423714. [PMID: 39385933 PMCID: PMC11462091 DOI: 10.3389/fgene.2024.1423714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 08/28/2024] [Indexed: 10/12/2024] Open
Abstract
To identify rare variants (RVs) of gout, we sequenced the whole genomes of 321 male gout patients and combined these with those of 64 male gout patients and 682 normal controls at Taiwan Biobank. We performed ACAT-O to identify 682 significant RVs (p < 3.8 × 10-8) clustered on chromosomes 1, 7, 10, 16, and 18. To prioritize causal variants effectively, we sifted them by Combined Annotation-Dependent Depletion score >10 or |effect size| ≥ 1.5 for those without CADD scores. In particular, to the best of our knowledge, we identified the rare variants rs559954634, rs186763678, and 13-85340782-G-A for the first time to be associated with gout in Taiwanese males. Importantly, the RV rs559954634 positively affects gout, and its neighboring gene NPHS2 is involved in serum urate and expressed in kidney tissues. The kidneys play a major role in regulating uric acid levels. This suggests that rs559954634 may be involved in gout. Furthermore, rs186763678 is in the intron of NFIA that interacts with SLC2A9, which has the most significant effect on serum urate. Note that gene-gene interaction NFIA-SLC2A9 is significantly associated with serum urate in the Italian MICROS population and a Croatian population. Moreover, 13-85340782-G-A significantly affects gout susceptibility (odds ratio 6.0; P = 0.038). The >1% carrier frequencies of these potentially pathogenic (protective) RVs in cases (controls) suggest the revealed associations may be true; these RVs deserve further studies for the mechanism. Finally, multivariate logistic regression analysis shows that the rare variants rs559954634 and 13-85340782-G-A jointly are significantly associated with gout susceptibility.
Collapse
Affiliation(s)
- Yu-Ping Tseng
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Ya-Sian Chang
- Department of Pathology, Chung Shan Medical University Hospital, Taichung, Taiwan
| | | | - Ting-Yuan Liu
- Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Jan-Gowth Chang
- Department of Laboratory Medicine, China Medical University Hospital, Taichung, Taiwan
| | - Grace S. Shieh
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan
- Data Science Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
| |
Collapse
|
48
|
Hou T, Shen X, Zhang S, Liang M, Chen L, Lu Q. AIGen: an artificial intelligence software for complex genetic data analysis. Brief Bioinform 2024; 25:bbae566. [PMID: 39550221 PMCID: PMC11568876 DOI: 10.1093/bib/bbae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 09/12/2024] [Accepted: 11/11/2024] [Indexed: 11/18/2024] Open
Abstract
The recent development of artificial intelligence (AI) technology, especially the advance of deep neural network (DNN) technology, has revolutionized many fields. While DNN plays a central role in modern AI technology, it has rarely been used in genetic data analysis due to analytical and computational challenges brought by high-dimensional genetic data and an increasing number of samples. To facilitate the use of AI in genetic data analysis, we developed a C++ package, AIGen, based on two newly developed neural networks (i.e. kernel neural networks and functional neural networks) that are capable of modeling complex genotype-phenotype relationships (e.g. interactions) while providing robust performance against high-dimensional genetic data. Moreover, computationally efficient algorithms (e.g. a minimum norm quadratic unbiased estimation approach and batch training) are implemented in the package to accelerate the computation, making them computationally efficient for analyzing large-scale datasets with thousands or even millions of samples. By applying AIGen to the UK Biobank dataset, we demonstrate that it can efficiently analyze large-scale genetic data, attain improved accuracy, and maintain robust performance. Availability: AIGen is developed in C++ and its source code, along with reference libraries, is publicly accessible on GitHub at https://github.com/TingtHou/AIGen.
Collapse
Affiliation(s)
- Tingting Hou
- Department of Experimental Statistics, Louisiana State University, 45 Martin D. Woodin Hall, Baton Rouge, LA 70802, United States
| | - Xiaoxi Shen
- Department of Mathematics, Texas State University, 601 University Drive San Marcos, TX 78666, United States
| | - Shan Zhang
- Department of Biostatistics, University of Florida, 2004 Mowry Road, Gainesville, FL 32611, United States
| | - Muxuan Liang
- Department of Biostatistics, University of Florida, 2004 Mowry Road, Gainesville, FL 32611, United States
| | - Li Chen
- Department of Biostatistics, University of Florida, 2004 Mowry Road, Gainesville, FL 32611, United States
| | - Qing Lu
- Department of Biostatistics, University of Florida, 2004 Mowry Road, Gainesville, FL 32611, United States
| |
Collapse
|
49
|
Svishcheva GR, Belonogova NM, Kirichenko AV, Tsepilov YA, Axenovich TI. A New Method for Conditional Gene-Based Analysis Effectively Accounts for the Regional Polygenic Background. Genes (Basel) 2024; 15:1174. [PMID: 39336765 PMCID: PMC11431718 DOI: 10.3390/genes15091174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 08/27/2024] [Accepted: 09/05/2024] [Indexed: 09/30/2024] Open
Abstract
Gene-based association analysis is a powerful tool for identifying genes that explain trait variability. An essential step of this analysis is a conditional analysis. It aims to eliminate the influence of SNPs outside the gene, which are in linkage disequilibrium with intragenic SNPs. The popular conditional analysis method, GCTA-COJO, accounts for the influence of several top independently associated SNPs outside the gene, correcting the z statistics for intragenic SNPs. We suggest a new TauCOR method for conditional gene-based analysis using summary statistics. This method accounts the influence of the full regional polygenic background, correcting the genotype correlations between intragenic SNPs. As a result, the distribution of z statistics for intragenic SNPs becomes conditionally independent of distribution for extragenic SNPs. TauCOR is compatible with any gene-based association test. TauCOR was tested on summary statistics simulated under different scenarios and on real summary statistics for a 'gold standard' gene list from the Open Targets Genetics project. TauCOR proved to be effective in all modelling scenarios and on real data. The TauCOR's strategy showed comparable sensitivity and higher specificity and accuracy than GCTA-COJO on both simulated and real data. The method can be successfully used to improve the effectiveness of gene-based association analyses.
Collapse
Affiliation(s)
- Gulnara R Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Ave. Lavrentiev, 10, 630090 Novosibirsk, Russia
- Institute of General Genetics, Russian Academy of Sciences, Gubkin St. 3, 119311 Moscow, Russia
| | - Nadezhda M Belonogova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Ave. Lavrentiev, 10, 630090 Novosibirsk, Russia
| | - Anatoly V Kirichenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Ave. Lavrentiev, 10, 630090 Novosibirsk, Russia
| | - Yakov A Tsepilov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Ave. Lavrentiev, 10, 630090 Novosibirsk, Russia
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1RQ, UK
| | - Tatiana I Axenovich
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Ave. Lavrentiev, 10, 630090 Novosibirsk, Russia
| |
Collapse
|
50
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|