151
|
Das Adhikari S, Cui Y, Wang J. BayesKAT: Bayesian Optimal Kernel-based Test for genetic association studies reveals joint genetic effects in complex diseases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.18.562824. [PMID: 37905124 PMCID: PMC10614916 DOI: 10.1101/2023.10.18.562824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
GWAS methods have identified individual SNPs significantly associated with specific phenotypes. Nonetheless, many complex diseases are polygenic and are controlled by multiple genetic variants that are usually non-linearly dependent. These genetic variants are marginally less effective and remain undetected in GWAS analysis. Kernel-based tests (KBT), which evaluate the joint effect of a group of genetic variants, are therefore critical for complex disease analysis. However, choosing different kernel functions in KBT can significantly influence the type I error control and power, and selecting the optimal kernel remains a statistically challenging task. A few existing methods suffer from inflated type 1 errors, limited scalability, inferior power, or issues of ambiguous conclusions. Here, we present a new Bayesian framework, BayesKAT( https://github.com/wangjr03/BayesKAT ), which overcomes these kernel specification issues by selecting the optimal composite kernel adaptively from the data while testing genetic associations simultaneously. Furthermore, BayesKAT implements a scalable computational strategy to boost its applicability, especially for high-dimensional cases where other methods become less effective. Based on a series of performance comparisons using both simulated and real large-scale genetics data, BayesKAT outperforms the available methods in detecting complex group-level associations and controlling type I errors simultaneously. Applied on a variety of groups of functionally related genetic variants based on biological pathways, co-expression gene modules, and protein complexes, BayesKAT deciphers the complex genetic basis and provides mechanistic insights into human diseases.
Collapse
|
152
|
Hai Y, Zhao W, Meng Q, Liu L, Wen Y. Bayesian linear mixed model with multiple random effects for family-based genetic studies. Front Genet 2023; 14:1267704. [PMID: 37928242 PMCID: PMC10620972 DOI: 10.3389/fgene.2023.1267704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/25/2023] [Indexed: 11/07/2023] Open
Abstract
Motivation: Family-based study design is one of the popular designs used in genetic research, and the whole-genome sequencing data obtained from family-based studies offer many unique features for risk prediction studies. They can not only provide a more comprehensive view of many complex diseases, but also utilize information in the design to further improve the prediction accuracy. While promising, existing analytical methods often ignore the information embedded in the study design and overlook the predictive effects of rare variants, leading to a prediction model with sub-optimal performance. Results: We proposed a Bayesian linear mixed model for the prediction analysis of sequencing data obtained from family-based studies. Our method can not only capture predictive effects from both common and rare variants, but also easily accommodate various disease model assumptions. It uses information embedded in the study design to form surrogates, where the predictive effects from unmeasured/unknown genetic and environmental risk factors can be modelled. Through extensive simulation studies and the analysis of sequencing data obtained from the Michigan State University Twin Registry study, we have demonstrated that the proposed method outperforms commonly adopted techniques. Availability: R package is available at https://github.com/yhai943/FBLMM.
Collapse
Affiliation(s)
- Yang Hai
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Wenxuan Zhao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Qingyu Meng
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Long Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| |
Collapse
|
153
|
Falk I, Zhao M, Nait Saada J, Guo Q. Learning the kernel for rare variant genetic association test. Front Genet 2023; 14:1245238. [PMID: 37886683 PMCID: PMC10598548 DOI: 10.3389/fgene.2023.1245238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 09/14/2023] [Indexed: 10/28/2023] Open
Abstract
Introduction: Compared to Genome-Wide Association Studies (GWAS) for common variants, single-marker association analysis for rare variants is underpowered. Set-based association analyses for rare variants are powerful tools that capture some of the missing heritability in trait association studies. Methods: We extend the convex-optimized SKAT (cSKAT) test set procedure which learns from data the optimal convex combination of kernels, to the full Generalised Linear Model (GLM) setting with arbitrary non-genetic covariates. We call this extended cSKAT (ecSKAT) and show that the resulting optimization problem is a quadratic programming problem that can be solved with no additional cost compared to cSKAT. Results: We show that a modified objective is related to an upper bound for the p-value through a decreasing exponential term in the objective function, indicating that optimizing this objective function is a principled way of learning the combination of kernels. We evaluate the performance of the proposed method on continuous and binary traits using simulation studies and illustrate its application using UK Biobank Whole Exome Sequencing data on hand grip strength and systemic lupus erythematosus rare variant association analysis. Discussion: Our proposed ecSKAT method enables correcting for important confounders in association studies such as age, sex or population structure for both quantitative and binary traits. Simulation studies showed that ecSKAT can recover sensible weights and achieve higher power across different sample sizes and misspecification settings. Compared to the burden test and SKAT method, ecSKAT gives a lower p-value for the genes tested in both quantitative and binary traits in the UKBiobank cohort.
Collapse
Affiliation(s)
- Isak Falk
- Department of Computer Science, University College London, London, United Kingdom
- Computational Statistics and Machine Learning, Italian Institute of Technology, Genoa, Italy
| | | | | | - Qi Guo
- BenevolentAI, London, United Kingdom
| |
Collapse
|
154
|
Pan R, Dickie EW, Hawco C, Reid N, Voineskos AN, Park JY. Spatial-extent inference for testing variance components in reliability and heritability studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.19.537270. [PMID: 37131799 PMCID: PMC10153210 DOI: 10.1101/2023.04.19.537270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Clusterwise inference is a popular approach in neuroimaging to increase sensitivity, but most existing methods are currently restricted to the General Linear Model (GLM) for testing mean parameters. Statistical methods for testing variance components, which are critical in neuroimaging studies that involve estimation of narrow-sense heritability or test-retest reliability, are underdeveloped due to methodological and computational challenges, which would potentially lead to low power. We propose a fast and powerful test for variance components called CLEAN-V (CLEAN for testing Variance components). CLEAN-V models the global spatial dependence structure of imaging data and computes a locally powerful variance component test statistic by data-adaptively pooling neighborhood information. Correction for multiple comparisons is achieved by permutations to control family-wise error rate (FWER). Through analysis of task-fMRI data from the Human Connectome Project across five tasks and comprehensive data-driven simulations, we show that CLEAN-V outperforms existing methods in detecting test-retest reliability and narrow-sense heritability with significantly improved power, with the detected areas aligning with activation maps. The computational efficiency of CLEAN-V also speaks of its practical utility, and it is available as an R package.
Collapse
Affiliation(s)
- Ruyi Pan
- Department of Statistical Sciences, University of Toronto, Toronto, ON, M5G 1Z5, Canada
- The Centre for Addiction and Mental Health, Toronto, ON, M5T 1R8, Canada
| | - Erin W. Dickie
- The Centre for Addiction and Mental Health, Toronto, ON, M5T 1R8, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5T 1R8, Canada
| | - Colin Hawco
- The Centre for Addiction and Mental Health, Toronto, ON, M5T 1R8, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5T 1R8, Canada
| | - Nancy Reid
- Department of Statistical Sciences, University of Toronto, Toronto, ON, M5G 1Z5, Canada
| | - Aristotle N. Voineskos
- The Centre for Addiction and Mental Health, Toronto, ON, M5T 1R8, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5T 1R8, Canada
| | - Jun Young Park
- Department of Statistical Sciences, University of Toronto, Toronto, ON, M5G 1Z5, Canada
- Department of Psychology, University of Toronto, Toronto, ON, M5G 1Z5, Canada
| |
Collapse
|
155
|
Lee WP, Wang H, Dombroski B, Cheng PL, Tucci A, Si YQ, Farrell J, Tzeng JY, Leung YY, Malamon J, Wang LS, Vardarajan B, Farrer L, Schellenberg G. Structural Variation Detection and Association Analysis of Whole-Genome-Sequence Data from 16,905 Alzheimer's Diseases Sequencing Project Subjects. RESEARCH SQUARE 2023:rs.3.rs-3353179. [PMID: 37886469 PMCID: PMC10602095 DOI: 10.21203/rs.3.rs-3353179/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Structural variations (SVs) are important contributors to the genetics of human diseases. However, their role in Alzheimer's disease (AD) remains largely unstudied due to challenges in accurately detecting SVs. We analyzed whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (N = 16,905) and identified 400,234 (168,223 high-quality) SVs. Laboratory validation yielded a sensitivity of 82% (85% for high-quality). We found a significant burden of deletions and duplications in AD cases, particularly for singletons and homozygous events. On AD genes, we observed the ultra-rare SVs associated with the disease, including protein-altering SVs in ABCA7, APP, PLCG2, and SORL1. Twenty-one SVs are in linkage disequilibrium (LD) with known AD-risk variants, exemplified by a 5k deletion in complete LD with rs143080277 in NCK2. We also identified 16 SVs associated with AD and 13 SVs linked to AD-related pathological/cognitive endophenotypes. This study highlights the pivotal role of SVs in shaping our understanding of AD genetics.
Collapse
|
156
|
Norden-Krichmar TM, Rotroff D, Schwantes-An TH, Bataller R, Goldman D, Nagy LE, Liangpunsakul S. Genomic approaches to explore susceptibility and pathogenesis of alcohol use disorder and alcohol-associated liver disease. Hepatology 2023:01515467-990000000-00586. [PMID: 37796138 PMCID: PMC10985049 DOI: 10.1097/hep.0000000000000617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 08/13/2023] [Indexed: 10/06/2023]
Abstract
Excessive alcohol use is a major risk factor for the development of an alcohol use disorder (AUD) and contributes to a wide variety of other medical illnesses, including alcohol-associated liver disease (ALD). Both AUD and ALD are complex and causally interrelated diseases, and multiple factors other than alcohol consumption are implicated in the disease pathogenesis. While the underlying pathophysiology of AUD and ALD is complex, there is substantial evidence for a genetic susceptibility of both diseases. Current genome-wide association studies indicate that the genes associated with clinical AUD only poorly overlap with the genes identified for heavy drinking and, in turn, neither overlap with the genes identified for ALD. Uncovering the main genetic factors will enable us to identify molecular drivers underlying the pathogenesis, discover potential targets for therapy, and implement patient care early in disease progression. In this review, we described multiple genomic approaches and their implications to investigate the susceptibility and pathogenesis of both AUD and ALD. We concluded our review with a discussion of the knowledge gaps and future research on genomic studies in these 2 diseases.
Collapse
Affiliation(s)
| | - Daniel Rotroff
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH
| | - Tae-Hwi Schwantes-An
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN
| | - Ramon Bataller
- Liver Unit, Institut of Digestive and Metabolic Diseases, Hospital Clinic, Barcelona, Spain
- Institut d’Investigacions Biomediques August Pi i Sunyer (IDIBAPS)
| | - David Goldman
- Laboratory of Neurogenetics and Office of the Clinical Director, National Institute on Alcohol Abuse and Alcoholism, Rockville, MD
| | - Laura E. Nagy
- Center for Liver Disease Research, Department of Inflammation and Immunity, Cleveland Clinic, Cleveland, OH
- Gastroenterology and Hepatology, Cleveland Clinic, Cleveland, OH
- Department of Molecular Medicine, Case Western Reserve University, Cleveland, OH
| | - Suthat Liangpunsakul
- Division of Gastroenterology and Hepatology, Department of Medicine, Indianapolis, IN
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN
- Roudebush Veterans Administration Medical Center, Indianapolis, IN
| |
Collapse
|
157
|
Huang M, Lyu C, Liu N, Nembhard WN, Witte JS, Hobbs CA, Li M. A gene-based association test of interactions for maternal-fetal genotypes identifies genes associated with nonsyndromic congenital heart defects. Genet Epidemiol 2023; 47:475-495. [PMID: 37341229 PMCID: PMC11781787 DOI: 10.1002/gepi.22533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/13/2023] [Accepted: 06/02/2023] [Indexed: 06/22/2023]
Abstract
The risk of congenital heart defects (CHDs) may be influenced by maternal genes, fetal genes, and their interactions. Existing methods commonly test the effects of maternal and fetal variants one-at-a-time and may have reduced statistical power to detect genetic variants with low minor allele frequencies. In this article, we propose a gene-based association test of interactions for maternal-fetal genotypes (GATI-MFG) using a case-mother and control-mother design. GATI-MFG can integrate the effects of multiple variants within a gene or genomic region and evaluate the joint effect of maternal and fetal genotypes while allowing for their interactions. In simulation studies, GATI-MFG had improved statistical power over alternative methods, such as the single-variant test and functional data analysis (FDA) under various disease scenarios. We further applied GATI-MFG to a two-phase genome-wide association study of CHDs for the testing of both common variants and rare variants using 947 CHD case mother-infant pairs and 1306 control mother-infant pairs from the National Birth Defects Prevention Study (NBDPS). After Bonferroni adjustment for 23,035 genes, two genes on chromosome 17, TMEM107 (p = 1.64e-06) and CTC1 (p = 2.0e-06), were identified for significant association with CHD in common variants analysis. Gene TMEM107 regulates ciliogenesis and ciliary protein composition and was found to be associated with heterotaxy. Gene CTC1 plays an essential role in protecting telomeres from degradation, which was suggested to be associated with cardiogenesis. Overall, GATI-MFG outperformed the single-variant test and FDA in the simulations, and the results of application to NBDPS samples are consistent with existing literature supporting the association of TMEM107 and CTC1 with CHDs.
Collapse
Affiliation(s)
- Manyan Huang
- Department of Epidemiology and Biostatistics, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Chen Lyu
- Department of Population Health, New York University Grossman School of Medicine, New York City, New York, USA
| | - Nianjun Liu
- Department of Epidemiology and Biostatistics, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Wendy N. Nembhard
- Department of Epidemiology, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - John S. Witte
- Department of Epidemiology and Population Health, Stanford University, Stanford, California, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, California, USA
| | - Charlotte A. Hobbs
- Rady Children's Institute for Genomic Medicine, San Diego, California, USA
| | - Ming Li
- Department of Epidemiology and Biostatistics, Indiana University Bloomington, Bloomington, Indiana, USA
| | | |
Collapse
|
158
|
Zhu Y, Ryu S, Tare A, Barzilai N, Atzmon G, Suh Y. Targeted sequencing of the 9p21.3 region reveals association with reduced disease risks in Ashkenazi Jewish centenarians. Aging Cell 2023; 22:e13962. [PMID: 37605876 PMCID: PMC10577543 DOI: 10.1111/acel.13962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/22/2023] [Accepted: 08/01/2023] [Indexed: 08/23/2023] Open
Abstract
Genome-wide association studies (GWAS) have pinpointed the chromosomal locus 9p21.3 as a genetic hotspot for various age-related disorders. Common genetic variants in this locus are linked to multiple traits, including coronary artery diseases, cancers, and diabetes. Centenarians are known for their reduced risk and delayed onset of these conditions. To investigate whether this evasion of disease risks involves diminished genetic risks in the 9p21.3 locus, we sequenced this region in an Ashkenazi Jewish centenarian cohort (centenarians: n = 450, healthy controls: n = 500). Risk alleles associated with cancers, glaucoma, CAD, and T2D showed a significant depletion in centenarians. Furthermore, the risk and non-risk genotypes are linked to two distinct low-frequency variant profiles, enriched in controls and centenarians, respectively. Our findings provide evidence that the extreme longevity cohort is associated with collectively lower risks of multiple age-related diseases in the 9p21.3 locus.
Collapse
Affiliation(s)
- Yizhou Zhu
- Department of Obstetrics and GynecologyColumbia UniversityNew York CityNew YorkUSA
| | - Seungjin Ryu
- Department of Pharmacology, College of MedicineHallym UniversityChuncheonGangwonKorea
| | - Archana Tare
- Department of GeneticsAlbert Einstein College of MedicineBronxNew YorkUSA
| | - Nir Barzilai
- Department of GeneticsAlbert Einstein College of MedicineBronxNew YorkUSA
- Institute for Aging ResearchAlbert Einstein College of MedicineBronxNew YorkUSA
- Department of MedicineAlbert Einstein College of MedicineBronxNew YorkUSA
| | - Gil Atzmon
- Department of GeneticsAlbert Einstein College of MedicineBronxNew YorkUSA
- Department of MedicineAlbert Einstein College of MedicineBronxNew YorkUSA
- Department of Human Biology, Faculty of Natural SciencesUniversity of HaifaHaifaIsrael
| | - Yousin Suh
- Department of Obstetrics and GynecologyColumbia UniversityNew York CityNew YorkUSA
- Department of Genetics and DevelopmentColumbia UniversityNew York CityNew YorkUSA
| |
Collapse
|
159
|
Jin X, Shi G. Cauchy combination methods for the detection of gene-environment interactions for rare variants related to quantitative phenotypes. Heredity (Edinb) 2023; 131:241-252. [PMID: 37481617 PMCID: PMC10539363 DOI: 10.1038/s41437-023-00640-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 07/09/2023] [Accepted: 07/12/2023] [Indexed: 07/24/2023] Open
Abstract
The characterization of gene-environment interactions (GEIs) can provide detailed insights into the biological mechanisms underlying complex diseases. Despite recent interest in GEIs for rare variants, published GEI tests are underpowered for an extremely small proportion of causal rare variants in a gene or a region. By extending the aggregated Cauchy association test (ACAT), we propose three GEI tests to address this issue: a Cauchy combination GEI test with fixed main effects (CCGEI-F), a Cauchy combination GEI test with random main effects (CCGEI-R), and an omnibus Cauchy combination GEI test (CCGEI-O). ACAT was applied to combine p values of single-variant GEI analyses to obtain CCGEI-F and CCGEI-R and p values of multiple GEI tests were combined in CCGEI-O. Through numerical simulations, for small numbers of causal variants, CCGEI-F, CCGEI-R and CCGEI-O provided approximately 5% higher power than the existing GEI tests INT-FIX and INT-RAN; however, they had slightly higher power than the existing GEI test TOW-GE. For large numbers of causal variants, although CCGEI-F and CCGEI-R exhibited comparable or slightly lower power values than the competing tests, the results were still satisfactory. Among all simulation conditions evaluated, CCGEI-O provided significantly higher power than that of competing GEI tests. We further applied our GEI tests in genome-wide analyses of systolic blood pressure or diastolic blood pressure to detect gene-body mass index (BMI) interactions, using whole-exome sequencing data from UK Biobank. At a suggestive significance level of 1.0 × 10-4, KCNC4, GAR1, FAM120AOS and NT5C3B showed interactions with BMI by our GEI tests.
Collapse
Affiliation(s)
- Xiaoqin Jin
- State Key Laboratory of Integrated Services Networks, Xidian University, 2 South Taibai Road, Xi'an, Shaanxi, 710071, China.
| | - Gang Shi
- State Key Laboratory of Integrated Services Networks, Xidian University, 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| |
Collapse
|
160
|
Liang X, Sun H. Weighted Selection Probability to Prioritize Susceptible Rare Variants in Multi-Phenotype Association Studies with Application to a Soybean Genetic Data Set. J Comput Biol 2023; 30:1075-1088. [PMID: 37871292 DOI: 10.1089/cmb.2022.0487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023] Open
Abstract
Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.
Collapse
Affiliation(s)
- Xianglong Liang
- Department of Statistic, Pusan National University, Busan, Korea
| | - Hokeun Sun
- Department of Statistic, Pusan National University, Busan, Korea
| |
Collapse
|
161
|
Chi J, Xu M, Sheng X, Zhou Y. Association detection between multiple traits and rare variants based on family data via a nonparametric method. PeerJ 2023; 11:e16040. [PMID: 37780393 PMCID: PMC10541022 DOI: 10.7717/peerj.16040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 08/15/2023] [Indexed: 10/03/2023] Open
Abstract
Background The rapid development of next-generation sequencing technologies allow people to analyze human complex diseases at the molecular level. It has been shown that rare variants play important roles for human diseases besides common variants. Thus, effective statistical methods need to be proposed to test for the associations between traits (e.g., diseases) and rare variants. Currently, more and more rare genetic variants are being detected throughout the human genome, which demonstrates the possibility to study rare variants. Yet complex diseases are usually measured as a variety of forms, such as binary, ordinal, quantitative, or some mixture of them. Therefore, the genetic mapping problem can be attributable to the association detection between multiple traits and multiple loci, with sufficiently considering the correlated structure among multiple traits. Methods In this article, we construct a new non-parametric statistic by the generalized Kendall's τ theory based on family data. The new test statistic has an asymptotic distribution, it can be used to study the associations between multiple traits and rare variants, which broadens the way to identify genetic factors of human complex diseases. Results We apply our method (called Nonp-FAM) to analyze simulated data and GAW17 data, and conduct comprehensive comparison with some existing methods. Experimental results show that the proposed family-based method is powerful and robust for testing associations between multiple traits and rare variants, even if the data has some population stratification effect.
Collapse
Affiliation(s)
- Jinling Chi
- Department of Statistics, Heilongjiang University, Harbin, China
- School of Mathematics and Statistics, Xidian University, Xi’an, China
| | - Meijuan Xu
- Department of Statistics, Heilongjiang University, Harbin, China
| | - Xiaona Sheng
- School of Information Engineering, Harbin University, Harbin, China
| | - Ying Zhou
- Department of Statistics, Heilongjiang University, Harbin, China
| |
Collapse
|
162
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
163
|
Hong H, Schulze KV, Copeland IE, Atyam M, Kamp K, Hanchard NA, Belmont J, Ringel-Kulka T, Heitkemper M, Shulman RJ. Genetic Variants in Carbohydrate Digestive Enzyme and Transport Genes Associated with Risk of Irritable Bowel Syndrome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.20.23295800. [PMID: 37790351 PMCID: PMC10543038 DOI: 10.1101/2023.09.20.23295800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Irritable Bowel Syndrome (IBS) is characterized by abdominal pain and alterations in bowel pattern, such as constipation (IBS-C), diarrhea (IBS-D), or mixed (IBS-M). Since malabsorption of ingested carbohydrates (CHO) can cause abdominal symptoms that closely mimic those of IBS, identifying genetic mutations in CHO digestive enzymes associated with IBS symptoms is critical to ascertain IBS pathophysiology. Through candidate gene association studies, we identify several common variants in TREH, SI, SLC5A1 and SLC2A5 that are associated with IBS symptoms. By investigating rare recessive Mendelian or oligogenic inheritance patterns, we identify case-exclusive rare deleterious variation in known disease genes (SI, LCT, ALDOB, and SLC5A1) as well as candidate disease genes (MGAM and SLC5A2), providing potential evidence of monogenic or oligogenic inheritance in a subset of IBS cases. Finally, our data highlight that moderate to severe IBS-associated gastrointestinal symptoms are often observed in IBS cases carrying one or more of deleterious rare variants.
Collapse
Affiliation(s)
- Hyejeong Hong
- Department of Biobehavioral Health Sciences, University of Pennsylvania School of Nursing
| | | | - Ian E. Copeland
- Department of Molecular and Human Genetics, Baylor College of Medicine
| | - Manasa Atyam
- Department of Medicine, Baylor College of Medicine
| | - Kendra Kamp
- Department of Biobehavioral Nursing and Health Informatics, University of Washington School of Nursing
| | - Neil A. Hanchard
- Department of Molecular and Human Genetics, Baylor College of Medicine
| | - John Belmont
- Departments of Molecular and Human Genetics and Pediatrics, Baylor College of Medicine
| | - Tamar Ringel-Kulka
- Department of Maternal and Child Health, University of North Carolina at Chapel Hill Gillings School of Global Public Health
| | - Margaret Heitkemper
- Department of Biobehavioral Nursing and Health Informatics, University of Washington School of Nursing
| | - Robert J. Shulman
- Children’s Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine
| |
Collapse
|
164
|
Wang H, Dombroski BA, Cheng PL, Tucci A, Si YQ, Farrell JJ, Tzeng JY, Leung YY, Malamon JS, Wang LS, Vardarajan BN, Farrer LA, Schellenberg GD, Lee WP. Structural Variation Detection and Association Analysis of Whole-Genome-Sequence Data from 16,905 Alzheimer's Diseases Sequencing Project Subjects. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.13.23295505. [PMID: 37745545 PMCID: PMC10516060 DOI: 10.1101/2023.09.13.23295505] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Structural variations (SVs) are important contributors to the genetics of numerous human diseases. However, their role in Alzheimer's disease (AD) remains largely unstudied due to challenges in accurately detecting SVs. Here, we analyzed whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP, N=16,905 subjects) and identified 400,234 (168,223 high-quality) SVs. We found a significant burden of deletions and duplications in AD cases (OR=1.05, P=0.03), particularly for singletons (OR=1.12, P=0.0002) and homozygous events (OR=1.10, P<0.0004). On AD genes, the ultra-rare SVs, including protein-altering SVs in ABCA7, APP, PLCG2, and SORL1, were associated with AD (SKAT-O P=0.004). Twenty-one SVs are in linkage disequilibrium (LD) with known AD-risk variants, e.g., a deletion (chr2:105731359-105736864) in complete LD (R2=0.99) with rs143080277 (chr2:105749599) in NCK2. We also identified 16 SVs associated with AD and 13 SVs associated with AD-related pathological/cognitive endophenotypes. Our findings demonstrate the broad impact of SVs on AD genetics.
Collapse
Affiliation(s)
- Hui Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Beth A Dombroski
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Po-Liang Cheng
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Albert Tucci
- Bioinformatics Research Center, North Carolina State University, NC 27695, USA
| | - Ya-Qin Si
- Bioinformatics Research Center, North Carolina State University, NC 27695, USA
| | - John J Farrell
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, MA 02118, USA
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, NC 27695, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - John S Malamon
- Department of Surgery, Scholl of Medicine, University of Colorado, CO 80045, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Badri N Vardarajan
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, College of Physicians and Surgeons, Columbia University, NY 10032, USA
- Department of Neurology, College of Physicians and Surgeons, Columbia University and the New York Presbyterian Hospital, NY 10032, USA
| | - Lindsay A Farrer
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, MA 02118, USA
- Department of Neurology, Boston University School of Medicine, MA 02118, USA
- Department of Ophthalmology, Boston University School of Medicine, MA 02118, USA
- Department of Biostatistics, Boston University School of Public Health, MA 02118, USA
- Department of Epidemiology, Boston University School of Public Health, MA 02118, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| |
Collapse
|
165
|
Bass AJ, Bian S, Wingo AP, Wingo TS, Cutler DJ, Epstein MP. Identifying latent genetic interactions in genome-wide association studies using multiple traits. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557155. [PMID: 37745553 PMCID: PMC10515795 DOI: 10.1101/2023.09.11.557155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Genome-wide association studies of complex traits frequently find that SNP-based estimates of heritability are considerably smaller than estimates from classic family-based studies. This 'missing' heritability may be partly explained by genetic variants interacting with other genes or environments that are difficult to specify, observe, and detect. To circumvent these challenges, we propose a new method to detect genetic interactions that leverages pleiotropy from multiple related traits without requiring the interacting variable to be specified or observed. Our approach, Latent Interaction Testing (LIT), uses the observation that correlated traits with shared latent genetic interactions have trait variance and covariance patterns that differ by genotype. LIT examines the relationship between trait variance/covariance patterns and genotype using a flexible kernel-based framework that is computationally scalable for biobank-sized datasets with a large number of traits. We first use simulated data to demonstrate that LIT substantially increases power to detect latent genetic interactions compared to a trait-by-trait univariate method. We then apply LIT to four obesity-related traits in the UK Biobank and detect genetic variants with interactive effects near known obesity-related genes. Overall, we show that LIT, implemented in the R package lit, uses shared information across traits to improve detection of latent genetic interactions compared to standard approaches.
Collapse
Affiliation(s)
- Andrew J. Bass
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Shijia Bian
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Aliza P. Wingo
- Department of Psychiatry, Emory University, Atlanta, GA 30322, USA
| | - Thomas S. Wingo
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
- Department of Neurology, Emory University, Atlanta, GA 30322, USA
| | - David J. Cutler
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | | |
Collapse
|
166
|
Aldisi R, Hassanin E, Sivalingam S, Buness A, Klinkhammer H, Mayr A, Fröhlich H, Krawitz P, Maj C. Gene-based burden scores identify rare variant associations for 28 blood biomarkers. BMC Genom Data 2023; 24:50. [PMID: 37667186 PMCID: PMC10476296 DOI: 10.1186/s12863-023-01155-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 08/28/2023] [Indexed: 09/06/2023] Open
Abstract
BACKGROUND A relevant part of the genetic architecture of complex traits is still unknown; despite the discovery of many disease-associated common variants. Polygenic risk score (PRS) models are based on the evaluation of the additive effects attributable to common variants and have been successfully implemented to assess the genetic susceptibility for many phenotypes. In contrast, burden tests are often used to identify an enrichment of rare deleterious variants in specific genes. Both kinds of genetic contributions are typically analyzed independently. Many studies suggest that complex phenotypes are influenced by both low effect common variants and high effect rare deleterious variants. The aim of this paper is to integrate the effect of both common and rare functional variants for a more comprehensive genetic risk modeling. METHODS We developed a framework combining gene-based scores based on the enrichment of rare functionally relevant variants with genome-wide PRS based on common variants for association analysis and prediction models. We applied our framework on UK Biobank dataset with genotyping and exome data and considered 28 blood biomarkers levels as target phenotypes. For each biomarker, an association analysis was performed on full cohort using gene-based scores (GBS). The cohort was then split into 3 subsets for PRS construction and feature selection, predictive model training, and independent evaluation, respectively. Prediction models were generated including either PRS, GBS or both (combined). RESULTS Association analyses of the cohort were able to detect significant genes that were previously known to be associated with different biomarkers. Interestingly, the analyses also revealed heterogeneous effect sizes and directionality highlighting the complexity of the blood biomarkers regulation. However, the combined models for many biomarkers show little or no improvement in prediction accuracy compared to the PRS models. CONCLUSION This study shows that rare variants play an important role in the genetic architecture of complex multifactorial traits such as blood biomarkers. However, while rare deleterious variants play a strong role at an individual level, our results indicate that classical common variant based PRS might be more informative to predict the genetic susceptibility at the population level.
Collapse
Affiliation(s)
- Rana Aldisi
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany.
| | - Emadeldin Hassanin
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
- Luxembourg Center for Systems Biomedicine, University of Luxembourg, Esch-Sur-Alzette, Luxembourg
| | - Sugirthan Sivalingam
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
- Core Unit for Bioinformatics Analysis, University Hospital Bonn, Bonn, Germany
- Institute of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| | - Andreas Buness
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
- Core Unit for Bioinformatics Analysis, University Hospital Bonn, Bonn, Germany
- Institute of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| | - Hannah Klinkhammer
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
- Institute of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| | - Andreas Mayr
- Institute of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| | - Holger Fröhlich
- Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany
- Bonn-Aachen International Center for IT (b-it), University of Bonn, Bonn, Germany
| | - Peter Krawitz
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
| | - Carlo Maj
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
- Centre for Human Genetics, University of Marburg, Marburg, Germany
| |
Collapse
|
167
|
Bocher O, Marenne G, Génin E, Perdry H. Ravages: An R package for the simulation and analysis of rare variants in multicategory phenotypes. Genet Epidemiol 2023; 47:450-460. [PMID: 37158367 DOI: 10.1002/gepi.22529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 03/27/2023] [Accepted: 04/25/2023] [Indexed: 05/10/2023]
Abstract
Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages.
Collapse
Affiliation(s)
- Ozvan Bocher
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France
- Institute of Translational Genomics, Helmholtz Zentrum München, Munich, Germany
| | | | | | - Hervé Perdry
- CESP Inserm, U1018, UFR Médecine, Univ Paris-Sud, Université Paris-Saclay, Villejuif, France
| |
Collapse
|
168
|
Xu J, Xu W, Choi J, Brhane Y, Christiani DC, Kothari J, McKay J, Field JK, Davies MPA, Liu G, Amos CI, Hung RJ, Briollais L. Large-scale whole exome sequencing studies identify two genes,CTSL and APOE, associated with lung cancer. PLoS Genet 2023; 19:e1010902. [PMID: 37738239 PMCID: PMC10516417 DOI: 10.1371/journal.pgen.1010902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 08/07/2023] [Indexed: 09/24/2023] Open
Abstract
Common genetic variants associated with lung cancer have been well studied in the past decade. However, only 12.3% heritability has been explained by these variants. In this study, we investigate the contribution of rare variants (RVs) (minor allele frequency <0.01) to lung cancer through two large whole exome sequencing case-control studies. We first performed gene-based association tests using a novel Bayes Factor statistic in the International Lung Cancer Consortium, the discovery study (European, 1042 cases vs. 881 controls). The top genes identified are further assessed in the UK Biobank (European, 630 cases vs. 172 864 controls), the replication study. After controlling for the false discovery rate, we found two genes, CTSL and APOE, significantly associated with lung cancer in both studies. Single variant tests in UK Biobank identified 4 RVs (3 missense variants) in CTSL and 2 RVs (1 missense variant) in APOE stongly associated with lung cancer (OR between 2.0 and 139.0). The role of these genetic variants in the regulation of CTSL or APOE expression remains unclear. If such a role is established, this could have important therapeutic implications for lung cancer patients.
Collapse
Affiliation(s)
- Jingxiong Xu
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Wei Xu
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Jiyeon Choi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Yonathan Brhane
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - David C. Christiani
- T. H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Jui Kothari
- Department of Environmental Health, T. H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - James McKay
- International Agency for Research on Cancer, Lyon, France
| | - John K. Field
- Department of Molecular and Clinical Cancer Medicine, The University of Liverpool, Liverpool, United Kingdom
| | - Michael P. A. Davies
- Department of Molecular and Clinical Cancer Medicine, The University of Liverpool, Liverpool, United Kingdom
| | - Geoffrey Liu
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Christopher I. Amos
- Dan L. Duncan Comprehensive Cancer Center, Department of Medicine, Baylor College of Medicine, Houston, Texas, United States of America
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, Texas, United States of America
| | - Rayjean J. Hung
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Laurent Briollais
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
169
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
170
|
Babadi M, Fu JM, Lee SK, Smirnov AN, Gauthier LD, Walker M, Benjamin DI, Zhao X, Karczewski KJ, Wong I, Collins RL, Sanchis-Juan A, Brand H, Banks E, Talkowski ME. GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data. Nat Genet 2023; 55:1589-1597. [PMID: 37604963 PMCID: PMC10904014 DOI: 10.1038/s41588-023-01449-0] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 06/16/2023] [Indexed: 08/23/2023]
Abstract
Copy number variants (CNVs) are major contributors to genetic diversity and disease. While standardized methods, such as the genome analysis toolkit (GATK), exist for detecting short variants, technical challenges have confounded uniform large-scale CNV analyses from whole-exome sequencing (WES) data. Given the profound impact of rare and de novo coding CNVs on genome organization and human disease, we developed GATK-gCNV, a flexible algorithm to discover rare CNVs from sequencing read-depth information, complete with open-source distribution via GATK. We benchmarked GATK-gCNV in 7,962 exomes from individuals in quartet families with matched genome sequencing and microarray data, finding up to 95% recall of rare coding CNVs at a resolution of more than two exons. We used GATK-gCNV to generate a reference catalog of rare coding CNVs in WES data from 197,306 individuals in the UK Biobank, and observed strong correlations between per-gene CNV rates and measures of mutational constraint, as well as rare CNV associations with multiple traits. In summary, GATK-gCNV is a tunable approach for sensitive and specific CNV discovery in WES data, with broad applications.
Collapse
Affiliation(s)
- Mehrtash Babadi
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Jack M Fu
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Samuel K Lee
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Andrey N Smirnov
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laura D Gauthier
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark Walker
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - David I Benjamin
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Isaac Wong
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Ryan L Collins
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Harrison Brand
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Eric Banks
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
171
|
Jiang Z, Zhang H, Ahearn TU, Garcia-Closas M, Chatterjee N, Zhu H, Zhan X, Zhao N. The sequence kernel association test for multicategorical outcomes. Genet Epidemiol 2023; 47:432-449. [PMID: 37078108 DOI: 10.1002/gepi.22527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 03/29/2023] [Accepted: 03/30/2023] [Indexed: 04/21/2023]
Abstract
Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set-based analysis methods for genome-wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set-based association analysis method, sequence kernel association test (SKAT)-MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT-MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT-MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER- breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N = 127 , 127 $N=127,127$ ) with SKAT-MC, and identified 21 significant genes in the genome. Consequently, SKAT-MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT-MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.
Collapse
Affiliation(s)
- Zhiwen Jiang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Thomas U Ahearn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Xiang Zhan
- Department of Biostatistics, Peking University, Beijing, China
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
172
|
Hu X, Jiang X, Li J, Zhao N, Gan H, Hu X, Li L, Liu X, Shan H, Bai Y, Pang P. Identification of potential genetic Loci and polygenic risk model for Budd-Chiari syndrome in Chinese population. iScience 2023; 26:107287. [PMID: 37539039 PMCID: PMC10393737 DOI: 10.1016/j.isci.2023.107287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/19/2023] [Accepted: 07/02/2023] [Indexed: 08/05/2023] Open
Abstract
Budd-Chiari syndrome (BCS) is characterized by hepatic venous outflow obstruction, posing life-threatening risks in severe cases. Reported risk factors include inherited and acquired hypercoagulable states or other predisposing factors. However, many patients have no identifiable etiology, and causes of BCS differ between the West and East. This study recruited 500 BCS patients and 696 normal individuals for whole-exome sequencing and developed a polygenic risk scoring (PRS) model using PLINK, LASSOSUM, BLUP, and BayesA methods. Risk factors for venous thromboembolism and vascular malformations were also assessed for BCS risk prediction. Ultimately, we discovered potential BCS risk mutations, such as rs1042331, and the optimal BayesA-generated PRS model presented an AUC >0.9 in the external replication cohort. This model provides particular insights into genetic risk differences between China and the West and suggests shared genetic risks among BCS, venous thromboembolism, and vascular malformations, offering different perspectives on BCS pathogenesis.
Collapse
Affiliation(s)
- Xiaojun Hu
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Xiaosen Jiang
- BGI-Shenzhen, Shenzhen, China
- College of Life Sciences, University of the Chinese Academy of Sciences, Beijing, China
| | - Jia Li
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
- Hebei Industrial Technology Research Institute of Genomics in Maternal & Child Health, Shijiazhuang BGI Genomics Co., Ltd, Shijiazhuang, China
| | - Ni Zhao
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Hairun Gan
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Xinyan Hu
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Luting Li
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Xingtao Liu
- Changfeng Hospital of Jinjiang District, Chengdu, China
| | - Hong Shan
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | | | - Pengfei Pang
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
- Guangdong Provincial Key Laboratory of Biomedical Imaging, Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
- Guangdong Provincial Engineering Research Center of Molecular Imaging, Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
| |
Collapse
|
173
|
Fu B, Pazokitoroudi A, Sudarshan M, Liu Z, Subramanian L, Sankararaman S. Fast kernel-based association testing of non-linear genetic effects for biobank-scale data. Nat Commun 2023; 14:4936. [PMID: 37582955 PMCID: PMC10427662 DOI: 10.1038/s41467-023-40346-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 07/18/2023] [Indexed: 08/17/2023] Open
Abstract
Our knowledge of non-linear genetic effects on complex traits remains limited, in part, due to the modest power to detect such effects. While kernel-based tests offer a versatile approach to test for non-linear relationships between sets of genetic variants and traits, current approaches cannot be applied to Biobank-scale datasets containing hundreds of thousands of individuals. We propose, FastKAST, a kernel-based approach that can test for non-linear effects of a set of variants on a quantitative trait. FastKAST provides calibrated hypothesis tests while enabling analysis of Biobank-scale datasets with hundreds of thousands of unrelated individuals from a homogeneous population. We apply FastKAST to 53 quantitative traits measured across ≈ 300 K unrelated white British individuals in the UK Biobank to detect sets of variants with non-linear effects at genome-wide significance.
Collapse
Affiliation(s)
- Boyang Fu
- Department of Computer Science, UCLA, Los Angeles, CA, USA.
| | | | - Mukund Sudarshan
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Zhengtong Liu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Lakshminarayanan Subramanian
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- Department of Population Health, NYU Grossman School of Medicine, New York, NY, USA
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
174
|
Stamp J, DenAdel A, Weinreich D, Crawford L. Leveraging the genetic correlation between traits improves the detection of epistasis in genome-wide association studies. G3 (BETHESDA, MD.) 2023; 13:jkad118. [PMID: 37243672 PMCID: PMC10484060 DOI: 10.1093/g3journal/jkad118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/11/2023] [Accepted: 05/23/2023] [Indexed: 05/29/2023]
Abstract
Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this study, we present the "multivariate MArginal ePIstasis Test" (mvMAPIT)-a multioutcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact-thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search-based methods. Our proposed mvMAPIT builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate mvMAPIT as a multivariate linear mixed model and develop a multitrait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. With simulations, we illustrate the benefits of mvMAPIT over univariate (or single-trait) epistatic mapping strategies. We also apply mvMAPIT framework to protein sequence data from two broadly neutralizing anti-influenza antibodies and approximately 2,000 heterogeneous stock of mice from the Wellcome Trust Centre for Human Genetics. The mvMAPIT R package can be downloaded at https://github.com/lcrawlab/mvMAPIT.
Collapse
Affiliation(s)
- Julian Stamp
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Alan DenAdel
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Daniel Weinreich
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02906, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Biostatistics, Brown University, Providence, RI 02903, USA
- Microsoft Research New England, Cambridge, MA 02142, USA
| |
Collapse
|
175
|
McCaw ZR, O'Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. An allelic-series rare-variant association test for candidate-gene discovery. Am J Hum Genet 2023; 110:1330-1342. [PMID: 37494930 PMCID: PMC10432147 DOI: 10.1016/j.ajhg.2023.07.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 06/30/2023] [Accepted: 07/01/2023] [Indexed: 07/28/2023] Open
Abstract
Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Francesco Paolo Casale
- Institute of AI for Health, Helmholtz Munich, Neuherberg, Germany; Helmholtz Pioneer Campus, Helmholtz Munich, Neuherberg, Germany; School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | | | | |
Collapse
|
176
|
Gupta R, Kanai M, Durham TJ, Tsuo K, McCoy JG, Kotrys AV, Zhou W, Chinnery PF, Karczewski KJ, Calvo SE, Neale BM, Mootha VK. Nuclear genetic control of mtDNA copy number and heteroplasmy in humans. Nature 2023; 620:839-848. [PMID: 37587338 PMCID: PMC10447254 DOI: 10.1038/s41586-023-06426-5] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 07/11/2023] [Indexed: 08/18/2023]
Abstract
Mitochondrial DNA (mtDNA) is a maternally inherited, high-copy-number genome required for oxidative phosphorylation1. Heteroplasmy refers to the presence of a mixture of mtDNA alleles in an individual and has been associated with disease and ageing. Mechanisms underlying common variation in human heteroplasmy, and the influence of the nuclear genome on this variation, remain insufficiently explored. Here we quantify mtDNA copy number (mtCN) and heteroplasmy using blood-derived whole-genome sequences from 274,832 individuals and perform genome-wide association studies to identify associated nuclear loci. Following blood cell composition correction, we find that mtCN declines linearly with age and is associated with variants at 92 nuclear loci. We observe that nearly everyone harbours heteroplasmic mtDNA variants obeying two principles: (1) heteroplasmic single nucleotide variants tend to arise somatically and accumulate sharply after the age of 70 years, whereas (2) heteroplasmic indels are maternally inherited as mixtures with relative levels associated with 42 nuclear loci involved in mtDNA replication, maintenance and novel pathways. These loci may act by conferring a replicative advantage to certain mtDNA alleles. As an illustrative example, we identify a length variant carried by more than 50% of humans at position chrM:302 within a G-quadruplex previously proposed to mediate mtDNA transcription/replication switching2,3. We find that this variant exerts cis-acting genetic control over mtDNA abundance and is itself associated in-trans with nuclear loci encoding machinery for this regulatory switch. Our study suggests that common variation in the nuclear genome can shape variation in mtCN and heteroplasmy dynamics across the human population.
Collapse
Affiliation(s)
- Rahul Gupta
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Timothy J Durham
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kristin Tsuo
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Jason G McCoy
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anna V Kotrys
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Wei Zhou
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Patrick F Chinnery
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
- MRC Mitochondrial Biology Unit, University of Cambridge, Cambridge, UK
| | - Konrad J Karczewski
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Sarah E Calvo
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin M Neale
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
| | - Vamsi K Mootha
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
177
|
Cuomo ASE, Nathan A, Raychaudhuri S, MacArthur DG, Powell JE. Single-cell genomics meets human genetics. Nat Rev Genet 2023; 24:535-549. [PMID: 37085594 PMCID: PMC10784789 DOI: 10.1038/s41576-023-00599-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/29/2023] [Indexed: 04/23/2023]
Abstract
Single-cell genomic technologies are revealing the cellular composition, identities and states in tissues at unprecedented resolution. They have now scaled to the point that it is possible to query samples at the population level, across thousands of individuals. Combining single-cell information with genotype data at this scale provides opportunities to link genetic variation to the cellular processes underpinning key aspects of human biology and disease. This strategy has potential implications for disease diagnosis, risk prediction and development of therapeutic solutions. But, effectively integrating large-scale single-cell genomic data, genetic variation and additional phenotypic data will require advances in data generation and analysis methods. As single-cell genetics begins to emerge as a field in its own right, we review its current state and the challenges and opportunities ahead.
Collapse
Affiliation(s)
- Anna S E Cuomo
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
| | - Aparna Nathan
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Joseph E Powell
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
178
|
Lo Faro V, Johansson T, Höglund J, Hadizadeh F, Johansson Å. Polygenic risk scores and risk stratification in deep vein thrombosis. Thromb Res 2023; 228:151-162. [PMID: 37331118 DOI: 10.1016/j.thromres.2023.06.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 05/18/2023] [Accepted: 06/09/2023] [Indexed: 06/20/2023]
Abstract
INTRODUCTION Deep vein thrombosis (DVT) is a complex disease, where 60 % of risk is due to genetic factors, such as the Factor V Leiden (FVL) variant. DVT is either asymptomatic or manifests with unspecific symptoms and, if left untreated, DVT leads to severe complications. The impact is dramatic and currently, there is still a research gap in DVT prevention. We characterized the genetic contribution and stratified individuals based on genetic makeup to evaluate if it favorably impacts risk prediction. METHODS In the UK Biobank (UKB), we performed gene-based association tests using exome sequencing data, as well as a genome-wide association study. We also constructed polygenic risk scores (PRS) in a subset of the cohort (Number of cases = 8231; Number of controls = 276,360) and calculated the impact on the prediction capacity of the PRS in a non-overlapping part of the cohort (Number of cases = 4342; Number of controls = 142,822). We generated additional PRSs that excluded the known causative variants. RESULTS We discovered and replicated a novel common variant (rs11604583) near the region where are located the TRIM51 and LRRC55 genes and identified a novel rare variant (rs187725533) located near the CREB3L1 gene, associated with 2.5-fold higher risk of DVT. In one of the PRS models constructed, the top decile of risk is associated with 3.4-fold increased risk, an effect that is 2.3-fold when excluding FVL carriers. In the top PRS decile, the cumulative risk of DVT at the age of 80 years is 10 % for FVL carriers, contraposed to 5 % for non-carriers. The population attributable fractions of having a high polygenic risk on the rate of DVT was estimated to be around 20 % in our cohort. CONCLUSION Individuals with a high polygenic risk of DVT, and not only carriers of well-studied variants such as FVL, may benefit from prevention strategies.
Collapse
Affiliation(s)
- Valeria Lo Faro
- Department of Immunology, Genetics and Pathology, Genomics and Neurobiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
| | - Therese Johansson
- Department of Immunology, Genetics and Pathology, Genomics and Neurobiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden; Centre for Women's Mental Health during the Reproductive Lifespan - Womher, Uppsala University, Uppsala, Sweden
| | - Julia Höglund
- Department of Immunology, Genetics and Pathology, Genomics and Neurobiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Fatemeh Hadizadeh
- Department of Immunology, Genetics and Pathology, Genomics and Neurobiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Åsa Johansson
- Department of Immunology, Genetics and Pathology, Genomics and Neurobiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
179
|
Dai J, Wang T, Xu K, Sun Y, Li Z, Chen P, Wang H, Wu D, Chen Y, Xiao L, Liu H, Wei H, Li R, Peng L, Yu T, Wang Y, Sun Z, Wang DW. Machine learning modeling identifies hypertrophic cardiomyopathy subtypes with genetic signature. Front Med 2023; 17:768-780. [PMID: 37121957 DOI: 10.1007/s11684-023-0982-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 01/05/2023] [Indexed: 05/02/2023]
Abstract
Previous studies have revealed that patients with hypertrophic cardiomyopathy (HCM) exhibit differences in symptom severity and prognosis, indicating potential HCM subtypes among these patients. Here, 793 patients with HCM were recruited at an average follow-up of 32.78 ± 27.58 months to identify potential HCM subtypes by performing consensus clustering on the basis of their echocardiography features. Furthermore, we proposed a systematic method for illustrating the relationship between the phenotype and genotype of each HCM subtype by using machine learning modeling and interactome network detection techniques based on whole-exome sequencing data. Another independent cohort that consisted of 414 patients with HCM was recruited to replicate the findings. Consequently, two subtypes characterized by different clinical outcomes were identified in HCM. Patients with subtype 2 presented asymmetric septal hypertrophy associated with a stable course, while those with subtype 1 displayed left ventricular systolic dysfunction and aggressive progression. Machine learning modeling based on personal whole-exome data identified 46 genes with mutation burden that could accurately predict subtype propensities. Furthermore, the patients in another cohort predicted as subtype 1 by the 46-gene model presented increased left ventricular end-diastolic diameter and reduced left ventricular ejection fraction. By employing echocardiography and genetic screening for the 46 genes, HCM can be classified into two subtypes with distinct clinical outcomes.
Collapse
Affiliation(s)
- Jiaqi Dai
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Tao Wang
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China
| | - Ke Xu
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Yang Sun
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Zongzhe Li
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Peng Chen
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Hong Wang
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Dongyang Wu
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Yanghui Chen
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Lei Xiao
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Hao Liu
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Haoran Wei
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Rui Li
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Liyuan Peng
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Ting Yu
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Yan Wang
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Zhongsheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Dao Wen Wang
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China.
- Hubei Key Laboratory of Genetics and Molecular Mechanism of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China.
| |
Collapse
|
180
|
Hu Y, Yu Z, Gao X, Liu G, Zhang Y, Šmarda P, Guo Q. Genetic diversity, population structure, and genome-wide association analysis of ginkgo cultivars. HORTICULTURE RESEARCH 2023; 10:uhad136. [PMID: 37564270 PMCID: PMC10410194 DOI: 10.1093/hr/uhad136] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 07/02/2023] [Indexed: 08/12/2023]
Abstract
Ginkgo biloba is an economically valuable tree worldwide. The species has nearly become extinct during the Quaternary, which has likely resulted in reduction of its genetic variability. The genetic variability is now conserved in few natural populations in China and a number of cultivars that are, however, derived from a few ancient trees, helping the species survive in China through medieval times. Despite the recent interest in ginkgo, however, detailed knowledge of its genetic diversity, conserved in cultivated trees and cultivars, has remained poor. This limits efficient conservation of its diversity as well as efficient use of the existing germplasm resources. Here we performed genotyping-by-sequencing (GBS) on 102 cultivated germplasms of ginkgo collected to explore their genetic structure, kinship, and inbreeding prediction. For the first time in ginkgo, a genome-wide association analysis study (GWAS) was used to attempt gene mapping of seed traits. The results showed that most of the germplasms did not show any obvious genetic relationship. The size of the ginkgo germplasm population expanded significantly around 1500 years ago during the Sui and Tang dynasties. Classification of seed cultivars based on a phylogenetic perspective does not support the current classification criteria based on phenotype. Twenty-four candidate genes were localized after performing GWAS on the seed traits. Overall, this study reveals the genetic basis of ginkgo seed traits and provides insights into its cultivation history. These findings will facilitate the conservation and utilization of the domesticated germplasms of this living fossil plant.
Collapse
Affiliation(s)
- Yaping Hu
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Zhaoyan Yu
- Coconut Research Institute of Chinese Academy of Tropical Agricultural Science, Wenchang, Hainan 571339, China
| | - Xiaoge Gao
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Ganping Liu
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Yun Zhang
- Institute of Grassland, Flowers, and Ecology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
| | - Petr Šmarda
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Koltlářská 2, Brno 61137, Czech Republic
| | - Qirong Guo
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| |
Collapse
|
181
|
Devogel N, Auer PL, Manansala R, Wang T. On asymptotic distributions of several test statistics for familial relatedness in linear mixed models. Stat Med 2023; 42:2962-2981. [PMID: 37345498 DOI: 10.1002/sim.9762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 03/16/2023] [Accepted: 04/26/2023] [Indexed: 06/23/2023]
Abstract
In this study, the asymptotic distributions of the likelihood ratio test (LRT), the restricted likelihood ratio test (RLRT), the F and the sequence kernel association test (SKAT) statistics for testing an additive effect of the expected familial relatedness (FR) in a linear mixed model are examined based on an eigenvalue approach. First, the covariance structure for modeling the FR effect in a LMM is presented. Then, the multiplicity of eigenvalues for the log-likelihood and restricted log-likelihood is established under a replicate family setting and extended to a more general replicate family setting (GRFS) as well. After that, the asymptotic null distributions of LRT, RLRT, F and SKAT statistics under GRFS are derived. The asymptotic null distribution of SKAT for testing genetic rare variants is also constructed. In addition, a simple formula for sample size calculation is provided based on the restricted maximum likelihood estimate of the effect size for the expected FR. Finally, a power comparison of these test statistics on hypothesis test of the expected FR effect is made via simulation. The four test statistics are also applied to a data set from the UK Biobank.
Collapse
Affiliation(s)
- Nicholas Devogel
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Paul L Auer
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Regina Manansala
- Centre for Health Economics Research & Modelling Infectious Diseases, Vaccine & Infectious Disease Institute WHO Collaborating Centre, Faculty of Medicine & Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Tao Wang
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
182
|
Chakraborty S, Kahali B. Exome-wide analysis reveals role of LRP1 and additional novel loci in cognition. HGG ADVANCES 2023; 4:100208. [PMID: 37305557 PMCID: PMC10248556 DOI: 10.1016/j.xhgg.2023.100208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 05/16/2023] [Indexed: 06/13/2023] Open
Abstract
Cognitive functioning is heritable, with metabolic risk factors known to accelerate age-associated cognitive decline. Identifying genetic underpinnings of cognition is thus crucial. Here, we undertake single-variant and gene-based association analyses upon 6 neurocognitive phenotypes across 6 cognition domains in whole-exome sequencing data from 157,160 individuals of the UK Biobank cohort to expound the genetic architecture of human cognition. We report 20 independent loci associated with 5 cognitive domains while controlling for APOE isoform-carrier status and metabolic risk factors; 18 of which were not previously reported, and implicated genes relating to oxidative stress, synaptic plasticity and connectivity, and neuroinflammation. A subset of significant hits for cognition indicates mediating effects via metabolic traits. Some of these variants also exhibit pleiotropic effects on metabolic traits. We further identify previously unknown interactions of APOE variants with LRP1 (rs34949484 and others, suggestively significant), AMIGO1 (rs146766120; pAla25Thr, significant), and ITPR3 (rs111522866, significant), controlling for lipid and glycemic risks. Our gene-based analysis also suggests that APOC1 and LRP1 have plausible roles along shared pathways of amyloid beta (Aβ) and lipid and/or glucose metabolism in affecting complex processing speed and visual attention. In addition, we report pairwise suggestive interactions of variants harbored in these genes with APOE affecting visual attention. Our report based on this large-scale exome-wide study highlights the effects of neuronal genes, such as LRP1, AMIGO1, and other genomic loci, thus providing further evidence of the genetic underpinnings for cognition during aging.
Collapse
Affiliation(s)
- Shreya Chakraborty
- Centre for Brain Research, Indian Institute of Science, Bangalore, Karnataka 560012, India
- Interdisciplinary Mathematical Sciences, Indian Institute of Science, Bangalore, Karnataka 560012, India
| | - Bratati Kahali
- Centre for Brain Research, Indian Institute of Science, Bangalore, Karnataka 560012, India
| |
Collapse
|
183
|
Zheng J, Wang X, Li J, Wu Y, Chang J, Xin J, Wang M, Wang T, Wei Q, Wang M, Zhang R. Rare variants confer shared susceptibility to gastrointestinal tract cancer risk. Front Oncol 2023; 13:1161639. [PMID: 37483484 PMCID: PMC10358854 DOI: 10.3389/fonc.2023.1161639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 06/12/2023] [Indexed: 07/25/2023] Open
Abstract
Background Cancers arising within the gastrointestinal tract are complex disorders involving genetic events that cause the conversion of normal tissue to premalignant lesions and malignancy. Shared genetic features are reported in epithelial-based gastrointestinal cancers which indicate common susceptibility among this group of malignancies. In addition, the contribution of rare variants may constitute parts of genetic susceptibility. Methods A cross-cancer analysis of 38,171 shared rare genetic variants from genome-wide association assays was conducted, which included data from 3,194 cases and 1,455 controls across three cancer sites (esophageal, gastric and colorectal). The SNP-level association was performed by multivariate logistic regression analyses for single cancer, followed by association analysis for SubSETs (ASSET) to adjust the bias of overlapping controls. Gene-level analyses were conducted by SKAT-O, with multiple comparison adjustments by false discovery rate (FDR). Based on the significant genes indicated by SKATO analysis, pathways analysis was conducted using Gene Ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome databases. Results Meta-analysis in three gastrointestinal (GI) cancers identified 13 novel susceptibility loci that reached genome-wide significance (P ASSET< 5×10-8). SKAT-O analysis revealed EXOC6, LRP5L and MIR1263/LINC01324 to be significant genes shared by GI cancers (P adj<0.05, P FDR<0.05). Furthermore, GO pathway analysis identified significant enrichment of synaptic transmission and neuron development pathways shared by all three cancer types. Conclusion Rare variants and the corresponding genes potentially contribute to shared susceptibility in different GI cancer types. The discovery of these novel variants and genes offers new insights for the carcinogenic mechanisms and missing heritability of GI cancers.
Collapse
Affiliation(s)
- Ji Zheng
- Department of Epidemiology, School of Public Health, Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Shanghai, China
| | - Xin Wang
- Department of Epidemiology, School of Public Health, Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Shanghai, China
- Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jingrao Li
- Department of Epidemiology, School of Public Health, Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Shanghai, China
| | - Yuanna Wu
- Department of Biological Sciences, Dedman College of Humanities and Sciences, Southern Methodist University, Dallas, TX, United States
| | - Jiang Chang
- Department of Health Toxicology, Key Laboratory for Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Junyi Xin
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, School of Public Health, Nanjing Medical University, Nanjing, China
- Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Meilin Wang
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, School of Public Health, Nanjing Medical University, Nanjing, China
- Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
- The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou Municipal Hospital, Gusu School, Nanjing Medical University, Suzhou, China
| | - Tianpei Wang
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Qingyi Wei
- Duke Cancer Institute, Duke University Medical Center, Durham, NC, United States
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, United States
| | - Mengyun Wang
- Yiwu Research Institute of Fudan University, Yiwu, Zhejiang, China
- Cancer Institute, Fudan University Shanghai Cancer Center, Shanghai Medical College, Shanghai, China
| | - Ruoxin Zhang
- Department of Epidemiology, School of Public Health, Key Laboratory of Public Health Safety, Ministry of Education, Fudan University, Shanghai, China
- Yiwu Research Institute of Fudan University, Yiwu, Zhejiang, China
- Cancer Institute, Fudan University Shanghai Cancer Center, Shanghai Medical College, Shanghai, China
| |
Collapse
|
184
|
Tantawy M, Yang G, Algubelli RR, DeAvila G, Rubinstein SM, Cornell RF, Fradley MG, Siegel EM, Hampton OA, Silva AS, Lenihan D, Shain KH, Baz RC, Gong Y. Whole-Exome sequencing analysis identified TMSB10/TRABD2A locus to be associated with carfilzomib-related cardiotoxicity among patients with multiple myeloma. Front Cardiovasc Med 2023; 10:1181806. [PMID: 37408649 PMCID: PMC10319068 DOI: 10.3389/fcvm.2023.1181806] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 06/05/2023] [Indexed: 07/07/2023] Open
Abstract
Background Proteasome inhibitor Carfilzomib (CFZ) is effective in treating patients with refractory or relapsed multiple myeloma (MM) but has been associated with cardiovascular adverse events (CVAE) such as hypertension, cardiomyopathy, and heart failure. This study aimed to investigate the contribution of germline genetic variants in protein-coding genes in CFZ-CVAE among MM patients using whole-exome sequencing (WES) analysis. Methods Exome-wide single-variant association analysis, gene-based analysis, and rare variant analyses were performed on 603,920 variants in 247 patients with MM who have been treated with CFZ and enrolled in the Oncology Research Information Exchange Network (ORIEN) at the Moffitt Cancer Center. Separate analyses were performed in European Americans and African Americans followed by a trans-ethnic meta-analysis. Results The most significant variant in the exome-wide single variant analysis was a missense variant rs7148 in the thymosin beta-10/TraB Domain Containing 2A (TMSB10/TRABD2A) locus. The effect allele of rs7148 was associated with a higher risk of CVAE [odds ratio (OR) = 9.3 with a 95% confidence interval of 3.9-22.3, p = 5.42*10-7]. MM patients with rs7148 AG or AA genotype had a higher risk of CVAE (50%) than those with GG genotype (10%). rs7148 is an expression quantitative trait locus (eQTL) for TRABD2A and TMSB10. The gene-based analysis also showed TRABD2A as the most significant gene associated with CFZ-CVAE (p = 1.06*10-6). Conclusions We identified a missense SNP rs7148 in the TMSB10/TRABD2A as associated with CFZ-CVAE in MM patients. More investigation is needed to understand the underlying mechanisms of these associations.
Collapse
Affiliation(s)
- Marwa Tantawy
- Department of Pharmacotherapy and Translational Research and Center for Pharmacogenomics and Precision Medicine, College of Pharmacy, University of Florida, Gainesville, FL, United States
| | - Guang Yang
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Raghunandan Reddy Algubelli
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, United States
| | - Gabriel DeAvila
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, United States
| | - Samuel M. Rubinstein
- Department of Medicine, Division of Hematology, University of North Carolina, Chapel Hill, NC, United States
| | - Robert F. Cornell
- Department of Medicine, Division of Hematology and Oncology, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Michael G. Fradley
- Cardio-Oncology Center of Excellence, Division of Cardiology, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, United States
| | - Erin M. Siegel
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, United States
| | - Oliver A. Hampton
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute. Tampa, FL, United States
| | - Ariosto S. Silva
- Department of Cancer Physiology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, United States
| | - Daniel Lenihan
- Cape Cardiology Group, Saint Francis Medical Center, Cape Girardeau, MO, United States
| | - Kenneth H. Shain
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, United States
| | - Rachid C. Baz
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, United States
| | - Yan Gong
- Department of Pharmacotherapy and Translational Research and Center for Pharmacogenomics and Precision Medicine, College of Pharmacy, University of Florida, Gainesville, FL, United States
- Cancer Control and Population Sciences, UF Health Cancer Center, University of Florida, Gainesville, FL, United States
| |
Collapse
|
185
|
Lu H, Zhang S, Jiang Z, Zeng P. Leveraging trans-ethnic genetic risk scores to improve association power for complex traits in underrepresented populations. Brief Bioinform 2023:bbad232. [PMID: 37332016 DOI: 10.1093/bib/bbad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/06/2023] [Accepted: 06/04/2023] [Indexed: 06/20/2023] Open
Abstract
Trans-ethnic genome-wide association studies have revealed that many loci identified in European populations can be reproducible in non-European populations, indicating widespread trans-ethnic genetic similarity. However, how to leverage such shared information more efficiently in association analysis is less investigated for traits in underrepresented populations. We here propose a statistical framework, trans-ethnic genetic risk score informed gene-based association mixed model (GAMM), by hierarchically modeling single-nucleotide polymorphism effects in the target population as a function of effects of the same trait in well-studied populations. GAMM powerfully integrates genetic similarity across distinct ancestral groups to enhance power in understudied populations, as confirmed by extensive simulations. We illustrate the usefulness of GAMM via the application to 13 blood cell traits (i.e. basophil count, eosinophil count, hematocrit, hemoglobin concentration, lymphocyte count, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, monocyte count, neutrophil count, platelet count, red blood cell count and total white blood cell count) in Africans of the UK Biobank (n = 3204) while utilizing genetic overlap shared in Europeans (n = 746 667) and East Asians (n = 162 255). We discovered multiple new associated genes, which had otherwise been missed by existing methods, and revealed that the trans-ethnic information indirectly contributed much to the phenotypic variance. Overall, GAMM represents a flexible and powerful statistical framework of association analysis for complex traits in underrepresented populations by integrating trans-ethnic genetic similarity across well-studied populations, and helps attenuate health inequities in current genetics research for people of minority populations.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
186
|
Obry L, Dalmasso C. Weighted multiple testing procedures in genome-wide association studies. PeerJ 2023; 11:e15369. [PMID: 37337586 PMCID: PMC10276986 DOI: 10.7717/peerj.15369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 04/17/2023] [Indexed: 06/21/2023] Open
Abstract
Multiple testing procedures controlling the false discovery rate (FDR) are increasingly used in the context of genome wide association studies (GWAS), and weighted multiple testing procedures that incorporate covariate information are efficient to improve the power to detect associations. In this work, we evaluate some recent weighted multiple testing procedures in the specific context of GWAS through a simulation study. We also present a new efficient procedure called wBHa that prioritizes the detection of genetic variants with low minor allele frequencies while maximizing the overall detection power. The results indicate good performance of our procedure compared to other weighted multiple testing procedures. In particular, in all simulated settings, wBHa tends to outperform other procedures in detecting rare variants while maintaining good overall power. The use of the different procedures is illustrated with a real dataset.
Collapse
Affiliation(s)
- Ludivine Obry
- Université Paris-Saclay, CNRS, Univ Evry, Laboratoire de Mathématiques et Modélisation d’Evry, Evry-Courcouronnes, France
| | - Cyril Dalmasso
- Université Paris-Saclay, CNRS, Univ Evry, Laboratoire de Mathématiques et Modélisation d’Evry, Evry-Courcouronnes, France
| |
Collapse
|
187
|
Trivellin G, Daly AF, Hernández-Ramírez LC, Araldi E, Tatsi C, Dale RK, Fridell G, Mittal A, Faucz FR, Iben JR, Li T, Vitali E, Stojilkovic SS, Kamenicky P, Villa C, Baussart B, Chittiboina P, Toro C, Gahl WA, Eugster EA, Naves LA, Jaffrain-Rea ML, de Herder WW, Neggers SJCMM, Petrossians P, Beckers A, Lania AG, Mains RE, Eipper BA, Stratakis CA. Germline loss-of-function PAM variants are enriched in subjects with pituitary hypersecretion. Front Endocrinol (Lausanne) 2023; 14:1166076. [PMID: 37388215 PMCID: PMC10303134 DOI: 10.3389/fendo.2023.1166076] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/10/2023] [Indexed: 07/01/2023] Open
Abstract
Introduction Pituitary adenomas (PAs) are common, usually benign tumors of the anterior pituitary gland which, for the most part, have no known genetic cause. PAs are associated with major clinical effects due to hormonal dysregulation and tumoral impingement on vital brain structures. PAM encodes a multifunctional protein responsible for the essential C-terminal amidation of secreted peptides. Methods Following the identification of a loss-of-function variant (p.Arg703Gln) in the peptidylglycine a-amidating monooxygenase (PAM) gene in a family with pituitary gigantism, we investigated 299 individuals with sporadic PAs and 17 familial isolated PA kindreds for PAM variants. Genetic screening was performed by germline and tumor sequencing and germline copy number variation (CNV) analysis. Results In germline DNA, we detected seven heterozygous, likely pathogenic missense, truncating, and regulatory SNVs. These SNVs were found in sporadic subjects with growth hormone excess (p.Gly552Arg and p.Phe759Ser), pediatric Cushing disease (c.-133T>C and p.His778fs), or different types of PAs (c.-361G>A, p.Ser539Trp, and p.Asp563Gly). The SNVs were functionally tested in vitro for protein expression and trafficking by Western blotting, splicing by minigene assays, and amidation activity in cell lysates and serum samples. These analyses confirmed a deleterious effect on protein expression and/or function. By interrogating 200,000 exomes from the UK Biobank, we confirmed a significant association of the PAM gene and rare PAM SNVs with diagnoses linked to pituitary gland hyperfunction. Conclusion The identification of PAM as a candidate gene associated with pituitary hypersecretion opens the possibility of developing novel therapeutics based on altering PAM function.
Collapse
Affiliation(s)
- Giampaolo Trivellin
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
- IRCCS Humanitas Research Hospital, Milan, Italy
| | - Adrian F. Daly
- Department of Endocrinology, Centre Hospitalier Universitaire de Liège, University of Liège, Domaine Universitaire du Sart-Tilman, Liège, Belgium
| | - Laura C. Hernández-Ramírez
- Red de Apoyo a la Investigación, Coordinación de la Investigación Científica, Universidad Nacional Autónoma de México e Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico
- Section on Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Elisa Araldi
- Energy Metabolism Laboratory, Department of Health Sciences and Technology, Institute of Translational Medicine, Swiss Federal Institute of Technology (ETH) Zurich, Schwerzenbach, Switzerland
| | - Christina Tatsi
- Section on Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Ryan K. Dale
- Bioinformatics and Scientific Programming Core, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Gus Fridell
- Bioinformatics and Scientific Programming Core, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Arjun Mittal
- Bioinformatics and Scientific Programming Core, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Fabio R. Faucz
- Section on Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
- Molecular Genomics Core, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
| | - James R. Iben
- Molecular Genomics Core, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Tianwei Li
- Molecular Genomics Core, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
| | | | - Stanko S. Stojilkovic
- Section on Cellular Signaling, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Peter Kamenicky
- Université Paris-Saclay, Institut national de la santé et de la recherche médicale (INSERM), Physiologie et Physiopathologie Endocriniennes, Le Kremlin-Bicêtre, France
| | - Chiara Villa
- Département de Neuropathologie de la Pitié Salpêtrière, Hôpital de la Pitié-Salpêtrière - Assistance Publique–Hôpitaux de Paris (APHP) Sorbonne Université, Paris, France
- Institut national de la santé et de la recherche médicale (INSERM) U1016, Centre national de la recherche scientifique Unité Mixte de Recherche (CNRS UMR) 8104, Institut Cochin, Paris, France
| | - Bertrand Baussart
- Institut national de la santé et de la recherche médicale (INSERM) U1016, Centre national de la recherche scientifique Unité Mixte de Recherche (CNRS UMR) 8104, Institut Cochin, Paris, France
- Service de Neurochirurgie, Hôpital Pitié-Salpêtrière, AP-HP Sorbonne, Paris, France
| | - Prashant Chittiboina
- Neurosurgery Unit for Pituitary and Inheritable Diseases and Surgical Neurology Branch, National Institute of Neurological Disorders and Stroke (NINDS), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Camilo Toro
- National Institutes of Health (NIH) Undiagnosed Diseases Program, Office of the Clinical Director, National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH), Bethesda, MD, United States
| | - William A. Gahl
- National Institutes of Health (NIH) Undiagnosed Diseases Program, Office of the Clinical Director, National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Erica A. Eugster
- Division of Endocrinology and Diabetes, Department of Pediatrics, Riley Hospital for Children at Indiana University (IU) Health, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Luciana A. Naves
- Service of Endocrinology, University Hospital, Faculty of Medicine, University of Brasilia, Brasilia, Brazil
| | - Marie-Lise Jaffrain-Rea
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L’Aquila, Italy
- Neuromed Institute, Istituto di Ricovero e Cura a Carattere Scientifico, Pozzilli, Italy
| | - Wouter W. de Herder
- Department of Medicine, Section Endocrinology, Pituitary Center Rotterdam, Erasmus University Medical Center, Rotterdam, Netherlands
| | - Sebastian JCMM Neggers
- Department of Medicine, Section Endocrinology, Pituitary Center Rotterdam, Erasmus University Medical Center, Rotterdam, Netherlands
| | - Patrick Petrossians
- Department of Endocrinology, Centre Hospitalier Universitaire de Liège, University of Liège, Domaine Universitaire du Sart-Tilman, Liège, Belgium
| | - Albert Beckers
- Department of Endocrinology, Centre Hospitalier Universitaire de Liège, University of Liège, Domaine Universitaire du Sart-Tilman, Liège, Belgium
| | - Andrea G. Lania
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
- IRCCS Humanitas Research Hospital, Milan, Italy
| | - Richard E. Mains
- Department of Neuroscience, University of Connecticut (UConn) Health, Farmington, CT, United States
| | - Betty A. Eipper
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, United States
| | - Constantine A. Stratakis
- Section on Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, MD, United States
- Human Genetics and Precision Medicine, Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology Hellas, Heraklion, Greece
- Research Institute, ELPEN, Athens, Greece
| |
Collapse
|
188
|
Nair J, Welch JF, Marciante AB, Hou T, Lu Q, Fox EJ, Mitchell GS. APOE4, Age, and Sex Regulate Respiratory Plasticity Elicited by Acute Intermittent Hypercapnic-Hypoxia. FUNCTION 2023; 4:zqad026. [PMID: 37575478 PMCID: PMC10413930 DOI: 10.1093/function/zqad026] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/22/2023] [Accepted: 05/25/2023] [Indexed: 08/15/2023] Open
Abstract
Rationale Acute intermittent hypoxia (AIH) shows promise for enhancing motor recovery in chronic spinal cord injuries and neurodegenerative diseases. However, human trials of AIH have reported significant variability in individual responses. Objectives Identify individual factors (eg, genetics, age, and sex) that determine response magnitude of healthy adults to an optimized AIH protocol, acute intermittent hypercapnic-hypoxia (AIHH). Methods In 17 healthy individuals (age = 27 ± 5 yr), associations between individual factors and changes in the magnitude of AIHH (15, 1-min O2 = 9.5%, CO2 = 5% episodes) induced changes in diaphragm motor-evoked potential (MEP) amplitude and inspiratory mouth occlusion pressures (P0.1) were evaluated. Single nucleotide polymorphisms (SNPs) in genes linked with mechanisms of AIH induced phrenic motor plasticity (BDNF, HTR2A, TPH2, MAOA, NTRK2) and neuronal plasticity (apolipoprotein E, APOE) were tested. Variations in AIHH induced plasticity with age and sex were also analyzed. Additional experiments in humanized (h)ApoE knock-in rats were performed to test causality. Results AIHH-induced changes in diaphragm MEP amplitudes were lower in individuals heterozygous for APOE4 (i.e., APOE3/4) compared to individuals with other APOE genotypes (P = 0.048) and the other tested SNPs. Males exhibited a greater diaphragm MEP enhancement versus females, regardless of age (P = 0.004). Additionally, age was inversely related with change in P0.1 (P = 0.007). In hApoE4 knock-in rats, AIHH-induced phrenic motor plasticity was significantly lower than hApoE3 controls (P < 0.05). Conclusions APOE4 genotype, sex, and age are important biological determinants of AIHH-induced respiratory motor plasticity in healthy adults. Addition to Knowledge Base AIH is a novel rehabilitation strategy to induce functional recovery of respiratory and non-respiratory motor systems in people with chronic spinal cord injury and/or neurodegenerative disease. Figure 5 Since most AIH trials report considerable inter-individual variability in AIH outcomes, we investigated factors that potentially undermine the response to an optimized AIH protocol, AIHH, in healthy humans. We demonstrate that genetics (particularly the lipid transporter, APOE), age and sex are important biological determinants of AIHH-induced respiratory motor plasticity.
Collapse
Affiliation(s)
- Jayakrishnan Nair
- Breathing Research and Therapeutics Center, Department of Physical Therapy, University of Florida, Gainesville, 32603, USA
- Department of Physical Therapy, Thomas Jefferson University, Philadelphia, PA, 19107, USA
| | - Joseph F Welch
- Breathing Research and Therapeutics Center, Department of Physical Therapy, University of Florida, Gainesville, 32603, USA
- School of Sport, Exercise and Rehabilitation Sciences, University of Birmingham, Edgbaston, Birmingham, 3- B15 2TT, UK
| | - Alexandria B Marciante
- Breathing Research and Therapeutics Center, Department of Physical Therapy, University of Florida, Gainesville, 32603, USA
| | - Tingting Hou
- Department of Biostatistics, University of Florida, Gainesville, 32603, USA
| | - Qing Lu
- Department of Biostatistics, University of Florida, Gainesville, 32603, USA
| | - Emily J Fox
- Breathing Research and Therapeutics Center, Department of Physical Therapy, University of Florida, Gainesville, 32603, USA
- Brooks Rehabilitation, Jacksonville, FL, 32216, USA
| | - Gordon S Mitchell
- Breathing Research and Therapeutics Center, Department of Physical Therapy, University of Florida, Gainesville, 32603, USA
| |
Collapse
|
189
|
Yee SW, Macdonald C, Mitrovic D, Zhou X, Koleske ML, Yang J, Silva DB, Grimes PR, Trinidad D, More SS, Kachuri L, Witte JS, Delemotte L, Giacomini KM, Coyote-Maestas W. The full spectrum of OCT1 (SLC22A1) mutations bridges transporter biophysics to drug pharmacogenomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.06.543963. [PMID: 37333090 PMCID: PMC10274788 DOI: 10.1101/2023.06.06.543963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Membrane transporters play a fundamental role in the tissue distribution of endogenous compounds and xenobiotics and are major determinants of efficacy and side effects profiles. Polymorphisms within these drug transporters result in inter-individual variation in drug response, with some patients not responding to the recommended dosage of drug whereas others experience catastrophic side effects. For example, variants within the major hepatic Human organic cation transporter OCT1 (SLC22A1) can change endogenous organic cations and many prescription drug levels. To understand how variants mechanistically impact drug uptake, we systematically study how all known and possible single missense and single amino acid deletion variants impact expression and substrate uptake of OCT1. We find that human variants primarily disrupt function via folding rather than substrate uptake. Our study revealed that the major determinants of folding reside in the first 300 amino acids, including the first 6 transmembrane domains and the extracellular domain (ECD) with a stabilizing and highly conserved stabilizing helical motif making key interactions between the ECD and transmembrane domains. Using the functional data combined with computational approaches, we determine and validate a structure-function model of OCT1s conformational ensemble without experimental structures. Using this model and molecular dynamic simulations of key mutants, we determine biophysical mechanisms for how specific human variants alter transport phenotypes. We identify differences in frequencies of reduced function alleles across populations with East Asians vs European populations having the lowest and highest frequency of reduced function variants, respectively. Mining human population databases reveals that reduced function alleles of OCT1 identified in this study associate significantly with high LDL cholesterol levels. Our general approach broadly applied could transform the landscape of precision medicine by producing a mechanistic basis for understanding the effects of human mutations on disease and drug response.
Collapse
Affiliation(s)
- Sook Wah Yee
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Christian Macdonald
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Darko Mitrovic
- Science for Life Laboratory, Department of Applied Physics, KTH Royal Institute of Technology, 12121 Solna, Sweden
| | - Xujia Zhou
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Megan L Koleske
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Jia Yang
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Dina Buitrago Silva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Patrick Rockefeller Grimes
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Donovan Trinidad
- Department of Medicine, Division of Infectious Disease, University of California, San Francisco, United States
| | - Swati S More
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
- Current address: Center for Drug Design (CDD), College of Pharmacy, University of Minnesota, Minnesota, United States
| | - Linda Kachuri
- Epidemiology and Population Health, Stanford University, California, United States
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, United States
| | - John S Witte
- Epidemiology and Population Health, Stanford University, California, United States
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, United States
| | - Lucie Delemotte
- Science for Life Laboratory, Department of Applied Physics, KTH Royal Institute of Technology, 12121 Solna, Sweden
| | - Kathleen M Giacomini
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
- Quantitative Biosciences Institute, University of California, San Francisco, United States
| |
Collapse
|
190
|
Xu Z, Yan S, Wu C, Duan Q, Chen S, Li Y. Next-Generation Sequencing Data-Based Association Testing of a Group of Genetic Markers for Complex Responses Using a Generalized Linear Model Framework. MATHEMATICS (BASEL, SWITZERLAND) 2023; 11:2560. [PMID: 38721066 PMCID: PMC11078158 DOI: 10.3390/math11112560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
Association testing has been widely used to study the relationship between genetic variants and phenotypes. Most association testing methods are genotype-based, i.e. first estimate genotype and then regress phenotype on estimated genotype and other variables. Directly testing methods based on next generation sequencing (NGS) data without genotype calling have been proposed and shown advantage over genotype-based methods in the scenarios when genotype calling is not accurate. NGS data-based single-variant testing have been proposed including our previously proposed single-variant testing method, i.e. UNC combo method [1]. NGS data-based group testing methods for continuous phenotype have also been proposed by us using a linear model framework which can handle continuous responses [2]. In this paper, we extend our linear model-based framework to a generalized linear model-based framework so that the methods can handle other types of responses especially binary responses which is commonly-faced in association studies. We have conducted extensive simulation studies to evaluate the performance of different estimators and compare our estimators with their corresponding genotype-based methods. We found that all methods have Type I errors controlled, and our NGS data-based testing methods have better performance than their corresponding genotype-based methods in the literature for other types of responses including binary responses (logistic regression) and count responses (Poisson regression especially when sequencing depth is low. In conclusion, we have extended our previous linear model (LM) framework to a generalized linear model (GLM) framework and derived NGS data-based testing methods for a group of genetic variants. Compared with our previously proposed LM-based methods [2], the new GLM-based methods can handle more complex responses (for example, binary responses and count responses) in addition to continuous responses. Our methods have filled the literature gap and shown advantage over their corresponding genotype-based methods in the literature.
Collapse
Affiliation(s)
- Zheng Xu
- Department of Mathematics and Statistics, Wright State University, Dayton, Ohio, 45324, USA
| | - Song Yan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Cong Wu
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68508, USA
| | - Qing Duan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Sixia Chen
- Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
191
|
Sok P, Sabo A, Almli LM, Jenkins MM, Nembhard WN, Agopian AJ, Bamshad MJ, Blue EE, Brody LC, Brown AL, Browne ML, Canfield MA, Carmichael SL, Chong JX, Dugan-Perez S, Feldkamp ML, Finnell RH, Gibbs RA, Kay DM, Lei Y, Meng Q, Moore CA, Mullikin JC, Muzny D, Olshan AF, Pangilinan F, Reefhuis J, Romitti PA, Schraw JM, Shaw GM, Werler MM, Harpavat S, Lupo PJ. Exome-wide assessment of isolated biliary atresia: A report from the National Birth Defects Prevention Study using child-parent trios and a case-control design to identify novel rare variants. Am J Med Genet A 2023; 191:1546-1556. [PMID: 36942736 PMCID: PMC10947986 DOI: 10.1002/ajmg.a.63185] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/07/2023] [Accepted: 03/07/2023] [Indexed: 03/23/2023]
Abstract
The etiology of biliary atresia (BA) is unknown, but recent studies suggest a role for rare protein-altering variants (PAVs). Exome sequencing data from the National Birth Defects Prevention Study on 54 child-parent trios, one child-mother duo, and 1513 parents of children with other birth defects were analyzed. Most (91%) cases were isolated BA. We performed (1) a trio-based analysis to identify rare de novo, homozygous, and compound heterozygous PAVs and (2) a case-control analysis using a sequence kernel-based association test to identify genes enriched with rare PAVs. While we replicated previous findings on PKD1L1, our results do not suggest that recurrent de novo PAVs play important roles in BA susceptibility. In fact, our finding in NOTCH2, a disease gene associated with Alagille syndrome, highlights the difficulty in BA diagnosis. Notably, IFRD2 has been implicated in other gastrointestinal conditions and warrants additional study. Overall, our findings strengthen the hypothesis that the etiology of BA is complex.
Collapse
Affiliation(s)
- Pagna Sok
- Pediatrics, Baylor College of Medicine, Houston, Texas,
USA
| | - Aniko Sabo
- Human Genome Sequencing Center, Baylor College of Medicine,
Houston, Texas, USA
| | - Lynn M. Almli
- National Center on Birth Defects and Developmental
Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia,
USA
| | - Mary M. Jenkins
- National Center on Birth Defects and Developmental
Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia,
USA
| | - Wendy N. Nembhard
- Fay W. Boozman College of Public Health, University of
Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - A. J. Agopian
- Department of Epidemiology, Human Genetics, and
Environmental Sciences, University of Texas School of Public Health, Houston, Texas,
USA
| | - Michael J. Bamshad
- Division of Genetic Medicine, Department of Pediatrics,
University of Washington, Seattle, Washington, USA
- Brotman Baty Institute for Precision Medicine, Seattle,
Washington, USA
| | - Elizabeth E. Blue
- Brotman Baty Institute for Precision Medicine, Seattle,
Washington, USA
- Division of Medical Genetics, Department of Medicine,
University of Washington, Seattle, Washington, USA
| | - Lawrence C. Brody
- Genetics and Environment Interaction Section, National
Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland,
USA
| | | | - Marilyn L. Browne
- Birth Defects Registry, New York State Department of
Health, Albany, New York, USA
- Department of Epidemiology and Biostatistics, School of
Public Health, University at Albany, Rensselaer, New York, USA
| | - Mark A. Canfield
- Birth Defects Epidemiology and Surveillance Branch, Texas
Department of State Health Services, Austin, Texas, USA
| | - Suzan L. Carmichael
- Department of Pediatrics, Stanford University School of
Medicine, Stanford, California, USA
| | - Jessica X. Chong
- Division of Genetic Medicine, Department of Pediatrics,
University of Washington, Seattle, Washington, USA
- Brotman Baty Institute for Precision Medicine, Seattle,
Washington, USA
| | - Shannon Dugan-Perez
- Human Genome Sequencing Center, Baylor College of Medicine,
Houston, Texas, USA
| | - Marcia L. Feldkamp
- Division of Medical Genetics, Department of Pediatrics,
University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - Richard H. Finnell
- Department of Medicine, Center for Precision
Environmental Health, Baylor College of Medicine, Houston, Texas, USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine,
Houston, Texas, USA
| | - Denise M. Kay
- Division of Genetics, Wadsworth Center, New York State
Department of Health, Albany, New York, USA
| | - Yunping Lei
- Department of Medicine, Center for Precision
Environmental Health, Baylor College of Medicine, Houston, Texas, USA
| | - Qingchang Meng
- Human Genome Sequencing Center, Baylor College of Medicine,
Houston, Texas, USA
| | - Cynthia A. Moore
- National Center on Birth Defects and Developmental
Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia,
USA
| | - James C. Mullikin
- Genetics and Environment Interaction Section, National
Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland,
USA
| | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine,
Houston, Texas, USA
| | - Andrew F. Olshan
- Department of Epidemiology, Gillings School of Global
Public Health, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Faith Pangilinan
- Genetics and Environment Interaction Section, National
Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland,
USA
| | - Jennita Reefhuis
- National Center on Birth Defects and Developmental
Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia,
USA
| | - Paul A. Romitti
- Department of Epidemiology, University of Iowa College of
Public Health, Iowa City, Iowa, USA
| | | | - Gary M. Shaw
- Department of Pediatrics, Stanford University School of
Medicine, Stanford, California, USA
| | - Martha M. Werler
- Department of Epidemiology, Boston University, Boston,
Massachusetts, USA
| | - Sanjiv Harpavat
- Pediatrics, Baylor College of Medicine, Houston, Texas,
USA
- Gastroenterology, Hepatology and Nutrition, Texas
Children’s Hospital, Houston, Texas, USA
| | - Philip J. Lupo
- Pediatrics, Baylor College of Medicine, Houston, Texas,
USA
| | | |
Collapse
|
192
|
Kumar S, Gerstein M. Unified views on variant impact across many diseases. Trends Genet 2023; 39:442-450. [PMID: 36858880 PMCID: PMC10192142 DOI: 10.1016/j.tig.2023.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 02/02/2023] [Accepted: 02/02/2023] [Indexed: 03/03/2023]
Abstract
Genomic studies of human disorders are often performed by distinct research communities (i.e., focused on rare diseases, common diseases, or cancer). Despite underlying differences in the mechanistic origin of different disease categories, these studies share the goal of identifying causal genomic events that are critical for the clinical manifestation of the disease phenotype. Moreover, these studies face common challenges, including understanding the complex genetic architecture of the disease, deciphering the impact of variants on multiple scales, and interpreting noncoding mutations. Here, we highlight these challenges in depth and argue that properly addressing them will require a more unified vocabulary and approach across disease communities. Toward this goal, we present a unified perspective on relating variant impact to various genomic disorders.
Collapse
Affiliation(s)
- Sushant Kumar
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada; Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA; Department of Computer Science, Yale University, New Haven, CT 06520, USA; Department of Statistics & Data Science, Yale University, New Haven, CT 06520, USA.
| |
Collapse
|
193
|
Sun R, Zhu L, Li Y, Yasui Y, Robison L. Inference for set-based effects in genetic association studies with interval-censored outcomes. Biometrics 2023; 79:1573-1585. [PMID: 35165890 PMCID: PMC9375811 DOI: 10.1111/biom.13636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 01/24/2022] [Accepted: 02/07/2022] [Indexed: 11/28/2022]
Abstract
The rapid acceleration of genetic data collection in biomedical settings has recently resulted in the rise of genetic compendiums filled with rich longitudinal disease data. One common feature of these data sets is their plethora of interval-censored outcomes. However, very few tools are available for the analysis of genetic data sets with interval-censored outcomes, and in particular, there is a lack of methodology available for set-based inference. Set-based inference is used to associate a gene, biological pathway, or other genetic construct with outcomes and is one of the most popular strategies in genetics research. This work develops three such tests for interval-censored settings beginning with a variance components test for interval-censored outcomes, the interval-censored sequence kernel association test (ICSKAT). We also provide the interval-censored version of the Burden test, and then we integrate ICSKAT and Burden to construct the interval censored sequence kernel association test-optimal (ICSKATO) combination. These tests unlock set-based analysis of interval-censored data sets with analogs of three highly popular set-based tools commonly applied to continuous and binary outcomes. Simulation studies illustrate the advantages of the developed methods over ad hoc alternatives, including protection of the type I error rate at very low levels and increased power. The proposed approaches are applied to the investigation that motivated this study, an examination of the genes associated with bone mineral density deficiency and fracture risk.
Collapse
Affiliation(s)
- Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, U.S.A
| | - Liang Zhu
- Division of Clinical and Translational Sciences, Department of Internal Medicine, University of Texas Health Science Center at Houston, Houston, Texas 77030, U.S.A
| | - Yimei Li
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105, U.S.A
| | - Yutaka Yasui
- Department of Epidemiology and Cancer Control, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105, U.S.A
| | - Leslie Robison
- Department of Epidemiology and Cancer Control, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105, U.S.A
| |
Collapse
|
194
|
Shen L, Amei A, Liu B, Liu Y, Xu G, Oh EC, Wang Z. Detection of interactions between genetic marker sets and environment in a genome-wide study of hypertension. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.28.542666. [PMID: 37398075 PMCID: PMC10312472 DOI: 10.1101/2023.05.28.542666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
As human complex diseases are influenced by the interplay of genes and environment, detecting gene-environment interactions ( G × E ) can shed light on biological mechanisms of diseases and play an important role in disease risk prediction. Development of powerful quantitative tools to incorporate G × E in complex diseases has potential to facilitate the accurate curation and analysis of large genetic epidemiological studies. However, most of existing methods that interrogate G × E focus on the interaction effects of an environmental factor and genetic variants, exclusively for common or rare variants. In this study, we proposed two tests, MAGEIT_RAN and MAGEIT_FIX, to detect interaction effects of an environmental factor and a set of genetic markers containing both rare and common variants, based on the MinQue for Summary statistics. The genetic main effects in MAGEIT_RAN and MAGEIT_FIX are modeled as random or fixed, respectively. Through simulation studies, we illustrated that both tests had type I error under control and MAGEIT_RAN was overall the most powerful test. We applied MAGEIT to a genome-wide analysis of gene-alcohol interactions on hypertension in the Multi-Ethnic Study of Atherosclerosis. We detected two genes, CCNDBP1 and EPB42, that interact with alcohol usage to influence blood pressure. Pathway analysis identified sixteen significant pathways related to signal transduction and development that were associated with hypertension, and several of them were reported to have an interactive effect with alcohol intake. Our results demonstrated that MAGEIT can detect biologically relevant genes that interact with environmental factors to influence complex traits.
Collapse
Affiliation(s)
- Linchuan Shen
- Department of Mathematical Sciences, University of Nevada, Las Vegas
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas
| | - Bowen Liu
- Department of Mathematical Sciences, University of Nevada, Las Vegas
| | - Yunqing Liu
- Department of Biostatistics, Yale School of Public Health
| | - Gang Xu
- Department of Mathematical Sciences, University of Nevada, Las Vegas
- Department of Biostatistics, Yale School of Public Health
| | - Edwin C. Oh
- Department of Internal Medicine, University of Nevada School of Medicine, Las Vegas
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health
| |
Collapse
|
195
|
Banerjee J, Taroni JN, Allaway RJ, Prasad DV, Guinney J, Greene C. Machine learning in rare disease. Nat Methods 2023:10.1038/s41592-023-01886-z. [PMID: 37248386 DOI: 10.1038/s41592-023-01886-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 04/22/2023] [Indexed: 05/31/2023]
Abstract
High-throughput profiling methods (such as genomics or imaging) have accelerated basic research and made deep molecular characterization of patient samples routine. These approaches provide a rich portrait of genes, molecular pathways and cell types involved in disease phenotypes. Machine learning (ML) can be a useful tool for extracting disease-relevant patterns from high-dimensional datasets. However, depending upon the complexity of the biological question, machine learning often requires many samples to identify recurrent and biologically meaningful patterns. Rare diseases are inherently limited in clinical cases, leading to few samples to study. In this Perspective, we outline the challenges and emerging solutions for using ML for small sample sets, specifically in rare diseases. Advances in ML methods for rare diseases are likely to be informative for applications beyond rare diseases for which few samples exist with high-dimensional data. We propose that the method community prioritize the development of ML techniques for rare disease research.
Collapse
Affiliation(s)
| | - Jaclyn N Taroni
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, PA, USA
| | | | | | | | - Casey Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA.
| |
Collapse
|
196
|
Chen NC, Kolesnikov A, Goel S, Yun T, Chang PC, Carroll A. Improving variant calling using population data and deep learning. BMC Bioinformatics 2023; 24:197. [PMID: 37173615 PMCID: PMC10182612 DOI: 10.1186/s12859-023-05294-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 04/17/2023] [Indexed: 05/15/2023] Open
Abstract
Large-scale population variant data is often used to filter and aid interpretation of variant calls in a single sample. These approaches do not incorporate population information directly into the process of variant calling, and are often limited to filtering which trades recall for precision. In this study, we develop population-aware DeepVariant models with a new channel encoding allele frequencies from the 1000 Genomes Project. This model reduces variant calling errors, improving both precision and recall in single samples, and reduces rare homozygous and pathogenic clinvar calls cohort-wide. We assess the use of population-specific or diverse reference panels, finding the greatest accuracy with diverse panels, suggesting that large, diverse panels are preferable to individual populations, even when the population matches sample ancestry. Finally, we show that this benefit generalizes to samples with different ancestry from the training data even when the ancestry is also excluded from the reference panel.
Collapse
Affiliation(s)
- Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
| | | | | | | | | | | |
Collapse
|
197
|
Tan WX, Sim X, Khoo CM, Teo AKK. Prioritization of genes associated with type 2 diabetes mellitus for functional studies. Nat Rev Endocrinol 2023:10.1038/s41574-023-00836-1. [PMID: 37169822 DOI: 10.1038/s41574-023-00836-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/28/2023] [Indexed: 05/13/2023]
Abstract
Existing therapies for type 2 diabetes mellitus (T2DM) show limited efficacy or have adverse effects. Numerous genetic variants associated with T2DM have been identified, but progress in translating these findings into potential drug targets has been limited. Here, we describe the tools and platforms available to identify effector genes from T2DM-associated coding and non-coding variants and prioritize them for functional studies. We discuss QSER1 and SLC12A8 as examples of genes that have been identified as possible T2DM candidate genes using these tools and platforms. We suggest further approaches, including the use of sequencing data with increased sample size and ethnic diversity, single-cell omics data for analyses, glycaemic trait associations to predict gene function and, potentially, human induced pluripotent stem cell 'village' cultures, to strengthen current gene functionalization workflows. Effective prioritization of T2DM-associated genes for experimental validation could expedite our understanding of the genetic mechanisms responsible for T2DM to facilitate the use of precision medicine in its treatment.
Collapse
Affiliation(s)
- Wei Xuan Tan
- Stem Cells and Diabetes Laboratory, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Xueling Sim
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| | - Chin Meng Khoo
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Adrian K K Teo
- Stem Cells and Diabetes Laboratory, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Precision Medicine Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
198
|
Baronas JM, Bartell E, Eliasen A, Doench JG, Yengo L, Vedantam S, Marouli E, Kronenberg HM, Hirschhorn JN, Renthal NE. Genome-wide CRISPR screening of chondrocyte maturation newly implicates genes in skeletal growth and height-associated GWAS loci. CELL GENOMICS 2023; 3:100299. [PMID: 37228756 PMCID: PMC10203046 DOI: 10.1016/j.xgen.2023.100299] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/14/2022] [Accepted: 03/17/2023] [Indexed: 05/27/2023]
Abstract
Alterations in the growth and maturation of chondrocytes can lead to variation in human height, including monogenic disorders of skeletal growth. We aimed to identify genes and pathways relevant to human growth by pairing human height genome-wide association studies (GWASs) with genome-wide knockout (KO) screens of growth-plate chondrocyte proliferation and maturation in vitro. We identified 145 genes that alter chondrocyte proliferation and maturation at early and/or late time points in culture, with 90% of genes validating in secondary screening. These genes are enriched in monogenic growth disorder genes and in KEGG pathways critical for skeletal growth and endochondral ossification. Further, common variants near these genes capture height heritability independent of genes computationally prioritized from GWASs. Our study emphasizes the value of functional studies in biologically relevant tissues as orthogonal datasets to refine likely causal genes from GWASs and implicates new genetic regulators of chondrocyte proliferation and maturation.
Collapse
Affiliation(s)
- John M. Baronas
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Eric Bartell
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Anders Eliasen
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - John G. Doench
- Genetic Perturbation Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Loic Yengo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Sailaja Vedantam
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eirini Marouli
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - GIANT Consortium
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
- Genetic Perturbation Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
- Endocrine Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Henry M. Kronenberg
- Endocrine Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Joel N. Hirschhorn
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nora E. Renthal
- Department of Pediatrics, Division of Endocrinology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
199
|
Bi W, Zhou W, Zhang P, Sun Y, Yue W, Lee S. Scalable mixed model methods for set-based association studies on large-scale categorical data analysis and its application to exome-sequencing data in UK Biobank. Am J Hum Genet 2023; 110:762-773. [PMID: 37019109 PMCID: PMC10183366 DOI: 10.1016/j.ajhg.2023.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/13/2023] [Indexed: 04/07/2023] Open
Abstract
The ongoing release of large-scale sequencing data in the UK Biobank allows for the identification of associations between rare variants and complex traits. SAIGE-GENE+ is a valid approach to conducting set-based association tests for quantitative and binary traits. However, for ordinal categorical phenotypes, applying SAIGE-GENE+ with treating the trait as quantitative or binarizing the trait can cause inflated type I error rates or power loss. In this study, we propose a scalable and accurate method for rare-variant association tests, POLMM-GENE, in which we used a proportional odds logistic mixed model to characterize ordinal categorical phenotypes while adjusting for sample relatedness. POLMM-GENE fully utilizes the categorical nature of phenotypes and thus can well control type I error rates while remaining powerful. In the analyses of UK Biobank 450k whole-exome-sequencing data for five ordinal categorical traits, POLMM-GENE identified 54 gene-phenotype associations.
Collapse
Affiliation(s)
- Wenjian Bi
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China; Center for Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China; Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China.
| | - Wei Zhou
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Peipei Zhang
- Department of Biochemistry and Biophysics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China; Key Laboratory for Neuroscience, Ministry of Education/National Health and Family Planning Commission, Peking University, Beijing, China
| | - Yaoyao Sun
- Peking University Sixth Hospital, Peking University Institute of Mental Health, Beijing, China; NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China
| | - Weihua Yue
- Peking University Sixth Hospital, Peking University Institute of Mental Health, Beijing, China; NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, China; Henan Key Lab of Biological Psychiatry, the Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, Henan, China; Chinese Institute for Brain Research, Beijing, China
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, Korea.
| |
Collapse
|
200
|
Zhang Z, Hong W, Wu Q, Tsavachidis S, Li JR, Amos CI, Cheng C, Sartain SE, Afshar-Kharghan V, Dong JF, Bhatraju P, Martin PJ, Makar RS, Bendapudi PK, Li A. Pathway-driven rare germline variants associated with transplant-associated thrombotic microangiopathy (TA-TMA). Thromb Res 2023; 225:39-46. [PMID: 36948020 PMCID: PMC10147584 DOI: 10.1016/j.thromres.2023.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/20/2023] [Accepted: 03/05/2023] [Indexed: 03/17/2023]
Abstract
The significance of rare germline mutations in transplant-associated thrombotic microangiopathy (TA-TMA) is not well studied. We performed a genetic association study in 100 adult TA-TMA patients vs. 98 post-transplant controls after matching by race, sex, and year. We focused on 5 pathways in complement, von Willebrand factor (VWF) function and related proteins, VWF clearance, ADAMTS13 function and related proteins, and endothelial activation (3641variants in 52 genes). In the primary analysis focused on 189 functional rare variants, no differential variant enrichment was observed in any of the pathways; specifically, 29 % TA-TMA and 33 % controls had at least 1 rare complement mutation. In the secondary analysis focused on 37 rare variants predicted to be pathogenic or likely pathogenic by ClinVar, Complement Database, or REVEL in-silico prediction tool, rare variants in the VWF clearance pathway were found to be significantly associated with TA-TMA (p = 0.008). On the gene level, LRP1 was the only one with significantly increased variants in TA-TMA in both analyses (p = 0.025 and 0.015). In conclusion, we did not find a significant association between rare variants in the complement pathway and TA-TMA; however, we discovered a new signal in the VWF clearance pathway driven by the gene LRP1 among likely pathogenic variants.
Collapse
Affiliation(s)
- Zhihui Zhang
- Institute for Clinical & Translational Research, Baylor College of Medicine, Houston, TX, United States of America
| | - Wei Hong
- Institute for Clinical & Translational Research, Baylor College of Medicine, Houston, TX, United States of America
| | - Qian Wu
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America
| | - Spiridon Tsavachidis
- Section of Epidemiology and Population Science, Baylor College of Medicine, Houston, TX, United States of America
| | - Jian-Rong Li
- Institute for Clinical & Translational Research, Baylor College of Medicine, Houston, TX, United States of America
| | - Christopher I Amos
- Institute for Clinical & Translational Research, Baylor College of Medicine, Houston, TX, United States of America; Section of Epidemiology and Population Science, Baylor College of Medicine, Houston, TX, United States of America
| | - Chao Cheng
- Institute for Clinical & Translational Research, Baylor College of Medicine, Houston, TX, United States of America
| | - Sarah E Sartain
- Section of Hematology-Oncology, Department of Pediatrics, Baylor College of Medicine, Houston, TX, United States of America
| | - Vahid Afshar-Kharghan
- Section of Benign Hematology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States of America
| | - Jing-Fei Dong
- BloodWorks Northwest Research Institute, Seattle, WA, United States of America
| | - Pavan Bhatraju
- Division of Pulmonary Critical Care and Sleep Medicine, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States of America
| | - Paul J Martin
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States of America; Division of Medical Oncology, Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States of America
| | - Robert S Makar
- Division of Hematology and Blood Transfusion Service, Massachusetts General Hospital, Boston, MA, United States of America; Division of Hemostasis and Thrombosis, Beth Israel Deaconess Medical Center, Boston, MA, United States of America
| | - Pavan K Bendapudi
- Division of Hematology and Blood Transfusion Service, Massachusetts General Hospital, Boston, MA, United States of America; Division of Hemostasis and Thrombosis, Beth Israel Deaconess Medical Center, Boston, MA, United States of America; Harvard Medical School, Boston, MA, United States of America
| | - Ang Li
- Section of Hematology-Oncology, Department of Medicine, Baylor College of Medicine, Houston, TX, United States of America.
| |
Collapse
|