1
|
Zhang S, Wang Z, Wang Y, Zhu Y, Zhou Q, Jian X, Zhao G, Qiu J, Xia K, Tang B, Mutz J, Li J, Li B. A metabolomic profile of biological aging in 250,341 individuals from the UK Biobank. Nat Commun 2024; 15:8081. [PMID: 39278973 PMCID: PMC11402978 DOI: 10.1038/s41467-024-52310-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 09/02/2024] [Indexed: 09/18/2024] Open
Abstract
The metabolomic profile of aging is complex. Here, we analyse 325 nuclear magnetic resonance (NMR) biomarkers from 250,341 UK Biobank participants, identifying 54 representative aging-related biomarkers associated with all-cause mortality. We conduct genome-wide association studies (GWAS) for these 325 biomarkers using whole-genome sequencing (WGS) data from 95,372 individuals and perform multivariable Mendelian randomization (MVMR) analyses, discovering 439 candidate "biomarker - disease" causal pairs at the nominal significance level. We develop a metabolomic aging score that outperforms other aging metrics in predicting short-term mortality risk and exhibits strong potential for discriminating aging-accelerated populations and improving disease risk prediction. A longitudinal analysis of 13,263 individuals enables us to calculate a metabolomic aging rate which provides more refined aging assessments and to identify candidate anti-aging and pro-aging NMR biomarkers. Taken together, our study has presented a comprehensive aging-related metabolomic profile and highlighted its potential for personalized aging monitoring and early disease intervention.
Collapse
Affiliation(s)
- Shiyu Zhang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China
- Xiangya School of Medicine, Central South University, Changsha, Hunan, 410013, China
| | - Zheng Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
| | - Yijing Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China
| | - Yixiao Zhu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China
| | - Qiao Zhou
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China
| | - Xingxing Jian
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China
| | - Jian Qiu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
| | - Kun Xia
- MOE Key Laboratory of Pediatric Rare Diseases & Hunan Key Laboratory of Medical Genetics, Central South University, Changsha, Hunan, 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
- Department of Neurology & Multi-omics Research Center for Brain Disorders, The First Affiliated Hospital University of South China, Hengyang, Hunan, China
| | - Julian Mutz
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK.
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China.
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
- Bioinformatics Center, Xiangya Hospital & Furong Laboratory, Changsha, Hunan, 410008, China.
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, 410008, China.
| |
Collapse
|
2
|
Zhu L, Zhang S, Sha Q. Meta-analysis of set-based multiple phenotype association test based on GWAS summary statistics from different cohorts. Front Genet 2024; 15:1359591. [PMID: 39301532 PMCID: PMC11410627 DOI: 10.3389/fgene.2024.1359591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 08/23/2024] [Indexed: 09/22/2024] Open
Abstract
Genome-wide association studies (GWAS) have emerged as popular tools for identifying genetic variants that are associated with complex diseases. Standard analysis of a GWAS involves assessing the association between each variant and a disease. However, this approach suffers from limited reproducibility and difficulties in detecting multi-variant and pleiotropic effects. Although joint analysis of multiple phenotypes for GWAS can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits, most of the multiple phenotype association tests are designed for a single variant, resulting in much lower power, especially when their effect sizes are small and only their cumulative effect is associated with multiple phenotypes. To overcome these limitations, set-based multiple phenotype association tests have been developed to enhance statistical power and facilitate the identification and interpretation of pleiotropic regions. In this research, we propose a new method, named Meta-TOW-S, which conducts joint association tests between multiple phenotypes and a set of variants (such as variants in a gene) utilizing GWAS summary statistics from different cohorts. Our approach applies the set-based method that Tests for the effect of an Optimal Weighted combination of variants in a gene (TOW) and accounts for sample size differences across GWAS cohorts by employing the Cauchy combination method. Meta-TOW-S combines the advantages of set-based tests and multi-phenotype association tests, exhibiting computational efficiency and enabling analysis across multiple phenotypes while accommodating overlapping samples from different GWAS cohorts. To assess the performance of Meta-TOW-S, we develop a phenotype simulator package that encompasses a comprehensive simulation scheme capable of modeling multiple phenotypes and multiple variants, including noise structures and diverse correlation patterns among phenotypes. Simulation studies validate that Meta-TOW-S maintains a desirable Type I error rate. Further simulation under different scenarios shows that Meta-TOW-S can improve power compared with other existing meta-analysis methods. When applied to four psychiatric disorders summary data, Meta-TOW-S detects a greater number of significant genes.
Collapse
Affiliation(s)
- Lirong Zhu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| |
Collapse
|
3
|
Jiang MZ, Gaynor SM, Li X, Van Buren E, Stilp A, Buth E, Wang FF, Manansala R, Gogarten SM, Li Z, Polfus LM, Salimi S, Bis JC, Pankratz N, Yanek LR, Durda P, Tracy RP, Rich SS, Rotter JI, Mitchell BD, Lewis JP, Psaty BM, Pratte KA, Silverman EK, Kaplan RC, Avery C, North KE, Mathias RA, Faraday N, Lin H, Wang B, Carson AP, Norwood AF, Gibbs RA, Kooperberg C, Lundin J, Peters U, Dupuis J, Hou L, Fornage M, Benjamin EJ, Reiner AP, Bowler RP, Lin X, Auer PL, Raffield LM. Whole genome sequencing based analysis of inflammation biomarkers in the Trans-Omics for Precision Medicine (TOPMed) consortium. Hum Mol Genet 2024; 33:1429-1441. [PMID: 38747556 PMCID: PMC11305684 DOI: 10.1093/hmg/ddae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 01/31/2024] [Accepted: 03/11/2024] [Indexed: 05/28/2024] Open
Abstract
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38 465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program (with varying sample size by trait, where the minimum sample size was n = 737 for MMP-1). We identified 22 distinct single-variant associations across 6 traits-E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin-that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
Collapse
Affiliation(s)
- Min-Zhi Jiang
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599, United States
| | - Sheila M Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Avenue, Boston, MA 02115, United States
- Regeneron Genetics Center, 777 Old Saw Mill River Road, Tarrytown, NY 10591, United States
| | - Xihao Li
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599, United States
- Department of Biostatistics, 135 Dauer Drive, 4115D McGavran-Greenberg Hall, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Eric Van Buren
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Avenue, Boston, MA 02115, United States
| | - Adrienne Stilp
- Department of Biostatistics, 4333 Brooklyn Ave NE, University of Washington, Seattle, WA 98105, United States
| | - Erin Buth
- Department of Biostatistics, 4333 Brooklyn Ave NE, University of Washington, Seattle, WA 98105, United States
| | - Fei Fei Wang
- Department of Biostatistics, 4333 Brooklyn Ave NE, University of Washington, Seattle, WA 98105, United States
| | - Regina Manansala
- Centre for Health Economics Research & Modelling Infectious Diseases (CHERMID), Vaccine & Infectious Disease Institute (VAXINFECTIO) WHO Collaborating Centre, University of Antwerp, Campus Drie Eiken - Building S; Universiteitsplein 1 2610 Antwerpen, Belgium
| | - Stephanie M Gogarten
- Department of Biostatistics, 4333 Brooklyn Ave NE, University of Washington, Seattle, WA 98105, United States
| | - Zilin Li
- School of Mathematics and Statistics, Northeast Normal University, 5268 Renmin Street, Changchun, JL 130024, China
| | - Linda M Polfus
- Advanced Analytics, Ambry Genetics, 1 Enterprise, Aliso Viejo, CA 92656, United States
| | - Shabnam Salimi
- Department of Epidemiology and Public Health, Division of Gerontology, University of Maryland School of Medicine, 655 W. Baltimore Street, Baltimore, MD 21201, United States
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, 4333 Brooklyn Ave NE, Box 359458, Seattle, WA 98195, United States
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, 420 Delaware Street SE, Minneapolis, MN 55455, United States
| | - Lisa R Yanek
- Department of Medicine, General Internal Medicine, Johns Hopkins University School of Medicine, 1830 E Monument St Rm 8024, Baltimore, MD 21287, United States
| | - Peter Durda
- Department of Pathology & Laboratory Medicine, University of Vermont Larner College of Medicine, 360 South Park Drive, Colchester, VT 05446, United States
| | - Russell P Tracy
- Department of Pathology & Laboratory Medicine, University of Vermont Larner College of Medicine, 360 South Park Drive, Colchester, VT 05446, United States
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia School of Medicine, 200 Jeanette Lancaster Way, Charlottesville, VA 22903, United States
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, 1124 W. Carson Street, Torrance, CA 90502, United States
| | - Braxton D Mitchell
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, 670 W. Baltimore St., Baltimore, MD 21201, United States
| | - Joshua P Lewis
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, 670 W. Baltimore St., Baltimore, MD 21201, United States
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, 4333 Brooklyn Ave NE, Box 359458, Seattle, WA 98195, United States
- Departments of Epidemiology and Health Systems and Population Health, University of Washington, 4333 Brooklyn Ave NE, Seattle, WA 98101, United States
| | - Katherine A Pratte
- Department of Medicine, Division of Pulmonary, Critical Care, and Sleep Medicine, National Jewish Health, 1400 Jackson Street, Denver, CO 80206, United States
| | - Edwin K Silverman
- Department of Medicine, Channing Division of Network Medicine, Brigham and Women's Hospital, 181 Longwood Avenue, Boston, MA 02115, United States
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, United States
| | - Christy Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, 137 East Franklin Street, Chapel Hill, NC 27599, United States
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, 137 East Franklin Street, Chapel Hill, NC 27599, United States
| | - Rasika A Mathias
- Department of Medicine, Allergy and Clinical Immunology, Johns Hopkins University School of Medicine, 5501 Hopkins Bayview Cir JHAAC Room 3B53, Baltimore, MD 21287, United States
| | - Nauder Faraday
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, 600 N Wolfe St, Baltimore, MD 21287, United States
| | - Honghuang Lin
- Department of Medicine, University of Massachusetts Chan Medical School, 55 Lake Ave North, Worcester, MA 01655, United States
| | - Biqi Wang
- Department of Medicine, University of Massachusetts Chan Medical School, 55 Lake Ave North, Worcester, MA 01655, United States
| | - April P Carson
- Department of Medicine, University of Mississippi Medical Center, 350 W. Woodrow Wilson Avenue, Suite 701, Jackson, MS 39213, United States
| | - Arnita F Norwood
- Department of Medicine, University of Mississippi Medical Center, 350 W. Woodrow Wilson Avenue, Suite 701, Jackson, MS 39213, United States
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, United States
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, 1100 Fairview Avenue N, Seattle, WA 98109, United States
| | - Jessica Lundin
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, 1100 Fairview Avenue N, Seattle, WA 98109, United States
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, 1100 Fairview Avenue N, Seattle, WA 98109, United States
| | - Josée Dupuis
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, 2001 McGill College Avenue, Montreal, QC H3A 1G1, Canada
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Avenue, Boston, MA 02118, United States
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 680 N Lake Shore Drive, Chicago, IL 60611, United States
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, 1825 Pressler Street, Houston, TX 77030, United States
| | - Emelia J Benjamin
- Department of Medicine, Cardiovascular Medicine, Boston Medical Center, Boston University Chobanian and Avedisian School of Medicine, 72 East Newton Street, Boston, MA 02118, United States
- Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Avenue, Boston, MA 02118, United States
- Boston University and National Heart, Lung, and Blood Institute’s Framingham Heart Study, 73 Mount Wayte Ave #2, Framingham, MA 01702, United States
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, 4333 Brooklyn Ave NE, Seattle, WA 98105, United States
| | - Russell P Bowler
- Department of Medicine, Division of Pulmonary, Critical Care, and Sleep Medicine, National Jewish Health, 1400 Jackson Street, Denver, CO 80206, United States
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Avenue, Boston, MA 02115, United States
| | - Paul L Auer
- Division of Biostatistics, Institute for Health and Equity, and Cancer Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226, United States
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599, United States
| | | |
Collapse
|
4
|
Yu Z, Coorens THH, Uddin MM, Ardlie KG, Lennon N, Natarajan P. Genetic variation across and within individuals. Nat Rev Genet 2024; 25:548-562. [PMID: 38548833 DOI: 10.1038/s41576-024-00709-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/09/2024] [Indexed: 04/12/2024]
Abstract
Germline variation and somatic mutation are intricately connected and together shape human traits and disease risks. Germline variants are present from conception, but they vary between individuals and accumulate over generations. By contrast, somatic mutations accumulate throughout life in a mosaic manner within an individual due to intrinsic and extrinsic sources of mutations and selection pressures acting on cells. Recent advancements, such as improved detection methods and increased resources for association studies, have drastically expanded our ability to investigate germline and somatic genetic variation and compare underlying mutational processes. A better understanding of the similarities and differences in the types, rates and patterns of germline and somatic variants, as well as their interplay, will help elucidate the mechanisms underlying their distinct yet interlinked roles in human health and biology.
Collapse
Affiliation(s)
- Zhi Yu
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | | | - Md Mesbah Uddin
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | | | - Niall Lennon
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Pradeep Natarajan
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
5
|
Cirulli ET, Schiabor Barrett KM, Bolze A, Judge DP, Pawloski PA, Grzymski JJ, Lee W, Washington NL. A power-based sliding window approach to evaluate the clinical impact of rare genetic variants in the nucleotide sequence or the spatial position of the folded protein. HGG ADVANCES 2024; 5:100284. [PMID: 38509709 PMCID: PMC11004801 DOI: 10.1016/j.xhgg.2024.100284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 03/14/2024] [Accepted: 03/15/2024] [Indexed: 03/22/2024] Open
Abstract
Systematic determination of novel variant pathogenicity remains a major challenge, even when there is an established association between a gene and phenotype. Here we present Power Window (PW), a sliding window technique that identifies the impactful regions of a gene using population-scale clinico-genomic datasets. By sizing analysis windows on the number of variant carriers, rather than the number of variants or nucleotides, statistical power is held constant, enabling the localization of clinical phenotypes and removal of unassociated gene regions. The windows can be built by sliding across either the nucleotide sequence of the gene (through 1D space) or the positions of the amino acids in the folded protein (through 3D space). Using a training set of 350k exomes from the UK Biobank (UKB), we developed PW models for well-established gene-disease associations and tested their accuracy in two independent cohorts (117k UKB exomes and 65k exomes sequenced at Helix in the Healthy Nevada Project, myGenetics, or In Our DNA SC studies). The significant models retained a median of 49% of the qualifying variant carriers in each gene (range 2%-98%), with quantitative traits showing a median effect size improvement of 66% compared with aggregating variants across the entire gene, and binary traits' odds ratios improving by a median of 2.2-fold. PW showcases that electronic health record-based statistical analyses can accurately distinguish between novel coding variants in established genes that will have high phenotypic penetrance and those that will not, unlocking new potential for human genomics research, drug development, variant interpretation, and precision medicine.
Collapse
Affiliation(s)
| | | | - Alexandre Bolze
- Helix, 101 S Ellsworth Ave Suite 350, San Mateo, CA 94401, USA
| | - Daniel P Judge
- Division of Cardiology, Medical University of South Carolina, 30 Courtenay Drive, MSC 592, Charleston, SC 29425, USA
| | | | - Joseph J Grzymski
- University of Nevada, 2215 Raggio Pkwy, Reno, NV 89512, USA; Renown Institute for Health Innovation, Reno, NV 89512, USA
| | - William Lee
- Helix, 101 S Ellsworth Ave Suite 350, San Mateo, CA 94401, USA
| | | |
Collapse
|
6
|
Liang Q, Abraham A, Capra JA, Kostka D. Disease-specific prioritization of non-coding GWAS variants based on chromatin accessibility. HGG ADVANCES 2024; 5:100310. [PMID: 38773771 PMCID: PMC11259938 DOI: 10.1016/j.xhgg.2024.100310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/15/2024] [Accepted: 05/16/2024] [Indexed: 05/24/2024] Open
Abstract
Non-protein-coding genetic variants are a major driver of the genetic risk for human disease; however, identifying which non-coding variants contribute to diseases and their mechanisms remains challenging. In silico variant prioritization methods quantify a variant's severity, but for most methods, the specific phenotype and disease context of the prediction remain poorly defined. For example, many commonly used methods provide a single, organism-wide score for each variant, while other methods summarize a variant's impact in certain tissues and/or cell types. Here, we propose a complementary disease-specific variant prioritization scheme, which is motivated by the observation that variants contributing to disease often operate through specific biological mechanisms. We combine tissue/cell-type-specific variant scores (e.g., GenoSkyline, FitCons2, DNA accessibility) into disease-specific scores with a logistic regression approach and apply it to ∼25,000 non-coding variants spanning 111 diseases. We show that this disease-specific aggregation significantly improves the association of common non-coding genetic variants with disease (average precision: 0.151, baseline = 0.09), compared with organism-wide scores (GenoCanyon, LINSIGHT, GWAVA, Eigen, CADD; average precision: 0.129, baseline = 0.09). Further on, disease similarities based on data-driven aggregation weights highlight meaningful disease groups, and it provides information about tissues and cell types that drive these similarities. We also show that so-learned similarities are complementary to genetic similarities as quantified by genetic correlation. Overall, our approach demonstrates the strengths of disease-specific variant prioritization, leads to improvement in non-coding variant prioritization, and enables interpretable models that link variants to disease via specific tissues and/or cell types.
Collapse
Affiliation(s)
- Qianqian Liang
- Department of Computational & Systems Biology and Center for Evolutionary Biology and Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA; Department of Human Genetics, University of Pittsburgh School of Public Health, Pittsburgh, PA, USA
| | - Abin Abraham
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - John A Capra
- Department of Epidemiology & Biostatistics and Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Dennis Kostka
- Department of Computational & Systems Biology and Center for Evolutionary Biology and Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
7
|
Kim Y, Jeong M, Koh IG, Kim C, Lee H, Kim JH, Yurko R, Kim IB, Park J, Werling DM, Sanders SJ, An JY. CWAS-Plus: estimating category-wide association of rare noncoding variation from whole-genome sequencing data with cell-type-specific functional data. Brief Bioinform 2024; 25:bbae323. [PMID: 38966948 PMCID: PMC11224609 DOI: 10.1093/bib/bbae323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 06/13/2024] [Accepted: 06/18/2024] [Indexed: 07/06/2024] Open
Abstract
Variants in cis-regulatory elements link the noncoding genome to human pathology; however, detailed analytic tools for understanding the association between cell-level brain pathology and noncoding variants are lacking. CWAS-Plus, adapted from a Python package for category-wide association testing (CWAS), enhances noncoding variant analysis by integrating both whole-genome sequencing (WGS) and user-provided functional data. With simplified parameter settings and an efficient multiple testing correction method, CWAS-Plus conducts the CWAS workflow 50 times faster than CWAS, making it more accessible and user-friendly for researchers. Here, we used a single-nuclei assay for transposase-accessible chromatin with sequencing to facilitate CWAS-guided noncoding variant analysis at cell-type-specific enhancers and promoters. Examining autism spectrum disorder WGS data (n = 7280), CWAS-Plus identified noncoding de novo variant associations in transcription factor binding sites within conserved loci. Independently, in Alzheimer's disease WGS data (n = 1087), CWAS-Plus detected rare noncoding variant associations in microglia-specific regulatory elements. These findings highlight CWAS-Plus's utility in genomic disorders and scalability for processing large-scale WGS data and in multiple-testing corrections. CWAS-Plus and its user manual are available at https://github.com/joonan-lab/cwas/ and https://cwas-plus.readthedocs.io/en/latest/, respectively.
Collapse
Affiliation(s)
- Yujin Kim
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - Minwoo Jeong
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - In Gyeong Koh
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - Chanhee Kim
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - Hyeji Lee
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - Jae Hyun Kim
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - Ronald Yurko
- Department of Statistics and Data Science, Carnegie Mellon University, 5000 Forbes Avenue, Squirrel Hill North, Pittsburgh, PA 15213, United States
| | - Il Bin Kim
- Department of Psychiatry, CHA Gangnam Medical Center, CHA University School of Medicine, 566 Nonhyon-ro, Gangnam-gu, Seoul 06135, Republic of Korea
| | - Jeongbin Park
- School of Biomedical Convergence Engineering, Pusan National University, 49 Busandaehak-ro, Mulgeum-eup, Yangsan-si, Gyeongsangnam-do, 50612, Republic of Korea
| | - Donna M Werling
- Laboratory of Genetics, University of Wisconsin-Madison, 425-g Henry Mall, Madison, WI 53706, Unite States
| | - Stephan J Sanders
- Department of Paediatrics, Institute of Developmental and Regenerative Medicine, University of Oxford, Old Road Campus, Roosevelt Dr, Headington, Oxford OX3 7TY, United Kingdom
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, 1651 4th Street, San Francisco, CA 94158, United States
| | - Joon-Yong An
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| |
Collapse
|
8
|
Mi H, Wang M, Chang Y. The potential impact of polymorphisms in METTL3 gene on knee osteoarthritis susceptibility. Heliyon 2024; 10:e28035. [PMID: 38560129 PMCID: PMC10981020 DOI: 10.1016/j.heliyon.2024.e28035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 02/29/2024] [Accepted: 03/11/2024] [Indexed: 04/04/2024] Open
Abstract
Objective This study was aimed to explore the correlation between METTL3 polymorphisms and susceptibility to knee osteoarthritis (KOA). Methods The relationship of five single nucleotide polymorphisms (SNPs) in the METTL3 gene with the susceptibility of KOA was analyzed through multinomial logistic regression analysis in this a case-control study. Genotyping was performed on 228 KOA patients and 252 unaffected individuals from South China based on the TaqMan method. The MDR software (version 3.0.2) was utilized for the analysis of SNP interactions. Results Out of the five SNPs examined, the T > G change in the METTL3 gene at the rs1061026 locus increased the risk of KOA, while rs1139130 A > G and rs1263802 C > T variants were found to be linked with a reduced risk of developing KOA with statistical significance. The rs1061027 A > C and rs1263801 C > G variants did not show significant association (p>0.05). The rs1061026 TG/GG genotype showed a significant correlation with an increased risk of KOA in the following subgroups: the males, individuals with a BMI ranging from 24 to 28, smokers, those who were not engaged in physical exercise (PE), patients who had experienced KOA symptoms for eight years or longer, and those without a family history of the disease or reported swelling. On the other hand, the rs1139130 AG/GG genotype demonstrated a protective effect against KOA among the females, individuals with a BMI greater than or equal to 24, a unilateral KOA, or a KOA duration of 8 years or less, non-smokers, non-alcohol drinkers, those who were not engaged in PE, and those who had no injury or family history, or no experience of knee swelling. Additionally, it was observed that the rs1263802 CT/TT genotypes showed a protective effect among patients without a history of injury. Furthermore, individuals with the haplotypes GAT, GGC, TAT, and TGC were found to have a significantly lower susceptibility to KOA compared to the reference haplotype TAC. Conclusions The METTL3 gene variant rs1061026 could increase the risk of KOA, whereas the variants of rs1139130 as well as rs1263802 might exert a protective effect against KOA. These variants could potentially function as susceptibility markers for KOA among the population from South China.
Collapse
Affiliation(s)
- Houlin Mi
- Department of Orthopedics, South China Hospital Affiliated to Shenzhen University, 1# Fuxin Road, Longgang District, Shenzhen City, Guangdong Province, 518111, China
| | - Mingzhi Wang
- Department of Thoracic Surgery, Guangdong Second Provincial General Hospital, 466# Xingang Middle Road, Haizhu District, Guangzhou City, Guangdong Province, 510006, China
| | - Yongmei Chang
- Department of Respiratory Medicine, Guangdong Second Provincial General Hospital, 466# Xingang Middle Road, Haizhu District, Guangzhou City, Guangdong Province, 510006, China
| |
Collapse
|
9
|
Kim Y, Jeong M, Koh IG, Kim C, Lee H, Kim JH, Yurko R, Kim IB, Park J, Werling DM, Sanders SJ, An JY. CWAS-Plus: Estimating category-wide association of rare noncoding variation from whole-genome sequencing data with cell-type-specific functional data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305828. [PMID: 38699372 PMCID: PMC11065022 DOI: 10.1101/2024.04.15.24305828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Variants in cis-regulatory elements link the noncoding genome to human brain pathology; however, detailed analytic tools for understanding the association between cell-level brain pathology and noncoding variants are lacking. CWAS-Plus, adapted from a Python package for category-wide association testing (CWAS) employs both whole-genome sequencing and user-provided functional data to enhance noncoding variant analysis, with a faster and more efficient execution of the CWAS workflow. Here, we used single-nuclei assay for transposase-accessible chromatin with sequencing to facilitate CWAS-guided noncoding variant analysis at cell-type specific enhancers and promoters. Examining autism spectrum disorder whole-genome sequencing data (n = 7,280), CWAS-Plus identified noncoding de novo variant associations in transcription factor binding sites within conserved loci. Independently, in Alzheimer's disease whole-genome sequencing data (n = 1,087), CWAS-Plus detected rare noncoding variant associations in microglia-specific regulatory elements. These findings highlight CWAS-Plus's utility in genomic disorders and scalability for processing large-scale whole-genome sequencing data and in multiple-testing corrections. CWAS-Plus and its user manual are available at https://github.com/joonan-lab/cwas/ and https://cwas-plus.readthedocs.io/en/latest/, respectively.
Collapse
Affiliation(s)
- Yujin Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Minwoo Jeong
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, 02841, Republic of Korea
| | - In Gyeong Koh
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Chanhee Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Hyeji Lee
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Jae Hyun Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Ronald Yurko
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Il Bin Kim
- Department of Psychiatry, CHA Gangnam Medical Center, CHA University School of Medicine, Seoul, 06135, Republic of Korea
| | - Jeongbin Park
- School of Biomedical Convergence Engineering, Pusan National University, Busan, 50612, Republic of Korea
| | - Donna M. Werling
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Stephan J. Sanders
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, CA 94158, USA
| | - Joon-Yong An
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, 02841, Republic of Korea
| |
Collapse
|
10
|
Schraiber JG, Edge MD, Pennell M. Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.10.579721. [PMID: 38496530 PMCID: PMC10942266 DOI: 10.1101/2024.02.10.579721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these two fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we derive a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., Genome-Wide Association Studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur using analytical theory and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate this by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study of this, we re-examine an analysis testing for co-evolution of expression levels between genes across a fungal phylogeny, and show that including covariance matrix eigenvectors as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.
Collapse
|
11
|
Feng J, Zeng Z, Luo S, Liu X, Luo Q, Yang K, Zhang G, Liu J. Carrier frequencies, trends, and geographical distribution of hearing loss variants in China: The pooled analysis of 2,161,984 newborns. Heliyon 2024; 10:e24850. [PMID: 38322914 PMCID: PMC10845244 DOI: 10.1016/j.heliyon.2024.e24850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/07/2023] [Accepted: 01/16/2024] [Indexed: 02/08/2024] Open
Abstract
The aim of this study is to comprehensively investigate the prevalence and distribution patterns of three common genetic variants associated with hearing loss (HL) in Chinese neonatal population. Methods: Prior to June 30, 2023, an extensive search and screening process was conducted across multiple literature databases. R software was utilized for conducting meta-analyses, cartography, and correlation analyses. Results: Firstly, our study identified a total of 99 studies meeting the inclusion criteria. Notably, provinces such as Qinghai, Tibet, Jilin, and Heilongjiang lack large-scale genetic screening data for neonatal deafness. Secondly, in Chinese newborns, the carrier frequencies of GJB2 variants (c.235delC, c.299_300delAT) were 1.63 % (95 %CI 1.52 %-1.76 %) and 0.33 % (95 %CI 0.30 %-0.37 %); While SLC26A4 variants (c.919-2A > G, c.2168A > G) exhibited carrier rates of 0.95 % (95 %CI 0.86 %-1.04 %) and 0.17 % (95 %CI 0.15 %-0.19 %); Additionally, Mt 12S rRNA m.1555 A > G variant was found at a rate of 0.24 % (95 % CI 0.22 %-0.26 %). Thirdly, the mutation rate of GJB2 c.235delC was higher in the east of the Heihe-Tengchong line, whereas the mutation rate of Mt 12S rRNA m.1555 A > G variant exhibited the opposite pattern. Forthly, no significant correlation exhibited the opposite pattern of GJB2 variants, but there was a notable correlation among SLC26A4 variants. Lastly, strong regional distribution correlations were evident between mutation sites from different genes, particularly between SLC26A4 (c.919-2A > G and c.2168A > G) and GJB c.299_300delAT. Conclusions: The most prevalent deafness genes among Chinese neonates were GJB2 c.235delC variant, followed by SLC26A4 c.919-2A > G variant. These gene mutation rates exhibit significant regional distribution characteristics. Consequently, it is imperative to enhance genetic screening efforts to reduce the incidence of deafness in high-risk areas.
Collapse
Affiliation(s)
- Jia Feng
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Zhangrui Zeng
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Sijian Luo
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Xuexue Liu
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Qing Luo
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Kui Yang
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| | - Guanbin Zhang
- Department of Laboratory Medicine, Fujian Medical University, Fuzhou 350122, China
- National Engineering Research Center for Beijing Biochip Technology, Beijing, 102206 ,China
| | - Jinbo Liu
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Sichuan Province Engineering Technology Research Center of Molecular Diagnosis of Clinical Diseases, Molecular Diagnosis of Clinical Diseases Key Laboratory of Luzhou, Sichuan, China
| |
Collapse
|
12
|
Li X, Pura J, Allen A, Owzar K, Lu J, Harms M, Xie J. DYNATE: Localizing rare-variant association regions via multiple testing embedded in an aggregation tree. Genet Epidemiol 2024; 48:42-55. [PMID: 38014869 PMCID: PMC10842871 DOI: 10.1002/gepi.22542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/09/2023] [Accepted: 10/26/2023] [Indexed: 11/29/2023]
Abstract
Rare-variants (RVs) genetic association studies enable researchers to uncover the variation in phenotypic traits left unexplained by common variation. Traditional single-variant analysis lacks power; thus, researchers have developed various methods to aggregate the effects of RVs across genomic regions to study their collective impact. Some existing methods utilize a static delineation of genomic regions, often resulting in suboptimal effect aggregation, as neutral subregions within the test region will result in an attenuation of signal. Other methods use varying windows to search for signals but often result in long regions containing many neutral RVs. To pinpoint short genomic regions enriched for disease-associated RVs, we developed a novel method, DYNamic Aggregation TEsting (DYNATE). DYNATE dynamically and hierarchically aggregates smaller genomic regions into larger ones and performs multiple testing for disease associations with a controlled weighted false discovery rate. DYNATE's main advantage lies in its strong ability to identify short genomic regions highly enriched for disease-associated RVs. Extensive numerical simulations demonstrate the superior performance of DYNATE under various scenarios compared with existing methods. We applied DYNATE to an amyotrophic lateral sclerosis study and identified a new gene, EPG5, harboring possibly pathogenic mutations.
Collapse
Affiliation(s)
- Xuechan Li
- Novartis Pharmaceuticals Corporation, Basel, Switzerland
| | | | - Andrew Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Kouros Owzar
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Jianfeng Lu
- Department of Mathematics, Duke University, Durham, North Carolina, USA
| | - Matthew Harms
- Department of Neurology, Columbia University, Broadway, New York, USA
| | - Jichun Xie
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
- Department of Mathematics, Duke University, Durham, North Carolina, USA
| |
Collapse
|
13
|
Feng X, Liu S, Li K, Bu F, Yuan H. NCAD v1.0: a database for non-coding variant annotation and interpretation. J Genet Genomics 2024; 51:230-242. [PMID: 38142743 DOI: 10.1016/j.jgg.2023.12.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/15/2023] [Accepted: 12/18/2023] [Indexed: 12/26/2023]
Abstract
The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.
Collapse
Affiliation(s)
- Xiaoshu Feng
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Sihan Liu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Ke Li
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Fengxiao Bu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| | - Huijun Yuan
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| |
Collapse
|
14
|
Ge F, Arif M, Yan Z, Alahmadi H, Worachartcheewan A, Shoombuatong W. Review of Computational Methods and Database Sources for Predicting the Effects of Coding Frameshift Small Insertion and Deletion Variations. ACS OMEGA 2024; 9:2032-2047. [PMID: 38250421 PMCID: PMC10795160 DOI: 10.1021/acsomega.3c07662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/30/2023] [Accepted: 12/04/2023] [Indexed: 01/23/2024]
Abstract
Genetic variations (including substitutions, insertions, and deletions) exert a profound influence on DNA sequences. These variations are systematically classified as synonymous, nonsynonymous, and nonsense, each manifesting distinct effects on proteins. The implementation of high-throughput sequencing has significantly augmented our comprehension of the intricate interplay between gene variations and protein structure and function, as well as their ramifications in the context of diseases. Frameshift variations, particularly small insertions and deletions (indels), disrupt protein coding and are instrumental in disease pathogenesis. This review presents a succinct review of computational methods, databases, current challenges, and future directions in predicting the consequences of coding frameshift small indels variations. We analyzed the predictive efficacy, reliability, and utilization of computational methods and variant account, reliability, and utilization of database. Besides, we also compared the prediction methodologies on GOF/LOF pathogenic variation data. Addressing the challenges pertaining to prediction accuracy and cross-species generalizability, nascent technologies such as AI and deep learning harbor immense potential to enhance predictive capabilities. The importance of interdisciplinary research and collaboration cannot be overstated for devising effective diagnosis, treatment, and prevention strategies concerning diseases associated with coding frameshift indels variations.
Collapse
Affiliation(s)
- Fang Ge
- State
Key Laboratory of Organic Electronics and lnformation Displays &
lnstitute of Advanced Materials (IAM), Nanjing University of Posts
& Telecommunications, 9 Wenyuan Road, Nanjing 210023, China
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| | - Muhammad Arif
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
| | - Zihao Yan
- School
of Computer Science and Engineering, Nanjing
University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Hanin Alahmadi
- College
of Computer Science and Engineering, Taibah
University, Madinah 344, Saudi Arabia
| | - Apilak Worachartcheewan
- Department
of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Watshara Shoombuatong
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
15
|
Cao C, Shao M, Zuo C, Kwok D, Liu L, Ge Y, Zhang Z, Cui F, Chen M, Fan R, Ding Y, Jiang H, Wang G, Zou Q. RAVAR: a curated repository for rare variant-trait associations. Nucleic Acids Res 2024; 52:D990-D997. [PMID: 37831073 PMCID: PMC10767942 DOI: 10.1093/nar/gkad876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/20/2023] [Accepted: 09/28/2023] [Indexed: 10/14/2023] Open
Abstract
Rare variants contribute significantly to the genetic causes of complex traits, as they can have much larger effects than common variants and account for much of the missing heritability in genome-wide association studies. The emergence of UK Biobank scale datasets and accurate gene-level rare variant-trait association testing methods have dramatically increased the number of rare variant associations that have been detected. However, no systematic collection of these associations has been carried out to date, especially at the gene level. To address the issue, we present the Rare Variant Association Repository (RAVAR), a comprehensive collection of rare variant associations. RAVAR includes 95 047 high-quality rare variant associations (76186 gene-level and 18 861 variant-level associations) for 4429 reported traits which are manually curated from 245 publications. RAVAR is the first resource to collect and curate published rare variant associations in an interactive web interface with integrated visualization, search, and download features. Detailed gene and SNP information are provided for each association, and users can conveniently search for related studies by exploring the EFO tree structure and interactive Manhattan plots. RAVAR could vastly improve the accessibility of rare variant studies. RAVAR is freely available for all users without login requirement at http://www.ravar.bio.
Collapse
Affiliation(s)
- Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Mengting Shao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Chunman Zuo
- Institute of Artificial Intelligence, Donghua University, Shanghai, China
| | - Devin Kwok
- School of Computer Science, McGill University, Montreal, Canada
| | - Lin Liu
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Yuli Ge
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Feifei Cui
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Mingshuai Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Rui Fan
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou, China
| | - Guishen Wang
- College of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
16
|
Chen H, Naseri A, Zhi D. FiMAP: A fast identity-by-descent mapping test for biobank-scale cohorts. PLoS Genet 2023; 19:e1011057. [PMID: 38039339 PMCID: PMC10718418 DOI: 10.1371/journal.pgen.1011057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 12/13/2023] [Accepted: 11/07/2023] [Indexed: 12/03/2023] Open
Abstract
Although genome-wide association studies (GWAS) have identified tens of thousands of genetic loci, the genetic architecture is still not fully understood for many complex traits. Most GWAS and sequencing association studies have focused on single nucleotide polymorphisms or copy number variations, including common and rare genetic variants. However, phased haplotype information is often ignored in GWAS or variant set tests for rare variants. Here we leverage the identity-by-descent (IBD) segments inferred from a random projection-based IBD detection algorithm in the mapping of genetic associations with complex traits, to develop a computationally efficient statistical test for IBD mapping in biobank-scale cohorts. We used sparse linear algebra and random matrix algorithms to speed up the computation, and a genome-wide IBD mapping scan of more than 400,000 samples finished within a few hours. Simulation studies showed that our new method had well-controlled type I error rates under the null hypothesis of no genetic association in large biobank-scale cohorts, and outperformed traditional GWAS single-variant tests when the causal variants were untyped and rare, or in the presence of haplotype effects. We also applied our method to IBD mapping of six anthropometric traits using the UK Biobank data and identified a total of 3,442 associations, 2,131 (62%) of which remained significant after conditioning on suggestive tag variants in the ± 3 centimorgan flanking regions from GWAS.
Collapse
Affiliation(s)
- Han Chen
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Ardalan Naseri
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Degui Zhi
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
17
|
Mo C, Ye Z, Pan Y, Zhang Y, Wu Q, Bi C, Liu S, Mitchell B, Kochunov P, Hong LE, Ma T, Chen S. An in-depth association analysis of genetic variants within nicotine-related loci: Meeting in middle of GWAS and genetic fine-mapping. Mol Cell Neurosci 2023; 127:103895. [PMID: 37634742 PMCID: PMC11128188 DOI: 10.1016/j.mcn.2023.103895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 08/21/2023] [Accepted: 08/24/2023] [Indexed: 08/29/2023] Open
Abstract
In the last two decades of Genome-wide association studies (GWAS), nicotine-dependence-related genetic loci (e.g., nicotinic acetylcholine receptor - nAChR subunit genes) are among the most replicable genetic findings. Although GWAS results have reported tens of thousands of SNPs within these loci, further analysis (e.g., fine-mapping) is required to identify the causal variants. However, it is computationally challenging for existing fine-mapping methods to reliably identify causal variants from thousands of candidate SNPs based on the posterior inclusion probability. To address this challenge, we propose a new method to select SNPs by jointly modeling the SNP-wise inference results and the underlying structured network patterns of the linkage disequilibrium (LD) matrix. We use adaptive dense subgraph extraction method to recognize the latent network patterns of the LD matrix and then apply group LASSO to select causal variant candidates. We applied this new method to the UK biobank data to identify the causal variant candidates for nicotine addiction. Eighty-one nicotine addiction-related SNPs (i.e.,-log(p) > 50) of nAChR were selected, which are highly correlated (average r2>0.8) although they are physically distant (e.g., >200 kilobase away) and from various genes. These findings revealed that distant SNPs from different genes can show higher LD r2 than their neighboring SNPs, and jointly contribute to a complex trait like nicotine addiction.
Collapse
Affiliation(s)
- Chen Mo
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Zhenyao Ye
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Yezhi Pan
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Yuan Zhang
- Department of Statistics, College of Arts and Sciences, Ohio State University, Columbus, Ohio, United States
| | - Qiong Wu
- Department of Mathematics, University of Maryland, College Park, Maryland, United States
| | - Chuan Bi
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Song Liu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China
| | - Braxton Mitchell
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Peter Kochunov
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - L. Elliot Hong
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, Maryland, United States
| | - Shuo Chen
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
- Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, Maryland, United States
| |
Collapse
|
18
|
Zhang J, Zhao H. eQTL studies: from bulk tissues to single cells. J Genet Genomics 2023; 50:925-933. [PMID: 37207929 PMCID: PMC10656365 DOI: 10.1016/j.jgg.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 05/02/2023] [Accepted: 05/04/2023] [Indexed: 05/21/2023]
Abstract
An expression quantitative trait locus (eQTL) is a chromosomal region where genetic variants are associated with the expression levels of specific genes that can be both nearby or distant. The identifications of eQTLs for different tissues, cell types, and contexts have led to a better understanding of the dynamic regulations of gene expressions and implications of functional genes and variants for complex traits and diseases. Although most eQTL studies have been performed on data collected from bulk tissues, recent studies have demonstrated the importance of cell-type-specific and context-dependent gene regulations in biological processes and disease mechanisms. In this review, we discuss statistical methods that have been developed to enable the detection of cell-type-specific and context-dependent eQTLs from bulk tissues, purified cell types, and single cells. We also discuss the limitations of the current methods and future research opportunities.
Collapse
Affiliation(s)
- Jingfei Zhang
- Information Systems and Operations Management, Emory University, Atlanta, GA 30322, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 208034, USA.
| |
Collapse
|
19
|
Li X, Chen H, Selvaraj MS, Van Buren E, Zhou H, Wang Y, Sun R, McCaw ZR, Yu Z, Arnett DK, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Carson AP, Carlson JC, Chami N, Chen YDI, Curran JE, de Vries PS, Fornage M, Franceschini N, Freedman BI, Gu C, Heard-Costa NL, He J, Hou L, Hung YJ, Irvin MR, Kaplan RC, Kardia SL, Kelly T, Konigsberg I, Kooperberg C, Kral BG, Li C, Loos RJ, Mahaney MC, Martin LW, Mathias RA, Minster RL, Mitchell BD, Montasser ME, Morrison AC, Palmer ND, Peyser PA, Psaty BM, Raffield LM, Redline S, Reiner AP, Rich SS, Sitlani CM, Smith JA, Taylor KD, Tiwari H, Vasan RS, Wang Z, Yanek LR, Yu B, Rice KM, Rotter JI, Peloso GM, Natarajan P, Li Z, Liu Z, Lin X. A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.30.564764. [PMID: 37961350 PMCID: PMC10634938 DOI: 10.1101/2023.10.30.564764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Large-scale whole-genome sequencing (WGS) studies have improved our understanding of the contributions of coding and noncoding rare variants to complex human traits. Leveraging association effect sizes across multiple traits in WGS rare variant association analysis can improve statistical power over single-trait analysis, and also detect pleiotropic genes and regions. Existing multi-trait methods have limited ability to perform rare variant analysis of large-scale WGS data. We propose MultiSTAAR, a statistical framework and computationally-scalable analytical pipeline for functionally-informed multi-trait rare variant analysis in large-scale WGS studies. MultiSTAAR accounts for relatedness, population structure and correlation among phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. We applied MultiSTAAR to jointly analyze three lipid traits (low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides) in 61,861 multi-ethnic samples from the Trans-Omics for Precision Medicine (TOPMed) Program. We discovered new associations with lipid traits missed by single-trait analysis, including rare variants within an enhancer of NIPSNAP3A and an intergenic region on chromosome 1.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Margaret Sunitha Selvaraj
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Eric Van Buren
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yuxuan Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zachary R. McCaw
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Zhi Yu
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Donna K. Arnett
- Provost Office, University of South Carolina, Columbia, SC, USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Donald W. Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E. Cade
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - April P. Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Jenna C. Carlson
- Department of Human Genetics and Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nathalie Chami
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Joanne E. Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Paul S. de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, the University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Barry I. Freedman
- Department of Internal Medicine, Nephrology, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Charles Gu
- Division of Biology & Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Nancy L. Heard-Costa
- Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Yi-Jen Hung
- Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Marguerite R. Irvin
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Robert C. Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Sharon L.R. Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Tanika Kelly
- Department of Medicine, Division of Nephrology, University of Illinois Chicago, Chicago, IL, USA
| | - Iain Konigsberg
- Department of Biomedical Informatics, University of Colorado, Aurora, CO, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Brian G. Kral
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Changwei Li
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Ruth J.F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Michael C. Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Lisa W. Martin
- George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | - Rasika A. Mathias
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ryan L. Minster
- Department of Human Genetics and Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Braxton D. Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - May E. Montasser
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C. Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nicholette D. Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Patricia A. Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M. Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Alexander P. Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Colleen M. Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Jennifer A. Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Hemant Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ramachandran S. Vasan
- Framingham Heart Study, Framingham, MA, USA
- Department of Quantitative and Qualitative Health Sciences, UT Health San Antonio School of Public Health, San Antonia, TX, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lisa R. Yanek
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Bing Yu
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Kenneth M. Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Gina M. Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Pradeep Natarajan
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zhonghua Liu
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Xihong Lin
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| |
Collapse
|
20
|
Wang Y, Selvaraj MS, Li X, Li Z, Holdcraft JA, Arnett DK, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Cade BE, Carlson JC, Carson AP, Chen YDI, Curran JE, de Vries PS, Dutcher SK, Ellinor PT, Floyd JS, Fornage M, Freedman BI, Gabriel S, Germer S, Gibbs RA, Guo X, He J, Heard-Costa N, Hildalgo B, Hou L, Irvin MR, Joehanes R, Kaplan RC, Kardia SL, Kelly TN, Kim R, Kooperberg C, Kral BG, Levy D, Li C, Liu C, Lloyd-Jone D, Loos RJ, Mahaney MC, Martin LW, Mathias RA, Minster RL, Mitchell BD, Montasser ME, Morrison AC, Murabito JM, Naseri T, O'Connell JR, Palmer ND, Preuss MH, Psaty BM, Raffield LM, Rao DC, Redline S, Reiner AP, Rich SS, Ruepena MS, Sheu WHH, Smith JA, Smith A, Tiwari HK, Tsai MY, Viaud-Martinez KA, Wang Z, Yanek LR, Zhao W, Rotter JI, Lin X, Natarajan P, Peloso GM. Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed whole-genome sequencing study. Am J Hum Genet 2023; 110:1704-1717. [PMID: 37802043 PMCID: PMC10577076 DOI: 10.1016/j.ajhg.2023.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/01/2023] [Accepted: 09/01/2023] [Indexed: 10/08/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are known to perform important regulatory functions in lipid metabolism. Large-scale whole-genome sequencing (WGS) studies and new statistical methods for variant set tests now provide an opportunity to assess more associations between rare variants in lncRNA genes and complex traits across the genome. In this study, we used high-coverage WGS from 66,329 participants of diverse ancestries with measurement of blood lipids and lipoproteins (LDL-C, HDL-C, TC, and TG) in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program to investigate the role of lncRNAs in lipid variability. We aggregated rare variants for 165,375 lncRNA genes based on their genomic locations and conducted rare-variant aggregate association tests using the STAAR (variant-set test for association using annotation information) framework. We performed STAAR conditional analysis adjusting for common variants in known lipid GWAS loci and rare-coding variants in nearby protein-coding genes. Our analyses revealed 83 rare lncRNA variant sets significantly associated with blood lipid levels, all of which were located in known lipid GWAS loci (in a ±500-kb window of a Global Lipids Genetics Consortium index variant). Notably, 61 out of 83 signals (73%) were conditionally independent of common regulatory variation and rare protein-coding variation at the same loci. We replicated 34 out of 61 (56%) conditionally independent associations using the independent UK Biobank WGS data. Our results expand the genetic architecture of blood lipids to rare variants in lncRNAs.
Collapse
Affiliation(s)
- Yuxuan Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Margaret Sunitha Selvaraj
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Xihao Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Zilin Li
- School of Mathematics and Statistics, Northeast Normal University, Changchun, Jilin, China; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jacob A Holdcraft
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Donna K Arnett
- Provost Office, University of South Carolina, Columbia, SC, USA; Department of Epidemiology and Biostatistics, University of South Carolina Arnold School of Public Health, Columbia, SC, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Brian E Cade
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Jenna C Carlson
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA; Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - April P Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Paul S de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Susan K Dutcher
- The McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Patrick T Ellinor
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA; Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - James S Floyd
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA; Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Myriam Fornage
- Center for Human Genetics, University of Texas Health at Houston, Houston, TX, USA
| | - Barry I Freedman
- Department of Internal Medicine, Nephrology, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | | | | | - Richard A Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA; Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Nancy Heard-Costa
- Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Bertha Hildalgo
- Department of Epidemiology, University of Alabama at Birmingham School of Public Health, Birmingham, AL, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham School of Public Health, Birmingham, AL, USA
| | - Roby Joehanes
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA; Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Sharon Lr Kardia
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Tanika N Kelly
- Department of Medicine, Division of Nephrology, University of Illinois Chicago, Chicago, IL, USA
| | - Ryan Kim
- Psomagen, Inc. (formerly Macrogen USA), Rockville, MD, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Brian G Kral
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Daniel Levy
- Framingham Heart Study, Framingham, MA, USA; Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Changwei Li
- Tulane University Translational Science Institute, New Orleans, LA, USA; Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; Framingham Heart Study, Framingham, MA, USA
| | - Don Lloyd-Jone
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Ruth Jf Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; NNF Center for Basic Metabolic Research, University of Copenhagen, Cophenhagen, Denmark
| | - Michael C Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Lisa W Martin
- George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | - Rasika A Mathias
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ryan L Minster
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - May E Montasser
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Joanne M Murabito
- Framingham Heart Study, Framingham, MA, USA; Department of Medicine, Boston Medical Center, Boston University Chobanian and Avedisian School of Medicine, Boston, MA, USA
| | - Take Naseri
- Naseri & Associates Public Health Consultancy Firm and Family Health Clinic, Apia, Samoa; International Health Institute, School of Public Health, Brown University, Providence, RI, USA
| | - Jeffrey R O'Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Michael H Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA; Department of Epidemiology, University of Washington, Seattle, WA, USA; Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dabeeru C Rao
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO, USA
| | - Susan Redline
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | | | - Wayne H-H Sheu
- Institute of Molecular and Genomic Medicine, National Health Research Institute (NHRI), Miaoli County, Taiwan
| | - Jennifer A Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Albert Smith
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Hemant K Tiwari
- Department of Biostatistics, University of Alabama, Birmingham, AL, USA
| | - Michael Y Tsai
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | | | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Wei Zhao
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Xihong Lin
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Pradeep Natarajan
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
| |
Collapse
|
21
|
Jin X, Shi G. Cauchy combination methods for the detection of gene-environment interactions for rare variants related to quantitative phenotypes. Heredity (Edinb) 2023; 131:241-252. [PMID: 37481617 PMCID: PMC10539363 DOI: 10.1038/s41437-023-00640-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 07/09/2023] [Accepted: 07/12/2023] [Indexed: 07/24/2023] Open
Abstract
The characterization of gene-environment interactions (GEIs) can provide detailed insights into the biological mechanisms underlying complex diseases. Despite recent interest in GEIs for rare variants, published GEI tests are underpowered for an extremely small proportion of causal rare variants in a gene or a region. By extending the aggregated Cauchy association test (ACAT), we propose three GEI tests to address this issue: a Cauchy combination GEI test with fixed main effects (CCGEI-F), a Cauchy combination GEI test with random main effects (CCGEI-R), and an omnibus Cauchy combination GEI test (CCGEI-O). ACAT was applied to combine p values of single-variant GEI analyses to obtain CCGEI-F and CCGEI-R and p values of multiple GEI tests were combined in CCGEI-O. Through numerical simulations, for small numbers of causal variants, CCGEI-F, CCGEI-R and CCGEI-O provided approximately 5% higher power than the existing GEI tests INT-FIX and INT-RAN; however, they had slightly higher power than the existing GEI test TOW-GE. For large numbers of causal variants, although CCGEI-F and CCGEI-R exhibited comparable or slightly lower power values than the competing tests, the results were still satisfactory. Among all simulation conditions evaluated, CCGEI-O provided significantly higher power than that of competing GEI tests. We further applied our GEI tests in genome-wide analyses of systolic blood pressure or diastolic blood pressure to detect gene-body mass index (BMI) interactions, using whole-exome sequencing data from UK Biobank. At a suggestive significance level of 1.0 × 10-4, KCNC4, GAR1, FAM120AOS and NT5C3B showed interactions with BMI by our GEI tests.
Collapse
Affiliation(s)
- Xiaoqin Jin
- State Key Laboratory of Integrated Services Networks, Xidian University, 2 South Taibai Road, Xi'an, Shaanxi, 710071, China.
| | - Gang Shi
- State Key Laboratory of Integrated Services Networks, Xidian University, 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| |
Collapse
|
22
|
Liang X, Sun H. Weighted Selection Probability to Prioritize Susceptible Rare Variants in Multi-Phenotype Association Studies with Application to a Soybean Genetic Data Set. J Comput Biol 2023; 30:1075-1088. [PMID: 37871292 DOI: 10.1089/cmb.2022.0487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023] Open
Abstract
Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.
Collapse
Affiliation(s)
- Xianglong Liang
- Department of Statistic, Pusan National University, Busan, Korea
| | - Hokeun Sun
- Department of Statistic, Pusan National University, Busan, Korea
| |
Collapse
|
23
|
Mathys H, Peng Z, Boix CA, Victor MB, Leary N, Babu S, Abdelhady G, Jiang X, Ng AP, Ghafari K, Kunisky AK, Mantero J, Galani K, Lohia VN, Fortier GE, Lotfi Y, Ivey J, Brown HP, Patel PR, Chakraborty N, Beaudway JI, Imhoff EJ, Keeler CF, McChesney MM, Patel HH, Patel SP, Thai MT, Bennett DA, Kellis M, Tsai LH. Single-cell atlas reveals correlates of high cognitive function, dementia, and resilience to Alzheimer's disease pathology. Cell 2023; 186:4365-4385.e27. [PMID: 37774677 PMCID: PMC10601493 DOI: 10.1016/j.cell.2023.08.039] [Citation(s) in RCA: 79] [Impact Index Per Article: 79.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 05/20/2023] [Accepted: 08/29/2023] [Indexed: 10/01/2023]
Abstract
Alzheimer's disease (AD) is the most common cause of dementia worldwide, but the molecular and cellular mechanisms underlying cognitive impairment remain poorly understood. To address this, we generated a single-cell transcriptomic atlas of the aged human prefrontal cortex covering 2.3 million cells from postmortem human brain samples of 427 individuals with varying degrees of AD pathology and cognitive impairment. Our analyses identified AD-pathology-associated alterations shared between excitatory neuron subtypes, revealed a coordinated increase of the cohesin complex and DNA damage response factors in excitatory neurons and in oligodendrocytes, and uncovered genes and pathways associated with high cognitive function, dementia, and resilience to AD pathology. Furthermore, we identified selectively vulnerable somatostatin inhibitory neuron subtypes depleted in AD, discovered two distinct groups of inhibitory neurons that were more abundant in individuals with preserved high cognitive function late in life, and uncovered a link between inhibitory neurons and resilience to AD pathology.
Collapse
Affiliation(s)
- Hansruedi Mathys
- Picower Institute for Learning and Memory, MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA; University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA.
| | - Zhuyu Peng
- Picower Institute for Learning and Memory, MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA
| | - Carles A Boix
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Matheus B Victor
- Picower Institute for Learning and Memory, MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA
| | - Noelle Leary
- Picower Institute for Learning and Memory, MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA
| | - Sudhagar Babu
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Ghada Abdelhady
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Xueqiao Jiang
- Picower Institute for Learning and Memory, MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA
| | - Ayesha P Ng
- Picower Institute for Learning and Memory, MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA
| | - Kimia Ghafari
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Alexander K Kunisky
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Julio Mantero
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kyriaki Galani
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Vanshika N Lohia
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Gabrielle E Fortier
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Yasmine Lotfi
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Jason Ivey
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Hannah P Brown
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Pratham R Patel
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Nehal Chakraborty
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Jacob I Beaudway
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Elizabeth J Imhoff
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Cameron F Keeler
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Maren M McChesney
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Haishal H Patel
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Sahil P Patel
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Megan T Thai
- University of Pittsburgh Brain Institute and Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | | | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Li-Huei Tsai
- Picower Institute for Learning and Memory, MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
24
|
Jiang MZ, Gaynor SM, Li X, Van Buren E, Stilp A, Buth E, Wang FF, Manansala R, Gogarten SM, Li Z, Polfus LM, Salimi S, Bis JC, Pankratz N, Yanek LR, Durda P, Tracy RP, Rich SS, Rotter JI, Mitchell BD, Lewis JP, Psaty BM, Pratte KA, Silverman EK, Kaplan RC, Avery C, North K, Mathias RA, Faraday N, Lin H, Wang B, Carson AP, Norwood AF, Gibbs RA, Kooperberg C, Lundin J, Peters U, Dupuis J, Hou L, Fornage M, Benjamin EJ, Reiner AP, Bowler RP, Lin X, Auer PL, Raffield LM. Whole Genome Sequencing Based Analysis of Inflammation Biomarkers in the Trans-Omics for Precision Medicine (TOPMed) Consortium. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.10.555215. [PMID: 37745480 PMCID: PMC10515765 DOI: 10.1101/2023.09.10.555215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38,465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program. We identified 22 distinct single-variant associations across 6 traits - E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin - that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
Collapse
Affiliation(s)
- Min-Zhi Jiang
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA
| | - Sheila M. Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115, USA
- Regeneron Genetics Center, Tarrytown, NY, 10591, USA
| | - Xihao Li
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Eric Van Buren
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115, USA
| | - Adrienne Stilp
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - Erin Buth
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - Fei Fei Wang
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - Regina Manansala
- Centre for Health Economics Research & Modelling Infectious Diseases (CHERMID), Vaccine & Infectious Disease Institute (VAXINFECTIO) WHO Collaborating Centre, University of Antwerp, Antwerp, BE
| | | | - Zilin Li
- School of Mathematics and Statistics, Northeast Normal University, Changchun, Jilin, 130024, China
| | - Linda M. Polfus
- Department of Preventive Medicine, Center for Genetic Epidemiology, University of Southern California, Los Angeles, CA, 90033, USA
| | - Shabnam Salimi
- Department of Epidemiology and Public Health, Division of Gerontology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, 4333 Brooklyn Ave NE, Box 359458, Seattle, WA, 98195, USA
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, MN, 55455, USA
| | - Lisa R. Yanek
- Department of Medicine, General Internal Medicine, Johns Hopkins University School of Medicine, 1830 E Monument St Rm 8024, Baltimore, MD, 21287, USA
| | - Peter Durda
- Department of Pathology & Laboratory Medicine, University of Vermont Larner College of Medicine, 360 South Park Drive, Colchester, VT, 05446, USA
| | - Russell P. Tracy
- Department of Pathology & Laboratory Medicine, University of Vermont Larner College of Medicine, 360 South Park Drive, Colchester, VT, 05446, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia School of Medicine, 200 Jeanette Lancaster Way, Charlottesville, VA, 22903, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, 1124 W. Carson Street, Torrance, CA, 90502, USA
| | - Braxton D. Mitchell
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, 670 W. Baltimore St., Baltimore, MD, 21201, USA
| | - Joshua P. Lewis
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, 670 W. Baltimore St., Baltimore, MD, 21201, USA
| | - Bruce M. Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, 4333 Brooklyn Ave NE, Box 359458, Seattle, WA, 98195, USA
- Departments of Epidemiology and Health Systems and Population Health, University of Washington, 4333 Brooklyn Ave NE, Seattle, WA, 98101, USA
| | - Katherine A. Pratte
- Department of Medicine, Division of Pulmonary, Critical Care, and Sleep Medicine, National Jewish Health, Denver, CO, 80206, USA
| | - Edwin K. Silverman
- Department of Medicine, Channing Division of Network Medicine, Brigham and Women’s Hospital, 181 Longwood Avenue, Boston, MA, 02115, USA
| | - Robert C. Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Christy Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Kari North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Rasika A. Mathias
- Department of Medicine, Allergy and Clinical Immunology, Johns Hopkins University School of Medicine, 5501 Hopkins Bayview Cir JHAAC Room 3B53, Baltimore, MD, 21287, USA
| | - Nauder Faraday
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, 600 N Wolfe St, Baltimore, MD, 21287, USA
| | - Honghuang Lin
- Department of Medicine, University of Massachusetts Chan Medical School, 55 Lake Ave North, Worcester, MA, 01655, USA
| | - Biqi Wang
- Department of Medicine, University of Massachusetts Chan Medical School, 55 Lake Ave North, Worcester, MA, 01655, USA
| | - April P. Carson
- Department of Medicine, University of Mississippi Medical Center, 350 W. Woodrow Wilson Avenue, Suite 701, Jackson, MS, 39213, USA
| | - Arnita F. Norwood
- Department of Medicine, University of Mississippi Medical Center, 350 W. Woodrow Wilson Avenue, Suite 701, Jackson, MS, 39213, USA
| | - Richard A. Gibbs
- Department of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jessica Lundin
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Josée Dupuis
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, H3A 1G1, Canada
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, 02118, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Emelia J. Benjamin
- Department of Medicine, Cardiovascular Medicine, Boston Medical Center, Boston University Chobanian and Avedisian School of Medicine, Boston, MA, 02118, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, 02118, USA
- Boston University and National Heart, Lung, and Blood Institute’s Framingham Heart Study, Framingham, MA, 01702, USA
| | - Alexander P. Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, 98105, USA
| | - Russell P. Bowler
- Department of Medicine, Division of Pulmonary, Critical Care, and Sleep Medicine, National Jewish Health, Denver, CO, 80206, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115, USA
| | - Paul L. Auer
- Division of Biostatistics, Institute for Health and Equity, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA
| | | |
Collapse
|
25
|
Arslan A. Pathogenic variants of human GABRA1 gene associated with epilepsy: A computational approach. Heliyon 2023; 9:e20218. [PMID: 37809401 PMCID: PMC10559982 DOI: 10.1016/j.heliyon.2023.e20218] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/17/2023] [Accepted: 09/13/2023] [Indexed: 10/10/2023] Open
Abstract
Critical for brain development, neurodevelopmental and network disorders, the GABRA1 gene encodes for the α1 subunit, an abundantly and developmentally expressed subunit of heteropentameric gamma-aminobutyric acid A receptors (GABAARs) mediating primary inhibition in the brain. Mutations of the GABAAR subunit genes including GABRA1 gene are associated with epilepsy, a group of syndromes, characterized by unprovoked seizures and diagnosed by integrative approach, that involves genetic testing. Despite the diagnostic use of genetic testing, a large fraction of the GABAAR subunit gene variants including the variants of GABRA1 gene is not known in terms of their molecular consequence, a challenge for precision and personalized medicine. Addressing this, one hundred thirty-seven GABRA1 gene variants of unknown clinical significance have been extracted from the ClinVar database and computationally analyzed for pathogenicity. Eight variants (L49H, P59L, W97R, D99G, G152S, V270G, T294R, P305L) are predicted as pathogenic and mapped to the α1 subunit's extracellular domain (ECD), transmembrane domains (TMDs) and extracellular linker. This is followed by the integration with relevant data for cellular pathology and severity of the epilepsy syndromes retrieved from the literature. Our results suggest that the pathogenic variants in the ECD of GABRA1 (L49H, P59L, W97R, D99G, G152S) will probably manifest decreased surface expression and reduced current with mild epilepsy phenotypes while V270G, T294R in the TMDs and P305L in the linker between the second and the third TMDs will likely cause reduced cell current with severe epilepsy phenotypes. The results presented in this study provides insights for clinical genetics and wet lab experimentation.
Collapse
Affiliation(s)
- Ayla Arslan
- Department of Molecular Biology and Genetics, Faculty of Engineering and Natural Sciences, Üsküdar University, Istanbul, Turkey
| |
Collapse
|
26
|
Jiang Z, Zhang H, Ahearn TU, Garcia-Closas M, Chatterjee N, Zhu H, Zhan X, Zhao N. The sequence kernel association test for multicategorical outcomes. Genet Epidemiol 2023; 47:432-449. [PMID: 37078108 DOI: 10.1002/gepi.22527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 03/29/2023] [Accepted: 03/30/2023] [Indexed: 04/21/2023]
Abstract
Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set-based analysis methods for genome-wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set-based association analysis method, sequence kernel association test (SKAT)-MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT-MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT-MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER- breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N = 127 , 127 $N=127,127$ ) with SKAT-MC, and identified 21 significant genes in the genome. Consequently, SKAT-MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT-MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.
Collapse
Affiliation(s)
- Zhiwen Jiang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Thomas U Ahearn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Xiang Zhan
- Department of Biostatistics, Peking University, Beijing, China
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
27
|
McCaw ZR, O'Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. An allelic-series rare-variant association test for candidate-gene discovery. Am J Hum Genet 2023; 110:1330-1342. [PMID: 37494930 PMCID: PMC10432147 DOI: 10.1016/j.ajhg.2023.07.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 06/30/2023] [Accepted: 07/01/2023] [Indexed: 07/28/2023] Open
Abstract
Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Francesco Paolo Casale
- Institute of AI for Health, Helmholtz Munich, Neuherberg, Germany; Helmholtz Pioneer Campus, Helmholtz Munich, Neuherberg, Germany; School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | | | | |
Collapse
|
28
|
Chumakova OS, Baulina NM. Advanced searching for hypertrophic cardiomyopathy heritability in real practice tomorrow. Front Cardiovasc Med 2023; 10:1236539. [PMID: 37583586 PMCID: PMC10425241 DOI: 10.3389/fcvm.2023.1236539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 07/17/2023] [Indexed: 08/17/2023] Open
Abstract
Hypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disease associated with morbidity and mortality at any age. As studies in recent decades have shown, the genetic architecture of HCM is quite complex both in the entire population and in each patient. In the rapidly advancing era of gene therapy, we have to provide a detailed molecular diagnosis to our patients to give them the chance for better and more personalized treatment. In addition to emphasizing the importance of genetic testing in routine practice, this review aims to discuss the possibility to go a step further and create an expanded genetic panel that contains not only variants in core genes but also new candidate genes, including those located in deep intron regions, as well as structural variations. It also highlights the benefits of calculating polygenic risk scores based on a combination of rare and common genetic variants for each patient and of using non-genetic HCM markers, such as microRNAs that can enhance stratification of risk for HCM in unselected populations alongside rare genetic variants and clinical factors. While this review is focusing on HCM, the discussed issues are relevant to other cardiomyopathies.
Collapse
Affiliation(s)
- Olga S. Chumakova
- Laboratory of Functional Genomics of Cardiovascular Diseases, National Medical Research Centre of Cardiology Named After E.I. Chazov, Moscow, Russia
| | | |
Collapse
|
29
|
Wang Y, Selvaraj MS, Li X, Li Z, Holdcraft JA, Arnett DK, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Cade BE, Carlson JC, Carson AP, Chen YDI, Curran JE, de Vries PS, Dutcher SK, Ellinor PT, Floyd JS, Fornage M, Freedman BI, Gabriel S, Germer S, Gibbs RA, Guo X, He J, Heard-Costa N, Hildalgo B, Hou L, Irvin MR, Joehanes R, Kaplan RC, Kardia SLR, Kelly TN, Kim R, Kooperberg C, Kral BG, Levy D, Li C, Liu C, Lloyd-Jone D, Loos RJF, Mahaney MC, Martin LW, Mathias RA, Minster RL, Mitchell BD, Montasser ME, Morrison AC, Murabito JM, Naseri T, O’Connell JR, Palmer ND, Preuss MH, Psaty BM, Raffield LM, Rao DC, Redline S, Reiner AP, Rich SS, Ruepena MS, Sheu WHH, Smith JA, Smith A, Tiwari HK, Tsai MY, Viaud-Martinez KA, Wang Z, Yanek LR, Zhao W, Rotter JI, Lin X, Natarajan P, Peloso GM. Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed Whole Genome Sequencing Study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.28.23291966. [PMID: 37425772 PMCID: PMC10327287 DOI: 10.1101/2023.06.28.23291966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Long non-coding RNAs (lncRNAs) are known to perform important regulatory functions. Large-scale whole genome sequencing (WGS) studies and new statistical methods for variant set tests now provide an opportunity to assess the associations between rare variants in lncRNA genes and complex traits across the genome. In this study, we used high-coverage WGS from 66,329 participants of diverse ancestries with blood lipid levels (LDL-C, HDL-C, TC, and TG) in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program to investigate the role of lncRNAs in lipid variability. We aggregated rare variants for 165,375 lncRNA genes based on their genomic locations and conducted rare variant aggregate association tests using the STAAR (variant-Set Test for Association using Annotation infoRmation) framework. We performed STAAR conditional analysis adjusting for common variants in known lipid GWAS loci and rare coding variants in nearby protein coding genes. Our analyses revealed 83 rare lncRNA variant sets significantly associated with blood lipid levels, all of which were located in known lipid GWAS loci (in a ±500 kb window of a Global Lipids Genetics Consortium index variant). Notably, 61 out of 83 signals (73%) were conditionally independent of common regulatory variations and rare protein coding variations at the same loci. We replicated 34 out of 61 (56%) conditionally independent associations using the independent UK Biobank WGS data. Our results expand the genetic architecture of blood lipids to rare variants in lncRNA, implicating new therapeutic opportunities.
Collapse
Affiliation(s)
- Yuxuan Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Margaret Sunitha Selvaraj
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
- Center for Computational Biology & Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Jacob A. Holdcraft
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Donna K. Arnett
- Provost Office, University of South Carolina, Columbia, SC, USA
- Department of Epidemiology and Biostatistics, University of South Carolina Arnold School of Public Health, Columbia, SC, USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Donald W. Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Brian E. Cade
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Jenna C. Carlson
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - April P. Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Joanne E. Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Paul S. de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Susan K. Dutcher
- The McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Patrick T. Ellinor
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - James S. Floyd
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Myriam Fornage
- Center for Human Genetics, University of Texas Health at Houston, Houston, TX, USA
| | - Barry I. Freedman
- Department of Internal Medicine, Nephrology, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | | | | | - Richard A. Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Nancy Heard-Costa
- Framingham Heart Study, Framingham, MA, USA
- Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Bertha Hildalgo
- Department of Epidemiology, University of Alabama at Birmingham School of Public Health, Birmingham, AL, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Marguerite R. Irvin
- Department of Epidemiology, University of Alabama at Birmingham School of Public Health, Birmingham, AL, USA
| | - Roby Joehanes
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert C. Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Sharon LR. Kardia
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Tanika N. Kelly
- Department of Medicine, Division of Nephrology, University of Illinois Chicago, Chicago, IL, USA
| | - Ryan Kim
- Psomagen, Inc. (formerly Macrogen USA), Rockville, MD, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Brian G. Kral
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Daniel Levy
- Framingham Heart Study, Framingham, MA, USA
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Changwei Li
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Don Lloyd-Jone
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Ruth JF. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- NNF Center for Basic Metabolic Research, University of Copenhagen, Cophenhagen, Denmark
| | - Michael C. Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Lisa W. Martin
- George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | - Rasika A. Mathias
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ryan L. Minster
- Department of Human Genetics and Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Braxton D. Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - May E. Montasser
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C. Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Joanne M. Murabito
- Framingham Heart Study, Framingham, MA, USA
- Department of Medicine, Boston Medical Center, Boston University Chobanian and Avedisian School of Medicine, Boston, MA, USA
| | | | - Jeffrey R. O’Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D. Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Michael H. Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bruce M. Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dabeeru C. Rao
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO, USA
| | - Susan Redline
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | | | | | - Jennifer A. Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Albert Smith
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Hemant K. Tiwari
- Department of Biostatistics, University of Alabama, Birmingham, AL, USA
| | - Michael Y. Tsai
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | | | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lisa R. Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Wei Zhao
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | | | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Xihong Lin
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Pradeep Natarajan
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Gina M. Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| |
Collapse
|
30
|
Feofanova EV, Brown MR, Alkis T, Manuel AM, Li X, Tahir UA, Li Z, Mendez KM, Kelly RS, Qi Q, Chen H, Larson MG, Lemaitre RN, Morrison AC, Grieser C, Wong KE, Gerszten RE, Zhao Z, Lasky-Su J, Yu B. Whole-Genome Sequencing Analysis of Human Metabolome in Multi-Ethnic Populations. Nat Commun 2023; 14:3111. [PMID: 37253714 PMCID: PMC10229598 DOI: 10.1038/s41467-023-38800-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/16/2023] [Indexed: 06/01/2023] Open
Abstract
Circulating metabolite levels may reflect the state of the human organism in health and disease, however, the genetic architecture of metabolites is not fully understood. We have performed a whole-genome sequencing association analysis of both common and rare variants in up to 11,840 multi-ethnic participants from five studies with up to 1666 circulating metabolites. We have discovered 1985 novel variant-metabolite associations, and validated 761 locus-metabolite associations reported previously. Seventy-nine novel variant-metabolite associations have been replicated, including three genetic loci located on the X chromosome that have demonstrated its involvement in metabolic regulation. Gene-based analysis have provided further support for seven metabolite-replicated loci pairs and their biologically plausible genes. Among those novel replicated variant-metabolite pairs, follow-up analyses have revealed that 26 metabolites have colocalized with 21 tissues, seven metabolite-disease outcome associations have been putatively causal, and 7 metabolites might be regulated by plasma protein levels. Our results have depicted the genetic contribution to circulating metabolite levels, providing additional insights into understanding human disease.
Collapse
Affiliation(s)
- Elena V Feofanova
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, TX, USA
| | - Michael R Brown
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, TX, USA
| | - Taryn Alkis
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, TX, USA
| | - Astrid M Manuel
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Usman A Tahir
- Division of Cardiovascular Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Kevin M Mendez
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Retina Service, Massachusetts Eye and Ear, Harvard Medical School, 243 Charles Street, Boston, MA, USA
| | - Rachel S Kelly
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Qibin Qi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Han Chen
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Martin G Larson
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Rozenn N Lemaitre
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Alanna C Morrison
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, TX, USA
| | | | | | - Robert E Gerszten
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Zhongming Zhao
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Bing Yu
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, TX, USA.
| |
Collapse
|
31
|
Wang N, Yu B, Jun G, Qi Q, Durazo-Arvizu RA, Lindstrom S, Morrison AC, Kaplan RC, Boerwinkle E, Chen H. StocSum: stochastic summary statistics for whole genome sequencing studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.06.535886. [PMID: 37066281 PMCID: PMC10104122 DOI: 10.1101/2023.04.06.535886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Genomic summary statistics, usually defined as single-variant test results from genome-wide association studies, have been widely used to advance the genetics field in a wide range of applications. Applications that involve multiple genetic variants also require their correlations or linkage disequilibrium (LD) information, often obtained from an external reference panel. In practice, it is usually difficult to find suitable external reference panels that represent the LD structure for underrepresented and admixed populations, or rare genetic variants from whole genome sequencing (WGS) studies, limiting the scope of applications for genomic summary statistics. Here we introduce StocSum, a novel reference-panel-free statistical framework for generating, managing, and analyzing stochastic summary statistics using random vectors. We develop various downstream applications using StocSum including single-variant tests, conditional association tests, gene-environment interaction tests, variant set tests, as well as meta-analysis and LD score regression tools. We demonstrate the accuracy and computational efficiency of StocSum using two cohorts from the Trans-Omics for Precision Medicine Program. StocSum will facilitate sharing and utilization of genomic summary statistics from WGS studies, especially for underrepresented and admixed populations.
Collapse
Affiliation(s)
- Nannan Wang
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Bing Yu
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Goo Jun
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Qibin Qi
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Ramon A. Durazo-Arvizu
- The Saban Research Institute, Children’s Hospital Los Angeles, Los Angeles, California
- Department of Pediatrics, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Sara Lindstrom
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology, School of Public Health, University of Washington, 3980 15th Ave NE, Seattle, WA, USA
| | - Alanna C. Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Robert C. Kaplan
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
32
|
Jurgens SJ, Pirruccello JP, Choi SH, Morrill VN, Chaffin M, Lubitz SA, Lunetta KL, Ellinor PT. Adjusting for common variant polygenic scores improves yield in rare variant association analyses. Nat Genet 2023; 55:544-548. [PMID: 36959364 PMCID: PMC11078202 DOI: 10.1038/s41588-023-01342-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 02/22/2023] [Indexed: 03/25/2023]
Abstract
With the emergence of large-scale sequencing data, methods for improving power in rare variant association tests are needed. Here we show that adjusting for common variant polygenic scores improves yield in gene-based rare variant association tests across 65 quantitative traits in the UK Biobank (up to 20% increase at α = 2.6 × 10-6), without marked increases in false-positive rates or genomic inflation. Benefits were seen for various models, with the largest improvements seen for efficient sparse mixed-effects models. Our results illustrate how polygenic score adjustment can efficiently improve power in rare variant association discovery.
Collapse
Affiliation(s)
- Sean J Jurgens
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Department of Experimental Cardiology, Heart Centre, Amsterdam UMC location University of Amsterdam, Amsterdam, the Netherlands
- Amsterdam Cardiovascular Sciences, Heart Failure & Arrhythmias, Amsterdam, the Netherlands
| | - James P Pirruccello
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Cardiology, University of California, San Francisco, San Francisco, CA, USA
| | - Seung Hoan Choi
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Valerie N Morrill
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark Chaffin
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven A Lubitz
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, USA
| | - Kathryn L Lunetta
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- NHLBI and Boston University's Framingham Heart Study, Framingham, MA, USA
| | - Patrick T Ellinor
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
33
|
Zhang J, Zhao H. eQTL Studies: from Bulk Tissues to Single Cells. ARXIV 2023:arXiv:2302.11662v1. [PMID: 36866231 PMCID: PMC9980190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
An expression quantitative trait locus (eQTL) is a chromosomal region where genetic variants are associated with the expression levels of certain genes that can be both nearby or distant. The identifications of eQTLs for different tissues, cell types, and contexts have led to better understanding of the dynamic regulations of gene expressions and implications of functional genes and variants for complex traits and diseases. Although most eQTL studies to date have been performed on data collected from bulk tissues, recent studies have demonstrated the importance of cell-type-specific and context-dependent gene regulations in biological processes and disease mechanisms. In this review, we discuss statistical methods that have been developed to enable the detections of cell-type-specific and context-dependent eQTLs from bulk tissues, purified cell types, and single cells. We also discuss the limitations of the current methods and future research opportunities.
Collapse
Affiliation(s)
- Jingfei Zhang
- Information Systems and Operations Management, Emory University
| | - Hongyu Zhao
- Department of Biostatistics, Yale University
| |
Collapse
|
34
|
Agarwal A, Zhao F, Jiang Y, Chen L. TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions. Bioinformatics 2023; 39:btad060. [PMID: 36707993 PMCID: PMC9900211 DOI: 10.1093/bioinformatics/btad060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 01/20/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Small insertion and deletion (sindel) of human genome has an important implication for human disease. One important mechanism for non-coding sindel (nc-sindel) to have an impact on human diseases and phenotypes is through the regulation of gene expression. Nevertheless, current sequencing experiments may lack statistical power and resolution to pinpoint the functional sindel due to lower minor allele frequency or small effect size. As an alternative strategy, a supervised machine learning method can identify the otherwise masked functional sindels by predicting their regulatory potential directly. However, computational methods for annotating and predicting the regulatory sindels, especially in the non-coding regions, are underdeveloped. RESULTS By leveraging labeled nc-sindels identified by cis-expression quantitative trait loci analyses across 44 tissues in Genotype-Tissue Expression (GTEx), and a compilation of both generic functional annotations and large-scale epigenomic profiles, we develop TIssue-specific Variant Annotation for Non-coding indel (TIVAN-indel), which is a supervised computational framework for predicting non-coding regulatory sindels. As a result, we demonstrate that TIVAN-indel achieves the best prediction performance in both with-tissue prediction and cross-tissue prediction. As an independent evaluation, we train TIVAN-indel from the 'Whole Blood' tissue in GTEx and test the model using 15 immune cell types from an independent study named Database of Immune Cell Expression. Lastly, we perform an enrichment analysis for both true and predicted sindels in key regulatory regions such as chromatin interactions, open chromatin regions and histone modification sites, and find biologically meaningful enrichment patterns. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/TIVAN-indel. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aman Agarwal
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Fengdi Zhao
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA
| | - Yuchao Jiang
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27516, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA
| |
Collapse
|
35
|
Li X, Quick C, Zhou H, Gaynor SM, Liu Y, Chen H, Selvaraj MS, Sun R, Dey R, Arnett DK, Bielak LF, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Correa A, Cupples LA, Curran JE, de Vries PS, Duggirala R, Freedman BI, Göring HHH, Guo X, Haessler J, Kalyani RR, Kooperberg C, Kral BG, Lange LA, Manichaikul A, Martin LW, McGarvey ST, Mitchell BD, Montasser ME, Morrison AC, Naseri T, O'Connell JR, Palmer ND, Peyser PA, Psaty BM, Raffield LM, Redline S, Reiner AP, Reupena MS, Rice KM, Rich SS, Sitlani CM, Smith JA, Taylor KD, Vasan RS, Willer CJ, Wilson JG, Yanek LR, Zhao W, Rotter JI, Natarajan P, Peloso GM, Li Z, Lin X. Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies. Nat Genet 2023; 55:154-164. [PMID: 36564505 PMCID: PMC10084891 DOI: 10.1038/s41588-022-01225-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 10/13/2022] [Indexed: 12/24/2022]
Abstract
Meta-analysis of whole genome sequencing/whole exome sequencing (WGS/WES) studies provides an attractive solution to the problem of collecting large sample sizes for discovering rare variants associated with complex phenotypes. Existing rare variant meta-analysis approaches are not scalable to biobank-scale WGS data. Here we present MetaSTAAR, a powerful and resource-efficient rare variant meta-analysis framework for large-scale WGS/WES studies. MetaSTAAR accounts for relatedness and population structure, can analyze both quantitative and dichotomous traits and boosts the power of rare variant tests by incorporating multiple variant functional annotations. Through meta-analysis of four lipid traits in 30,138 ancestrally diverse samples from 14 studies of the Trans Omics for Precision Medicine (TOPMed) Program, we show that MetaSTAAR performs rare variant meta-analysis at scale and produces results comparable to using pooled data. Additionally, we identified several conditionally significant rare variant associations with lipid traits. We further demonstrate that MetaSTAAR is scalable to biobank-scale cohorts through meta-analysis of TOPMed WGS data and UK Biobank WES data of ~200,000 samples.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Corbin Quick
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sheila M Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yaowu Liu
- School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Margaret Sunitha Selvaraj
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rounak Dey
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Donna K Arnett
- University of Kentucky, College of Public Health, Lexington, KY, USA
| | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E Cade
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Adolfo Correa
- Jackson Heart Study, Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Paul S de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ravindranath Duggirala
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Barry I Freedman
- Department of Internal Medicine, Nephrology, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Harald H H Göring
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Rita R Kalyani
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Brian G Kral
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Leslie A Lange
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Lisa W Martin
- Division of Cardiology, George Washington School of Medicine and Health Sciences, Washington, DC, USA
| | - Stephen T McGarvey
- Department of Epidemiology, International Health Institute, Department of Anthropology, Brown University, Providence, RI, USA
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Baltimore VA Medical Center, Baltimore, MD, USA
| | - May E Montasser
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | - Jeffrey R O'Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Alexander P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
| | | | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Colleen M Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Ramachandran S Vasan
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Cristen J Willer
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - James G Wilson
- Division of Cardiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Pradeep Natarajan
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA.
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Statistics, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
36
|
Zhu W, Chen HH, Petty AS, Petty LE, Polikowsky HG, Gamazon ER, Below JE, Highland HM. IMMerge: merging imputation data at scale. Bioinformatics 2023; 39:btac750. [PMID: 36413071 PMCID: PMC9805583 DOI: 10.1093/bioinformatics/btac750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 11/04/2022] [Accepted: 11/21/2022] [Indexed: 11/23/2022] Open
Abstract
SUMMARY Genomic data are often processed in batches and analyzed together to save time. However, it is challenging to combine multiple large VCFs and properly handle imputation quality and missing variants due to the limitations of available tools. To address these concerns, we developed IMMerge, a Python-based tool that takes advantage of multiprocessing to reduce running time. For the first time in a publicly available tool, imputation quality scores are correctly combined with Fisher's z transformation. AVAILABILITY AND IMPLEMENTATION IMMerge is an open-source project under MIT license. Source code and user manual are available at https://github.com/belowlab/IMMerge.
Collapse
Affiliation(s)
- Wanying Zhu
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Hung-Hsin Chen
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Alexander S Petty
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Lauren E Petty
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Hannah G Polikowsky
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Eric R Gamazon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jennifer E Below
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| |
Collapse
|