1
|
Li X, Chen H, Selvaraj MS, Van Buren E, Zhou H, Wang Y, Sun R, McCaw ZR, Yu Z, Jiang MZ, DiCorpo D, Gaynor SM, Dey R, Arnett DK, Benjamin EJ, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Carson AP, Carlson JC, Chami N, Chen YDI, Curran JE, de Vries PS, Fornage M, Franceschini N, Freedman BI, Gu C, Heard-Costa NL, He J, Hou L, Hung YJ, Irvin MR, Kaplan RC, Kardia SLR, Kelly TN, Konigsberg I, Kooperberg C, Kral BG, Li C, Li Y, Lin H, Liu CT, Loos RJF, Mahaney MC, Martin LW, Mathias RA, Mitchell BD, Montasser ME, Morrison AC, Naseri T, North KE, Palmer ND, Peyser PA, Psaty BM, Redline S, Reiner AP, Rich SS, Sitlani CM, Smith JA, Taylor KD, Tiwari HK, Vasan RS, Viali S, Wang Z, Wessel J, Yanek LR, Yu B, Dupuis J, Meigs JB, Auer PL, Raffield LM, Manning AK, Rice KM, Rotter JI, Peloso GM, Natarajan P, Li Z, Liu Z, Lin X. A statistical framework for multi-trait rare variant analysis in large-scale whole-genome sequencing studies. NATURE COMPUTATIONAL SCIENCE 2025:10.1038/s43588-024-00764-8. [PMID: 39920506 DOI: 10.1038/s43588-024-00764-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 12/20/2024] [Indexed: 02/09/2025]
Abstract
Large-scale whole-genome sequencing (WGS) studies have improved our understanding of the contributions of coding and noncoding rare variants to complex human traits. Leveraging association effect sizes across multiple traits in WGS rare variant association analysis can improve statistical power over single-trait analysis, and also detect pleiotropic genes and regions. Existing multi-trait methods have limited ability to perform rare variant analysis of large-scale WGS data. We propose MultiSTAAR, a statistical framework and computationally scalable analytical pipeline for functionally informed multi-trait rare variant analysis in large-scale WGS studies. MultiSTAAR accounts for relatedness, population structure and correlation among phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. We applied MultiSTAAR to jointly analyze three lipid traits in 61,838 multi-ethnic samples from the Trans-Omics for Precision Medicine (TOPMed) Program. We discovered and replicated new associations with lipid traits missed by single-trait analysis.
Collapse
Grants
- R35-CA197449 U.S. Department of Health & Human Services | NIH | National Cancer Institute (NCI)
- U19-CA203654 U.S. Department of Health & Human Services | NIH | National Cancer Institute (NCI)
- U01-HG012064 U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- U01-HG009088 U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- R00HG012956-02 U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- NHLBI TOPMed Fellowship 75N92021F00229 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- NHLBI TOPMed Fellowship 75N92021F00229 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- U01-HL072524 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01-HL104135-04S1 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- U01-HL054472 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- U01-HL054473 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- U01-HL054495 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- U01-HL054509 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01-HL055673-18S1 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01-HL153805 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R03-HL154284 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- HL105756 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- 1R35-HL135818 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01-HL113338 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- HL046389 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- U01-HL137162 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01-HL142711 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01-HL127564 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01-HL142711 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01-HL127564 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- HL151855 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- HHSN268201700001I NHLBI NIH HHS
- HHSN268201700002I NHLBI NIH HHS
- HHSN268201700003I NHLBI NIH HHS
- HHSN268201700005I NHLBI NIH HHS
- HHSN268201700004I NHLBI NIH HHS
- R01-MD012765 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- R01-DK117445 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- HHSN268201600018C NHLBI NIH HHS
- HHSN268201600001C NHLBI NIH HHS
- HHSN268201600002C NHLBI NIH HHS
- HHSN268201600003C NHLBI NIH HHS
- HHSN268201600004C NHLBI NIH HHS
- HHSN268201800001I U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- 75N92020D00001 NHLBI NIH HHS
- HHSN268201500003I NHLBI NIH HHS
- N01-HC-95159 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- 75N92020D00005 NHLBI NIH HHS
- N01-HC-95160 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- 75N92020D00002 NHLBI NIH HHS
- N01-HC-95161 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- 75N92020D00003 NHLBI NIH HHS
- N01-HC-95162 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- 75N92020D00006 NHLBI NIH HHS
- N01-HC-95163 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- 75N92020D00004 NHLBI NIH HHS
- N01-HC-95164 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- 75N92020D00007 NHLBI NIH HHS
- N01-HC-95165 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- N01-HC-95166 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- N01-HC-95167 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- N01-HC-95168 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- N01-HC-95169 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- UL1-TR-000040 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- UL1-TR-001079 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- UL1-TR-001420 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- UL1-TR001881 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- DK063491 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- R01-HL071051 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- R01-HL071205 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- R01-HL071250 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- R01-HL071251 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- R01-HL071258 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- R01-HL071259 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- UL1-RR033176 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- 1R01AG086379-01 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- DK078616 U.S. Department of Health & Human Services | National Institutes of Health (NIH)
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Margaret Sunitha Selvaraj
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Eric Van Buren
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yuxuan Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zachary R McCaw
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Zhi Yu
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Clinical and Translational Epidemiology Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Min-Zhi Jiang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Biostatistics, The Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Daniel DiCorpo
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Sheila M Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Rounak Dey
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Donna K Arnett
- Provost Office, University of South Carolina, Columbia, SC, USA
| | - Emelia J Benjamin
- Section of Cardiovascular Medicine, Boston Medical Center, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E Cade
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - April P Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Jenna C Carlson
- Department of Human Genetics and Department of Biostatistics and Health Data Science, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nathalie Chami
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Paul S de Vries
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Barry I Freedman
- Department of Internal Medicine, Nephrology, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Charles Gu
- Division of Biology & Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Nancy L Heard-Costa
- Framingham Heart Study, Framingham, MA, USA
- Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Translational Science Institute, Tulane University, New Orleans, LA, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Yi-Jen Hung
- Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Marguerite R Irvin
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Tanika N Kelly
- Department of Medicine, Division of Nephrology, University of Illinois Chicago, Chicago, IL, USA
| | - Iain Konigsberg
- Department of Biomedical Informatics, University of Colorado, Aurora, CO, USA
| | - Charles Kooperberg
- Department of Medicine, Division of Nephrology, University of Illinois Chicago, Chicago, IL, USA
| | - Brian G Kral
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Changwei Li
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Translational Science Institute, Tulane University, New Orleans, LA, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Honghuang Lin
- Department of Medicine, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Michael C Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Lisa W Martin
- School of Medicine and Health Sciences, George Washington University, Washington, DC, USA
| | - Rasika A Mathias
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - May E Montasser
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Take Naseri
- Naseri & Associates Public Health Consultancy Firm and Family Health Clinic, Apia, Samoa
- Department of Epidemiology, Brown University, Providence, RI, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Alexander P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Department of Genome Sciences, University of Virginia, Charlottesville, VA, USA
| | - Colleen M Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Hemant K Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ramachandran S Vasan
- Framingham Heart Study, Framingham, MA, USA
- Department of Quantitative and Qualitative Health Sciences, UT Health San Antonio School of Public Health, San Antonia, TX, USA
| | - Satupa'itea Viali
- School of Medicine, National University of Samoa, Apia, Samoa
- Department of Chronic Disease Epidemiology, Yale University School of Public Health, New Haven, CT, USA
- Oceania University of Medicine, Apia, Samoa
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jennifer Wessel
- Department of Epidemiology, Fairbanks School of Public Health, Indiana University, Indianapolis, IN, USA
- Diabetes Translational Research Center, Indiana University, Indianapolis, IN, USA
| | - Lisa R Yanek
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Bing Yu
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - James B Meigs
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Paul L Auer
- Division of Biostatistics, Data Science Institute, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alisa K Manning
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Metabolism Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA, USA
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Pradeep Natarajan
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Zhonghua Liu
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA.
| | - Xihong Lin
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Statistics, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
2
|
Xu H, Ma Y, Xu LL, Li Y, Liu Y, Li Y, Zhou XJ, Zhou W, Lee S, Zhang P, Yue W, Bi W. SPA GRM: effectively controlling for sample relatedness in large-scale genome-wide association studies of longitudinal traits. Nat Commun 2025; 16:1413. [PMID: 39915470 DOI: 10.1038/s41467-025-56669-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 01/27/2025] [Indexed: 02/09/2025] Open
Abstract
Sample relatedness is a major confounder in genome-wide association studies (GWAS), potentially leading to inflated type I error rates if not appropriately controlled. A common strategy is to incorporate a random effect related to genetic relatedness matrix (GRM) into regression models. However, this approach is challenging for large-scale GWAS of complex traits, such as longitudinal traits. Here we propose a scalable and accurate analysis framework, SPAGRM, which controls for sample relatedness via a precise approximation of the joint distribution of genotypes. SPAGRM can utilize GRM-free models and thus is applicable to various trait types and statistical methods, including linear mixed models and generalized estimation equations for longitudinal traits. A hybrid strategy incorporating saddlepoint approximation greatly increases the accuracy to analyze low-frequency and rare genetic variants, especially in unbalanced phenotypic distributions. We also introduce SPAGRM(CCT) to aggregate the results following different models via Cauchy combination test. Extensive simulations and real data analyses demonstrated that SPAGRM maintains well-controlled type I error rates and SPAGRM(CCT) can serve as a broadly effective method. Applying SPAGRM to 79 longitudinal traits extracted from UK Biobank primary care data, we identified 7,463 genetic loci, making a pioneering attempt to conduct GWAS for these traits as longitudinal traits.
Collapse
Affiliation(s)
- He Xu
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Yuzhuo Ma
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Lin-Lin Xu
- Renal Division, Peking University First Hospital; Peking University Institute of Nephrology, Beijing, China
| | - Yin Li
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
- Key Laboratory for Neuroscience, Ministry of Education/National Health and Family Planning Commission, Peking University, Beijing, China
| | - Yufei Liu
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Ying Li
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Xu-Jie Zhou
- Renal Division, Peking University First Hospital; Peking University Institute of Nephrology, Beijing, China
| | - Wei Zhou
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea
| | - Peipei Zhang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.
- Key Laboratory for Neuroscience, Ministry of Education/National Health and Family Planning Commission, Peking University, Beijing, China.
| | - Weihua Yue
- Peking University Sixth Hospital, Peking University Institute of Mental Health, NHC Key Laboratory of Mental Health (Peking University), National Clinical Research Center for Mental Disorders (Peking University Sixth Hospital), Beijing, 100191, China.
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, 100871, China.
- Chinese Institute for Brain Research, Beijing, 102206, China.
| | - Wenjian Bi
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China.
- Center for Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China.
- Medicine Innovation Center for Fundamental Research on Major Immunology-related Diseases, Peking University, Beijing, China.
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China.
| |
Collapse
|
3
|
Stoneman HR, Price AM, Trout NS, Lamont R, Tifour S, Pozdeyev N, Crooks K, Lin M, Rafaels N, Gignoux CR, Marker KM, Hendricks AE. Characterizing substructure via mixture modeling in large-scale genetic summary statistics. Am J Hum Genet 2025; 112:235-253. [PMID: 39824191 DOI: 10.1016/j.ajhg.2024.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 12/09/2024] [Accepted: 12/09/2024] [Indexed: 01/20/2025] Open
Abstract
Genetic summary data are broadly accessible and highly useful, including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into summary data, such as allele frequencies, masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted-for substructure limits summary data usability, especially for understudied or admixed populations. There is a need for methods to enable the harmonization of summary data where the underlying substructure is matched between datasets. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to enable the harmonization of genetic summary data by estimating and adjusting for substructure. In extensive simulations and application to public data, we show that Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and scans for potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse, publicly available summary data, resulting in improved and more equitable research.
Collapse
Affiliation(s)
- Hayley R Stoneman
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Adelle M Price
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikole Scribner Trout
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Riley Lamont
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Souha Tifour
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikita Pozdeyev
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristy Crooks
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Pathology, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Meng Lin
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Nicholas Rafaels
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christopher R Gignoux
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Katie M Marker
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Audrey E Hendricks
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
| |
Collapse
|
4
|
Wang H, Li X, Li T, Li Z, Sham PC, Zhang YD. MAAT: a new nonparametric Bayesian framework for incorporating multiple functional annotations in transcriptome-wide association studies. Genome Biol 2025; 26:21. [PMID: 39905509 DOI: 10.1186/s13059-025-03485-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 01/27/2025] [Indexed: 02/06/2025] Open
Abstract
Transcriptome-wide association study (TWAS) has emerged as a powerful tool for translating the myriad variations identified by genome-wide association studies (GWAS) into regulated genes in the post-GWAS era. While integrating annotation information has been shown to enhance power, current annotation-assisted TWAS tools predominantly focus on epigenomic annotations. When including more annotations, the assumption of a positive correlation between annotation scores and SNPs' effect sizes, as adopted by current methods, often falls short. Here, we propose MAAT expanding the horizons of existing TWAS studies, generating a new model incorporating multiple annotations into TWAS and a new metric indicating the most important annotation.
Collapse
Affiliation(s)
- Han Wang
- College of Science, China Agricultural University, Beijing, China
| | - Xiang Li
- Department of Statistics and Actuarial Science, School of Computing and Data Science, The University of Hong Kong, Hong Kong SAR, China
| | - Teng Li
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zhe Li
- 4+4 Medical Doctor Program, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Pak Chung Sham
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Yan Dora Zhang
- Department of Statistics and Actuarial Science, School of Computing and Data Science, The University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
5
|
Shang L, Wu P, Zhou X. Statistical identification of cell type-specific spatially variable genes in spatial transcriptomics. Nat Commun 2025; 16:1059. [PMID: 39865128 PMCID: PMC11770176 DOI: 10.1038/s41467-025-56280-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 01/06/2025] [Indexed: 01/28/2025] Open
Abstract
An essential task in spatial transcriptomics is identifying spatially variable genes (SVGs). Here, we present Celina, a statistical method for systematically detecting cell type-specific SVGs (ct-SVGs)-a subset of SVGs exhibiting distinct spatial expression patterns within specific cell types. Celina utilizes a spatially varying coefficient model to accurately capture each gene's spatial expression pattern in relation to the distribution of cell types across tissue locations, ensuring effective type I error control and high power. Celina proves powerful compared to existing methods in single-cell resolution spatial transcriptomics and stands as the only effective solution for spot-resolution spatial transcriptomics. Applied to five real datasets, Celina uncovers ct-SVGs associated with tumor progression and patient survival in lung cancer, identifies metagenes with unique spatial patterns linked to cell proliferation and immune response in kidney cancer, and detects genes preferentially expressed near amyloid-β plaques in an Alzheimer's model.
Collapse
Affiliation(s)
- Lulu Shang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peijun Wu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
6
|
Herbst E, Mandel-Gutfreund Y, Yakhini Z, Biran H. Inferring single-cell and spatial microRNA activity from transcriptomics data. Commun Biol 2025; 8:87. [PMID: 39827321 PMCID: PMC11743151 DOI: 10.1038/s42003-025-07454-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 01/02/2025] [Indexed: 01/22/2025] Open
Abstract
The activity of miRNA varies across different cell populations and systems, as part of the mechanisms that distinguish cell types and roles in living organisms and in human health and disease. Typically, miRNA regulation drives changes in the composition and levels of protein-coding RNA and of lncRNA, with targets being down-regulated when miRNAs are active. The term "miRNA activity" is used to refer to this transcriptional effect of miRNAs. This study introduces miTEA-HiRes, a method designed to facilitate the evaluation of miRNA activity at high resolution. The method applies to single-cell transcriptomics, type-specific single-cell populations, and spatial transcriptomics data. By comparing different conditions, differential miRNA activity is inferred. For instance, miTEA-HiRes analysis of peripheral blood mononuclear cells comparing Multiple Sclerosis patients to control groups revealed differential activity of miR-20a-5p and others, consistent with the literature on miRNA underexpression in Multiple Sclerosis. We also show miR-519a-3p differential activity in specific cell populations.
Collapse
Affiliation(s)
- Efrat Herbst
- Arazi School of Computer Science, Reichman University, Herzliya, Israel.
| | - Yael Mandel-Gutfreund
- Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel
| | - Zohar Yakhini
- Arazi School of Computer Science, Reichman University, Herzliya, Israel
- Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel
| | - Hadas Biran
- Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel
| |
Collapse
|
7
|
Liu X, Li YJ, Fan Q. Zim4rv: an R package to modeling zero-inflated count phenotype on regional-based rare variants. BMC Bioinformatics 2025; 26:18. [PMID: 39819419 PMCID: PMC11740424 DOI: 10.1186/s12859-024-06029-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Accepted: 12/27/2024] [Indexed: 01/19/2025] Open
Abstract
BACKGROUND With the advance of next-generation sequencing, various gene-based rare variant association tests have been developed, particularly for binary and continuous phenotypes. In contrast, fewer methods are available for traits not following binomial or normal distributions. To address this, we previously proposed a set of burden- and kernel-based rare variant tests for count data following zero-inflated Poisson (ZIP) distributions, referred to as ZIP-b and ZIP-k tests. We sought to extend the methods to accommodate negative binomial distribution and implemented these tests in a new R package. RESULTS We introduce ZIM4rv, an R package designed to analyze the association of rare variants with zero-inflated counts outcomes. Our package offers two novel models developed by our team: our previously proposed ZIP-b and ZIP-k tests, and the newly derived Negative Binomial Burden and Kernel Test (ZINB-b, ZINB-k). Additionally, we include an ad-hoc two-stage analysis, testing zero and non-zero as a binary outcome and non-zero as a continuous outcome, respectively. To showcase the utility of our platform, we applied this program to analyze neuritic plaque count data from the ROSMAP cohort. CONCLUSION The R package ZIM4rv presents an integrated workflow for conducting association tests on a set of rare variants with zero-inflated counts data.
Collapse
Affiliation(s)
- Xiaomin Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| | - Yi-Ju Li
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina, USA
- Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, North Carolina, USA
| | - Qiao Fan
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
8
|
Yu X, Zhang L, Srinivasan A, Xie MG, Xue L. A unified combination framework for dependent tests with applications to microbiome association studies. Biometrics 2025; 81:ujaf001. [PMID: 39887051 PMCID: PMC11783248 DOI: 10.1093/biomtc/ujaf001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 10/30/2024] [Accepted: 01/10/2025] [Indexed: 02/01/2025]
Abstract
We introduce a novel meta-analysis framework to combine dependent tests under a general setting, and utilize it to synthesize various microbiome association tests that are calculated from the same dataset. Our development builds upon the classical meta-analysis methods of aggregating P-values and also a more recent general method of combining confidence distributions, but makes generalizations to handle dependent tests. The proposed framework ensures rigorous statistical guarantees, and we provide a comprehensive study and compare it with various existing dependent combination methods. Notably, we demonstrate that the widely used Cauchy combination method for dependent tests, referred to as the vanilla Cauchy combination in this article, can be viewed as a special case within our framework. Moreover, the proposed framework provides a way to address the problem when the distributional assumptions underlying the vanilla Cauchy combination are violated. Our numerical results demonstrate that ignoring the dependence among the to-be-combined components may lead to a severe size distortion phenomenon. Compared to the existing P-value combination methods, including the vanilla Cauchy combination method and other methods, the proposed combination framework is flexible and can be adapted to handle the dependence accurately and utilizes the information efficiently to construct tests with accurate size and enhanced power. The development is applied to the microbiome association studies, where we aggregate information from multiple existing tests using the same dataset. The combined tests harness the strengths of each individual test across a wide range of alternative spaces, enabling more efficient and meaningful discoveries of vital microbiome associations.
Collapse
Affiliation(s)
- Xiufan Yu
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Linjun Zhang
- Department of Statistics, Rutgers University, Piscataway, NJ 08854, USA
| | | | - Min-ge Xie
- Department of Statistics, Rutgers University, Piscataway, NJ 08854, USA
| | - Lingzhou Xue
- Department of Statistics, Penn State University, University Park, PA 16802, USA
| |
Collapse
|
9
|
King A, Wu C. Integrative Multi-Omics Approach for Improving Causal Gene Identification. Genet Epidemiol 2025; 49:e22601. [PMID: 39444114 DOI: 10.1002/gepi.22601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 10/01/2024] [Accepted: 10/04/2024] [Indexed: 10/25/2024]
Abstract
Transcriptome-wide association studies (TWAS) have been widely used to identify thousands of likely causal genes for diseases and complex traits using predicted expression models. However, most existing TWAS methods rely on gene expression alone and overlook other regulatory mechanisms of gene expression, including DNA methylation and splicing, that contribute to the genetic basis of these complex traits and diseases. Here we introduce a multi-omics method that integrates gene expression, DNA methylation, and splicing data to improve the identification of associated genes with our traits of interest. Through simulations and by analyzing genome-wide association study (GWAS) summary statistics for 24 complex traits, we show that our integrated method, which leverages these complementary omics biomarkers, achieves higher statistical power, and improves the accuracy of likely causal gene identification in blood tissues over individual omics methods. Finally, we apply our integrated model to a lung cancer GWAS data set, demonstrating the integrated models improved identification of prioritized genes for lung cancer risk.
Collapse
Affiliation(s)
- Austin King
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
10
|
Samorodnitsky S, Campbell K, Little A, Ling W, Zhao N, Chen YC, Wu MC. Detecting Clinically Relevant Topological Structures in Multiplexed Spatial Proteomics Imaging Using TopKAT. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.18.628976. [PMID: 39764056 PMCID: PMC11702633 DOI: 10.1101/2024.12.18.628976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
Novel multiplexed spatial proteomics imaging platforms expose the spatial architecture of cells in the tumor microenvironment (TME). The diverse cell population in the TME, including its spatial context, has been shown to have important clinical implications, correlating with disease prognosis and treatment response. The accelerating implementation of spatial proteomic technologies motivates new statistical models to test if cell-level images associate with patient-level endpoints. Few existing methods can robustly characterize the geometry of the spatial arrangement of cells and also yield both a valid and powerful test for association with patient-level outcomes. We propose a topology-based approach that combines persistent homology with kernel testing to determine if topological structures created by cells predict continuous, binary, or survival clinical endpoints. We term our method TopKAT (Topological Kernel Association Test) and show that it can be more powerful than statistical tests grounded in the spatial point process model, particularly when cells arise along the boundary of a ring. We demonstrate the properties of TopKAT through simulation studies and apply it to two studies of triple negative breast cancer where we show that TopKAT recovers clinically relevant topological structures in the spatial distribution of immune and tumor cells.
Collapse
Affiliation(s)
- Sarah Samorodnitsky
- Public Health Sciences Division, Fred Hutchinson Cancer Center
- SWOG Statistics and Data Management Center
| | - Katie Campbell
- Medicine, Division of Hematology/Oncology, University of California Los Angeles
| | - Amarise Little
- Public Health Sciences Division, Fred Hutchinson Cancer Center
- SWOG Statistics and Data Management Center
| | - Wodan Ling
- Population Health Sciences, Weill Cornell Medical College
| | - Ni Zhao
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University
| | - Yen-Chi Chen
- Department of Statistics, University of Washington
| | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Center
- SWOG Statistics and Data Management Center
| |
Collapse
|
11
|
Shao M, Chen K, Zhang S, Tian M, Shen Y, Cao C, Gu N. Multiome-wide Association Studies: Novel Approaches for Understanding Diseases. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae077. [PMID: 39471467 PMCID: PMC11630051 DOI: 10.1093/gpbjnl/qzae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/06/2024] [Accepted: 10/23/2024] [Indexed: 11/01/2024]
Abstract
The rapid development of multiome (transcriptome, proteome, cistrome, imaging, and regulome)-wide association study methods have opened new avenues for biologists to understand the susceptibility genes underlying complex diseases. Thorough comparisons of these methods are essential for selecting the most appropriate tool for a given research objective. This review provides a detailed categorization and summary of the statistical models, use cases, and advantages of recent multiome-wide association studies. In addition, to illustrate gene-disease association studies based on transcriptome-wide association study (TWAS), we collected 478 disease entries across 22 categories from 235 manually reviewed publications. Our analysis reveals that mental disorders are the most frequently studied diseases by TWAS, indicating its potential to deepen our understanding of the genetic architecture of complex diseases. In summary, this review underscores the importance of multiome-wide association studies in elucidating complex diseases and highlights the significance of selecting the appropriate method for each study.
Collapse
Affiliation(s)
- Mengting Shao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Kaiyang Chen
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Shuting Zhang
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Min Tian
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Yan Shen
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Ning Gu
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
- Nanjing Key Laboratory for Cardiovascular Information and Health Engineering Medicine, Institute of Clinical Medicine, Nanjing Drum Tower Hospital, Medical School, Nanjing University, Nanjing 210093, China
| |
Collapse
|
12
|
Zhang Y, Schluter J, Zhang L, Cao X, Jenq RR, Feng H, Haines J, Zhang L. Review and revamp of compositional data transformation: A new framework combining proportion conversion and contrast transformation. Comput Struct Biotechnol J 2024; 23:4088-4107. [PMID: 39624165 PMCID: PMC11609487 DOI: 10.1016/j.csbj.2024.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 11/01/2024] [Accepted: 11/02/2024] [Indexed: 01/03/2025] Open
Abstract
Due to the development of next-generation sequencing technology and an increased appreciation of their role in modulating host immunity and their potential as therapeutic agents, the human microbiome has emerged as a key area of interest in various biological investigations of human health and disease. However, microbiome data present a number of statistical challenges not addressed by existing methods, such as the varying sequencing depth, the compositionality, and zero inflation. Solutions like scaling and transformation methods help to mitigate heterogeneity and release constraints, but often introduce biases and yield inconsistent results on the same data. To address these issues, we conduct a systematic review of compositional data transformation, with a particular focus on the connection and distinction of existing techniques. Additionally, we create a new framework that enables the development of new transformations by combining proportion conversion with contrast transformations. This framework includes well-known methods such as Additive Log Ratio (ALR) and Centered Log Ratio (CLR) as special cases. Using this framework, we develop two novel transformations-Centered Arcsine Contrast (CAC) and Additive Arcsine Contrast (AAC)-which show enhanced performance in scenarios with high zero-inflation. Moreover, our findings suggest that ALR and CLR transformations are more effective when zero values are less prevalent. This comprehensive review and the innovative framework provide microbiome researchers with a significant direction to enhance data transformation procedures and improve analytical outcomes.
Collapse
Affiliation(s)
- Yiqian Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, 2109 Adelbert Rd, Cleveland, 44106, OH, USA
- Department of Statistics, University of Illinois Urbana-Champaign, 605 E. Springfield Ave., Champaign, 61820, IL, USA
| | - Jonas Schluter
- Institute for Systems Genetics, Department of Microbiology, New York University Grossman School of Medicine, 435 East 30th Street, New York, 10016, NY, USA
| | - Lijun Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, 2109 Adelbert Rd, Cleveland, 44106, OH, USA
| | - Xuan Cao
- Division of Statistics and Data Science, Department of Mathematical Sciences, University of Cincinnati, 2815 Commons Way, Cincinnati, 45219, OH, USA
| | - Robert R. Jenq
- Department of Hematology & Hematopoietic Cell Transplantation, City of Hope, 1500 East Duarte Road, Duarte, 91010, CA, USA
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, 2109 Adelbert Rd, Cleveland, 44106, OH, USA
| | - Jonathan Haines
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, 2109 Adelbert Rd, Cleveland, 44106, OH, USA
| | - Liangliang Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, 2109 Adelbert Rd, Cleveland, 44106, OH, USA
- Case Comprehensive Cancer Center, 2103 Cornell Road, Cleveland, 44106, OH, USA
| |
Collapse
|
13
|
Kang M, Farrell JJ, Zhu C, Park H, Kang S, Seo EH, Choi KY, Jun GR, Won S, Gim J, Lee KH, Farrer LA. Whole-genome sequencing study in Koreans identifies novel loci for Alzheimer's disease. Alzheimers Dement 2024; 20:8246-8262. [PMID: 39428694 DOI: 10.1002/alz.14128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 06/06/2024] [Accepted: 06/18/2024] [Indexed: 10/22/2024]
Abstract
INTRODUCTION The genetic basis of Alzheimer's disease (AD) in Koreans is poorly understood. METHODS We performed an AD genome-wide association study using whole-genome sequence data from 3540 Koreans (1583 AD cases, 1957 controls) and single-nucleotide polymorphism array data from 2978 Japanese (1336 AD cases, 1642 controls). Significant findings were evaluated by pathway enrichment and differential gene expression analysis in brain tissue from controls and AD cases with and without dementia prior to death. RESULTS We identified genome-wide significant associations with APOE in the total sample and ROCK2 (rs76484417, p = 2.71×10-8) among APOE ε4 non-carriers. A study-wide significant association was found with aggregated rare variants in MICALL1 (MICAL like 1) (p = 9.04×10-7). Several novel AD-associated genes, including ROCK2 and MICALL1, were differentially expressed in AD cases compared to controls (p < 3.33×10-3). ROCK2 was also differentially expressed between AD cases with and without dementia (p = 1.34×10-4). DISCUSSION Our results provide insight into genetic mechanisms leading to AD and cognitive resilience in East Asians. HIGHLIGHTS Novel genome-wide significant associations for AD identified with ROCK2 and MICALL1. ROCK2 and MICALL1 are differentially expressed between AD cases and controls in the brain. This is the largest whole-genome-sequence study of AD in an East Asian population.
Collapse
Affiliation(s)
- Moonil Kang
- Department of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts, USA
| | - John J Farrell
- Department of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts, USA
| | - Congcong Zhu
- Department of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts, USA
| | - Hyeonseul Park
- Department of Integrative Biological Sciences, Chosun University, Gwangju, Republic of Korea
| | - Sarang Kang
- Gwangju Alzheimer's and Related Dementia (GARD) Cohort Research Center, Chosun University, Dong-gu, Gwangju, Republic of Korea
| | - Eun Hyun Seo
- Gwangju Alzheimer's and Related Dementia (GARD) Cohort Research Center, Chosun University, Dong-gu, Gwangju, Republic of Korea
- Premedical Science, College of Medicine, Chosun University, Dong-gu, Gwangju, Republic of Korea
| | - Kyu Yeong Choi
- Gwangju Alzheimer's and Related Dementia (GARD) Cohort Research Center, Chosun University, Dong-gu, Gwangju, Republic of Korea
- Kolab Inc., Dong-gu, Gwangju, Republic of Korea
| | - Gyungah R Jun
- Department of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts, USA
- Department of Ophthalmology, Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
- Alzheimer's Disease Research Center, Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts, USA
| | - Sungho Won
- Institute of Health and Environment, Seoul National University, Gwanak-gu, Seoul, Republic of Korea
- Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, Gwanak-gu, Seoul, Republic of Korea
- RexSoft Corps, Gwanak-gu, Seoul, Republic of Korea
| | - Jungsoo Gim
- Department of Integrative Biological Sciences, Chosun University, Gwangju, Republic of Korea
- Gwangju Alzheimer's and Related Dementia (GARD) Cohort Research Center, Chosun University, Dong-gu, Gwangju, Republic of Korea
- Department of Biomedical Science, Chosun University, Dong-gu, Gwangju, Republic of Korea
- Well-ageing Medicare Institute, Chosun University, Dong-gu, Gwangju, Republic of Korea
| | - Kun Ho Lee
- Department of Integrative Biological Sciences, Chosun University, Gwangju, Republic of Korea
- Gwangju Alzheimer's and Related Dementia (GARD) Cohort Research Center, Chosun University, Dong-gu, Gwangju, Republic of Korea
- Department of Biomedical Science, Chosun University, Dong-gu, Gwangju, Republic of Korea
- Korea Brain Research Institute, Dong-gu, Daegu, Republic of Korea
| | - Lindsay A Farrer
- Department of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts, USA
- Department of Ophthalmology, Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
- Alzheimer's Disease Research Center, Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts, USA
- Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
14
|
McIlwain SJ, Hoefges A, Erbe AK, Sondel PM, Ong IM. Ranking antibody binding epitopes and proteins across samples from whole proteome tiled linear peptides. Bioinformatics 2024; 40:btae637. [PMID: 39499154 PMCID: PMC11631460 DOI: 10.1093/bioinformatics/btae637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 09/17/2024] [Accepted: 11/01/2024] [Indexed: 11/07/2024] Open
Abstract
INTRODUCTION Ultradense peptide binding arrays that can probe millions of linear peptides comprising the entire proteomes of human or mouse, or hundreds of thousands of microbes, are powerful tools for studying the antibody repertoire in serum samples to understand adaptive immune responses. MOTIVATION There are few tools for exploring high-dimensional, significant and reproducible antibody targets for ultradense peptide binding arrays at the linear peptide, epitope (grouping of adjacent peptides), and protein level across multiple samples/subjects (i.e. epitope spread or immunogenic regions of proteins) for understanding the heterogeneity of immune responses. RESULTS We developed Hierarchical antibody binding Epitopes and pROteins from liNear peptides (HERON), an R package, which can identify immunogenic epitopes, using meta-analyses and spatial clustering techniques to explore antibody targets at various resolution and confidence levels, that can be found consistently across a specified number of samples through the entire proteome to study antibody responses for diagnostics or treatment. Our approach estimates significance values at the linear peptide (probe), epitope, and protein level to identify top candidates for validation. We tested the performance of predictions on all three levels using correlation between technical replicates and comparison of epitope calls on two datasets, and results showed HERON's competitiveness in estimating false discovery rates and finding general and sample-level regions of interest for antibody binding. AVAILABILITY AND IMPLEMENTATION The HERON R package is available at Bioconductor https://bioconductor.org/packages/release/bioc/html/HERON.html.
Collapse
Affiliation(s)
- Sean J McIlwain
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53705, USA
- University of Wisconsin Carbone Comprehensive Cancer Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Anna Hoefges
- Department of Human Oncology, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Amy K Erbe
- Department of Human Oncology, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Paul M Sondel
- University of Wisconsin Carbone Comprehensive Cancer Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Human Oncology, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Pediatrics, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Irene M Ong
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53705, USA
- University of Wisconsin Carbone Comprehensive Cancer Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Obstetrics and Gynecology, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Center for Human Genomics and Precision Medicine, University of Wisconsin-Madison, Madison, WI, 53705, USA
| |
Collapse
|
15
|
He M, Zhao N. A Mixed Effect Similarity Matrix Regression Model (SMRmix) for Integrating Multiple Microbiome Datasets at Community Level. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.10.584315. [PMID: 38559012 PMCID: PMC10979838 DOI: 10.1101/2024.03.10.584315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
BACKGROUND Recent studies have highlighted the importance of human microbiota in our health and diseases. However, in many areas of research, individual microbiome studies often offer inconsistent results due to the limited sample sizes and the heterogeneity in study populations and experimental procedures. This inconsistency underscores the necessity for integrative analysis of multiple microbiome datasets. Despite the critical need, statistical methods that incorporate multiple microbiome datasets and account for the study heterogeneity are not available in the literature. METHODS In this paper, we develop a mixed effect similarity matrix regression (SMRmix) approach for identifying community level microbiome shifts between outcomes. SMRmix has a close connection with the microbiome kernel association test, one of the most popular approaches for such a task but is only applicable when we have a single study. SMRmix enables researchers to consolidate findings from diverse microbiome studies. RESULTS Via extensive simulations, we show that SMRmix has well-controlled type I error and higher power than some potential competitors. We applied the SMRmix to two real-world datasets. The first, from the HIV-reanalysis consortium, integrated data from 17 studies on gut dysbiosis in HIV. Our analysis confirmed consistent associations between the gut microbiome and HIV infection as well as MSM (men who have sex with men) status, demonstrating greater power than competing methods. The second dataset involved 11 studies on the gut microbiome in colorectal cancer; analysis with SMRmix confirmed significant dysbiosis in affected individuals compared to healthy controls. CONCLUSION The development of SMRmix enables the integration of multiple studies and effectively managing study heterogeneity, and provides a powerful tool for uncovering consistent associations between diseases and community-level microbiome data.
Collapse
|
16
|
Tan Q, Xu X, Zhou H, Jia J, Jia Y, Tu H, Zhou D, Wu X. A multi-ancestry cerebral cortex transcriptome-wide association study identifies genes associated with smoking behaviors. Mol Psychiatry 2024; 29:3580-3589. [PMID: 38816585 DOI: 10.1038/s41380-024-02605-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 04/30/2024] [Accepted: 05/09/2024] [Indexed: 06/01/2024]
Abstract
Transcriptome-wide association studies (TWAS) have provided valuable insight in identifying genes that may impact cigarette smoking. Most of previous studies, however, mainly focused on European ancestry. Limited TWAS studies have been conducted across multiple ancestries to explore genes that may impact smoking behaviors. In this study, we used cis-eQTL data of cerebral cortex from multiple ancestries in MetaBrain, including European, East Asian, and African samples, as reference panels to perform multi-ancestry TWAS analyses on ancestry-matched GWASs of four smoking behaviors including smoking initiation, smoking cessation, age of smoking initiation, and number of cigarettes per day in GWAS & Sequencing Consortium of Alcohol and Nicotine use (GSCAN). Multiple-ancestry fine-mapping approach was conducted to identify credible gene sets associated with these four traits. Enrichment and module network analyses were further performed to explore the potential roles of these identified gene sets. A total of 719 unique genes were identified to be associated with at least one of the four smoking traits across ancestries. Among those, 249 genes were further prioritized as putative causal genes in multiple ancestry-based fine-mapping approach. Several well-known smoking-related genes, including PSMA4, IREB2, and CHRNA3, showed high confidence across ancestries. Some novel genes, e.g., TSPAN3 and ANK2, were also identified in the credible sets. The enrichment analysis identified a series of critical pathways related to smoking such as synaptic transmission and glutamate receptor activity. Leveraging the power of the latest multi-ancestry GWAS and eQTL data sources, this study revealed hundreds of genes and relevant biological processes related to smoking behaviors. These findings provide new insights for future functional studies on smoking behaviors.
Collapse
Affiliation(s)
- Qilong Tan
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
| | - Xiaohang Xu
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
| | - Hanyi Zhou
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
| | - Junlin Jia
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
| | - Yubing Jia
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
| | - Huakang Tu
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
- National Institute for Data Science in Health and Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Dan Zhou
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China
- Cancer Center, Zhejiang University, Hangzhou, 310058, China
| | - Xifeng Wu
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, 310058, China.
- School of Medicine and Health Science, George Washington University, Washington, DC, USA.
| |
Collapse
|
17
|
Ng B, Tasaki S, Greathouse KM, Walker CK, Zhang A, Covitz S, Cieslak M, Weber AJ, Adamson AB, Andrade JP, Poovey EH, Curtis KA, Muhammad HM, Seidlitz J, Satterthwaite T, Bennett DA, Seyfried NT, Vogel J, Gaiteri C, Herskowitz JH. Integration across biophysical scales identifies molecular and cellular correlates of person-to-person variability in human brain connectivity. Nat Neurosci 2024; 27:2240-2252. [PMID: 39482360 PMCID: PMC11537986 DOI: 10.1038/s41593-024-01788-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 09/16/2024] [Indexed: 11/03/2024]
Abstract
Brain connectivity arises from interactions across biophysical scales, ranging from molecular to cellular to anatomical to network level. To date, there has been little progress toward integrated analysis across these scales. To bridge this gap, from a unique cohort of 98 individuals, we collected antemortem neuroimaging and genetic data, as well as postmortem dendritic spine morphometric, proteomic and gene expression data from the superior frontal and inferior temporal gyri. Through the integration of the molecular and dendritic spine morphology data, we identified hundreds of proteins that explain interindividual differences in functional connectivity and structural covariation. These proteins are enriched for synaptic structures and functions, energy metabolism and RNA processing. By integrating data at the genetic, molecular, subcellular and tissue levels, we link specific biochemical changes at synapses to connectivity between brain regions. These results demonstrate the feasibility of integrating data from vastly different biophysical scales to provide a more comprehensive understanding of brain connectivity.
Collapse
Affiliation(s)
- Bernard Ng
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Kelsey M Greathouse
- Department of Neurology, Center for Neurodegeneration and Experimental Therapeutics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Courtney K Walker
- Department of Neurology, Center for Neurodegeneration and Experimental Therapeutics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ada Zhang
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Sydney Covitz
- Penn/CHOP Lifespan Brain Institute, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
- Penn Lifespan Informatics and Neuroimaging Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Matt Cieslak
- Penn/CHOP Lifespan Brain Institute, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
- Penn Lifespan Informatics and Neuroimaging Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Audrey J Weber
- Department of Neurology, Center for Neurodegeneration and Experimental Therapeutics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ashley B Adamson
- Department of Neurology, Center for Neurodegeneration and Experimental Therapeutics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Julia P Andrade
- Department of Neurology, Center for Neurodegeneration and Experimental Therapeutics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Emily H Poovey
- Department of Neurology, Center for Neurodegeneration and Experimental Therapeutics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Kendall A Curtis
- Department of Neurology, Center for Neurodegeneration and Experimental Therapeutics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Hamad M Muhammad
- Department of Neurology, Center for Neurodegeneration and Experimental Therapeutics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jakob Seidlitz
- Penn/CHOP Lifespan Brain Institute, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
- Penn Lifespan Informatics and Neuroimaging Center, University of Pennsylvania, Philadelphia, PA, USA
- Department of Child and Adolescent Psychiatry and Behavioral Science, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ted Satterthwaite
- Penn/CHOP Lifespan Brain Institute, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
- Penn Lifespan Informatics and Neuroimaging Center, University of Pennsylvania, Philadelphia, PA, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Nicholas T Seyfried
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, USA
| | - Jacob Vogel
- Penn/CHOP Lifespan Brain Institute, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
- Penn Lifespan Informatics and Neuroimaging Center, University of Pennsylvania, Philadelphia, PA, USA
- Department of Clinical Science, Malmö, SciLifeLab, Lund University, Lund, Sweden
| | - Chris Gaiteri
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA.
| | - Jeremy H Herskowitz
- Department of Neurology, Center for Neurodegeneration and Experimental Therapeutics, University of Alabama at Birmingham, Birmingham, AL, USA.
| |
Collapse
|
18
|
Chien LC. Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods. Int J Biostat 2024; 20:677-690. [PMID: 37743670 DOI: 10.1515/ijb-2022-0123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 07/28/2023] [Indexed: 09/26/2023]
Abstract
In genome-wide association studies (GWAS), logistic regression is one of the most popular analytics methods for binary traits. Multinomial regression is an extension of binary logistic regression that allows for multiple categories. However, many GWAS methods have been limited application to binary traits. These methods have improperly often been used to account for ordinal traits, which causes inappropriate type I error rates and poor statistical power. Owing to the lack of analysis methods, GWAS of ordinal traits has been known to be problematic and gaining attention. In this paper, we develop a general framework for identifying ordinal traits associated with genetic variants in pedigree-structured samples by collapsing and kernel methods. We use the local odds ratios GEE technology to account for complicated correlation structures between family members and ordered categorical traits. We use the retrospective idea to treat the genetic markers as random variables for calculating genetic correlations among markers. The proposed genetic association method can accommodate ordinal traits and allow for the covariate adjustment. We conduct simulation studies to compare the proposed tests with the existing models for analyzing the ordered categorical data under various configurations. We illustrate application of the proposed tests by simultaneously analyzing a family study and a cross-sectional study from the Genetic Analysis Workshop 19 (GAW19) data.
Collapse
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC
| |
Collapse
|
19
|
Guerra G, Wendt G, McCoy L, Hansen HM, Kachuri L, Molinaro AM, Rice T, Guan V, Capistrano L, Hsieh A, Kalsi V, Sallee J, Taylor JW, Clarke JL, Rodriguez Almaraz E, Wiencke JK, Eckel-Passow JE, Jenkins RB, Wrensch M, Francis SS. Functional germline variants in DNA damage repair pathways are associated with altered survival in adults with glioma treated with temozolomide. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.10.13.23296963. [PMID: 39417102 PMCID: PMC11482862 DOI: 10.1101/2023.10.13.23296963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Background Temozolomide (TMZ) treatment has demonstrated, but variable, impact on glioma prognosis. This study examines associations of survival with DNA repair gene germline polymorphisms among glioma patients who did and did not have TMZ treatment. Identifying genetic markers which sensitize tumor cells to TMZ could personalize therapy and improve outcomes. Methods We evaluated TMZ-related survival associations of pathogenic germline SNPs and genetically predicted transcript levels within 34 DNA repair genes among 1504 glioma patients from the UCSF Adult Glioma Study and Mayo Clinic whose diagnoses spanned pre- and post-TMZ eras within the major known glioma prognostic molecular subtypes. Results Among those who received TMZ, 5 SNPs were associated with overall survival, but not in those who did not receive TMZ. Only rs2308321-G, in MGMT, was associated with decreased survival (HR=1.21, p=0.019) for all glioma subtypes. Rs73191162-T (near UNG), rs13076508-C (near PARP3), rs7840433-A (near NEIL2), and rs3130618-A (near MSH5) were only associated with survival and TMZ treatment for certain subtypes, suggesting subtype-specific germline chemo-sensitization.Genetically predicted elevated compared to normal brain expression of PNKP was associated with dramatically worse survival for TMZ-treated patients with IDH-mutant and 1p/19q non-codeleted gliomas (p=0.015). Similarly, NEIL2 and TDG expressions were associated with altered TMZ-related survival only among certain subtypes. Conclusions Functional germline alterations within DNA repair genes were associated with TMZ sensitivity, measured by overall survival, among adults with glioma, these variants should be evaluated in prospective analyses and functional studies.
Collapse
Affiliation(s)
- Geno Guerra
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - George Wendt
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Lucie McCoy
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Helen M. Hansen
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Linda Kachuri
- Department of Epidemiology & Population Health, Stanford University School of Medicine, Stanford, CA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA
| | - Annette M. Molinaro
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Terri Rice
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Victoria Guan
- School of Pharmacy, University of California San Francisco, San Francisco, CA, USA
| | - Lianne Capistrano
- School of Pharmacy, University of California San Francisco, San Francisco, CA, USA
| | - Allison Hsieh
- School of Pharmacy, University of California San Francisco, San Francisco, CA, USA
| | - Veruna Kalsi
- School of Pharmacy, University of California San Francisco, San Francisco, CA, USA
| | - Jaimie Sallee
- School of Pharmacy, University of California San Francisco, San Francisco, CA, USA
| | - Jennie W. Taylor
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Jennifer L. Clarke
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Eduardo Rodriguez Almaraz
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - John K. Wiencke
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | | | - Robert B. Jenkins
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Margaret Wrensch
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Stephen S. Francis
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
20
|
Ziyatdinov A, Mbatchou J, Marcketta A, Backman J, Gaynor S, Zou Y, Joseph T, Geraghty B, Herman J, Watanabe K, Ghosh A, Kosmicki J, Locke A, Thornton T, Kang HM, Ferreira M, Baras A, Abecasis G, Marchini J. Joint testing of rare variant burden scores using non-negative least squares. Am J Hum Genet 2024; 111:2139-2149. [PMID: 39366334 PMCID: PMC11480795 DOI: 10.1016/j.ajhg.2024.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 08/23/2024] [Accepted: 08/27/2024] [Indexed: 10/06/2024] Open
Abstract
Gene-based burden tests are a popular and powerful approach for analysis of exome-wide association studies. These approaches combine sets of variants within a gene into a single burden score that is then tested for association. Typically, a range of burden scores are calculated and tested across a range of annotation classes and frequency bins. Correlation between these tests can complicate the multiple testing correction and hamper interpretation of the results. We introduce a method called the sparse burden association test (SBAT) that tests the joint set of burden scores under the assumption that causal burden scores act in the same effect direction. The method simultaneously assesses the significance of the model fit and selects the set of burden scores that best explain the association at the same time. Using simulated data, we show that the method is well calibrated and highlight scenarios where the test outperforms existing gene-based tests. We apply the method to 73 quantitative traits from the UK Biobank, showing that SBAT is a valuable additional gene-based test when combined with other existing approaches. This test is implemented in the REGENIE software.
Collapse
Affiliation(s)
| | | | | | | | | | - Yuxin Zou
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | - Adam Locke
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | |
Collapse
|
21
|
Acharya S, Liao S, Jung WJ, Kang YS, Moghaddam VA, Feitosa MF, Wojczynski MK, Lin S, Anema JA, Schwander K, Connell JO, Province MA, Brent MR. A methodology for gene level omics-WAS integration identifies genes influencing traits associated with cardiovascular risks: the Long Life Family Study. Hum Genet 2024; 143:1241-1252. [PMID: 39276247 PMCID: PMC11485042 DOI: 10.1007/s00439-024-02701-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 08/15/2024] [Indexed: 09/16/2024]
Abstract
The Long Life Family Study (LLFS) enrolled 4953 participants in 539 pedigrees displaying exceptional longevity. To identify genetic mechanisms that affect cardiovascular risks in the LLFS population, we developed a multi-omics integration pipeline and applied it to 11 traits associated with cardiovascular risks. Using our pipeline, we aggregated gene-level statistics from rare-variant analysis, GWAS, and gene expression-trait association by Correlated Meta-Analysis (CMA). Across all traits, CMA identified 64 significant genes after Bonferroni correction (p ≤ 2.8 × 10-7), 29 of which replicated in the Framingham Heart Study (FHS) cohort. Notably, 20 of the 29 replicated genes do not have a previously known trait-associated variant in the GWAS Catalog within 50 kb. Thirteen modules in Protein-Protein Interaction (PPI) networks are significantly enriched in genes with low meta-analysis p-values for at least one trait, three of which are replicated in the FHS cohort. The functional annotation of genes in these modules showed a significant over-representation of trait-related biological processes including sterol transport, protein-lipid complex remodeling, and immune response regulation. Among major findings, our results suggest a role of triglyceride-associated and mast-cell functional genes FCER1A, MS4A2, GATA2, HDC, and HRH4 in atherosclerosis risks. Our findings also suggest that lower expression of ATG2A, a gene we found to be associated with BMI, may be both a cause and consequence of obesity. Finally, our results suggest that ENPP3 may play an intermediary role in triglyceride-induced inflammation. Our pipeline is freely available and implemented in the Nextflow workflow language, making it easily runnable on any compute platform ( https://nf-co.re/omicsgenetraitassociation ).
Collapse
Affiliation(s)
- Sandeep Acharya
- Division of Computational and Data Sciences, Washington University, St Louis, MO, USA
| | - Shu Liao
- Department of Computer Science and Engineering, Washington University, St Louis, MO, USA
| | - Wooseok J Jung
- Department of Computer Science and Engineering, Washington University, St Louis, MO, USA
| | - Yu S Kang
- Department of Computer Science and Engineering, Washington University, St Louis, MO, USA
| | - Vaha Akbary Moghaddam
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Mary F Feitosa
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Mary K Wojczynski
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Shiow Lin
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Jason A Anema
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Karen Schwander
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Jeff O Connell
- Department of Medicine, University of Maryland, Baltimore, MD, USA
| | - Michael A Province
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA
| | - Michael R Brent
- Department of Computer Science and Engineering, Washington University, St Louis, MO, USA.
| |
Collapse
|
22
|
Liu WS, Wu BS, Yang L, Chen SD, Zhang YR, Deng YT, Wu XR, He XY, Yang J, Feng JF, Cheng W, Xu YM, Yu JT. Whole exome sequencing analyses reveal novel genes in telomere length and their biomedical implications. GeroScience 2024; 46:5365-5385. [PMID: 38837026 PMCID: PMC11336033 DOI: 10.1007/s11357-024-01203-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 05/11/2024] [Indexed: 06/06/2024] Open
Abstract
Telomere length is a putative biomarker of aging and is associated with multiple age-related diseases. There are limited data on the landscape of rare genetic variations in telomere length. Here, we systematically characterize the rare variant associations with leukocyte telomere length (LTL) through exome-wide association study (ExWAS) among 390,231 individuals in the UK Biobank. We identified 18 robust rare-variant genes for LTL, most of which estimated effects on LTL were significant (> 0.2 standard deviation per allele). The biological functions of the rare-variant genes were associated with telomere maintenance and capping and several genes were specifically expressed in the testis. Three novel genes (ASXL1, CFAP58, and TET2) associated with LTL were identified. Phenotypic association analyses indicated significant associations of ASXL1 and TET2 with cancers, age-related diseases, blood assays, and cardiovascular traits. Survival analyses suggested that carriers of ASXL1 or TET2 variants were at increased risk for cancers; diseases of the circulatory, respiratory, and genitourinary systems; and all-cause and cause-specific deaths. The CFAP58 carriers were at elevated risk of deaths due to cancers. Collectively, the present whole exome sequencing study provides novel insights into the genetic landscape of LTL, identifying novel genes associated with LTL and their implications on human health and facilitating a better understanding of aging, thus pinpointing the genetic relevance of LTL with clonal hematopoiesis, biomedical traits, and health-related outcomes.
Collapse
Affiliation(s)
- Wei-Shi Liu
- Department of Neurology and National Center for Neurological Diseases, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, 12Th Wulumuqi Zhong Road, Shanghai, 200040, China
| | - Bang-Sheng Wu
- Department of Neurology and National Center for Neurological Diseases, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, 12Th Wulumuqi Zhong Road, Shanghai, 200040, China
| | - Liu Yang
- Department of Neurology and National Center for Neurological Diseases, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, 12Th Wulumuqi Zhong Road, Shanghai, 200040, China
| | - Shi-Dong Chen
- Department of Neurology and National Center for Neurological Diseases, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, 12Th Wulumuqi Zhong Road, Shanghai, 200040, China
| | - Ya-Ru Zhang
- Department of Neurology and National Center for Neurological Diseases, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, 12Th Wulumuqi Zhong Road, Shanghai, 200040, China
| | - Yue-Ting Deng
- Department of Neurology and National Center for Neurological Diseases, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, 12Th Wulumuqi Zhong Road, Shanghai, 200040, China
| | - Xin-Rui Wu
- Department of Neurology and National Center for Neurological Diseases, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, 12Th Wulumuqi Zhong Road, Shanghai, 200040, China
| | - Xiao-Yu He
- Department of Neurology and National Center for Neurological Diseases, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, 12Th Wulumuqi Zhong Road, Shanghai, 200040, China
| | - Jing Yang
- Department of Neurology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou University, 1St Eastern Jianshe Road, Zhengzhou, 450000, China
- NHC Key Laboratory of Prevention and Treatment of Cerebrovascular Diseases, Zhengzhou, China
| | - Jian-Feng Feng
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
- Department of Computer Science, University of Warwick, Coventry, UK
| | - Wei Cheng
- Department of Neurology and National Center for Neurological Diseases, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, 12Th Wulumuqi Zhong Road, Shanghai, 200040, China
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
- Department of Computer Science, University of Warwick, Coventry, UK
| | - Yu-Ming Xu
- Department of Neurology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou University, 1St Eastern Jianshe Road, Zhengzhou, 450000, China.
- NHC Key Laboratory of Prevention and Treatment of Cerebrovascular Diseases, Zhengzhou, China.
| | - Jin-Tai Yu
- Department of Neurology and National Center for Neurological Diseases, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Shanghai Medical College, Fudan University, 12Th Wulumuqi Zhong Road, Shanghai, 200040, China.
| |
Collapse
|
23
|
Samorodnitsky S, Wu MC. Statistical analysis of multiple regions-of-interest in multiplexed spatial proteomics data. Brief Bioinform 2024; 25:bbae522. [PMID: 39428129 PMCID: PMC11491162 DOI: 10.1093/bib/bbae522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 08/21/2024] [Accepted: 10/07/2024] [Indexed: 10/22/2024] Open
Abstract
Multiplexed spatial proteomics reveals the spatial organization of cells in tumors, which is associated with important clinical outcomes such as survival and treatment response. This spatial organization is often summarized using spatial summary statistics, including Ripley's K and Besag's L. However, if multiple regions of the same tumor are imaged, it is unclear how to synthesize the relationship with a single patient-level endpoint. We evaluate extant approaches for accommodating multiple images within the context of associating summary statistics with outcomes. First, we consider averaging-based approaches wherein multiple summaries for a single sample are combined in a weighted mean. We then propose a novel class of ensemble testing approaches in which we simulate random weights used to aggregate summaries, test for an association with outcomes, and combine the $P$-values. We systematically evaluate the performance of these approaches via simulation and application to data from non-small cell lung cancer, colorectal cancer, and triple negative breast cancer. We find that the optimal strategy varies, but a simple weighted average of the summary statistics based on the number of cells in each image often offers the highest power and controls type I error effectively. When the size of the imaged regions varies, incorporating this variation into the weighted aggregation may yield additional power in cases where the varying size is informative. Ensemble testing (but not resampling) offered high power and type I error control across conditions in our simulated data sets.
Collapse
Affiliation(s)
- Sarah Samorodnitsky
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, United States
- SWOG Statistics and Data Management Center, Fred Hutchinson Cancer Center, Seattle, WA 98109, United States
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, United States
- SWOG Statistics and Data Management Center, Fred Hutchinson Cancer Center, Seattle, WA 98109, United States
| |
Collapse
|
24
|
Tian L, Xiao J, Yu T. A robust statistical approach for finding informative spatially associated pathways. Brief Bioinform 2024; 25:bbae543. [PMID: 39451157 PMCID: PMC11503753 DOI: 10.1093/bib/bbae543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 08/27/2024] [Accepted: 10/13/2024] [Indexed: 10/26/2024] Open
Abstract
Spatial transcriptomics offers deep insights into cellular functional localization and communication by mapping gene expression to spatial locations. Traditional approaches that focus on selecting spatially variable genes often overlook the complexity of biological pathways and the interactions among genes. Here, we introduce a novel framework that shifts the focus towards directly identifying functional pathways associated with spatial variability by adapting the Brownian distance covariance test in an innovative manner to explore the heterogeneity of biological functions over space. Unlike most other methods, this statistical testing approach is free of gene selection and parameter selection and allows nonlinear and complex dependencies. It allows for a deeper understanding of how cells coordinate their activities across different spatial domains through biological pathways. By analyzing real human and mouse datasets, the method found significant pathways that were associated with spatial variation, as well as different pathway patterns among inner- and edge-cancer regions. This innovative framework offers a new perspective on analyzing spatial transcriptomic data, contributing to our understanding of tissue architecture and disease pathology. The implementation is publicly available at https://github.com/tianlq-prog/STpathway.
Collapse
Affiliation(s)
- Leqi Tian
- School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen, Guangdong 518172, P.R. China
- Shenzhen Research Institute of Big Data, Shenzhen, Guangdong 518172, P.R. China
| | - Jiashun Xiao
- Shenzhen Research Institute of Big Data, Shenzhen, Guangdong 518172, P.R. China
| | - Tianwei Yu
- School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen, Guangdong 518172, P.R. China
- Shenzhen Research Institute of Big Data, Shenzhen, Guangdong 518172, P.R. China
| |
Collapse
|
25
|
Bian S, Bass AJ, Liu Y, Wingo AP, Wingo T, Cutler DJ, Epstein MP. SCAMPI: A scalable statistical framework for genome-wide interaction testing harnessing cross-trait correlations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.10.612314. [PMID: 39314278 PMCID: PMC11418984 DOI: 10.1101/2024.09.10.612314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Family-based heritability estimates of complex traits are often considerably larger than their single-nucleotide polymorphism (SNP) heritability estimates. This discrepancy may be due to non-additive effects of genetic variation, including variation that interacts with other genes or environmental factors to influence the trait. Variance-based procedures provide a computationally efficient strategy to screen for SNPs with potential interaction effects without requiring the specification of the interacting variable. While valuable, such variance-based tests consider only a single trait and ignore likely pleiotropy among related traits that, if present, could improve power to detect such interaction effects. To fill this gap, we propose SCAMPI (Scalable Cauchy Aggregate test using Multiple Phenotypes to test Interactions), which screens for variants with interaction effects across multiple traits. SCAMPI is motivated by the observation that SNPs with pleiotropic interaction effects induce genotypic differences in the patterns of correlation among traits. By studying such patterns across genotype categories among multiple traits, we show that SCAMPI has improved performance over traditional univariate variance-based methods. Like those traditional variance-based tests, SCAMPI permits the screening of interaction effects without requiring the specification of the interaction variable and is further computationally scalable to biobank data. We employed SCAMPI to screen for interacting SNPs associated with four lipid-related traits in the UK Biobank and identified multiple gene regions missed by existing univariate variance-based tests. SCAMPI is implemented in software for public use.
Collapse
Affiliation(s)
- Shijia Bian
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30329, USA
| | - Andrew J Bass
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, 30329, USA
| | - Yue Liu
- Department of Neurology, University of California, Davis, Sacramento, CA 95817, USA
| | - Aliza P Wingo
- Department of Psychiatry, University of California, Davis, Sacramento, CA 95817, USA
- Division of Mental Health, VA Northern California Health Care System, CA 95655, USA
| | - Thomas Wingo
- Department of Neurology, University of California, Davis, Sacramento, CA 95817, USA
| | - David J Cutler
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, 30329, USA
| | - Michael P Epstein
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, 30329, USA
| |
Collapse
|
26
|
Zhu L, Zhang S, Sha Q. Meta-analysis of set-based multiple phenotype association test based on GWAS summary statistics from different cohorts. Front Genet 2024; 15:1359591. [PMID: 39301532 PMCID: PMC11410627 DOI: 10.3389/fgene.2024.1359591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 08/23/2024] [Indexed: 09/22/2024] Open
Abstract
Genome-wide association studies (GWAS) have emerged as popular tools for identifying genetic variants that are associated with complex diseases. Standard analysis of a GWAS involves assessing the association between each variant and a disease. However, this approach suffers from limited reproducibility and difficulties in detecting multi-variant and pleiotropic effects. Although joint analysis of multiple phenotypes for GWAS can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits, most of the multiple phenotype association tests are designed for a single variant, resulting in much lower power, especially when their effect sizes are small and only their cumulative effect is associated with multiple phenotypes. To overcome these limitations, set-based multiple phenotype association tests have been developed to enhance statistical power and facilitate the identification and interpretation of pleiotropic regions. In this research, we propose a new method, named Meta-TOW-S, which conducts joint association tests between multiple phenotypes and a set of variants (such as variants in a gene) utilizing GWAS summary statistics from different cohorts. Our approach applies the set-based method that Tests for the effect of an Optimal Weighted combination of variants in a gene (TOW) and accounts for sample size differences across GWAS cohorts by employing the Cauchy combination method. Meta-TOW-S combines the advantages of set-based tests and multi-phenotype association tests, exhibiting computational efficiency and enabling analysis across multiple phenotypes while accommodating overlapping samples from different GWAS cohorts. To assess the performance of Meta-TOW-S, we develop a phenotype simulator package that encompasses a comprehensive simulation scheme capable of modeling multiple phenotypes and multiple variants, including noise structures and diverse correlation patterns among phenotypes. Simulation studies validate that Meta-TOW-S maintains a desirable Type I error rate. Further simulation under different scenarios shows that Meta-TOW-S can improve power compared with other existing meta-analysis methods. When applied to four psychiatric disorders summary data, Meta-TOW-S detects a greater number of significant genes.
Collapse
Affiliation(s)
- Lirong Zhu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| |
Collapse
|
27
|
Hu T, Parrish RL, Dai Q, Buchman AS, Tasaki S, Bennett DA, Seyfried NT, Epstein MP, Yang J. Omnibus proteome-wide association study identifies 43 risk genes for Alzheimer disease dementia. Am J Hum Genet 2024; 111:1848-1863. [PMID: 39079537 PMCID: PMC11393696 DOI: 10.1016/j.ajhg.2024.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 06/28/2024] [Accepted: 07/02/2024] [Indexed: 09/08/2024] Open
Abstract
Transcriptome-wide association study (TWAS) tools have been applied to conduct proteome-wide association studies (PWASs) by integrating proteomics data with genome-wide association study (GWAS) summary data. The genetic effects of PWAS-identified significant genes are potentially mediated through genetically regulated protein abundance, thus informing the underlying disease mechanisms better than GWAS loci. However, existing TWAS/PWAS tools are limited by considering only one statistical model. We propose an omnibus PWAS pipeline to account for multiple statistical models and demonstrate improved performance by simulation and application studies of Alzheimer disease (AD) dementia. We employ the Aggregated Cauchy Association Test to derive omnibus PWAS (PWAS-O) p values from PWAS p values obtained by three existing tools assuming complementary statistical models-TIGAR, PrediXcan, and FUSION. Our simulation studies demonstrated improved power, with well-calibrated type I error, for PWAS-O over all three individual tools. We applied PWAS-O to studying AD dementia with reference proteomic data profiled from dorsolateral prefrontal cortex of postmortem brains from individuals of European ancestry. We identified 43 risk genes, including 5 not identified by previous studies, which are interconnected through a protein-protein interaction network that includes the well-known AD risk genes TOMM40, APOC1, and APOC2. We also validated causal genetic effects mediated through the proteome for 27 (63%) PWAS-O risk genes, providing insights into the underlying biological mechanisms of AD dementia and highlighting promising targets for therapeutic development. PWAS-O can be easily applied to studying other complex diseases.
Collapse
Affiliation(s)
- Tingyang Hu
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA; Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Randy L Parrish
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA; Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA 30322, USA
| | - Qile Dai
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA; Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA 30322, USA
| | - Aron S Buchman
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Nicholas T Seyfried
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Michael P Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA.
| |
Collapse
|
28
|
Wang C, Wang T, Kiryluk K, Wei Y, Aschard H, Ionita-Laza I. Genome-wide discovery for biomarkers using quantile regression at biobank scale. Nat Commun 2024; 15:6460. [PMID: 39085219 PMCID: PMC11291931 DOI: 10.1038/s41467-024-50726-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 07/18/2024] [Indexed: 08/02/2024] Open
Abstract
Genome-wide association studies (GWAS) for biomarkers important for clinical phenotypes can lead to clinically relevant discoveries. Conventional GWAS for quantitative traits are based on simplified regression models modeling the conditional mean of a phenotype as a linear function of genotype. We draw attention here to an alternative, lesser known approach, namely quantile regression that naturally extends linear regression to the analysis of the entire conditional distribution of a phenotype of interest. Quantile regression can be applied efficiently at biobank scale, while having some unique advantages such as (1) identifying variants with heterogeneous effects across quantiles of the phenotype distribution; (2) accommodating a wide range of phenotype distributions including non-normal distributions, with invariance of results to trait transformations; and (3) providing more detailed information about genotype-phenotype associations even for those associations identified by conventional GWAS. We show in simulations that quantile regression is powerful across both homogeneous and various heterogeneous models. Applications to 39 quantitative traits in the UK Biobank demonstrate that quantile regression can be a helpful complement to linear regression in GWAS and can identify variants with larger effects on high-risk subgroups of individuals but with lower or no contribution overall.
Collapse
Affiliation(s)
- Chen Wang
- Department of Biostatistics, Columbia University, New York, NY, USA
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | | | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Ying Wei
- Department of Biostatistics, Columbia University, New York, NY, USA
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Columbia University, New York, NY, USA.
- Department of Statistics, Lund University, Lund, Sweden.
| |
Collapse
|
29
|
Yang H, Wang X, Zhang Z, Chen F, Cao H, Yan L, Gao X, Dong H, Cui Y. A high-dimensional omnibus test for set-based association analysis. Brief Bioinform 2024; 25:bbae456. [PMID: 39288231 PMCID: PMC11407446 DOI: 10.1093/bib/bbae456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 08/21/2024] [Accepted: 09/03/2024] [Indexed: 09/19/2024] Open
Abstract
Set-based association analysis is a valuable tool in studying the etiology of complex diseases in genome-wide association studies, as it allows for the joint testing of variants in a region or group. Two common types of single nucleotide polymorphism (SNP)-disease functional models are recognized when evaluating the joint function of a set of SNP: the cumulative weak signal model, in which multiple functional variants with small effects contribute to disease risk, and the dominating strong signal model, in which a few functional variants with large effects contribute to disease risk. However, existing methods have two main limitations that reduce their power. Firstly, they typically only consider one disease-SNP association model, which can result in significant power loss if the model is misspecified. Secondly, they do not account for the high-dimensional nature of SNPs, leading to low power or high false positives. In this study, we propose a solution to these challenges by using a high-dimensional inference procedure that involves simultaneously fitting many SNPs in a regression model. We also propose an omnibus testing procedure that employs a robust and powerful P-value combination method to enhance the power of SNP-set association. Our results from extensive simulation studies and a real data analysis demonstrate that our set-based high-dimensional inference strategy is both flexible and computationally efficient and can substantially improve the power of SNP-set association analysis. Application to a real dataset further demonstrates the utility of the testing strategy.
Collapse
Affiliation(s)
- Haitao Yang
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
- Hebei Key Laboratory of Environment and Human Health, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
- Hebei Key Laboratory of Forensic Medicine, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Xin Wang
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Zechen Zhang
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
- Hebei Key Laboratory of Environment and Human Health, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Fuzhao Chen
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Hongyan Cao
- Department of Health Statistics, Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, School of Public Health; MOE Key Laboratory of Coal Environmental Pathogenicity and Prevention, Shanxi Medical University, No 56 Xinjian South Rd., Taiyuan, Shanxi 030001, P.R. China
| | - Lina Yan
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
- Hebei Key Laboratory of Environment and Human Health, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Xia Gao
- Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
- Hebei Key Laboratory of Environment and Human Health, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China
| | - Hui Dong
- Department of Neurology, Second Hospital of Hebei Medical University, 215 West Heping Road, Shijiazhuang, Hebei 050000, P.R. China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, 619 Red Cedar Rd., East Lansing, MI 48824, United States
| |
Collapse
|
30
|
Bouttle K, Ingold N, O’Mara TA. Using Genetics to Investigate Relationships between Phenotypes: Application to Endometrial Cancer. Genes (Basel) 2024; 15:939. [PMID: 39062718 PMCID: PMC11276418 DOI: 10.3390/genes15070939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 07/14/2024] [Accepted: 07/16/2024] [Indexed: 07/28/2024] Open
Abstract
Genome-wide association studies (GWAS) have accelerated the exploration of genotype-phenotype associations, facilitating the discovery of replicable genetic markers associated with specific traits or complex diseases. This narrative review explores the statistical methodologies developed using GWAS data to investigate relationships between various phenotypes, focusing on endometrial cancer, the most prevalent gynecological malignancy in developed nations. Advancements in analytical techniques such as genetic correlation, colocalization, cross-trait locus identification, and causal inference analyses have enabled deeper exploration of associations between different phenotypes, enhancing statistical power to uncover novel genetic risk regions. These analyses have unveiled shared genetic associations between endometrial cancer and many phenotypes, enabling identification of novel endometrial cancer risk loci and furthering our understanding of risk factors and biological processes underlying this disease. The current status of research in endometrial cancer is robust; however, this review demonstrates that further opportunities exist in statistical genetics that hold promise for advancing the understanding of endometrial cancer and other complex diseases.
Collapse
Affiliation(s)
| | | | - Tracy A. O’Mara
- Cancer Research Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia (N.I.)
| |
Collapse
|
31
|
Samorodnitsky S, Campbell K, Ribas A, Wu MC. A Spatial Omnibus Test (SPOT) for Spatial Proteomic Data. Bioinformatics 2024; 40:btae425. [PMID: 38950184 PMCID: PMC11257711 DOI: 10.1093/bioinformatics/btae425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 05/10/2024] [Accepted: 06/28/2024] [Indexed: 07/03/2024] Open
Abstract
MOTIVATION Spatial proteomics can reveal the spatial organization of immune cells in the tumor immune microenvironment. Relating measures of spatial clustering, such as Ripley's K or Besag's L, to patient outcomes may offer important clinical insights. However, these measures require pre-specifying a radius in which to quantify clustering, yet no consensus exists on the optimal radius which may be context-specific. RESULTS We propose a SPatial Omnibus Test (SPOT) which conducts this analysis across a range of candidate radii. At each radius, SPOT evaluates the association between the spatial summary and outcome, adjusting for confounders. SPOT then aggregates results across radii using the Cauchy combination test, yielding an omnibus P-value characterizing the overall degree of association. Using simulations, we verify that the type I error rate is controlled and show SPOT can be more powerful than alternatives. We also apply SPOT to ovarian and lung cancer studies. AVAILABILITY AND IMPLEMENTATION An R package and tutorial are provided at https://github.com/sarahsamorodnitsky/SPOT.
Collapse
Affiliation(s)
- Sarah Samorodnitsky
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle 98109, USA
- SWOG Statistics and Data Management Center, Fred Hutchinson Cancer Center, Seattle 98109, USA
| | - Katie Campbell
- Department of Medicine, Division of Hematology/Oncology, University of California Los Angeles, Los Angeles 90095, USA
| | - Antoni Ribas
- Department of Medicine, Division of Hematology/Oncology, University of California Los Angeles, Los Angeles 90095, USA
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle 98109, USA
- SWOG Statistics and Data Management Center, Fred Hutchinson Cancer Center, Seattle 98109, USA
| |
Collapse
|
32
|
Hung-Ching C, Yusi F, Gorczyca MT, Kayhan B, Tseng GC. High-dimensional causal mediation analysis by partial sum statistic and sample splitting strategy in imaging genetics application. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.23.24309362. [PMID: 38978660 PMCID: PMC11230309 DOI: 10.1101/2024.06.23.24309362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Causal mediation analysis provides a systematic approach to explore the causal role of one or more mediators in the association between exposure and outcome. In omics or imaging data analysis, mediators are often high-dimensional, which brings new statistical challenges. Existing methods either violate causal assumptions or fail in interpretable variable selection. Additionally, mediators are often highly correlated, presenting difficulties in selecting and prioritizing top mediators. To address these issues, we develop a framework using Partial Sum Statistic and Sample Splitting Strategy, namely PS5, for high-dimensional causal mediation analysis. The method provides a powerful global mediation test satisfying causal assumptions, followed by an algorithm to select and prioritize active mediators with quantification of individual mediation contributions. We demonstrate its accurate type I error control, superior statistical power, reduced bias in mediation effect estimation, and accurate mediator selection using extensive simulations of varying levels of effect size, signal sparsity, and mediator correlations. Finally, we apply PS5 to an imaging genetics dataset of chronic obstructive pulmonary disease (COPD) patients ( N =8,897) in the COPDGene study to examine the causal mediation role of lung images ( p =5,810) in the associations between polygenic risk score and lung function and between smoking exposure and lung function, respectively. Both causal mediation analyses successfully estimate the global indirect effect and detect mediating image regions. Collectively, we find a region in the lower lobe of the right lung with a strong and concordant mediation effect for both genetic and environmental exposures. This suggests that targeted treatment toward this region might mitigate the severity of COPD due to genetic and smoking effects.
Collapse
|
33
|
Gao G, McClellan J, Barbeira AN, Fiorica PN, Li JL, Mu Z, Olopade OI, Huo D, Im HK. A multi-tissue, splicing-based joint transcriptome-wide association study identifies susceptibility genes for breast cancer. Am J Hum Genet 2024; 111:1100-1113. [PMID: 38733992 PMCID: PMC11179262 DOI: 10.1016/j.ajhg.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 04/13/2024] [Accepted: 04/15/2024] [Indexed: 05/13/2024] Open
Abstract
Splicing-based transcriptome-wide association studies (splicing-TWASs) of breast cancer have the potential to identify susceptibility genes. However, existing splicing-TWASs test the association of individual excised introns in breast tissue only and thus have limited power to detect susceptibility genes. In this study, we performed a multi-tissue joint splicing-TWAS that integrated splicing-TWAS signals of multiple excised introns in each gene across 11 tissues that are potentially relevant to breast cancer risk. We utilized summary statistics from a meta-analysis that combined genome-wide association study (GWAS) results of 424,650 women of European ancestry. Splicing-level prediction models were trained in GTEx (v.8) data. We identified 240 genes by the multi-tissue joint splicing-TWAS at the Bonferroni-corrected significance level; in the tissue-specific splicing-TWAS that combined TWAS signals of excised introns in genes in breast tissue only, we identified nine additional significant genes. Of these 249 genes, 88 genes in 62 loci have not been reported by previous TWASs, and 17 genes in seven loci are at least 1 Mb away from published GWAS index variants. By comparing the results of our splicing-TWASs with previous gene-expression-based TWASs that used the same summary statistics and expression prediction models trained in the same reference panel, we found that 110 genes in 70 loci that are identified only by the splicing-TWASs. Our results showed that for many genes, expression quantitative trait loci (eQTL) did not show a significant impact on breast cancer risk, whereas splicing quantitative trait loci (sQTL) showed a strong impact through intron excision events.
Collapse
Affiliation(s)
- Guimin Gao
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Julian McClellan
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Alvaro N Barbeira
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Peter N Fiorica
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - James L Li
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Zepeng Mu
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Olufunmilayo I Olopade
- Section of Hematology and Oncology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Dezheng Huo
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA; Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
34
|
Guo S, Yang J. Bayesian genome-wide TWAS with reference transcriptomic data of brain and blood tissues identified 141 risk genes for Alzheimer's disease dementia. Alzheimers Res Ther 2024; 16:120. [PMID: 38824563 PMCID: PMC11144322 DOI: 10.1186/s13195-024-01488-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 05/27/2024] [Indexed: 06/03/2024]
Abstract
BACKGROUND Transcriptome-wide association study (TWAS) is an influential tool for identifying genes associated with complex diseases whose genetic effects are likely mediated through transcriptome. TWAS utilizes reference genetic and transcriptomic data to estimate effect sizes of genetic variants on gene expression (i.e., effect sizes of a broad sense of expression quantitative trait loci, eQTL). These estimated effect sizes are employed as variant weights in gene-based association tests, facilitating the mapping of risk genes with genome-wide association study (GWAS) data. However, most existing TWAS of Alzheimer's disease (AD) dementia are limited to studying only cis-eQTL proximal to the test gene. To overcome this limitation, we applied the Bayesian Genome-wide TWAS (BGW-TWAS) method to leveraging both cis- and trans- eQTL of brain and blood tissues, in order to enhance mapping risk genes for AD dementia. METHODS We first applied BGW-TWAS to the Genotype-Tissue Expression (GTEx) V8 dataset to estimate cis- and trans- eQTL effect sizes of the prefrontal cortex, cortex, and whole blood tissues. Estimated eQTL effect sizes were integrated with the summary data of the most recent GWAS of AD dementia to obtain BGW-TWAS (i.e., gene-based association test) p-values of AD dementia per gene per tissue type. Then we used the aggregated Cauchy association test to combine TWAS p-values across three tissues to obtain omnibus TWAS p-values per gene. RESULTS We identified 85 significant genes in prefrontal cortex, 82 in cortex, and 76 in whole blood that were significantly associated with AD dementia. By combining BGW-TWAS p-values across these three tissues, we obtained 141 significant risk genes including 34 genes primarily due to trans-eQTL and 35 mapped risk genes in GWAS Catalog. With these 141 significant risk genes, we detected functional clusters comprised of both known mapped GWAS risk genes of AD in GWAS Catalog and our identified TWAS risk genes by protein-protein interaction network analysis, as well as several enriched phenotypes related to AD. CONCLUSION We applied BGW-TWAS and aggregated Cauchy test methods to integrate both cis- and trans- eQTL data of brain and blood tissues with GWAS summary data, identifying 141 TWAS risk genes of AD dementia. These identified risk genes provide novel insights into the underlying biological mechanisms of AD dementia and potential gene targets for therapeutics development.
Collapse
Affiliation(s)
- Shuyi Guo
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
35
|
Song H, Wu MC. Multivariate differential association analysis. Stat (Int Stat Inst) 2024; 13:e704. [PMID: 39712486 PMCID: PMC11661859 DOI: 10.1002/sta4.704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 05/02/2024] [Indexed: 12/24/2024]
Abstract
Identifying how dependence relationships vary across different conditions plays a significant role in many scientific investigations. For example, it is important for the comparison of biological systems to see if relationships between genomic features differ between cases and controls. In this paper, we seek to evaluate whether relationships between two sets of variables are different or not across two conditions. Specifically, we assess: do two sets of high-dimensional variables have similar dependence relationships across two conditions? We propose a new kernel-based test to capture the differential dependence. Specifically, the new test determines whether two measures that detect dependence relationships are similar or not under two conditions. We introduce the asymptotic permutation null distribution of the test statistic and it is shown to work well under finite samples such that the test is computationally efficient, significantly enhancing its usability in analyzing large datasets. We demonstrate through numerical studies that our proposed test has high power for detecting differential linear and non-linear relationships. The proposed method is implemented in an R package kerDAA.
Collapse
Affiliation(s)
- Hoseung Song
- Department of Industrial and Systems Engineering, KAIST, Daejeon, Republic of Korea
| | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, U.S.A
| |
Collapse
|
36
|
Deng Q, Song C, Lin S. An adaptive and robust method for multi-trait analysis of genome-wide association studies using summary statistics. Eur J Hum Genet 2024; 32:681-690. [PMID: 37237036 PMCID: PMC11153499 DOI: 10.1038/s41431-023-01389-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 05/01/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with human traits or diseases in the past decade. Nevertheless, much of the heritability of many traits is still unaccounted for. Commonly used single-trait analysis methods are conservative, while multi-trait methods improve statistical power by integrating association evidence across multiple traits. In contrast to individual-level data, GWAS summary statistics are usually publicly available, and thus methods using only summary statistics have greater usage. Although many methods have been developed for joint analysis of multiple traits using summary statistics, there are many issues, including inconsistent performance, computational inefficiency, and numerical problems when considering lots of traits. To address these challenges, we propose a multi-trait adaptive Fisher method for summary statistics (MTAFS), a computationally efficient method with robust power performance. We applied MTAFS to two sets of brain imaging derived phenotypes (IDPs) from the UK Biobank, including a set of 58 Volumetric IDPs and a set of 212 Area IDPs. Through annotation analysis, the underlying genes of the SNPs identified by MTAFS were found to exhibit higher expression and are significantly enriched in brain-related tissues. Together with results from a simulation study, MTAFS shows its advantage over existing multi-trait methods, with robust performance across a range of underlying settings. It controls type 1 error well and can efficiently handle a large number of traits.
Collapse
Affiliation(s)
- Qiaolan Deng
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, USA
- Department of Statistics, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA
| | - Chi Song
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, USA
| | - Shili Lin
- Department of Statistics, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
37
|
Lai EY, Huang YT. Identifying pleiotropic genes via the composite test amidst the complexity of polygenic traits. Brief Bioinform 2024; 25:bbae327. [PMID: 39007593 PMCID: PMC11247409 DOI: 10.1093/bib/bbae327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/29/2024] [Accepted: 06/24/2024] [Indexed: 07/16/2024] Open
Abstract
Identifying the causal relationship between genotype and phenotype is essential to expanding our understanding of the gene regulatory network spanning the molecular level to perceptible traits. A pleiotropic gene can act as a central hub in the network, influencing multiple outcomes. Identifying such a gene involves testing under a composite null hypothesis where the gene is associated with, at most, one trait. Traditional methods such as meta-analyses of top-hit $P$-values and sequential testing of multiple traits have been proposed, but these methods fail to consider the background of genome-wide signals. Since Huang's composite test produces uniformly distributed $P$-values for genome-wide variants under the composite null, we propose a gene-level pleiotropy test that entails combining the aforementioned method with the aggregated Cauchy association test. A polygenic trait involves multiple genes with different functions to co-regulate mechanisms. We show that polygenicity should be considered when identifying pleiotropic genes; otherwise, the associations polygenic traits initiate will give rise to false positives. In this study, we constructed gene-trait functional modules using the results of the proposed pleiotropy tests. Our analysis suite was implemented as an R package PGCtest. We demonstrated the proposed method with an application study of the Taiwan Biobank database and identified functional modules comprising specific genes and their co-regulated traits.
Collapse
Affiliation(s)
- En-Yu Lai
- Institute of Statistical Science, Academia Sinica, No.128, Academia Road, Section 2, Nankang, Taipei 11529, Taiwan
| | - Yen-Tsung Huang
- Institute of Statistical Science, Academia Sinica, No.128, Academia Road, Section 2, Nankang, Taipei 11529, Taiwan
| |
Collapse
|
38
|
Zhou W, Cuomo ASE, Xue A, Kanai M, Chau G, Krishna C, Xavier RJ, MacArthur DG, Powell JE, Daly MJ, Neale BM. Efficient and accurate mixed model association tool for single-cell eQTL analysis. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.15.24307317. [PMID: 38798318 PMCID: PMC11118640 DOI: 10.1101/2024.05.15.24307317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Understanding the genetic basis of gene expression can help us understand the molecular underpinnings of human traits and disease. Expression quantitative trait locus (eQTL) mapping can help in studying this relationship but have been shown to be very cell-type specific, motivating the use of single-cell RNA sequencing and single-cell eQTLs to obtain a more granular view of genetic regulation. Current methods for single-cell eQTL mapping either rely on the "pseudobulk" approach and traditional pipelines for bulk transcriptomics or do not scale well to large datasets. Here, we propose SAIGE-QTL, a robust and scalable tool that can directly map eQTLs using single-cell profiles without needing aggregation at the pseudobulk level. Additionally, SAIGE-QTL allows for testing the effects of less frequent/rare genetic variation through set-based tests, which is traditionally excluded from eQTL mapping studies. We evaluate the performance of SAIGE-QTL on both real and simulated data and demonstrate the improved power for eQTL mapping over existing pipelines.
Collapse
|
39
|
Ashokkumar M, Mei W, Peterson JJ, Harigaya Y, Murdoch DM, Margolis DM, Kornfein C, Oesterling A, Guo Z, Rudin CD, Jiang Y, Browne EP. Integrated Single-cell Multiomic Analysis of HIV Latency Reversal Reveals Novel Regulators of Viral Reactivation. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae003. [PMID: 38902848 PMCID: PMC11189801 DOI: 10.1093/gpbjnl/qzae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 10/19/2023] [Indexed: 06/22/2024]
Abstract
Despite the success of antiretroviral therapy, human immunodeficiency virus (HIV) cannot be cured because of a reservoir of latently infected cells that evades therapy. To understand the mechanisms of HIV latency, we employed an integrated single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq) approach to simultaneously profile the transcriptomic and epigenomic characteristics of ∼ 125,000 latently infected primary CD4+ T cells after reactivation using three different latency reversing agents. Differentially expressed genes and differentially accessible motifs were used to examine transcriptional pathways and transcription factor (TF) activities across the cell population. We identified cellular transcripts and TFs whose expression/activity was correlated with viral reactivation and demonstrated that a machine learning model trained on these data was 75%-79% accurate at predicting viral reactivation. Finally, we validated the role of two candidate HIV-regulating factors, FOXP1 and GATA3, in viral transcription. These data demonstrate the power of integrated multimodal single-cell analysis to uncover novel relationships between host cell factors and HIV latency.
Collapse
Affiliation(s)
- Manickam Ashokkumar
- Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- HIV Cure Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Wenwen Mei
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jackson J Peterson
- HIV Cure Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yuriko Harigaya
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - David M Murdoch
- Department of Medicine, Duke University, Durham, NC 27708, USA
| | - David M Margolis
- Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- HIV Cure Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Caleb Kornfein
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Alex Oesterling
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Zhicheng Guo
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Cynthia D Rudin
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Yuchao Jiang
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
- Department of Biology, Texas A&M University, College Station, TX 77843, USA
- Department of Biomedical Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Edward P Browne
- Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- HIV Cure Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
40
|
Zhou RR, Zucker DM, Zhao SD. Power of testing for exposure effects under incomplete mediation. Int J Biostat 2024; 20:217-228. [PMID: 37084462 DOI: 10.1515/ijb-2022-0106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 03/25/2023] [Indexed: 04/23/2023]
Abstract
Mediation analysis studies situations where an exposure may affect an outcome both directly and indirectly through intervening variables called mediators. It is frequently of interest to test for the effect of the exposure on the outcome, and the standard approach is simply to regress the latter on the former. However, it seems plausible that a more powerful test statistic could be achieved by also incorporating the mediators. This would be useful in cases where the exposure effect size might be small, which for example is common in genomics applications. Previous work has shown that this is indeed possible under complete mediation, where there is no direct effect. In most applications, however, the direct effect is likely nonzero. In this paper we study linear mediation models and find that under certain conditions, power gain is still possible under this incomplete mediation setting for testing the null hypothesis that there is neither a direct nor an indirect effect. We study a class of procedures that can achieve this performance and develop their application to both low- and high-dimensional mediators. We then illustrate their performances in simulations as well as in an analysis using DNA methylation mediators to study the effect of cigarette smoking on gene expression.
Collapse
Affiliation(s)
| | - David M Zucker
- Department of Statistics and Data Science, Hebrew University, Jerusalem, Israel
| | - Sihai D Zhao
- Department of Statistics, University of Illinois Urbana-Champaign, Champaign, IL, USA
| |
Collapse
|
41
|
Bass AJ, Bian S, Wingo AP, Wingo TS, Cutler DJ, Epstein MP. Identifying latent genetic interactions in genome-wide association studies using multiple traits. Genome Med 2024; 16:62. [PMID: 38664839 PMCID: PMC11044415 DOI: 10.1186/s13073-024-01329-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/02/2024] [Indexed: 04/28/2024] Open
Abstract
The "missing" heritability of complex traits may be partly explained by genetic variants interacting with other genes or environments that are difficult to specify, observe, and detect. We propose a new kernel-based method called Latent Interaction Testing (LIT) to screen for genetic interactions that leverages pleiotropy from multiple related traits without requiring the interacting variable to be specified or observed. Using simulated data, we demonstrate that LIT increases power to detect latent genetic interactions compared to univariate methods. We then apply LIT to obesity-related traits in the UK Biobank and detect variants with interactive effects near known obesity-related genes (URL: https://CRAN.R-project.org/package=lit ).
Collapse
Affiliation(s)
- Andrew J Bass
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA.
| | - Shijia Bian
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA
| | - Aliza P Wingo
- Department of Psychiatry, Emory University, Atlanta, GA, 30322, USA
| | - Thomas S Wingo
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA
- Department of Neurology, Emory University, Atlanta, GA, 30322, USA
| | - David J Cutler
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA
| | - Michael P Epstein
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA.
| |
Collapse
|
42
|
He R, Liu M, Lin Z, Zhuang Z, Shen X, Pan W. DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies. Biostatistics 2024; 25:468-485. [PMID: 36610078 PMCID: PMC11017120 DOI: 10.1093/biostatistics/kxac051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 12/08/2022] [Accepted: 12/14/2022] [Indexed: 01/09/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.
Collapse
Affiliation(s)
- Ruoyu He
- Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, 420 Delaware Street SE, Minneapolis, MN 55455 and School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church St SE, Minneapolis, MN 55455
| | - Mingyang Liu
- Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, 420 Delaware Street SE, Minneapolis, MN 55455 and School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church St SE, Minneapolis, MN 55455
| | - Zhaotong Lin
- Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, 420 Delaware Street SE, Minneapolis, MN 55455 and School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church St SE, Minneapolis, MN 55455
| | - Zhong Zhuang
- Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, 420 Delaware Street SE, Minneapolis, MN 55455 and School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church St SE, Minneapolis, MN 55455
| | - Xiaotong Shen
- Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, 420 Delaware Street SE, Minneapolis, MN 55455 and School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church St SE, Minneapolis, MN 55455
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, 420 Delaware Street SE, Minneapolis, MN 55455 and School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church St SE, Minneapolis, MN 55455
| |
Collapse
|
43
|
Luo L, Mehrotra DV, Shen J, Tang ZZ. Multi-trait analysis of gene-by-environment interactions in large-scale genetic studies. Biostatistics 2024; 25:504-520. [PMID: 36897773 DOI: 10.1093/biostatistics/kxad004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 02/15/2023] [Accepted: 02/22/2023] [Indexed: 03/11/2023] Open
Abstract
Identifying genotype-by-environment interaction (GEI) is challenging because the GEI analysis generally has low power. Large-scale consortium-based studies are ultimately needed to achieve adequate power for identifying GEI. We introduce Multi-Trait Analysis of Gene-Environment Interactions (MTAGEI), a powerful, robust, and computationally efficient framework to test gene-environment interactions on multiple traits in large data sets, such as the UK Biobank (UKB). To facilitate the meta-analysis of GEI studies in a consortium, MTAGEI efficiently generates summary statistics of genetic associations for multiple traits under different environmental conditions and integrates the summary statistics for GEI analysis. MTAGEI enhances the power of GEI analysis by aggregating GEI signals across multiple traits and variants that would otherwise be difficult to detect individually. MTAGEI achieves robustness by combining complementary tests under a wide spectrum of genetic architectures. We demonstrate the advantages of MTAGEI over existing single-trait-based GEI tests through extensive simulation studies and the analysis of the whole exome sequencing data from the UKB.
Collapse
Affiliation(s)
- Lan Luo
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 330 N Orchard St, Madison, WI 53715, USA
| |
Collapse
|
44
|
Cao H, Jia C, Li Z, Yang H, Fang R, Zhang Y, Cui Y. wMKL: multi-omics data integration enables novel cancer subtype identification via weight-boosted multi-kernel learning. Br J Cancer 2024; 130:1001-1012. [PMID: 38278975 PMCID: PMC10951206 DOI: 10.1038/s41416-024-02587-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/09/2024] [Accepted: 01/15/2024] [Indexed: 01/28/2024] Open
Abstract
BACKGROUND Cancer is a heterogeneous disease driven by complex molecular alterations. Cancer subtypes determined from multi-omics data can provide novel insight into personalised precision treatment. It is recognised that incorporating prior weight knowledge into multi-omics data integration can improve disease subtyping. METHODS We develop a weighted method, termed weight-boosted Multi-Kernel Learning (wMKL) which incorporates heterogeneous data types as well as flexible weight functions, to boost subtype identification. Given a series of weight functions, we propose an omnibus combination strategy to integrate different weight-related P-values to improve subtyping precision. RESULTS wMKL models each data type with multiple kernel choices, thus alleviating the sensitivity and robustness issue due to selecting kernel parameters. Furthermore, wMKL integrates different data types by learning weights of different kernels derived from each data type, recognising the heterogeneous contribution of different data types to the final subtyping performance. The proposed wMKL outperforms existing weighted and non-weighted methods. The utility and advantage of wMKL are illustrated through extensive simulations and applications to two TCGA datasets. Novel subtypes are identified followed by extensive downstream bioinformatics analysis to understand the molecular mechanisms differentiating different subtypes. CONCLUSIONS The proposed wMKL method provides a novel strategy for disease subtyping. The wMKL is freely available at https://github.com/biostatcao/wMKL .
Collapse
Affiliation(s)
- Hongyan Cao
- Division of Health Statistics, Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, 030001, Taiyuan, Shanxi, China
- MOE Key Laboratory of Coal Environmental Pathogenicity and Prevention, Shanxi Medical University, 030001, Taiyuan, Shanxi, China
- Division of Mathematics, School of Basic Medical Science, Shanxi Medical University, 030001, Taiyuan, Shanxi, China
| | - Congcong Jia
- Division of Health Statistics, Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, 030001, Taiyuan, Shanxi, China
| | - Zhi Li
- Department of Hematology, Taiyuan Central Hospital of Shanxi Medical University, 030001, Taiyuan, Shanxi, China
| | - Haitao Yang
- Division of Health Statistics, School of Public Health, Hebei Medical University, 050017, Shijiazhuang, China
| | - Ruiling Fang
- Division of Health Statistics, Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, 030001, Taiyuan, Shanxi, China
| | - Yanbo Zhang
- Division of Health Statistics, Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, 030001, Taiyuan, Shanxi, China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
45
|
Melton HJ, Zhang Z, Wu C. SUMMIT-FA: a new resource for improved transcriptome imputation using functional annotations. Hum Mol Genet 2024; 33:624-635. [PMID: 38129112 PMCID: PMC10954367 DOI: 10.1093/hmg/ddad205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/24/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), which improves gene expression prediction accuracy by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models in whole blood using SUMMIT-FA with the comprehensive functional database MACIE and eQTL summary-level data from the eQTLGen consortium. We apply these models to GWAS for 24 complex traits and show that SUMMIT-FA identifies significantly more gene-trait associations and improves predictive power for identifying "silver standard" genes compared to several benchmark methods. We further conduct a simulation study to demonstrate the effectiveness of SUMMIT-FA.
Collapse
Affiliation(s)
- Hunter J Melton
- Department of Statistics, Florida State University, 214 Rogers Building, 117 N. Woodward Avenue, Tallahassee, FL 32306, United States
| | - Zichen Zhang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Unit 1689, Houston, TX 77030, United States
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Unit 1689, Houston, TX 77030, United States
| |
Collapse
|
46
|
Venkatesh SS, Wittemans LBL, Palmer DS, Baya NA, Ferreira T, Hill B, Lassen FH, Parker MJ, Reibe S, Elhakeem A, Banasik K, Bruun MT, Erikstrup C, Jensen BA, Juul A, Mikkelsen C, Nielsen HS, Ostrowski SR, Pedersen OB, Rohde PD, Sorensen E, Ullum H, Westergaard D, Haraldsson A, Holm H, Jonsdottir I, Olafsson I, Steingrimsdottir T, Steinthorsdottir V, Thorleifsson G, Figueredo J, Karjalainen MK, Pasanen A, Jacobs BM, Hubers N, Lippincott M, Fraser A, Lawlor DA, Timpson NJ, Nyegaard M, Stefansson K, Magi R, Laivuori H, van Heel DA, Boomsma DI, Balasubramanian R, Seminara SB, Chan YM, Laisk T, Lindgren CM. Genome-wide analyses identify 21 infertility loci and over 400 reproductive hormone loci across the allele frequency spectrum. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.19.24304530. [PMID: 38562841 PMCID: PMC10984039 DOI: 10.1101/2024.03.19.24304530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Genome-wide association studies (GWASs) may help inform treatments for infertility, whose causes remain unknown in many cases. Here we present GWAS meta-analyses across six cohorts for male and female infertility in up to 41,200 cases and 687,005 controls. We identified 21 genetic risk loci for infertility (P≤5E-08), of which 12 have not been reported for any reproductive condition. We found positive genetic correlations between endometriosis and all-cause female infertility (rg=0.585, P=8.98E-14), and between polycystic ovary syndrome and anovulatory infertility (rg=0.403, P=2.16E-03). The evolutionary persistence of female infertility-risk alleles in EBAG9 may be explained by recent directional selection. We additionally identified up to 269 genetic loci associated with follicle-stimulating hormone (FSH), luteinising hormone, oestradiol, and testosterone through sex-specific GWAS meta-analyses (N=6,095-246,862). While hormone-associated variants near FSHB and ARL14EP colocalised with signals for anovulatory infertility, we found no rg between female infertility and reproductive hormones (P>0.05). Exome sequencing analyses in the UK Biobank (N=197,340) revealed that women carrying testosterone-lowering rare variants in GPC2 were at higher risk of infertility (OR=2.63, P=1.25E-03). Taken together, our results suggest that while individual genes associated with hormone regulation may be relevant for fertility, there is limited genetic evidence for correlation between reproductive hormones and infertility at the population level. We provide the first comprehensive view of the genetic architecture of infertility across multiple diagnostic criteria in men and women, and characterise its relationship to other health conditions.
Collapse
Affiliation(s)
- Samvida S Venkatesh
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, United Kingdom
| | - Laura B L Wittemans
- Novo Nordisk Research Centre Oxford, Oxford, United Kingdom
- Nuffield Department of Women's and Reproductive Health, Medical Sciences Division, University of Oxford, United Kingdom
| | - Duncan S Palmer
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
- Nuffield Department of Population Health, Medical Sciences Division, University of Oxford, Oxford, United Kingdom
| | - Nikolas A Baya
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, United Kingdom
| | - Teresa Ferreira
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
| | - Barney Hill
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
- Nuffield Department of Population Health, Medical Sciences Division, University of Oxford, Oxford, United Kingdom
| | - Frederik Heymann Lassen
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, United Kingdom
| | - Melody J Parker
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
- Nuffield Department of Clinical Medicine, University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom
| | - Saskia Reibe
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
- Nuffield Department of Population Health, Medical Sciences Division, University of Oxford, Oxford, United Kingdom
| | - Ahmed Elhakeem
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom
- Population Health Science, Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Karina Banasik
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
- Department of Obstetrics and Gynecology, Copenhagen University Hospital, Hvidovre, Copenhagen, Denmark
| | - Mie T Bruun
- Department of Clinical Immunology, Odense University Hospital, Odense, Denmark
| | - Christian Erikstrup
- Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark
- Department of Clinical Medicine, Health, Aarhus University, Aarhus, Denmark
| | - Bitten A Jensen
- Department of Clinical Immunology, Aalborg University Hospital, Aalborg, Denmark
| | - Anders Juul
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen; Copenhagen, Denmark
- Department of Growth and Reproduction, Copenhagen University Hospital-Rigshospitalet, Copenhagen, Denmark
| | - Christina Mikkelsen
- Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Science, Copenhagen University, Copenhagen, Denmark
| | - Henriette S Nielsen
- Department of Obstetrics and Gynecology, The Fertility Clinic, Hvidovre University Hospital, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Sisse R Ostrowski
- Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Ole B Pedersen
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Clinical Immunology, Zealand University Hospital, Kge, Denmark
| | - Palle D Rohde
- Genomic Medicine, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Erik Sorensen
- Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | | | - David Westergaard
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
- Department of Obstetrics and Gynecology, Copenhagen University Hospital, Hvidovre, Copenhagen, Denmark
| | - Asgeir Haraldsson
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
- Children's Hospital Iceland, Landspitali University Hospital, Reykjavik, Iceland
| | - Hilma Holm
- deCODE genetics/Amgen, Inc., Reykjavik, Iceland
| | - Ingileif Jonsdottir
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
- deCODE genetics/Amgen, Inc., Reykjavik, Iceland
| | - Isleifur Olafsson
- Department of Clinical Biochemistry, Landspitali University Hospital, Reykjavik, Iceland
| | - Thora Steingrimsdottir
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
- Department of Obstetrics and Gynecology, Landspitali University Hospital, Reykjavik, Iceland
| | | | | | - Jessica Figueredo
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Minna K Karjalainen
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
- Research Unit of Population Health, Faculty of Medicine, University of Oulu, Finland
- Northern Finland Birth Cohorts, Arctic Biobank, Infrastructure for Population Studies, Faculty of Medicine, University of Oulu, Oulu, Finland
| | - Anu Pasanen
- Research Unit of Clinical Medicine, Medical Research Center Oulu, University of Oulu, and Department of Children and Adolescents, Oulu University Hospital, Oulu, Finland
| | - Benjamin M Jacobs
- Centre for Preventive Neurology, Wolfson Institute of Population Health, Queen Mary University London, London, EC1M 6BQ, United Kingdom
| | - Nikki Hubers
- Department of Biological Psychology, Netherlands Twin Register, Vrije Universiteit, Amsterdam, The Netherlands
- Amsterdam Reproduction and Development Institute, Amsterdam, The Netherlands
| | - Margaret Lippincott
- Harvard Reproductive Sciences Center and Reproductive Endocrine Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Abigail Fraser
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom
- Population Health Science, Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Deborah A Lawlor
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom
- Population Health Science, Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Nicholas J Timpson
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom
- Population Health Science, Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Mette Nyegaard
- Genomic Medicine, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Kari Stefansson
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
- deCODE genetics/Amgen, Inc., Reykjavik, Iceland
| | - Reedik Magi
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Hannele Laivuori
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
- Medical and Clinical Genetics, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Department of Obstetrics and Gynecology, Tampere University Hospital, Finland
- Center for Child, Adolescent, and Maternal Health Research, Faculty of Medicine and Health Technology, Tampere University, Finland
| | - David A van Heel
- Blizard Institute, Queen Mary University London, London, E1 2AT, United Kingdom
| | - Dorret I Boomsma
- Department of Biological Psychology, Netherlands Twin Register, Vrije Universiteit, Amsterdam, The Netherlands
- Amsterdam Reproduction and Development Institute, Amsterdam, The Netherlands
| | - Ravikumar Balasubramanian
- Harvard Reproductive Sciences Center and Reproductive Endocrine Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Stephanie B Seminara
- Harvard Reproductive Sciences Center and Reproductive Endocrine Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Yee-Ming Chan
- Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Endocrinology, Department of Pediatrics, Boston Children's Hospital, Boston, Massachusetts, United States of America
| | - Triin Laisk
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Cecilia M Lindgren
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, United Kingdom
- Nuffield Department of Women's and Reproductive Health, Medical Sciences Division, University of Oxford, United Kingdom
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| |
Collapse
|
47
|
Samorodnitsky S, Campbell K, Ribas A, Wu MC. A Spatial Omnibus Test (SPOT) for Spatial Proteomic Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.08.584117. [PMID: 38559053 PMCID: PMC10979932 DOI: 10.1101/2024.03.08.584117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Spatial proteomics can reveal the spatial organization of immune cells in the tumor immune microenvironment. Relating measures of spatial clustering, such as Ripley's K or Besag's L, to patient outcomes may offer important clinical insights. However, these measures require pre-specifying a radius in which to quantify clustering, yet no consensus exists on the optimal radius which may be context-specific. We propose a SPatial Omnibus Test (SPOT) which conducts this analysis across a range of candidate radii. At each radius, SPOT evaluates the association between the spatial summary and outcome, adjusting for confounders. SPOT then aggregates results across radii using the Cauchy combination test, yielding an omnibus p-value characterizing the overall degree of association. Using simulations, we verify that the type I error rate is controlled and show SPOT can be more powerful than alternatives. We also apply SPOT to an ovarian cancer study. An R package and tutorial is provided at https://github.com/sarahsamorodnitsky/SPOT.
Collapse
Affiliation(s)
- Sarah Samorodnitsky
- Public Health Sciences Division, Fred Hutch Cancer Center
- SWOG Statistics and Data Management Center
| | - Katie Campbell
- Medicine, Division of Hematology/Oncology, University of California Los Angeles
| | - Antoni Ribas
- Medicine, Division of Hematology/Oncology, University of California Los Angeles
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutch Cancer Center
- SWOG Statistics and Data Management Center
| |
Collapse
|
48
|
Nazeen S, Wang X, Zielinski D, Lam I, Hallacli E, Xu P, Ethier E, Strom R, Zanella CA, Nithianandam V, Ritter D, Henderson A, Saurat N, Afroz J, Nutter-Upham A, Benyamini H, Copty J, Ravishankar S, Morrow A, Mitchel J, Neavin D, Gupta R, Farbehi N, Grundman J, Myers RH, Scherzer CR, Trojanowski JQ, Van Deerlin VM, Cooper AA, Lee EB, Erlich Y, Lindquist S, Peng J, Geschwind DH, Powell J, Studer L, Feany MB, Sunyaev SR, Khurana V. Deep sequencing of proteotoxicity modifier genes uncovers a Presenilin-2/beta-amyloid-actin genetic risk module shared among alpha-synucleinopathies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.03.583145. [PMID: 38496508 PMCID: PMC10942362 DOI: 10.1101/2024.03.03.583145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Whether neurodegenerative diseases linked to misfolding of the same protein share genetic risk drivers or whether different protein-aggregation pathologies in neurodegeneration are mechanistically related remains uncertain. Conventional genetic analyses are underpowered to address these questions. Through careful selection of patients based on protein aggregation phenotype (rather than clinical diagnosis) we can increase statistical power to detect associated variants in a targeted set of genes that modify proteotoxicities. Genetic modifiers of alpha-synuclein (ɑS) and beta-amyloid (Aβ) cytotoxicity in yeast are enriched in risk factors for Parkinson's disease (PD) and Alzheimer's disease (AD), respectively. Here, along with known AD/PD risk genes, we deeply sequenced exomes of 430 ɑS/Aβ modifier genes in patients across alpha-synucleinopathies (PD, Lewy body dementia and multiple system atrophy). Beyond known PD genes GBA1 and LRRK2, rare variants AD genes (CD33, CR1 and PSEN2) and Aβ toxicity modifiers involved in RhoA/actin cytoskeleton regulation (ARGHEF1, ARHGEF28, MICAL3, PASK, PKN2, PSEN2) were shared risk factors across synucleinopathies. Actin pathology occurred in iPSC synucleinopathy models and RhoA downregulation exacerbated ɑS pathology. Even in sporadic PD, the expression of these genes was altered across CNS cell types. Genome-wide CRISPR screens revealed the essentiality of PSEN2 in both human cortical and dopaminergic neurons, and PSEN2 mutation carriers exhibited diffuse brainstem and cortical synucleinopathy independent of AD pathology. PSEN2 contributes to a common-risk signal in PD GWAS and regulates ɑS expression in neurons. Our results identify convergent mechanisms across synucleinopathies, some shared with AD.
Collapse
Affiliation(s)
- Sumaiya Nazeen
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Xinyuan Wang
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Dina Zielinski
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Whitehead Institute of Biomedical Research, Cambridge, MA, USA
| | - Isabel Lam
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Erinc Hallacli
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Ping Xu
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Elizabeth Ethier
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Ronya Strom
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Camila A Zanella
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Vanitha Nithianandam
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Dylan Ritter
- The Center for Stem Cell Biology, Sloan-Kettering Institute for Cancer Research, New York, NY, USA
| | - Alexander Henderson
- The Center for Stem Cell Biology, Sloan-Kettering Institute for Cancer Research, New York, NY, USA
| | - Nathalie Saurat
- The Center for Stem Cell Biology, Sloan-Kettering Institute for Cancer Research, New York, NY, USA
| | - Jalwa Afroz
- The Center for Stem Cell Biology, Sloan-Kettering Institute for Cancer Research, New York, NY, USA
| | | | - Hadar Benyamini
- Whitehead Institute of Biomedical Research, Cambridge, MA, USA
| | - Joseph Copty
- Garvan Institute of Medical Research, Sydney, NSW, Australia
| | | | - Autumn Morrow
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Jonathan Mitchel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Health Sciences & Technology, Harvard Medical School & Massachusetts Institute of Technology, Boston, MA
| | - Drew Neavin
- Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Renuka Gupta
- Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Nona Farbehi
- Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Jennifer Grundman
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Richard H Myers
- Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Clemens R Scherzer
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - John Q Trojanowski
- Center for Neurodegenerative Disease Research, University of Pennsylvania, Philadelphia, PA, USA
| | - Vivianna M Van Deerlin
- Center for Neurodegenerative Disease Research, University of Pennsylvania, Philadelphia, PA, USA
| | - Antony A Cooper
- Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Edward B Lee
- Center for Neurodegenerative Disease Research, University of Pennsylvania, Philadelphia, PA, USA
| | - Yaniv Erlich
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Susan Lindquist
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Jian Peng
- Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Daniel H Geschwind
- Center for Autism Research and Treatment, Semel Institute, Program in Neurogenetics, Department of Neurology and Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Joseph Powell
- Garvan Institute of Medical Research, Sydney, NSW, Australia
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Lorenz Studer
- The Center for Stem Cell Biology, Sloan-Kettering Institute for Cancer Research, New York, NY, USA
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mel B Feany
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vikram Khurana
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Stem Cell Institute, Cambridge, MA, USA
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| |
Collapse
|
49
|
Acharya S, Liao S, Jung WJ, Kang YS, Moghaddam VA, Feitosa M, Wojczynski M, Lin S, Anema JA, Schwander K, Connell JO, Province M, Brent MR. Multi-omics Integration Identifies Genes Influencing Traits Associated with Cardiovascular Risks: The Long Life Family Study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.04.24303657. [PMID: 38496585 PMCID: PMC10942516 DOI: 10.1101/2024.03.04.24303657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
The Long Life Family Study (LLFS) enrolled 4,953 participants in 539 pedigrees displaying exceptional longevity. To identify genetic mechanisms that affect cardiovascular risks in the LLFS population, we developed a multi-omics integration pipeline and applied it to 11 traits associated with cardiovascular risks. Using our pipeline, we aggregated gene-level statistics from rare-variant analysis, GWAS, and gene expression-trait association by Correlated Meta-Analysis (CMA). Across all traits, CMA identified 64 significant genes after Bonferroni correction (p ≤ 2.8×10-7), 29 of which replicated in the Framingham Heart Study (FHS) cohort. Notably, 20 of the 29 replicated genes do not have a previously known trait-associated variant in the GWAS Catalog within 50 kb. Thirteen modules in Protein-Protein Interaction (PPI) networks are significantly enriched in genes with low meta-analysis p-values for at least one trait, three of which are replicated in the FHS cohort. The functional annotation of genes in these modules showed a significant over-representation of trait-related biological processes including sterol transport, protein-lipid complex remodeling, and immune response regulation. Among major findings, our results suggest a role of triglyceride-associated and mast-cell functional genes FCER1A, MS4A2, GATA2, HDC, and HRH4 in atherosclerosis risks. Our findings also suggest that lower expression of ATG2A, a gene we found to be associated with BMI, may be both a cause and consequence of obesity. Finally, our results suggest that ENPP3 may play an intermediary role in triglyceride-induced inflammation. Our pipeline is freely available and implemented in the Nextflow workflow language, making it easily runnable on any compute platform (https://nf-co.re/omicsgenetraitassociation).
Collapse
Affiliation(s)
- Sandeep Acharya
- Division of Computational and Data Sciences, Washington University, St Louis, MO
| | - Shu Liao
- Department of Computer Science and Engineering, Washington University, St Louis, MO
| | - Wooseok J Jung
- Department of Computer Science and Engineering, Washington University, St Louis, MO
| | - Yu S Kang
- Department of Computer Science and Engineering, Washington University, St Louis, MO
| | - Vaha A Moghaddam
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO
| | - Mary Feitosa
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO
| | - Mary Wojczynski
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO
| | - Shiow Lin
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO
| | - Jason A Anema
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO
| | - Karen Schwander
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO
| | - Jeff O Connell
- Department of Medicine, University of Maryland, Baltimore, MD
| | - Mike Province
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO
| | - Michael R Brent
- Department of Computer Science and Engineering, Washington University, St Louis, MO
| |
Collapse
|
50
|
Sulaiman D, Wu D, Black LP, Williams KJ, Graim K, Datta S, Reddy ST, Guirgis FW. Lipidomic changes in a novel sepsis outcome-based analysis reveals potent pro-inflammatory and pro-resolving signaling lipids. Clin Transl Sci 2024; 17:e13745. [PMID: 38488489 PMCID: PMC10941572 DOI: 10.1111/cts.13745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 12/20/2023] [Accepted: 01/21/2024] [Indexed: 03/18/2024] Open
Abstract
The purpose of this study was to investigate changes in the lipidome of patients with sepsis to identify signaling lipids associated with poor outcomes that could be linked to future therapies. Adult patients with sepsis were enrolled within 24h of sepsis recognition. Patients meeting Sepsis-3 criteria were enrolled from the emergency department or intensive care unit and blood samples were obtained. Clinical data were collected and outcomes of rapid recovery, chronic critical illness (CCI), or early death were adjudicated by clinicians. Lipidomic analysis was performed on two platforms, the Sciex™ 5500 device to perform a lipidomic screen of 1450 lipid species and a targeted signaling lipid panel using liquid-chromatography tandem mass spectrometry. For the lipidomic screen, there were 274 patients with sepsis: 192 with rapid recovery, 47 with CCI, and 35 with early deaths. CCI and early death patients were grouped together for analysis. Fatty acid (FA) 12:0 was decreased in CCI/early death, whereas FA 17:0 and 20:1 were elevated in CCI/early death, compared to rapid recovery patients. For the signaling lipid panel analysis, there were 262 patients with sepsis: 189 with rapid recovery, 45 with CCI, and 28 with early death. Pro-inflammatory signaling lipids from ω-6 poly-unsaturated fatty acids (PUFAs), including 15-hydroxyeicosatetraenoic (HETE), 12-HETE, and 11-HETE (oxidation products of arachidonic acid [AA]) were elevated in CCI/early death patients compared to rapid recovery. The pro-resolving lipid mediator from ω-3 PUFAs, 14(S)-hydroxy docosahexaenoic acid (14S-HDHA), was also elevated in CCI/early death compared to rapid recovery. Signaling lipids of the AA pathway were elevated in poor-outcome patients with sepsis and may serve as targets for future therapies.
Collapse
Affiliation(s)
- Dawoud Sulaiman
- Division of Cardiology, Department of MedicineDavid Geffen School of Medicine at UCLALos AngelesCaliforniaUSA
| | - Dongyuan Wu
- Department of BiostatisticsUniversity of FloridaGainesvilleFloridaUSA
| | | | - Kevin J. Williams
- Department of Biological ChemistryDavid Geffen School of Medicine at UCLALos AngelesCaliforniaUSA
- UCLA Lipidomics LabLos AngelesCaliforniaUSA
| | - Kiley Graim
- Computer and Information Science and EngineeringUniversity of FloridaGainesvilleFloridaUSA
| | - Susmita Datta
- Department of BiostatisticsUniversity of FloridaGainesvilleFloridaUSA
| | - Srinivasa T. Reddy
- Division of Cardiology, Department of MedicineDavid Geffen School of Medicine at UCLALos AngelesCaliforniaUSA
| | - Faheem W. Guirgis
- Department of Emergency MedicineUniversity of Florida College of MedicineGainesvilleFloridaUSA
| |
Collapse
|