1
|
Lona-Durazo F, Omachi K, Fermin D, Eichinger F, Troost JP, Lin MH, Dinsmore IR, Mirshahi T, Chang AR, Miner JH, Paterson AD, Barua M, Gagliano Taliun SA. Association of Genetically Predicted Skipping of COL4A4 Exon 27 with Hematuria and Albuminuria. J Am Soc Nephrol 2024:00001751-990000000-00408. [PMID: 39190490 DOI: 10.1681/asn.0000000000000480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 08/22/2024] [Indexed: 08/29/2024] Open
Abstract
Background:
Hematuria is an established sign of glomerular disease and can be associated with kidney failure, but there has been limited scientific study of this trait.
Methods:
Here, we combined genetic data from the UK Biobank with predicted gene expression and splicing from GTEx kidney cortex samples (n = 65) in a transcriptome-wide association study (TWAS) to identify additional potential biological mechanisms influencing hematuria.
Results:
The TWAS using kidney cortex identified significant associations for 5 genes in terms of expression and 3 significant splicing events. Notably, we identified an association between the skipping of COL4A4 exon 27, which is genetically predicted by intronic rs11898094 (minor allele frequency 13%), and hematuria. Association between this variant was also found with urinary albumin excretion. We found independent evidence supporting the same variant predicting this skipping event in glomeruli-derived mRNA transcriptomics data (n = 245) from NEPTUNE. The functional significance of loss of exon 27 was demonstrated using the split NanoLuc-based α3α4α5(IV) heterotrimer assay, in which type IV collagen heterotrimer formation was quantified by luminescence. The causal splicing variant for this skipping event is yet to be identified.
Conclusions:
In summary, by integrating multiple data types, we identify a potential splicing event associated with hematuria and albuminuria.
Collapse
Affiliation(s)
- Frida Lona-Durazo
- Montreal Heart Institute, Montreal, Quebec, Canada
- Faculty of Medicine, Université de Montréal, Montreal, Quebec, Canada
| | - Kohei Omachi
- Division of Nephrology, Washington University School of Medicine, St. Louis, Missouri
- Department of Molecular Medicine, Graduate School of Pharmaceutical Sciences, Kumamoto University, Kumamoto, Japan
| | - Damian Fermin
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Felix Eichinger
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Jonathan P Troost
- Michigan Institute for Clinical and Health Research, University of Michigan, Ann Arbor, Michigan
| | - Meei-Hua Lin
- Division of Nephrology, Washington University School of Medicine, St. Louis, Missouri
| | - Ian R Dinsmore
- Department of Genomic Health, Geisinger, Danville, Pennsylvania
| | - Tooraj Mirshahi
- Department of Genomic Health, Geisinger, Danville, Pennsylvania
| | - Alexander R Chang
- Department of Population Health Sciences, Center for Kidney Health Research, Geisinger, Danville, Pennsylvania
- Department of Nephrology, Geisinger, Danville, Pennsylvania
| | - Jeffrey H Miner
- Division of Nephrology, Washington University School of Medicine, St. Louis, Missouri
| | - Andrew D Paterson
- Divisions of Epidemiology and Biostatistics, Dalla Lana School of Public Health, Toronto, Ontario, Canada
- Genetics and Genome Biology, Research Institute at The Hospital for Sick Children, Toronto, Ontario, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Moumita Barua
- Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada
- Division of Nephrology, University Health Network, Toronto, Ontario, Canada
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- Toronto General Hospital Research Institute, Toronto, Ontario, Canada
| | - Sarah A Gagliano Taliun
- Montreal Heart Institute, Montreal, Quebec, Canada
- Department of Medicine, Université de Montréal, Montreal, Quebec, Canada
- Department of Neurosciences, Université de Montréal, Montreal, Quebec, Canada
| |
Collapse
|
2
|
Song S, Wang L, Hou L, Liu JS. Partitioning and aggregating cross-tissue and tissue-specific genetic effects to identify gene-trait associations. Nat Commun 2024; 15:5769. [PMID: 38982044 PMCID: PMC11233643 DOI: 10.1038/s41467-024-49924-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 06/25/2024] [Indexed: 07/11/2024] Open
Abstract
TWAS have shown great promise in extending GWAS loci to a functional understanding of disease mechanisms. In an effort to fully unleash the TWAS and GWAS information, we propose MTWAS, a statistical framework that partitions and aggregates cross-tissue and tissue-specific genetic effects in identifying gene-trait associations. We introduce a non-parametric imputation strategy to augment the inaccessible tissues, accommodating complex interactions and non-linear expression data structures across various tissues. We further classify eQTLs into cross-tissue eQTLs and tissue-specific eQTLs via a stepwise procedure based on the extended Bayesian information criterion, which is consistent under high-dimensional settings. We show that MTWAS significantly improves the prediction accuracy across all 47 tissues of the GTEx dataset, compared with other single-tissue and multi-tissue methods, such as PrediXcan, TIGAR, and UTMOST. Applying MTWAS to the DICE and OneK1K datasets with bulk and single-cell RNA sequencing data on immune cell types showcases consistent improvements in prediction accuracy. MTWAS also identifies more predictable genes, and the improvement can be replicated with independent studies. We apply MTWAS to 84 UK Biobank GWAS studies, which provides insights into disease etiology.
Collapse
Affiliation(s)
- Shuang Song
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Lijun Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Lin Hou
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China.
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
3
|
Li JL, McClellan JC, Zhang H, Gao G, Huo D. Multi-tissue transcriptome-wide association studies identified 235 genes for intrinsic subtypes of breast cancer. J Natl Cancer Inst 2024; 116:1105-1115. [PMID: 38400758 PMCID: PMC11223833 DOI: 10.1093/jnci/djae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/25/2024] [Accepted: 02/20/2024] [Indexed: 02/26/2024] Open
Abstract
BACKGROUND Although genome-wide association studies (GWAS) of breast cancer (BC) identified common variants which differ between intrinsic subtypes, genes through which these variants act to impact BC risk have not been fully established. Transcriptome-wide association studies (TWAS) have identified genes associated with overall BC risk, but subtype-specific differences are largely unknown. METHODS We performed two multi-tissue TWAS for each BC intrinsic subtype, including an expression-based approach that collated TWAS signals from expression quantitative trait loci (eQTLs) across multiple tissues and a novel splicing-based approach that collated signals from splicing QTLs (sQTLs) across intron clusters and subsequently across tissues. We used summary statistics for five intrinsic subtypes including Luminal A-like, Luminal B-like, Luminal B/HER2-negative-like, HER2-enriched-like, and triple-negative BC, generated from 106 278 BC cases and 91 477 controls in the Breast Cancer Association Consortium. RESULTS Overall, we identified 235 genes in 88 loci that were associated with at least one of the five intrinsic subtypes. Most genes were subtype-specific, and many have not been reported in previous TWAS. We discovered common variants that modulate expression of CHEK2 confer increased risk to Luminal A-like BC, in contrast to the viewpoint that CHEK2 primarily harbors rare, penetrant mutations. Additionally, our splicing-based TWAS provided population-level support for MDM4 splice variants that increased the risk of triple-negative BC. CONCLUSION Our comprehensive, multi-tissue TWAS corroborated previous GWAS loci for overall BC risk and intrinsic subtypes, while underscoring how common variation that impacts expression and splicing of genes in multiple tissue types can be used to further elucidate the etiology of BC.
Collapse
Affiliation(s)
- James L Li
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Julian C McClellan
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Guimin Gao
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Dezheng Huo
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, IL, USA
| |
Collapse
|
4
|
Gao G, McClellan J, Barbeira AN, Fiorica PN, Li JL, Mu Z, Olopade OI, Huo D, Im HK. A multi-tissue, splicing-based joint transcriptome-wide association study identifies susceptibility genes for breast cancer. Am J Hum Genet 2024; 111:1100-1113. [PMID: 38733992 PMCID: PMC11179262 DOI: 10.1016/j.ajhg.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 04/13/2024] [Accepted: 04/15/2024] [Indexed: 05/13/2024] Open
Abstract
Splicing-based transcriptome-wide association studies (splicing-TWASs) of breast cancer have the potential to identify susceptibility genes. However, existing splicing-TWASs test the association of individual excised introns in breast tissue only and thus have limited power to detect susceptibility genes. In this study, we performed a multi-tissue joint splicing-TWAS that integrated splicing-TWAS signals of multiple excised introns in each gene across 11 tissues that are potentially relevant to breast cancer risk. We utilized summary statistics from a meta-analysis that combined genome-wide association study (GWAS) results of 424,650 women of European ancestry. Splicing-level prediction models were trained in GTEx (v.8) data. We identified 240 genes by the multi-tissue joint splicing-TWAS at the Bonferroni-corrected significance level; in the tissue-specific splicing-TWAS that combined TWAS signals of excised introns in genes in breast tissue only, we identified nine additional significant genes. Of these 249 genes, 88 genes in 62 loci have not been reported by previous TWASs, and 17 genes in seven loci are at least 1 Mb away from published GWAS index variants. By comparing the results of our splicing-TWASs with previous gene-expression-based TWASs that used the same summary statistics and expression prediction models trained in the same reference panel, we found that 110 genes in 70 loci that are identified only by the splicing-TWASs. Our results showed that for many genes, expression quantitative trait loci (eQTL) did not show a significant impact on breast cancer risk, whereas splicing quantitative trait loci (sQTL) showed a strong impact through intron excision events.
Collapse
Affiliation(s)
- Guimin Gao
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Julian McClellan
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Alvaro N Barbeira
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Peter N Fiorica
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - James L Li
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Zepeng Mu
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Olufunmilayo I Olopade
- Section of Hematology and Oncology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Dezheng Huo
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA; Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
5
|
Head ST, Dezem F, Todor A, Yang J, Plummer J, Gayther S, Kar S, Schildkraut J, Epstein MP. Cis- and trans-eQTL TWASs of breast and ovarian cancer identify more than 100 susceptibility genes in the BCAC and OCAC consortia. Am J Hum Genet 2024; 111:1084-1099. [PMID: 38723630 PMCID: PMC11179407 DOI: 10.1016/j.ajhg.2024.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/11/2024] [Accepted: 04/16/2024] [Indexed: 05/21/2024] Open
Abstract
Transcriptome-wide association studies (TWASs) have investigated the role of genetically regulated transcriptional activity in the etiologies of breast and ovarian cancer. However, methods performed to date have focused on the regulatory effects of risk-associated SNPs thought to act in cis on a nearby target gene. With growing evidence for distal (trans) regulatory effects of variants on gene expression, we performed TWASs of breast and ovarian cancer using a Bayesian genome-wide TWAS method (BGW-TWAS) that considers effects of both cis- and trans-expression quantitative trait loci (eQTLs). We applied BGW-TWAS to whole-genome and RNA sequencing data in breast and ovarian tissues from the Genotype-Tissue Expression project to train expression imputation models. We applied these models to large-scale GWAS summary statistic data from the Breast Cancer and Ovarian Cancer Association Consortia to identify genes associated with risk of overall breast cancer, non-mucinous epithelial ovarian cancer, and 10 cancer subtypes. We identified 101 genes significantly associated with risk with breast cancer phenotypes and 8 with ovarian phenotypes. These loci include established risk genes and several novel candidate risk loci, such as ACAP3, whose associations are predominantly driven by trans-eQTLs. We replicated several associations using summary statistics from an independent GWAS of these cancer phenotypes. We further used genotype and expression data in normal and tumor breast tissue from the Cancer Genome Atlas to examine the performance of our trained expression imputation models. This work represents an in-depth look into the role of trans eQTLs in the complex molecular mechanisms underlying these diseases.
Collapse
Affiliation(s)
- S Taylor Head
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Felipe Dezem
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Andrei Todor
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Jingjing Yang
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Jasmine Plummer
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Simon Gayther
- Department of Biomedical Sciences, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Siddhartha Kar
- Early Cancer Institute, Department of Oncology, University of Cambridge, Cambridge CB2 0XZ, UK
| | - Joellen Schildkraut
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Michael P Epstein
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA.
| |
Collapse
|
6
|
Durge AR, Shrimankar DD. DHFS-ECM: Design of a Dual Heuristic Feature Selection-based Ensemble Classification Model for the Identification of Bamboo Species from Genomic Sequences. Curr Genomics 2024; 25:185-201. [PMID: 39087000 PMCID: PMC11288165 DOI: 10.2174/0113892029268176240125055419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/16/2024] [Accepted: 01/16/2024] [Indexed: 08/02/2024] Open
Abstract
Background Analyzing genomic sequences plays a crucial role in understanding biological diversity and classifying Bamboo species. Existing methods for genomic sequence analysis suffer from limitations such as complexity, low accuracy, and the need for constant reconfiguration in response to evolving genomic datasets. Aim This study addresses these limitations by introducing a novel Dual Heuristic Feature Selection-based Ensemble Classification Model (DHFS-ECM) for the precise identification of Bamboo species from genomic sequences. Methods The proposed DHFS-ECM method employs a Genetic Algorithm to perform dual heuristic feature selection. This process maximizes inter-class variance, leading to the selection of informative N-gram feature sets. Subsequently, intra-class variance levels are used to create optimal training and validation sets, ensuring comprehensive coverage of class-specific features. The selected features are then processed through an ensemble classification layer, combining multiple stratification models for species-specific categorization. Results Comparative analysis with state-of-the-art methods demonstrate that DHFS-ECM achieves remarkable improvements in accuracy (9.5%), precision (5.9%), recall (8.5%), and AUC performance (4.5%). Importantly, the model maintains its performance even with an increased number of species classes due to the continuous learning facilitated by the Dual Heuristic Genetic Algorithm Model. Conclusion DHFS-ECM offers several key advantages, including efficient feature extraction, reduced model complexity, enhanced interpretability, and increased robustness and accuracy through the ensemble classification layer. These attributes make DHFS-ECM a promising tool for real-time clinical applications and a valuable contribution to the field of genomic sequence analysis.
Collapse
Affiliation(s)
- Aditi R Durge
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| | - Deepti D Shrimankar
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| |
Collapse
|
7
|
McClellan JC, Li JL, Gao G, Huo D. Expression- and splicing-based multi-tissue transcriptome-wide association studies identified multiple genes for breast cancer by estrogen-receptor status. Breast Cancer Res 2024; 26:51. [PMID: 38515142 PMCID: PMC10958972 DOI: 10.1186/s13058-024-01809-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 03/14/2024] [Indexed: 03/23/2024] Open
Abstract
BACKGROUND Although several transcriptome-wide association studies (TWASs) have been performed to identify genes associated with overall breast cancer (BC) risk, only a few TWAS have explored the differences in estrogen receptor-positive (ER+) and estrogen receptor-negative (ER-) breast cancer. Additionally, these studies were based on gene expression prediction models trained primarily in breast tissue, and they did not account for alternative splicing of genes. METHODS In this study, we utilized two approaches to perform multi-tissue TWASs of breast cancer by ER subtype: (1) an expression-based TWAS that combined TWAS signals for each gene across multiple tissues and (2) a splicing-based TWAS that combined TWAS signals of all excised introns for each gene across tissues. To perform this TWAS, we utilized summary statistics for ER + BC from the Breast Cancer Association Consortium (BCAC) and for ER- BC from a meta-analysis of BCAC and the Consortium of Investigators of Modifiers of BRCA1 and BRCA2 (CIMBA). RESULTS In total, we identified 230 genes in 86 loci that were associated with ER + BC and 66 genes in 29 loci that were associated with ER- BC at a Bonferroni threshold of significance. Of these genes, 2 genes associated with ER + BC at the 1q21.1 locus were located at least 1 Mb from published GWAS hits. For several well-studied tumor suppressor genes such as TP53 and CHEK2 which have historically been thought to impact BC risk through rare, penetrant mutations, we discovered that common variants, which modulate gene expression, may additionally contribute to ER + or ER- etiology. CONCLUSIONS Our study comprehensively examined how differences in common variation contribute to molecular differences between ER + and ER- BC and introduces a novel, splicing-based framework that can be used in future TWAS studies.
Collapse
Affiliation(s)
- Julian C McClellan
- Department of Public Health Sciences, University of Chicago, Chicago, IL, 60637, USA
| | - James L Li
- Department of Public Health Sciences, University of Chicago, Chicago, IL, 60637, USA
| | - Guimin Gao
- Department of Public Health Sciences, University of Chicago, Chicago, IL, 60637, USA.
| | - Dezheng Huo
- Department of Public Health Sciences, University of Chicago, Chicago, IL, 60637, USA.
- Section of Hematology & Oncology, Department of Medicine, University of Chicago, Chicago, IL, 60637, USA.
| |
Collapse
|
8
|
Wittich H, Ardlie K, Taylor KD, Durda P, Liu Y, Mikhaylova A, Gignoux CR, Cho MH, Rich SS, Rotter JI, Manichaikul A, Im HK, Wheeler HE. Transcriptome-wide association study of the plasma proteome reveals cis and trans regulatory mechanisms underlying complex traits. Am J Hum Genet 2024; 111:445-455. [PMID: 38320554 PMCID: PMC10940016 DOI: 10.1016/j.ajhg.2024.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 01/12/2024] [Accepted: 01/12/2024] [Indexed: 02/08/2024] Open
Abstract
Regulation of transcription and translation are mechanisms through which genetic variants affect complex traits. Expression quantitative trait locus (eQTL) studies have been more successful at identifying cis-eQTL (within 1 Mb of the transcription start site) than trans-eQTL. Here, we tested the cis component of gene expression for association with observed plasma protein levels to identify cis- and trans-acting genes that regulate protein levels. We used transcriptome prediction models from 49 Genotype-Tissue Expression (GTEx) Project tissues to predict the cis component of gene expression and tested the predicted expression of every gene in every tissue for association with the observed abundance of 3,622 plasma proteins measured in 3,301 individuals from the INTERVAL study. We tested significant results for replication in 971 individuals from the Trans-omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA). We found 1,168 and 1,210 cis- and trans-acting associations that replicated in TOPMed (FDR < 0.05) with a median expected true positive rate (π1) across tissues of 0.806 and 0.390, respectively. The target proteins of trans-acting genes were enriched for transcription factor binding sites and autoimmune diseases in the GWAS catalog. Furthermore, we found a higher correlation between predicted expression and protein levels of the same underlying gene (R = 0.17) than observed expression (R = 0.10, p = 7.50 × 10-11). This indicates the cis-acting genetically regulated (heritable) component of gene expression is more consistent across tissues than total observed expression (genetics + environment) and is useful in uncovering the function of SNPs associated with complex traits.
Collapse
Affiliation(s)
- Henry Wittich
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
| | - Kristin Ardlie
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Peter Durda
- Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT 05446, USA
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, USA
| | - Anna Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Chris R Gignoux
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Denver Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Stephen S Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - Heather E Wheeler
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA; Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA.
| |
Collapse
|
9
|
Wang T, Yan Z, Zhang Y, Lou Z, Zheng X, Mai D, Wang Y, Shang X, Xiao B, Peng J, Chen J. postGWAS: A web server for deciphering the causality post the genome-wide association studies. Comput Biol Med 2024; 171:108108. [PMID: 38359659 DOI: 10.1016/j.compbiomed.2024.108108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/23/2024] [Accepted: 02/04/2024] [Indexed: 02/17/2024]
Abstract
While genome-wide association studies (GWAS) have unequivocally identified vast disease susceptibility variants, a majority of them are situated in non-coding regions and are in high linkage disequilibrium (LD). To pave the way of translating GWAS signals to clinical drug targets, it is essential to identify the underlying causal variants and further causal genes. To this end, a myriad of post-GWAS methods have been devised, each grounded in distinct principles including fine-mapping, co-localization, and transcriptome-wide association study (TWAS) techniques. Yet, no platform currently exists that seamlessly integrates these diverse post-GWAS methodologies. In this work, we present a user-friendly web server for post-GWAS analysis, that seamlessly integrates 9 distinct methods with 12 models, categorized by fine-mapping, colocalization, and TWAS. The server mainly helps users decipher the causality hindered by complex GWAS signals, including casual variants and casual genes, without the burden of computational skills and complex environment configuration, and provides a convenient platform for post-GWAS analysis, result visualization, facilitating the understanding and interpretation of the genome-wide association studies. The postGWAS server is available at http://g2g.biographml.com/.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Zhihao Yan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yiming Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Zhuofei Lou
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Xiaozhu Zheng
- Department of Anesthesiology, The People's Hospital of Yubei District, Chongqing, 401120, China
| | - DuoDuo Mai
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Bing Xiao
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Jing Chen
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China.
| |
Collapse
|
10
|
Head ST, Dezem F, Todor A, Yang J, Plummer J, Gayther S, Kar S, Schildkraut J, Epstein MP. Cis- and trans-eQTL TWAS of breast and ovarian cancer identify more than 100 risk associated genes in the BCAC and OCAC consortia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566218. [PMID: 38014246 PMCID: PMC10680675 DOI: 10.1101/2023.11.09.566218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Transcriptome-wide association studies (TWAS) have investigated the role of genetically regulated transcriptional activity in the etiologies of breast and ovarian cancer. However, methods performed to date have only considered regulatory effects of risk associated SNPs thought to act in cis on a nearby target gene. With growing evidence for distal (trans) regulatory effects of variants on gene expression, we performed TWAS of breast and ovarian cancer using a Bayesian genome-wide TWAS method (BGW-TWAS) that considers effects of both cis- and trans-expression quantitative trait loci (eQTLs). We applied BGW-TWAS to whole genome and RNA sequencing data in breast and ovarian tissues from the Genotype-Tissue Expression project to train expression imputation models. We applied these models to large-scale GWAS summary statistic data from the Breast Cancer and Ovarian Cancer Association Consortia to identify genes associated with risk of overall breast cancer, non-mucinous epithelial ovarian cancer, and 10 cancer subtypes. We identified 101 genes significantly associated with risk with breast cancer phenotypes and 8 with ovarian phenotypes. These loci include established risk genes and several novel candidate risk loci, such as ACAP3, whose associations are predominantly driven by trans-eQTLs. We replicated several associations using summary statistics from an independent GWAS of these cancer phenotypes. We further used genotype and expression data in normal and tumor breast tissue from the Cancer Genome Atlas to examine the performance of our trained expression imputation models. This work represents a first look into the role of trans-eQTLs in the complex molecular mechanisms underlying these diseases.
Collapse
Affiliation(s)
- S. Taylor Head
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Felipe Dezem
- Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Andrei Todor
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Jingjing Yang
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Jasmine Plummer
- Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Simon Gayther
- Department of Biomedical Sciences, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Siddhartha Kar
- Early Cancer Institute, Department of Oncology, University of Cambridge, Cambridge CB2 0XZ, UK
| | - Joellen Schildkraut
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Michael P. Epstein
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
11
|
Ottensmann L, Tabassum R, Ruotsalainen SE, Gerl MJ, Klose C, Widén E, Simons K, Ripatti S, Pirinen M. Genome-wide association analysis of plasma lipidome identifies 495 genetic associations. Nat Commun 2023; 14:6934. [PMID: 37907536 PMCID: PMC10618167 DOI: 10.1038/s41467-023-42532-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 10/13/2023] [Indexed: 11/02/2023] Open
Abstract
The human plasma lipidome captures risk for cardiometabolic diseases. To discover new lipid-associated variants and understand the link between lipid species and cardiometabolic disorders, we perform univariate and multivariate genome-wide analyses of 179 lipid species in 7174 Finnish individuals. We fine-map the associated loci, prioritize genes, and examine their disease links in 377,277 FinnGen participants. We identify 495 genome-trait associations in 56 genetic loci including 8 novel loci, with a considerable boost provided by the multivariate analysis. For 26 loci, fine-mapping identifies variants with a high causal probability, including 14 coding variants indicating likely causal genes. A phenome-wide analysis across 953 disease endpoints reveals disease associations for 40 lipid loci. For 11 coronary artery disease risk variants, we detect strong associations with lipid species. Our study demonstrates the power of multivariate genetic analysis in correlated lipidomics data and reveals genetic links between diseases and lipid species beyond the standard lipids.
Collapse
Affiliation(s)
- Linda Ottensmann
- Institute for Molecular Medicine Finland, HiLIFE, University of Helsinki, Helsinki, Finland.
| | - Rubina Tabassum
- Institute for Molecular Medicine Finland, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Sanni E Ruotsalainen
- Institute for Molecular Medicine Finland, HiLIFE, University of Helsinki, Helsinki, Finland
| | | | | | - Elisabeth Widén
- Institute for Molecular Medicine Finland, HiLIFE, University of Helsinki, Helsinki, Finland
| | | | - Samuli Ripatti
- Institute for Molecular Medicine Finland, HiLIFE, University of Helsinki, Helsinki, Finland
- Department of Public Health, Clinicum, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Broad Institute of the Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA
| | - Matti Pirinen
- Institute for Molecular Medicine Finland, HiLIFE, University of Helsinki, Helsinki, Finland.
- Department of Public Health, Clinicum, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
12
|
Araujo DS, Nguyen C, Hu X, Mikhaylova AV, Gignoux C, Ardlie K, Taylor KD, Durda P, Liu Y, Papanicolaou G, Cho MH, Rich SS, Rotter JI, Im HK, Manichaikul A, Wheeler HE. Multivariate adaptive shrinkage improves cross-population transcriptome prediction and association studies in underrepresented populations. HGG ADVANCES 2023; 4:100216. [PMID: 37869564 PMCID: PMC10589725 DOI: 10.1016/j.xhgg.2023.100216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/27/2023] [Indexed: 10/24/2023] Open
Abstract
Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.
Collapse
Affiliation(s)
- Daniel S. Araujo
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
| | - Chris Nguyen
- Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
| | - Xiaowei Hu
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Anna V. Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristin Ardlie
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Peter Durda
- Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT 05446, USA
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, USA
| | - George Papanicolaou
- Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - NHLBI TOPMed Consortium
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
- Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO 80045, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
- Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT 05446, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, USA
- Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Heather E. Wheeler
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
- Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
| |
Collapse
|
13
|
Ghaffar A, Nyholt DR. Integrating eQTL and GWAS data characterises established and identifies novel migraine risk loci. Hum Genet 2023; 142:1113-1137. [PMID: 37245199 PMCID: PMC10449685 DOI: 10.1007/s00439-023-02568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 05/02/2023] [Indexed: 05/29/2023]
Abstract
Migraine-a painful, throbbing headache disorder-is the most common complex brain disorder, yet its molecular mechanisms remain unclear. Genome-wide association studies (GWAS) have proven successful in identifying migraine risk loci; however, much work remains to identify the causal variants and genes. In this paper, we compared three transcriptome-wide association study (TWAS) imputation models-MASHR, elastic net, and SMultiXcan-to characterise established genome-wide significant (GWS) migraine GWAS risk loci, and to identify putative novel migraine risk gene loci. We compared the standard TWAS approach of analysing 49 GTEx tissues with Bonferroni correction for testing all genes present across all tissues (Bonferroni), to TWAS in five tissues estimated to be relevant to migraine, and TWAS with Bonferroni correction that took into account the correlation between eQTLs within each tissue (Bonferroni-matSpD). Elastic net models performed in all 49 GTEx tissues using Bonferroni-matSpD characterised the highest number of established migraine GWAS risk loci (n = 20) with GWS TWAS genes having colocalisation (PP4 > 0.5) with an eQTL. SMultiXcan in all 49 GTEx tissues identified the highest number of putative novel migraine risk genes (n = 28) with GWS differential expression at 20 non-GWS GWAS loci. Nine of these putative novel migraine risk genes were later found to be at and in linkage disequilibrium with true (GWS) migraine risk loci in a recent, more powerful migraine GWAS. Across all TWAS approaches, a total of 62 putative novel migraine risk genes were identified at 32 independent genomic loci. Of these 32 loci, 21 were true risk loci in the recent, more powerful migraine GWAS. Our results provide important guidance on the selection, use, and utility of imputation-based TWAS approaches to characterise established GWAS risk loci and identify novel risk gene loci.
Collapse
Affiliation(s)
- Ammarah Ghaffar
- Statistical and Genomic Epidemiology Laboratory, School of Biomedical Sciences, Faculty of Health, Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, QLD, 4059, Australia.
| | - Dale R Nyholt
- Statistical and Genomic Epidemiology Laboratory, School of Biomedical Sciences, Faculty of Health, Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, QLD, 4059, Australia.
| |
Collapse
|
14
|
Vysotskiy M, Weiss LA. Combinations of genes at the 16p11.2 and 22q11.2 CNVs contribute to neurobehavioral traits. PLoS Genet 2023; 19:e1010780. [PMID: 37267418 DOI: 10.1371/journal.pgen.1010780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 05/09/2023] [Indexed: 06/04/2023] Open
Abstract
The 16p11.2 and 22q11.2 copy number variants (CNVs) are associated with neurobehavioral traits including autism spectrum disorder (ASD), schizophrenia, bipolar disorder, obesity, and intellectual disability. Identifying specific genes contributing to each disorder and dissecting the architecture of CNV-trait association has been difficult, inspiring hypotheses of more complex models, such as multiple genes acting together. Using multi-tissue data from the GTEx consortium, we generated pairwise expression imputation models for CNV genes and then applied these elastic net models to GWAS for: ASD, bipolar disorder, schizophrenia, BMI (obesity), and IQ (intellectual disability). We compared the variance in these five traits explained by gene pairs with the variance explained by single genes and by traditional interaction models. We also modeled polygene region-wide effects using summed predicted expression ranks across many genes to create a regionwide score. We found that in all CNV-trait pairs except for bipolar disorder at 22q11.2, pairwise effects explain more variance than single genes. Pairwise model superiority was specific to the CNV region for all 16p11.2 traits and ASD at 22q11.2. We identified novel individual genes over-represented in top pairs that did not show single-gene signal. We also found that BMI and IQ have significant regionwide association with both CNV regions. Overall, we observe that genetic architecture differs by trait and region, but 9/10 CNV-trait combinations demonstrate evidence for multigene contribution, and for most of these, the importance of combinatorial models appears unique to CNV regions. Our results suggest that mechanistic insights for CNV pathology may require combinational models.
Collapse
Affiliation(s)
- Mikhail Vysotskiy
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
- Department of Psychiatry and Behavioral Sciences, University of California San Francisco, San Francisco, California, United States of America
- Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California, United States of America
| | - Lauren A Weiss
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
- Department of Psychiatry and Behavioral Sciences, University of California San Francisco, San Francisco, California, United States of America
- Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California, United States of America
| |
Collapse
|
15
|
Gao G, Fiorica PN, McClellan J, Barbeira AN, Li JL, Olopade OI, Im HK, Huo D. A joint transcriptome-wide association study across multiple tissues identifies candidate breast cancer susceptibility genes. Am J Hum Genet 2023; 110:950-962. [PMID: 37164006 PMCID: PMC10257003 DOI: 10.1016/j.ajhg.2023.04.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 04/14/2023] [Indexed: 05/12/2023] Open
Abstract
Genome-wide association studies (GWASs) have identified more than 200 genomic loci for breast cancer risk, but specific causal genes in most of these loci have not been identified. In fact, transcriptome-wide association studies (TWASs) of breast cancer performed using gene expression prediction models trained in breast tissue have yet to clearly identify most target genes. To identify candidate genes, we performed a GWAS analysis in a breast cancer dataset from UK Biobank (UKB) and combined the results with the GWAS results of the Breast Cancer Association Consortium (BCAC) by a meta-analysis. Using the summary statistics from the meta-analysis, we performed a joint TWAS analysis that combined TWAS signals from multiple tissues. We used expression prediction models trained in 11 tissues that are potentially relevant to breast cancer from the Genotype-Tissue Expression (GTEx) data. In the GWAS analysis, we identified eight loci distinct from those reported previously. In the TWAS analysis, we identified 309 genes at 108 genomic loci to be significantly associated with breast cancer at the Bonferroni threshold. Of these, 17 genes were located in eight regions that were at least 1 Mb away from published GWAS hits. The remaining TWAS-significant genes were located in 100 known genomic loci from previous GWASs of breast cancer. We found that 21 genes located in known GWAS loci remained statistically significant after conditioning on previous GWAS index variants. Our study provides insights into breast cancer genetics through mapping candidate target genes in a large proportion of known GWAS loci and discovering multiple new loci.
Collapse
Affiliation(s)
- Guimin Gao
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Peter N Fiorica
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Julian McClellan
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Alvaro N Barbeira
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - James L Li
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Olufunmilayo I Olopade
- Section of Hematology & Oncology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.
| | - Dezheng Huo
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA; Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
16
|
Araujo DS, Nguyen C, Hu X, Mikhaylova AV, Gignoux C, Ardlie K, Taylor KD, Durda P, Liu Y, Papanicolaou G, Cho MH, Rich SS, Rotter JI, Im HK, Manichaikul A, Wheeler HE. Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.09.527747. [PMID: 36798214 PMCID: PMC9934635 DOI: 10.1101/2023.02.09.527747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized methods that leverage shared regulatory effects across different conditions, in this case, across different populations may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWAS) using different methods (Elastic Net, Joint-Tissue Imputation (JTI), Matrix eQTL, Multivariate Adaptive Shrinkage in R (MASHR), and Transcriptome-Integrated Genetic Association Resource (TIGAR)) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWAS, we integrated publicly available multi-ethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology Study (PAGE) and Pan-UK Biobank with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multi-ethnic TWAS, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWAS and new loci previously not found in GWAS. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWAS for multi-ethnic or underrepresented populations.
Collapse
Affiliation(s)
- Daniel S. Araujo
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, 60660, USA
| | - Chris Nguyen
- Department of Biology, Loyola University Chicago, Chicago, IL, 60660, USA
| | - Xiaowei Hu
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA
| | - Anna V. Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Kristin Ardlie
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | - Peter Durda
- Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT, 05446, USA
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - George Papanicolaou
- Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD, 20892, USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, 02115, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | | | - Hae Kyung Im
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA
| | - Heather E. Wheeler
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, 60660, USA
- Department of Biology, Loyola University Chicago, Chicago, IL, 60660, USA
| |
Collapse
|
17
|
Durge AR, Shrimankar DD, Sawarkar AD. Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective. Curr Genomics 2022; 23:299-317. [PMID: 36778194 PMCID: PMC9878859 DOI: 10.2174/1389202923666220927105311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 08/29/2022] [Accepted: 09/01/2022] [Indexed: 11/22/2022] Open
Abstract
Genome sequences indicate a wide variety of characteristics, which include species and sub-species type, genotype, diseases, growth indicators, yield quality, etc. To analyze and study the characteristics of the genome sequences across different species, various deep learning models have been proposed by researchers, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Multilayer Perceptrons (MLPs), etc., which vary in terms of evaluation performance, area of application and species that are processed. Due to a wide differentiation between the algorithmic implementations, it becomes difficult for research programmers to select the best possible genome processing model for their application. In order to facilitate this selection, the paper reviews a wide variety of such models and compares their performance in terms of accuracy, area of application, computational complexity, processing delay, precision and recall. Thus, in the present review, various deep learning and machine learning models have been presented that possess different accuracies for different applications. For multiple genomic data, Repeated Incremental Pruning to Produce Error Reduction with Support Vector Machine (Ripper SVM) outputs 99.7% of accuracy, and for cancer genomic data, it exhibits 99.27% of accuracy using the CNN Bayesian method. Whereas for Covid genome analysis, Bidirectional Long Short-Term Memory with CNN (BiLSTM CNN) exhibits the highest accuracy of 99.95%. A similar analysis of precision and recall of different models has been reviewed. Finally, this paper concludes with some interesting observations related to the genomic processing models and recommends applications for their efficient use.
Collapse
Affiliation(s)
- Aditi R. Durge
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| | - Deepti D. Shrimankar
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India,Address correspondence to this author at the Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India; Tel: 9860606477; E-mail:
| | - Ankush D. Sawarkar
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| |
Collapse
|
18
|
Díez-Villanueva A, Sanz-Pamplona R, Solé X, Cordero D, Crous-Bou M, Guinó E, Lopez-Doriga A, Berenguer A, Aussó S, Paré-Brunet L, Obón-Santacana M, Moratalla-Navarro F, Salazar R, Sanjuan X, Santos C, Biondo S, Diez-Obrero V, Garcia-Serrano A, Alonso MH, Carreras-Torres R, Closa A, Moreno V. COLONOMICS - integrative omics data of one hundred paired normal-tumoral samples from colon cancer patients. Sci Data 2022; 9:595. [PMID: 36182938 PMCID: PMC9526730 DOI: 10.1038/s41597-022-01697-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 08/16/2022] [Indexed: 11/29/2022] Open
Abstract
Colonomics is a multi-omics dataset that includes 250 samples: 50 samples from healthy colon mucosa donors and 100 paired samples from colon cancer patients (tumor/adjacent). From these samples, Colonomics project includes data from genotyping, DNA methylation, gene expression, whole exome sequencing and micro-RNAs (miRNAs) expression. It also includes data from copy number variation (CNV) from tumoral samples. In addition, clinical data from all these samples is available. The aims of the project were to explore and integrate these datasets to describe colon cancer at molecular level and to compare normal and tumoral tissues. Also, to improve screening by finding biomarkers for the diagnosis and prognosis of colon cancer. This project has its own website including four browsers allowing users to explore Colonomics datasets. Since generated data could be reuse for the scientific community for exploratory or validation purposes, here we describe omics datasets included in the Colonomics project as well as results from multi-omics layers integration.
Collapse
Affiliation(s)
- Anna Díez-Villanueva
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Rebeca Sanz-Pamplona
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Xavier Solé
- Molecular Biology CORE, Center for Biomedical Diagnostics, Hospital Clínic de Barcelona, 08036, Barcelona, Spain
- Translational Genomic and Targeted Therapeutics in Solid Tumors, August Pi i Sunyer Biomedical Research Institute (IDIBAPS), 08036, Barcelona, Spain
| | - David Cordero
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Marta Crous-Bou
- Unit of Nutrition and Cancer, Cancer Epidemiology Research Program, Catalan Institute of Oncology (ICO) - Bellvitge Biomedical Research Institute (IDIBELL). L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Elisabet Guinó
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Adriana Lopez-Doriga
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Antoni Berenguer
- Rheumatology Department - Parc Taulí Research and Innovation Institute (I3PT), Barcelona, Spain
| | - Susanna Aussó
- TIC Salut Social Foundation. Ministry of Health of Generalitat de Catalunya, Barcelona, Spain
| | | | - Mireia Obón-Santacana
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Ferran Moratalla-Navarro
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
| | - Ramon Salazar
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
- Medical Oncology Department. Catalan Institute of Oncology (ICO), Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Oncology (CIBERONC), Madrid, Spain
| | - Xavier Sanjuan
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
- Pathology Service, Bellvitge University Hospital (HUB), Hospitalet de Llobregat, Barcelona, Spain
| | - Cristina Santos
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
- Medical Oncology Department. Catalan Institute of Oncology (ICO), Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Oncology (CIBERONC), Madrid, Spain
| | - Sebastiano Biondo
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
- Digestive Surgery Service, Bellvitge University Hospital (HUB). Hospitalet de Llobregat, Barcelona, Spain
| | - Virginia Diez-Obrero
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
| | - Ainhoa Garcia-Serrano
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Maria Henar Alonso
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain
| | - Robert Carreras-Torres
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Adria Closa
- The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
| | - Víctor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO). Hospitalet de Llobregat, Barcelona, Spain.
- Colorectal Cancer Group, ONCOBELL, Bellvitge Biomedical Research Institute (IDIBELL). Hospitalet de Llobregat, Barcelona, Spain.
- Biomedical Research Centre Network for Epidemiology and Public Health (CIBERESP), Madrid, Spain.
- Department of Clinical Sciences, Faculty of Medicine and health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona, Barcelona, Spain.
| |
Collapse
|
19
|
Schubert R, Geoffroy E, Gregga I, Mulford AJ, Aguet F, Ardlie K, Gerszten R, Clish C, Van Den Berg D, Taylor KD, Durda P, Johnson WC, Cornell E, Guo X, Liu Y, Tracy R, Conomos M, Blackwell T, Papanicolaou G, Lappalainen T, Mikhaylova AV, Thornton TA, Cho MH, Gignoux CR, Lange L, Lange E, Rich SS, Rotter JI, Manichaikul A, Im HK, Wheeler HE. Protein prediction for trait mapping in diverse populations. PLoS One 2022; 17:e0264341. [PMID: 35202437 PMCID: PMC8870552 DOI: 10.1371/journal.pone.0264341] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 02/08/2022] [Indexed: 11/18/2022] Open
Abstract
Genetically regulated gene expression has helped elucidate the biological mechanisms underlying complex traits. Improved high-throughput technology allows similar interrogation of the genetically regulated proteome for understanding complex trait mechanisms. Here, we used the Trans-omics for Precision Medicine (TOPMed) Multi-omics pilot study, which comprises data from Multi-Ethnic Study of Atherosclerosis (MESA), to optimize genetic predictors of the plasma proteome for genetically regulated proteome-wide association studies (PWAS) in diverse populations. We built predictive models for protein abundances using data collected in TOPMed MESA, for which we have measured 1,305 proteins by a SOMAscan assay. We compared predictive models built via elastic net regression to models integrating posterior inclusion probabilities estimated by fine-mapping SNPs prior to elastic net. In order to investigate the transferability of predictive models across ancestries, we built protein prediction models in all four of the TOPMed MESA populations, African American (n = 183), Chinese (n = 71), European (n = 416), and Hispanic/Latino (n = 301), as well as in all populations combined. As expected, fine-mapping produced more significant protein prediction models, especially in African ancestries populations, potentially increasing opportunity for discovery. When we tested our TOPMed MESA models in the independent European INTERVAL study, fine-mapping improved cross-ancestries prediction for some proteins. Using GWAS summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises ∼50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we applied S-PrediXcan to perform PWAS for 28 complex traits. The most protein-trait associations were discovered, colocalized, and replicated in large independent GWAS using proteome prediction model training populations with similar ancestries to PAGE. At current training population sample sizes, performance between baseline and fine-mapped protein prediction models in PWAS was similar, highlighting the utility of elastic net. Our predictive models in diverse populations are publicly available for use in proteome mapping methods at https://doi.org/10.5281/zenodo.4837327.
Collapse
Affiliation(s)
- Ryan Schubert
- Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, United States of America
- Department of Biology, Loyola University Chicago, Chicago, IL, United States of America
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
| | - Elyse Geoffroy
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
| | - Isabelle Gregga
- Department of Biology, Loyola University Chicago, Chicago, IL, United States of America
| | - Ashley J. Mulford
- Department of Biology, Loyola University Chicago, Chicago, IL, United States of America
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
| | - Francois Aguet
- Broad Institute, Cambridge, MA, United States of America
| | - Kristin Ardlie
- Broad Institute, Cambridge, MA, United States of America
| | - Robert Gerszten
- Beth Israel Deaconess Medical Center, Boston, MA, United States of America
| | - Clary Clish
- Broad Institute, Cambridge, MA, United States of America
| | - David Van Den Berg
- University of Southern California, Los Angeles, CA, United States of America
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, United States of America
| | - Peter Durda
- Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, VT, United States of America
| | - W. Craig Johnson
- Collaborative Health Studies Coordinating Center, University of Washington, Seattle, WA, United States of America
| | - Elaine Cornell
- Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, VT, United States of America
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, United States of America
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC, United States of America
| | - Russell Tracy
- Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, VT, United States of America
| | - Matthew Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, United States of America
| | - Tom Blackwell
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
| | - George Papanicolaou
- Epidemiology Branch, National Heart, Lung and Blood Institute, Bethesda, MD, United States of America
| | - Tuuli Lappalainen
- New York Genome Center and Department of Systems Biology, Columbia University, New York, NY United States of America
| | - Anna V. Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA, United States of America
| | - Timothy A. Thornton
- Department of Biostatistics, University of Washington, Seattle, WA, United States of America
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
| | - Christopher R. Gignoux
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Leslie Lange
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Ethan Lange
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States of America
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, United States of America
| | | | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States of America
| | - Hae Kyung Im
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, United States of America
| | - Heather E. Wheeler
- Department of Biology, Loyola University Chicago, Chicago, IL, United States of America
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
- * E-mail:
| |
Collapse
|
20
|
Liang Y, Pividori M, Manichaikul A, Palmer AA, Cox NJ, Wheeler HE, Im HK. Polygenic transcriptome risk scores (PTRS) can improve portability of polygenic risk scores across ancestries. Genome Biol 2022; 23:23. [PMID: 35027082 PMCID: PMC8759285 DOI: 10.1186/s13059-021-02591-w] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/27/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Polygenic risk scores (PRS) are valuable to translate the results of genome-wide association studies (GWAS) into clinical practice. To date, most GWAS have been based on individuals of European-ancestry leading to poor performance in populations of non-European ancestry. RESULTS We introduce the polygenic transcriptome risk score (PTRS), which is based on predicted transcript levels (rather than SNPs), and explore the portability of PTRS across populations using UK Biobank data. CONCLUSIONS We show that PTRS has a significantly higher portability (Wilcoxon p=0.013) in the African-descent samples where the loss of performance is most acute with better performance than PRS when used in combination.
Collapse
Affiliation(s)
- Yanyu Liang
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA.
| | - Milton Pividori
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104, PA, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
- Institute for Genomic Medicine, University of California San Diego, San Diego, CA, USA
| | - Nancy J Cox
- Vanderbilt Genetic Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Heather E Wheeler
- Department of Biology, Loyola University Chicago, Chicago, IL, USA
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA
- Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Maywood, IL, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
21
|
Mahoney E, Janve V, Hohman TJ, Dumitrescu L. Evaluation of Sex-Aware PrediXcan Models for Predicting Gene Expression. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2022; 27:361-372. [PMID: 34890163 PMCID: PMC8924937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Gene-based methods such as PrediXcan use expression quantitative trait loci to build tissue-specific gene expression models when only genetic data is available. There are known sex differences in tissue-specific gene expression and in the genetic architecture of gene expression, but such differences have not been incorporated into predicted gene expression models to date. We built sex-aware PrediXcan models using whole blood transcriptomic data from the Genotype-Tissue Expression (GTEx) project (195 females and 371 males) and evaluated their performance in an independent dataset. Specifically, PrediXcan models were built following the method described in Gamazon et al. 2015, but we included both whole-sample and sex-specific models. Validation was evaluated leveraging lymphoblast RNA sequencing data from the EUR cohort of the 1000 Genomes Project (178 females and 171 males). Correlations (R2) between observed and predicted expression were evaluated in 5,283 autosomal genes to determine performance of models. In sum, we successfully predicted 1,149 genes in males and 623 in females, while 3,511 genes appeared to be not sex-specific. Of the sex-specific genes, 15% (189 genes in males and 73 genes in females) exhibited higher R2 in sex-specific models compared to whole-sample models, although the overall gain in predictive power was generally minimal and well within measurement error. Nevertheless, two female-specific genes and six male-specific genes showed significantly better prediction when using the sex-specific weights versus the whole-sample weights; furthermore, several of these genes play a role in mitochondrial metabolism, which is known to be influenced by sex hormones. Taken together, these results support previous reports of the small contribution of genetic architecture to sex-specific expression. Still, sex-aware PrediXcan models were able to provide robust sex-specific prediction signals. Future studies exploring the contribution of the X chromosome and tissue specificity on sex-specific genetically regulated expression will clarify the utility of this method.
Collapse
Affiliation(s)
- Emily Mahoney
- Vanderbilt Memory and Alzheimer’s Center, Vanderbilt University Medical Center, Nashville, TN 37212, USA
| | - Vaibhav Janve
- Vanderbilt Memory and Alzheimer’s Center, Vanderbilt University Medical Center, Nashville, TN 37212, USA
| | - Timothy J. Hohman
- Vanderbilt Memory and Alzheimer’s Center, Vanderbilt University Medical Center, Nashville, TN 37212, USA,Vanderbilt Genetics Institute, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37212, USA
| | - Logan Dumitrescu
- Vanderbilt Memory and Alzheimer’s Center, Vanderbilt University Medical Center, Nashville, TN 37212, USA,Vanderbilt Genetics Institute, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37212, USA,
| |
Collapse
|
22
|
Díez-Obrero V, Moratalla-Navarro F, Ibáñez-Sanz G, Guardiola J, Rodríguez-Moranta F, Obón-Santacana M, Díez-Villanueva A, Dampier CH, Devall M, Carreras-Torres R, Casey G, Moreno V. Transcriptome-Wide Association Study for Inflammatory Bowel Disease Reveals Novel Candidate Susceptibility Genes in Specific Colon Subsites and Tissue Categories. J Crohns Colitis 2021; 16:275-285. [PMID: 34286847 PMCID: PMC8864630 DOI: 10.1093/ecco-jcc/jjab131] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
BACKGROUND AND AIMS Genome-wide association studies [GWAS] for inflammatory bowel disease [IBD] have identified 240 risk variants. However, the benefit of understanding the genetic architecture of IBD remains to be exploited. Transcriptome-wide association studies [TWAS] associate gene expression with genetic susceptibility to disease, providing functional insight into risk loci. In this study, we integrate relevant datasets for IBD and perform a TWAS to nominate novel genes implicated in IBD genetic susceptibility. METHODS We applied elastic net regression to generate gene expression prediction models for the University of Barcelona and University of Virginia RNA sequencing project [BarcUVa-Seq] and correlated expression and disease association research [CEDAR] datasets. Together with Genotype-Tissue Expression project [GTEx] data, and GWAS results from about 60 000 individuals, we employed Summary-PrediXcan and Summary-MultiXcan for single and joint analyses of TWAS results, respectively. RESULTS BarcUVa-Seq TWAS revealed 39 novel genes whose expression in the colon is associated with IBD genetic susceptibility. They included expression markers for specific colon cell types. TWAS meta-analysis including all tissues/cell types provided 186 novel candidate susceptibility genes. Additionally, we identified 78 novel susceptibility genes whose expression is associated with IBD exclusively in immune (N = 19), epithelial (N = 25), mesenchymal (N = 22) and neural (N = 12) tissue categories. Associated genes were involved in relevant molecular pathways, including pathways related to known IBD therapeutics, such as tumour necrosis factor signalling. CONCLUSION These findings provide insight into tissue-specific molecular processes underlying IBD genetic susceptibility. Associated genes could be candidate targets for new therapeutics and should be prioritized in functional studies.
Collapse
Affiliation(s)
- Virginia Díez-Obrero
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, Spain,ONCOBELL Program, Bellvitge Biomedical Research Institute (IDIBELL), L’Hospitalet de Llobregat, Barcelona, Spain,Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain,Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
| | - Ferran Moratalla-Navarro
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, Spain,Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain,Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
| | - Gemma Ibáñez-Sanz
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, Spain,ONCOBELL Program, Bellvitge Biomedical Research Institute (IDIBELL), L’Hospitalet de Llobregat, Barcelona, Spain,Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain,Gastroenterology Department, Bellvitge University Hospital, L’Hospitalet de Llobregat, Spain
| | - Jordi Guardiola
- Gastroenterology Department, Bellvitge University Hospital, L’Hospitalet de Llobregat, Spain
| | | | - Mireia Obón-Santacana
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, Spain,ONCOBELL Program, Bellvitge Biomedical Research Institute (IDIBELL), L’Hospitalet de Llobregat, Barcelona, Spain,Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Anna Díez-Villanueva
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, Spain,ONCOBELL Program, Bellvitge Biomedical Research Institute (IDIBELL), L’Hospitalet de Llobregat, Barcelona, Spain,Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Christopher Heaton Dampier
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA,Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Matthew Devall
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA,Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Robert Carreras-Torres
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, Spain,ONCOBELL Program, Bellvitge Biomedical Research Institute (IDIBELL), L’Hospitalet de Llobregat, Barcelona, Spain,Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Graham Casey
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA,Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Victor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, Spain,ONCOBELL Program, Bellvitge Biomedical Research Institute (IDIBELL), L’Hospitalet de Llobregat, Barcelona, Spain,Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain,Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain,Corresponding author: Dr Victor Moreno, Catalan Institute of Oncology, Oncology Data Analytics Program, Hospital Duran i Reynals, Gran Via de l’Hospitalet, 199–203, 08908 L’Hospitalet de Llobregat (Barcelona) Spain. Tel: +34 932 607 434;
| |
Collapse
|
23
|
Feng H, Mancuso N, Pasaniuc B, Kraft P. Multitrait transcriptome-wide association study (TWAS) tests. Genet Epidemiol 2021; 45:563-576. [PMID: 34082479 DOI: 10.1002/gepi.22391] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 03/26/2021] [Accepted: 04/05/2021] [Indexed: 12/19/2022]
Abstract
Multitrait tests can improve power to detect associations between individual single-nucleotide polymorphisms (SNPs) and several related traits. Here, we develop methods for multi-SNP transcriptome-wide association (TWAS) tests to test the association between predicted gene expression levels and multiple phenotypes. We show that the correlation in TWAS test statistics for multiple phenotypes has the same form as multitrait statistics for the single-SNP setting. Thus, established methods for combining single-SNP test statistics across multiple traits can be extended directly to the TWAS setting. We performed an extensive evaluation across eight multitrait methods in simulations that varied gene-phenotype effect sizes in addition to the underlying covariance structure among the phenotypes. We found that all multitrait TWAS tests have well-calibrated Type I error (except ASSET, which can have a slightly elevated or depressed Type I error rate). Our results show that multitrait TWAS can improve statistical power compared with multiple single-trait TWAS followed by Bonferroni correction. To illustrate our approach to real data, we conducted a multitrait TWAS of four circulating lipid traits from the Global Lipids Genetics Consortium. We found that our multitrait Wald TWAS approach identified 506 genes associated with lipid levels compared with 87 identified through Bonferroni-corrected single-trait TWAS. Overall, we find that our proposed multitrait TWAS framework outperforms single-trait approaches to identify new genetic associations, especially for functionally correlated phenotypes and phenotypes with overlapping genome-wide association studies samples, leading to insights into the genetic architecture of multiple phenotypes.
Collapse
Affiliation(s)
- Helian Feng
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Nicholas Mancuso
- Department of Preventive Medicine, Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
- Division of Biostatistics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Bogdan Pasaniuc
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, USA
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - Peter Kraft
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
24
|
Okoro PC, Schubert R, Guo X, Johnson WC, Rotter JI, Hoeschele I, Liu Y, Im HK, Luke A, Dugas LR, Wheeler HE. Transcriptome prediction performance across machine learning models and diverse ancestries. HGG ADVANCES 2021; 2:100019. [PMID: 33937878 PMCID: PMC8087249 DOI: 10.1016/j.xhgg.2020.100019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 12/29/2020] [Indexed: 11/18/2022] Open
Abstract
Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.
Collapse
Affiliation(s)
- Paul C. Okoro
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA
| | - Ryan Schubert
- Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, USA
| | - Xiuqing Guo
- Institute for Translational Genomics and Population Sciences, The Lundquist Institute and Department of Pediatrics at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - W. Craig Johnson
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jerome I. Rotter
- Institute for Translational Genomics and Population Sciences, The Lundquist Institute and Department of Pediatrics at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Ina Hoeschele
- Fralin Life Sciences Institute, Virginia Tech, Blacksburg, VA, USA
- Department of Statistics, Virginia Tech, Blacksburg, VA, USA
- Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Amy Luke
- Department of Public Health Sciences, Parkinson School of Health Sciences and Public Health, Loyola University Chicago, Maywood, IL, USA
| | - Lara R. Dugas
- Department of Public Health Sciences, Parkinson School of Health Sciences and Public Health, Loyola University Chicago, Maywood, IL, USA
- Department of Human Biology, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Heather E. Wheeler
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA
- Department of Biology, Loyola University Chicago, Chicago, IL, USA
- Department of Computer Science, Loyola University Chicago, Chicago, IL, USA
| |
Collapse
|
25
|
Lu H, Zhang J, Jiang Z, Zhang M, Wang T, Zhao H, Zeng P. Detection of Genetic Overlap Between Rheumatoid Arthritis and Systemic Lupus Erythematosus Using GWAS Summary Statistics. Front Genet 2021; 12:656545. [PMID: 33815486 PMCID: PMC8012913 DOI: 10.3389/fgene.2021.656545] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/01/2021] [Indexed: 01/04/2023] Open
Abstract
Background Clinical and epidemiological studies have suggested systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) are comorbidities and common genetic etiologies can partly explain such coexistence. However, shared genetic determinations underlying the two diseases remain largely unknown. Methods Our analysis relied on summary statistics available from genome-wide association studies of SLE (N = 23,210) and RA (N = 58,284). We first evaluated the genetic correlation between RA and SLE through the linkage disequilibrium score regression (LDSC). Then, we performed a multiple-tissue eQTL (expression quantitative trait loci) weighted integrative analysis for each of the two diseases and aggregated association evidence across these tissues via the recently proposed harmonic mean P-value (HMP) combination strategy, which can produce a single well-calibrated P-value for correlated test statistics. Afterwards, we conducted the pleiotropy-informed association using conjunction conditional FDR (ccFDR) to identify potential pleiotropic genes associated with both RA and SLE. Results We found there existed a significant positive genetic correlation (rg = 0.404, P = 6.01E-10) via LDSC between RA and SLE. Based on the multiple-tissue eQTL weighted integrative analysis and the HMP combination across various tissues, we discovered 14 potential pleiotropic genes by ccFDR, among which four were likely newly novel genes (i.e., INPP5B, OR5K2, RP11-2C24.5, and CTD-3105H18.4). The SNP effect sizes of these pleiotropic genes were typically positively dependent, with an average correlation of 0.579. Functionally, these genes were implicated in multiple auto-immune relevant pathways such as inositol phosphate metabolic process, membrane and glucagon signaling pathway. Conclusion This study reveals common genetic components between RA and SLE and provides candidate associated loci for understanding of molecular mechanism underlying the comorbidity of the two diseases.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Zhou Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Meng Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
26
|
Barbeira AN, Bonazzola R, Gamazon ER, Liang Y, Park Y, Kim-Hellmuth S, Wang G, Jiang Z, Zhou D, Hormozdiari F, Liu B, Rao A, Hamel AR, Pividori MD, Aguet F, Bastarache L, Jordan DM, Verbanck M, Do R, Stephens M, Ardlie K, McCarthy M, Montgomery SB, Segrè AV, Brown CD, Lappalainen T, Wen X, Im HK. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol 2021; 22:49. [PMID: 33499903 PMCID: PMC7836161 DOI: 10.1186/s13059-020-02252-4] [Citation(s) in RCA: 138] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Accepted: 12/18/2020] [Indexed: 12/12/2022] Open
Abstract
The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.
Collapse
Affiliation(s)
- Alvaro N Barbeira
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Rodrigo Bonazzola
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Eric R Gamazon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Data Science Institute, Vanderbilt University, Nashville, TN, USA
- Clare Hall, University of Cambridge, Cambridge, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Yanyu Liang
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - YoSon Park
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Sarah Kim-Hellmuth
- Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Gao Wang
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Zhuoxun Jiang
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Dan Zhou
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Farhad Hormozdiari
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Boxiang Liu
- Department of Biology, Stanford University, Stanford, 94305, CA, USA
| | - Abhiram Rao
- Department of Biology, Stanford University, Stanford, 94305, CA, USA
| | - Andrew R Hamel
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ocular Genomics Institute, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
| | - Milton D Pividori
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - François Aguet
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Department of Medicine, Vanderbilt University, Nashville, TN, USA
- Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Daniel M Jordan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Marie Verbanck
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Université de Paris - EA 7537 BIOSTM, Paris, France
| | - Ron Do
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Kristin Ardlie
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Ayellet V Segrè
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ocular Genomics Institute, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
| | - Christopher D Brown
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
27
|
Barbeira AN, Melia OJ, Liang Y, Bonazzola R, Wang G, Wheeler HE, Aguet F, Ardlie KG, Wen X, Im HK. Fine-mapping and QTL tissue-sharing information improves the reliability of causal gene identification. Genet Epidemiol 2020; 44:854-867. [PMID: 32964524 PMCID: PMC7693040 DOI: 10.1002/gepi.22346] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 06/26/2020] [Accepted: 06/26/2020] [Indexed: 01/01/2023]
Abstract
The integration of transcriptomic studies and genome-wide association studies (GWAS) via imputed expression has seen extensive application in recent years, enabling the functional characterization and causal gene prioritization of GWAS loci. However, the techniques for imputing transcriptomic traits from DNA variation remain underdeveloped. Furthermore, associations found when linking eQTL studies to complex traits through methods like PrediXcan can lead to false positives due to linkage disequilibrium between distinct causal variants. Therefore, the best prediction performance models may not necessarily lead to more reliable causal gene discovery. With the goal of improving discoveries without increasing false positives, we develop and compare multiple transcriptomic imputation approaches using the most recent GTEx release of expression and splicing data on 17,382 RNA-sequencing samples from 948 post-mortem donors in 54 tissues. We find that informing prediction models with posterior causal probability from fine-mapping (dap-g) and borrowing information across tissues (mashr) can lead to better performance in terms of number and proportion of significant associations that are colocalized and the proportion of silver standard genes identified as indicated by precision-recall and receiver operating characteristic curves. All prediction models are made publicly available at predictdb.org.
Collapse
Affiliation(s)
- Alvaro N. Barbeira
- Section of Genetic Medicine, Department of MedicineThe University of ChicagoChicagoIllinois
| | - Owen J. Melia
- Section of Genetic Medicine, Department of MedicineThe University of ChicagoChicagoIllinois
| | - Yanyu Liang
- Section of Genetic Medicine, Department of MedicineThe University of ChicagoChicagoIllinois
| | - Rodrigo Bonazzola
- Section of Genetic Medicine, Department of MedicineThe University of ChicagoChicagoIllinois
| | - Gao Wang
- Department of Human GeneticsThe University of ChicagoChicagoIllinois
| | - Heather E. Wheeler
- Department of BiologyLoyola University ChicagoChicagoIllinois
- Department of Computer ScienceLoyola University ChicagoChicagoIllinois
- Department of Public Health Sciences, Stritch School of MedicineLoyola University ChicagoMaywoodIllinois
| | - François Aguet
- The Broad Institute of MIT and HarvardCambridgeMassachusetts
| | | | - Xiaoquan Wen
- Department of BiostatisticsUniversity of MichiganAnn ArborMichigan
| | - Hae K. Im
- Section of Genetic Medicine, Department of MedicineThe University of ChicagoChicagoIllinois
- Department of Human GeneticsThe University of ChicagoChicagoIllinois
| |
Collapse
|