1
|
Vochteloo M, Deelen P, Vink B, Tsai EA, Runz H, Andreu-Sánchez S, Fu J, Zhernakova A, Westra HJ, Franke L. PICALO: principal interaction component analysis for the identification of discrete technical, cell-type, and environmental factors that mediate eQTLs. Genome Biol 2024; 25:29. [PMID: 38254182 PMCID: PMC10802033 DOI: 10.1186/s13059-023-03151-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 12/20/2023] [Indexed: 01/24/2024] Open
Abstract
Expression quantitative trait loci (eQTL) offer insights into the regulatory mechanisms of trait-associated variants, but their effects often rely on contexts that are unknown or unmeasured. We introduce PICALO, a method for hidden variable inference of eQTL contexts. PICALO identifies and disentangles technical from biological context in heterogeneous blood and brain bulk eQTL datasets. These contexts are biologically informative and reproducible, outperforming cell counts or expression-based principal components. Furthermore, we show that RNA quality and cell type proportions interact with thousands of eQTLs. Knowledge of hidden eQTL contexts may aid in the inference of functional mechanisms underlying disease variants.
Collapse
Affiliation(s)
- Martijn Vochteloo
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Patrick Deelen
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Britt Vink
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Institute for Life Science & Technology, Hanze University of Applied Sciences, Groningen, The Netherlands
| | - Ellen A Tsai
- Translational Sciences, Research and Development, Biogen, Cambridge, MA, USA
| | - Heiko Runz
- Translational Sciences, Research and Development, Biogen, Cambridge, MA, USA
| | - Sergio Andreu-Sánchez
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Jingyuan Fu
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Alexandra Zhernakova
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Harm-Jan Westra
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
- Oncode Institute, Utrecht, The Netherlands.
| | - Lude Franke
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
- Oncode Institute, Utrecht, The Netherlands.
| |
Collapse
|
2
|
Zhang Z, Jung J, Kim A, Suboc N, Gazal S, Mancuso N. A scalable approach to characterize pleiotropy across thousands of human diseases and complex traits using GWAS summary statistics. Am J Hum Genet 2023; 110:1863-1874. [PMID: 37879338 PMCID: PMC10645558 DOI: 10.1016/j.ajhg.2023.09.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/27/2023] Open
Abstract
Genome-wide association studies (GWASs) across thousands of traits have revealed the pervasive pleiotropy of trait-associated genetic variants. While methods have been proposed to characterize pleiotropic components across groups of phenotypes, scaling these approaches to ultra-large-scale biobanks has been challenging. Here, we propose FactorGo, a scalable variational factor analysis model to identify and characterize pleiotropic components using biobank GWAS summary data. In extensive simulations, we observe that FactorGo outperforms the state-of-the-art (model-free) approach tSVD in capturing latent pleiotropic factors across phenotypes while maintaining a similar computational cost. We apply FactorGo to estimate 100 latent pleiotropic factors from GWAS summary data of 2,483 phenotypes measured in European-ancestry Pan-UK BioBank individuals (N = 420,531). Next, we find that factors from FactorGo are more enriched with relevant tissue-specific annotations than those identified by tSVD (p = 2.58E-10) and validate our approach by recapitulating brain-specific enrichment for BMI and the height-related connection between reproductive system and muscular-skeletal growth. Finally, our analyses suggest shared etiologies between rheumatoid arthritis and periodontal condition in addition to alkaline phosphatase as a candidate prognostic biomarker for prostate cancer. Overall, FactorGo improves our biological understanding of shared etiologies across thousands of GWASs.
Collapse
Affiliation(s)
- Zixuan Zhang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Junghyun Jung
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Artem Kim
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Noah Suboc
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
3
|
Jeong Y, Song J, Lee Y, Choi E, Won Y, Kim B, Jang W. A Transcriptome-Wide Analysis of Psoriasis: Identifying the Potential Causal Genes and Drug Candidates. Int J Mol Sci 2023; 24:11717. [PMID: 37511476 PMCID: PMC10380797 DOI: 10.3390/ijms241411717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 07/14/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
Psoriasis is a chronic inflammatory skin disease characterized by cutaneous eruptions and pruritus. Because the genetic backgrounds of psoriasis are only partially revealed, an integrative and rigorous study is necessary. We conducted a transcriptome-wide association study (TWAS) with the new Genotype-Tissue Expression version 8 reference panels, including some tissue and multi-tissue panels that were not used previously. We performed tissue-specific heritability analyses on genome-wide association study data to prioritize the tissue panels for TWAS analysis. TWAS and colocalization (COLOC) analyses were performed with eight tissues from the single-tissue panels and the multi-tissue panels of context-specific genetics (CONTENT) to increase tissue specificity and statistical power. From TWAS, we identified the significant associations of 101 genes in the single-tissue panels and 64 genes in the multi-tissue panels, of which 26 genes were replicated in the COLOC. Functional annotation and network analyses identified that the genes were associated with psoriasis and/or immune responses. We also suggested drug candidates that interact with jointly significant genes through a conditional and joint analysis. Together, our findings may contribute to revealing the underlying genetic mechanisms and provide new insights into treatments for psoriasis.
Collapse
Affiliation(s)
- Yeonbin Jeong
- Department of Life Sciences, Dongguk University, Seoul 04620, Republic of Korea
| | - Jaeseung Song
- Department of Life Sciences, Dongguk University, Seoul 04620, Republic of Korea
| | - Yubin Lee
- Department of Life Sciences, Dongguk University, Seoul 04620, Republic of Korea
| | - Eunyoung Choi
- Department of Life Sciences, Dongguk University, Seoul 04620, Republic of Korea
| | - Youngtae Won
- Department of Life Sciences, Dongguk University, Seoul 04620, Republic of Korea
| | - Byunghyuk Kim
- Department of Life Sciences, Dongguk University-Seoul, Goyang 10326, Republic of Korea
| | - Wonhee Jang
- Department of Life Sciences, Dongguk University, Seoul 04620, Republic of Korea
| |
Collapse
|
4
|
Zhang Z, Jung J, Kim A, Suboc N, Gazal S, Mancuso N. A scalable variational approach to characterize pleiotropic components across thousands of human diseases and complex traits using GWAS summary statistics. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.27.23287801. [PMID: 37034739 PMCID: PMC10081403 DOI: 10.1101/2023.03.27.23287801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Genome-wide association studies (GWAS) across thousands of traits have revealed the pervasive pleiotropy of trait-associated genetic variants. While methods have been proposed to characterize pleiotropic components across groups of phenotypes, scaling these approaches to ultra large-scale biobanks has been challenging. Here, we propose FactorGo, a scalable variational factor analysis model to identify and characterize pleiotropic components using biobank GWAS summary data. In extensive simulations, we observe that FactorGo outperforms the state-of-the-art (model-free) approach tSVD in capturing latent pleiotropic factors across phenotypes, while maintaining a similar computational cost. We apply FactorGo to estimate 100 latent pleiotropic factors from GWAS summary data of 2,483 phenotypes measured in European-ancestry Pan-UK BioBank individuals (N=420,531). Next, we find that factors from FactorGo are more enriched with relevant tissue-specific annotations than those identified by tSVD (P=2.58E-10), and validate our approach by recapitulating brain-specific enrichment for BMI and the height-related connection between reproductive system and muscular-skeletal growth. Finally, our analyses suggest novel shared etiologies between rheumatoid arthritis and periodontal condition, in addition to alkaline phosphatase as a candidate prognostic biomarker for prostate cancer. Overall, FactorGo improves our biological understanding of shared etiologies across thousands of GWAS.
Collapse
Affiliation(s)
- Zixuan Zhang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Junghyun Jung
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Artem Kim
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Noah Suboc
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Steven Gazal
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
- Department of Quantitative and Computational Biology, University of Southern California
- Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
- Department of Quantitative and Computational Biology, University of Southern California
- Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California
| |
Collapse
|
5
|
Johnson KE, Heisel T, Allert M, Fürst A, Yerabandi N, Knights D, Jacobs KM, Lock EF, Bode L, Fields DA, Rudolph MC, Gale CA, Albert FW, Demerath EW, Blekhman R. Human milk variation is shaped by maternal genetics and impacts the infant gut microbiome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.24.525211. [PMID: 36747843 PMCID: PMC9900818 DOI: 10.1101/2023.01.24.525211] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Human milk is a complex mix of nutritional and bioactive components that provide complete nutrition for the infant. However, we lack a systematic knowledge of the factors shaping milk composition and how milk variation influences infant health. Here, we used multi-omic profiling to characterize interactions between maternal genetics, milk gene expression, milk composition, and the infant fecal microbiome in 242 exclusively breastfeeding mother-infant pairs. We identified 487 genetic loci associated with milk gene expression unique to the lactating mammary gland, including loci that impacted breast cancer risk and human milk oligosaccharide concentration. Integrative analyses uncovered connections between milk gene expression and infant gut microbiome, including an association between the expression of inflammation-related genes with IL-6 concentration in milk and the abundance of Bifidobacteria in the infant gut. Our results show how an improved understanding of the genetics and genomics of human milk connects lactation biology with maternal and infant health.
Collapse
Affiliation(s)
- Kelsey E Johnson
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, USA
| | - Timothy Heisel
- Division of Neonatology, Department of Pediatrics, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Mattea Allert
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, USA
| | - Annalee Fürst
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Nikhila Yerabandi
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Dan Knights
- BioTechnology Institute, College of Biological Sciences, University of Minnesota, Minneapolis, MN, USA
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Katherine M Jacobs
- Department of Obstetrics, Gynecology and Women's Health, Division of Maternal-Fetal Medicine, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Eric F Lock
- Division of Biostatistics, University of Minnesota School of Public Health, Minneapolis, MN, USA
| | - Lars Bode
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- Human Milk Institute (HMI) and Mother-Milk-Infant Center of Research Excellence (MOMI CORE), University of California, San Diego, La Jolla, CA, USA
| | - David A Fields
- Department of Pediatrics, the University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Michael C Rudolph
- Harold Hamm Diabetes Center, Department of Physiology, the University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Cheryl A Gale
- Division of Neonatology, Department of Pediatrics, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Frank W Albert
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, USA
| | - Ellen W Demerath
- Division of Epidemiology and Community Health, University of Minnesota School of Public Health, Minneapolis, MN, USA
| | - Ran Blekhman
- Section of Genetic Medicine, Division of Biological Sciences, University of Chicago, Chicago, IL, USA
| |
Collapse
|
6
|
Current challenges in understanding the role of enhancers in disease. Nat Struct Mol Biol 2022; 29:1148-1158. [PMID: 36482255 DOI: 10.1038/s41594-022-00896-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/04/2022] [Indexed: 12/13/2022]
Abstract
Enhancers play a central role in the spatiotemporal control of gene expression and tend to work in a cell-type-specific manner. In addition, they are suggested to be major contributors to phenotypic variation, evolution and disease. There is growing evidence that enhancer dysfunction due to genetic, structural or epigenetic mechanisms contributes to a broad range of human diseases referred to as enhanceropathies. Such mechanisms often underlie the susceptibility to common diseases, but can also play a direct causal role in cancer or Mendelian diseases. Despite the recent gain of insights into enhancer biology and function, we still have a limited ability to predict how enhancer dysfunction impacts gene expression. Here we discuss the major challenges that need to be overcome when studying the role of enhancers in disease etiology and highlight opportunities and directions for future studies, aiming to disentangle the molecular basis of enhanceropathies.
Collapse
|
7
|
Cardoso TF, Bruscadin JJ, Afonso J, Petrini J, Andrade BGN, de Oliveira PSN, Malheiros JM, Rocha MIP, Zerlotini A, Ferraz JBS, Mourão GB, Coutinho LL, Regitano LCA. EEF1A1 transcription cofactor gene polymorphism is associated with muscle gene expression and residual feed intake in Nelore cattle. Mamm Genome 2022; 33:619-628. [PMID: 35816191 DOI: 10.1007/s00335-022-09959-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 06/22/2022] [Indexed: 12/01/2022]
Abstract
Cis-acting effects of noncoding variants on gene expression and regulatory molecules constitute a significant factor for phenotypic variation in complex traits. To provide new insights into the impacts of single-nucleotide polymorphisms (SNPs) on transcription factors (TFs) and transcription cofactors (TcoF) coding genes, we carried out a multi-omic analysis to identify cis-regulatory effects of SNPs on these genes' expression in muscle and describe their association with feed efficiency-related traits in Nelore cattle. As a result, we identified one SNP, the rs137256008C > T, predicted to impact the EEF1A1 gene expression (β = 3.02; P-value = 3.51E-03) and the residual feed intake trait (β = - 3.47; P-value = 0.02). This SNP was predicted to modify transcription factor sites and overlaps with several QTL for feed efficiency traits. In addition, co-expression network analyses showed that animals containing the T allele of the rs137256008 SNP may be triggering changes in the gene network. Therefore, our analyses reinforce and contribute to a better understanding of the biological mechanisms underlying gene expression control of feed efficiency traits in bovines. The cis-regulatory SNP can be used as biomarker for feed efficiency in Nelore cattle.
Collapse
Affiliation(s)
- T F Cardoso
- Embrapa Southeast Livestock, São Carlos, SP, Brazil
| | - J J Bruscadin
- Program on Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, SP, Brazil
| | - J Afonso
- Embrapa Southeast Livestock, São Carlos, SP, Brazil
| | - J Petrini
- Department of Animal Science, "Luiz de Queiroz" College of Agriculture, University of São Paulo/ESALQ, Piracicaba, SP, Brazil
| | - B G N Andrade
- Computer Science Department, Munster Technological University, MTU/ADAPT, Cork, Ireland
| | - P S N de Oliveira
- Program on Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, SP, Brazil
| | - J M Malheiros
- Federal University of Latin American Integration, Foz do Iguaçu, Paraná, Brazil
| | - M I P Rocha
- Program on Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, SP, Brazil
| | - A Zerlotini
- Embrapa Agricultural Informatics, Campinas, SP, Brazil
| | - J B S Ferraz
- Department of Veterinary Medicine, University of São Paulo/FZEA, Pirassununga, Brazil
| | - G B Mourão
- Department of Animal Science, "Luiz de Queiroz" College of Agriculture, University of São Paulo/ESALQ, Piracicaba, SP, Brazil
| | - L L Coutinho
- Department of Animal Science, "Luiz de Queiroz" College of Agriculture, University of São Paulo/ESALQ, Piracicaba, SP, Brazil
| | | |
Collapse
|
8
|
Flynn E, Lappalainen T. Functional Characterization of Genetic Variant Effects on Expression. Annu Rev Biomed Data Sci 2022; 5:119-139. [PMID: 35483347 DOI: 10.1146/annurev-biodatasci-122120-010010] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Thousands of common genetic variants in the human population have been associated with disease risk and phenotypic variation by genome-wide association studies (GWAS). However, the majority of GWAS variants fall into noncoding regions of the genome, complicating our understanding of their regulatory functions, and few molecular mechanisms of GWAS variant effects have been clearly elucidated. Here, we set out to review genetic variant effects, focusing on expression quantitative trait loci (eQTLs), including their utility in interpreting GWAS variant mechanisms. We discuss the interrelated challenges and opportunities for eQTL analysis, covering determining causal variants, elucidating molecular mechanisms of action, and understanding context variability. Addressing these questions can enable better functional characterization of disease-associated loci and provide insights into fundamental biological questions of the noncoding genetic regulatory code and its control of gene expression. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Elise Flynn
- New York Genome Center, New York, NY, USA; , .,Department of Systems Biology, Columbia University, New York, NY, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA; , .,Department of Systems Biology, Columbia University, New York, NY, USA.,Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
9
|
Arvanitis M, Tayeb K, Strober BJ, Battle A. Redefining tissue specificity of genetic regulation of gene expression in the presence of allelic heterogeneity. Am J Hum Genet 2022; 109:223-239. [PMID: 35085493 PMCID: PMC8874223 DOI: 10.1016/j.ajhg.2022.01.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Accepted: 01/05/2022] [Indexed: 01/03/2023] Open
Abstract
Uncovering the functional impact of genetic variation on gene expression is important in understanding tissue biology and the pathogenesis of complex traits. Despite large efforts to map expression quantitative trait loci (eQTLs) across many human tissues, our ability to translate those findings to understanding human disease has been incomplete, and the majority of disease loci are not explained by association with expression of a target gene. Cell-type specificity and the presence of multiple independent causal variants for many eQTLs are potential confounders contributing to the apparent discrepancy with disease loci. In this study, we investigate the tissue specificity of genetic effects on gene expression and the overlap with disease loci while considering the presence of multiple causal variants within and across tissues. We find evidence of pervasive tissue specificity of eQTLs, often masked by linkage disequilibrium that misleads traditional meta-analytic approaches. We propose CAFEH (colocalization and fine-mapping in the presence of allelic heterogeneity), a Bayesian method that integrates genetic association data across multiple traits, incorporating linkage disequilibrium to identify causal variants. CAFEH outperforms previous approaches in colocalization and fine-mapping. Using CAFEH, we show that genes with highly tissue-specific genetic effects are under greater selection, enriched in differentiation and developmental processes, and more likely to be involved in human disease. Last, we demonstrate that CAFEH can efficiently leverage the widespread allelic heterogeneity in genetic regulation of gene expression to prioritize the target tissue in genome-wide association complex trait loci, thereby improving our ability to interpret complex trait genetics.
Collapse
Affiliation(s)
- Marios Arvanitis
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21211, USA; Department of Medicine, Division of Cardiology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Karl Tayeb
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Benjamin J Strober
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21211, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA; Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205, USA.
| |
Collapse
|
10
|
Gene Expression Analysis through Parallel Non-Negative Matrix Factorization. COMPUTATION 2021. [DOI: 10.3390/computation9100106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Genetic expression analysis is a principal tool to explain the behavior of genes in an organism when exposed to different experimental conditions. In the state of art, many clustering algorithms have been proposed. It is overwhelming the amount of biological data whose high-dimensional structure exceeds mostly current computational architectures. The computational time and memory consumption optimization actually become decisive factors in choosing clustering algorithms. We propose a clustering algorithm based on Non-negative Matrix Factorization and K-means to reduce data dimensionality but whilst preserving the biological context and prioritizing gene selection, and it is implemented within parallel GPU-based environments through the CUDA library. A well-known dataset is used in our tests and the quality of the results is measured through the Rand and Accuracy Index. The results show an increase in the acceleration of 6.22× compared to the sequential version. The algorithm is competitive in the biological datasets analysis and it is invariant with respect to the classes number and the size of the gene expression matrix.
Collapse
|
11
|
Kerimov N, Hayhurst JD, Peikova K, Manning JR, Walter P, Kolberg L, Samoviča M, Sakthivel MP, Kuzmin I, Trevanion SJ, Burdett T, Jupp S, Parkinson H, Papatheodorou I, Yates AD, Zerbino DR, Alasoo K. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat Genet 2021; 53:1290-1299. [PMID: 34493866 PMCID: PMC8423625 DOI: 10.1038/s41588-021-00924-w] [Citation(s) in RCA: 142] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 07/26/2021] [Indexed: 12/15/2022]
Abstract
Many gene expression quantitative trait locus (eQTL) studies have published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization. However, technical differences between these datasets are a barrier to their widespread use. Consequently, target genes for most genome-wide association study (GWAS) signals have still not been identified. In the present study, we present the eQTL Catalogue ( https://www.ebi.ac.uk/eqtl ), a resource of quality-controlled, uniformly re-computed gene expression and splicing QTLs from 21 studies. We find that, for matching cell types and tissues, the eQTL effect sizes are highly reproducible between studies. Although most QTLs were shared between most bulk tissues, we identified a greater diversity of cell-type-specific QTLs from purified cell types, a subset of which also manifested as new disease co-localizations. Our summary statistics are freely available to enable the systematic interpretation of human GWAS associations across many cell types and tissues.
Collapse
Affiliation(s)
- Nurlan Kerimov
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Open Targets, Wellcome Genome Campus, Cambridge, UK
| | - James D Hayhurst
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Kateryna Peikova
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Jonathan R Manning
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Peter Walter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Liis Kolberg
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Marija Samoviča
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Manoj Pandian Sakthivel
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Stephen J Trevanion
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Tony Burdett
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Simon Jupp
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Helen Parkinson
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Irene Papatheodorou
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Andrew D Yates
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Daniel R Zerbino
- Open Targets, Wellcome Genome Campus, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia.
- Open Targets, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|