1
|
Das Adhikari S, Cui Y, Wang J. BayesKAT: bayesian optimal kernel-based test for genetic association studies reveals joint genetic effects in complex diseases. Brief Bioinform 2024; 25:bbae182. [PMID: 38653490 PMCID: PMC11036342 DOI: 10.1093/bib/bbae182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/10/2024] [Accepted: 04/05/2024] [Indexed: 04/25/2024] Open
Abstract
Genome-wide Association Studies (GWAS) methods have identified individual single-nucleotide polymorphisms (SNPs) significantly associated with specific phenotypes. Nonetheless, many complex diseases are polygenic and are controlled by multiple genetic variants that are usually non-linearly dependent. These genetic variants are marginally less effective and remain undetected in GWAS analysis. Kernel-based tests (KBT), which evaluate the joint effect of a group of genetic variants, are therefore critical for complex disease analysis. However, choosing different kernel functions in KBT can significantly influence the type I error control and power, and selecting the optimal kernel remains a statistically challenging task. A few existing methods suffer from inflated type 1 errors, limited scalability, inferior power or issues of ambiguous conclusions. Here, we present a new Bayesian framework, BayesKAT (https://github.com/wangjr03/BayesKAT), which overcomes these kernel specification issues by selecting the optimal composite kernel adaptively from the data while testing genetic associations simultaneously. Furthermore, BayesKAT implements a scalable computational strategy to boost its applicability, especially for high-dimensional cases where other methods become less effective. Based on a series of performance comparisons using both simulated and real large-scale genetics data, BayesKAT outperforms the available methods in detecting complex group-level associations and controlling type I errors simultaneously. Applied on a variety of groups of functionally related genetic variants based on biological pathways, co-expression gene modules and protein complexes, BayesKAT deciphers the complex genetic basis and provides mechanistic insights into human diseases.
Collapse
Affiliation(s)
- Sikta Das Adhikari
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
2
|
Chen C, Liu Y, Luo M, Yang J, Chen Y, Wang R, Zhou J, Zang Y, Diao L, Han L. PancanQTLv2.0: a comprehensive resource for expression quantitative trait loci across human cancers. Nucleic Acids Res 2024; 52:D1400-D1406. [PMID: 37870463 PMCID: PMC10767806 DOI: 10.1093/nar/gkad916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 09/29/2023] [Accepted: 10/06/2023] [Indexed: 10/24/2023] Open
Abstract
Expression quantitative trait locus (eQTL) analysis is a powerful tool used to investigate genetic variations in complex diseases, including cancer. We previously developed a comprehensive database, PancanQTL, to characterize cancer eQTLs using The Cancer Genome Atlas (TCGA) dataset, and linked eQTLs with patient survival and GWAS risk variants. Here, we present an updated version, PancanQTLv2.0 (https://hanlaboratory.com/PancanQTLv2/), with advancements in fine-mapping causal variants for eQTLs, updating eQTLs overlapping with GWAS linkage disequilibrium regions and identifying eQTLs associated with drug response and immune infiltration. Through fine-mapping analysis, we identified 58 747 fine-mapped eQTLs credible sets, providing mechanic insights of gene regulation in cancer. We further integrated the latest GWAS Catalog and identified a total of 84 592 135 linkage associations between eQTLs and the existing GWAS loci, which represents a remarkable ∼50-fold increase compared to the previous version. Additionally, PancanQTLv2.0 uncovered 659516 associations between eQTLs and drug response and identified 146948 associations between eQTLs and immune cell abundance, providing potentially clinical utility of eQTLs in cancer therapy. PancanQTLv2.0 expanded the resources available for investigating gene expression regulation in human cancers, leading to advancements in cancer research and precision oncology.
Collapse
Affiliation(s)
- Chengxuan Chen
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX 77030, USA
| | - Yuan Liu
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX 77030, USA
| | - Mei Luo
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
| | - Jingwen Yang
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
| | - Yamei Chen
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
| | - Runhao Wang
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
| | - Joseph Zhou
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX 77030, USA
| | - Yong Zang
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
| | - Lixia Diao
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Leng Han
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX 77030, USA
| |
Collapse
|
3
|
Das Adhikari S, Cui Y, Wang J. BayesKAT: Bayesian Optimal Kernel-based Test for genetic association studies reveals joint genetic effects in complex diseases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.18.562824. [PMID: 37905124 PMCID: PMC10614916 DOI: 10.1101/2023.10.18.562824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
GWAS methods have identified individual SNPs significantly associated with specific phenotypes. Nonetheless, many complex diseases are polygenic and are controlled by multiple genetic variants that are usually non-linearly dependent. These genetic variants are marginally less effective and remain undetected in GWAS analysis. Kernel-based tests (KBT), which evaluate the joint effect of a group of genetic variants, are therefore critical for complex disease analysis. However, choosing different kernel functions in KBT can significantly influence the type I error control and power, and selecting the optimal kernel remains a statistically challenging task. A few existing methods suffer from inflated type 1 errors, limited scalability, inferior power, or issues of ambiguous conclusions. Here, we present a new Bayesian framework, BayesKAT( https://github.com/wangjr03/BayesKAT ), which overcomes these kernel specification issues by selecting the optimal composite kernel adaptively from the data while testing genetic associations simultaneously. Furthermore, BayesKAT implements a scalable computational strategy to boost its applicability, especially for high-dimensional cases where other methods become less effective. Based on a series of performance comparisons using both simulated and real large-scale genetics data, BayesKAT outperforms the available methods in detecting complex group-level associations and controlling type I errors simultaneously. Applied on a variety of groups of functionally related genetic variants based on biological pathways, co-expression gene modules, and protein complexes, BayesKAT deciphers the complex genetic basis and provides mechanistic insights into human diseases.
Collapse
|
4
|
Swart PC, Du Plessis M, Rust C, Womersley JS, van den Heuvel LL, Seedat S, Hemmings SMJ. Identifying genetic loci that are associated with changes in gene expression in PTSD in a South African cohort. J Neurochem 2023; 166:705-719. [PMID: 37522158 PMCID: PMC10953375 DOI: 10.1111/jnc.15919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 06/30/2023] [Accepted: 07/05/2023] [Indexed: 08/01/2023]
Abstract
The molecular mechanisms underlying posttraumatic stress disorder (PTSD) are yet to be fully elucidated, especially in underrepresented population groups. Expression quantitative trait loci (eQTLs) are DNA sequence variants that influence gene expression, in a local (cis-) or distal (trans-) manner, and subsequently impact cellular, tissue, and system physiology. This study aims to identify genetic loci associated with gene expression changes in a South African PTSD cohort. Genome-wide genotype and RNA-sequencing data were obtained from 32 trauma-exposed controls and 35 PTSD cases of mixed-ancestry, as part of the SHARED ROOTS project. The first approach utilised 108 937 single-nucleotide polymorphisms (SNPs) (MAF > 10%) and 11 312 genes with Matrix eQTL to map potential eQTLs, while controlling for covariates as appropriate. The second analysis was focused on 5638 SNPs related to a previously calculated PTSD polygenic risk score for this cohort. SNP-gene pairs were considered eQTLs if they surpassed Bonferroni correction and had a false discovery rate <0.05. We did not identify eQTLs that significantly influenced gene expression in a PTSD-dependent manner. However, several known cis-eQTLs, independent of PTSD diagnosis, were observed. rs8521 (C > T) was associated with TAGLN and SIDT2 expression, and rs11085906 (C > T) was associated with ZNF333 expression. This exploratory study provides insight into the molecular mechanisms associated with PTSD in a non-European, admixed sample population. This study was limited by the cross-sectional design and insufficient statistical power. Overall, this study should encourage further multi-omics approaches towards investigating PTSD in diverse populations.
Collapse
Affiliation(s)
- Patricia C. Swart
- Department of Psychiatry, Faculty of Medicine and Health SciencesStellenbosch UniversityCape TownSouth Africa
- South African Medical Research Council/Stellenbosch University Genomics of Brain Disorders UnitCape TownSouth Africa
| | - Morne Du Plessis
- Department of Psychiatry, Faculty of Medicine and Health SciencesStellenbosch UniversityCape TownSouth Africa
- South African Medical Research Council/Stellenbosch University Genomics of Brain Disorders UnitCape TownSouth Africa
| | - Carlien Rust
- Department of Psychiatry, Faculty of Medicine and Health SciencesStellenbosch UniversityCape TownSouth Africa
- South African Medical Research Council/Stellenbosch University Genomics of Brain Disorders UnitCape TownSouth Africa
| | - Jacqueline S. Womersley
- Department of Psychiatry, Faculty of Medicine and Health SciencesStellenbosch UniversityCape TownSouth Africa
- South African Medical Research Council/Stellenbosch University Genomics of Brain Disorders UnitCape TownSouth Africa
| | - Leigh L. van den Heuvel
- Department of Psychiatry, Faculty of Medicine and Health SciencesStellenbosch UniversityCape TownSouth Africa
- South African Medical Research Council/Stellenbosch University Genomics of Brain Disorders UnitCape TownSouth Africa
| | - Soraya Seedat
- Department of Psychiatry, Faculty of Medicine and Health SciencesStellenbosch UniversityCape TownSouth Africa
- South African Medical Research Council/Stellenbosch University Genomics of Brain Disorders UnitCape TownSouth Africa
| | - Sian M. J. Hemmings
- Department of Psychiatry, Faculty of Medicine and Health SciencesStellenbosch UniversityCape TownSouth Africa
- South African Medical Research Council/Stellenbosch University Genomics of Brain Disorders UnitCape TownSouth Africa
| |
Collapse
|
5
|
Zhong V, Archibald BN, Brophy JAN. Transcriptional and post-transcriptional controls for tuning gene expression in plants. CURRENT OPINION IN PLANT BIOLOGY 2023; 71:102315. [PMID: 36462457 DOI: 10.1016/j.pbi.2022.102315] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/22/2022] [Accepted: 10/27/2022] [Indexed: 06/17/2023]
Abstract
Plant biotechnologists seek to modify plants through genetic reprogramming, but our ability to precisely control gene expression in plants is still limited. Here, we review transcription and translation in the model plants Arabidopsis thaliana and Nicotiana benthamiana with an eye toward control points that may be used to predictably modify gene expression. We highlight differences in gene expression requirements between these plants and other species, and discuss the ways in which our understanding of gene expression has been used to engineer plants. This review is intended to serve as a resource for plant scientists looking to achieve precise control over gene expression.
Collapse
Affiliation(s)
- Vivian Zhong
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Bella N Archibald
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
6
|
MIR retrotransposons link the epigenome and the transcriptome of coding genes in acute myeloid leukemia. Nat Commun 2022; 13:6524. [PMID: 36316347 PMCID: PMC9622910 DOI: 10.1038/s41467-022-34211-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 10/18/2022] [Indexed: 11/06/2022] Open
Abstract
DNMT3A and IDH1/2 mutations combinatorically regulate the transcriptome and the epigenome in acute myeloid leukemia; yet the mechanisms of this interplay are unknown. Using a systems approach within topologically associating domains, we find that genes with significant expression-methylation correlations are enriched in signaling and metabolic pathways. The common denominator across these methylation-regulated genes is the density in MIR retrotransposons of their introns. Moreover, a discrete number of CpGs overlapping enhancers are responsible for regulating most of these genes. Established mouse models recapitulate the dependency of MIR-rich genes on the balanced expression of epigenetic modifiers, while projection of leukemic profiles onto normal hematopoiesis ones further consolidates the dependencies of methylation-regulated genes on MIRs. Collectively, MIR elements on genes and enhancers are susceptible to changes in DNA methylation activity and explain the cooperativity of proteins in this pathway in normal and malignant hematopoiesis.
Collapse
|
7
|
Tian H, He Y, Xue Y, Gao YQ. Expression regulation of genes is linked to their CpG density distributions around transcription start sites. Life Sci Alliance 2022; 5:5/9/e202101302. [PMID: 35580989 PMCID: PMC9113945 DOI: 10.26508/lsa.202101302] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 05/07/2022] [Accepted: 05/09/2022] [Indexed: 11/24/2022] Open
Abstract
The CpG dinucleotide and its methylation behaviors play vital roles in gene regulation. Previous studies have divided genes into several categories based on the CpG intensity around transcription starting sites and found that housekeeping genes tend to possess high CpG density, whereas tissue-specific genes are generally characterized by low CpG density. In this study, we investigated how the CpG density distribution of a gene affects its transcription and regulation pattern. Based on the CpG density distribution around transcription starting site, by means of a semi-supervised neural network we designed, which took data augmentation into account, we divided the human genes into three categories, and genes within each cluster shared similar CpG density distribution. Not only sequence properties, these different clusters exhibited distinctly different structural features, regulatory mechanisms, correlation patterns between the expression level and CpG/TpG density, and expression and epigenetic mark variations during tumorigenesis. For instance, the activation of cluster 3 genes relies more on 3D genome reorganization, compared with cluster 1 and 2 genes, whereas cluster 2 genes showed the strongest correlation between gene expression and H3K27me3. Genes exhibiting uncoupled correlation between gene regulation and histone modifications are mainly in cluster 3. These results emphasized that the usage of epigenetic marks in gene regulation is partially rooted in the sequence property of genes such as their CpG density distribution and explained to some extent why the relation between epigenetic marks and gene expression is controversial.
Collapse
Affiliation(s)
- Hao Tian
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Yueying He
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Yue Xue
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Yi Qin Gao
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China .,Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.,Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
| |
Collapse
|
8
|
Panara V, Monteiro R, Koltowska K. Epigenetic Regulation of Endothelial Cell Lineages During Zebrafish Development-New Insights From Technical Advances. Front Cell Dev Biol 2022; 10:891538. [PMID: 35615697 PMCID: PMC9125237 DOI: 10.3389/fcell.2022.891538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 04/10/2022] [Indexed: 01/09/2023] Open
Abstract
Epigenetic regulation is integral in orchestrating the spatiotemporal regulation of gene expression which underlies tissue development. The emergence of new tools to assess genome-wide epigenetic modifications has enabled significant advances in the field of vascular biology in zebrafish. Zebrafish represents a powerful model to investigate the activity of cis-regulatory elements in vivo by combining technologies such as ATAC-seq, ChIP-seq and CUT&Tag with the generation of transgenic lines and live imaging to validate the activity of these regulatory elements. Recently, this approach led to the identification and characterization of key enhancers of important vascular genes, such as gata2a, notch1b and dll4. In this review we will discuss how the latest technologies in epigenetics are being used in the zebrafish to determine chromatin states and assess the function of the cis-regulatory sequences that shape the zebrafish vascular network.
Collapse
Affiliation(s)
- Virginia Panara
- Immunology Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Rui Monteiro
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom,Birmingham Centre of Genome Biology, University of Birmingham, Birmingham, United Kingdom
| | - Katarzyna Koltowska
- Immunology Genetics and Pathology, Uppsala University, Uppsala, Sweden,*Correspondence: Katarzyna Koltowska,
| |
Collapse
|