1
|
Koo HJ, Pan W. Are trait-associated genes clustered together in a gene network? Genet Epidemiol 2024. [PMID: 38472164 DOI: 10.1002/gepi.22557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 01/25/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024]
Abstract
Genome-wide association studies (GWAS) have provided an abundance of information about the genetic variants and their loci that are associated to complex traits and diseases. However, due to linkage disequilibrium (LD) and noncoding regions of loci, it remains a challenge to pinpoint the causal genes. Gene network-based approaches, paired with network diffusion methods, have been proposed to prioritize causal genes and to boost statistical power in GWAS based on the assumption that trait-associated genes are clustered in a gene network. Due to the difficulty in mapping trait-associated variants to genes in GWAS, this assumption has never been directly or rigorously tested empirically. On the other hand, whole exome sequencing (WES) data focuses on the protein-coding regions, directly identifying trait-associated genes. In this study, we tested the assumption by leveraging the recently available exome-based association statistics from the UK Biobank WES data along with two types of networks. We found that almost all trait-associated genes were significantly more proximal to each other than randomly selected genes within both networks. These results support the assumption that trait-associated genes are clustered in gene networks, which can be further leveraged to boost the power of GWAS such as by introducing less stringent p value thresholds.
Collapse
Affiliation(s)
- Hyun Jung Koo
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, USA
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
2
|
Cao X, Zhang L, Islam MK, Zhao M, He C, Zhang K, Liu S, Sha Q, Wei H. TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization. NAR Genom Bioinform 2023; 5:lqad083. [PMID: 37711605 PMCID: PMC10498345 DOI: 10.1093/nargab/lqad083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 05/30/2023] [Accepted: 08/30/2023] [Indexed: 09/16/2023] Open
Abstract
Four statistical selection methods for inferring transcription factor (TF)-target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed for inferring pathway gene regulatory networks (GRNs) by combining Huber or MSE loss function with a network (Net)-based penalty. To solve these regressions, we ameliorated an accelerated proximal gradient descent (APGD) algorithm to optimize parameter selection processes, resulting in an equally effective but much faster algorithm than the commonly used convex optimization solver. The synthetic data generated in a general setting was used to test four TF-TG identification methods, ENET-based methods performed better than Lasso-based methods. Synthetic data generated from two network settings was used to test Huber-Net and MSE-Net, which outperformed all other methods. The TF-TG identification methods were also tested with SND1 and gl3 overexpression transcriptomic data, Huber-ENET and MSE-ENET outperformed all other methods when genome-wide predictions were performed. The TF-TG identification methods fill the gap of lacking a method for genome-wide TG prediction of a TF, and potential for validating ChIP/DAP-seq results, while the two Net-based methods are instrumental for predicting pathway GRNs.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Ling Zhang
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Md Khairul Islam
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Mingxia Zhao
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Cheng He
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Hairong Wei
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| |
Collapse
|