2
|
Yuan K, Zeng T, Chen L. Interpreting Functional Impact of Genetic Variations by Network QTL for Genotype–Phenotype Association Study. Front Cell Dev Biol 2022; 9:720321. [PMID: 35155440 PMCID: PMC8826544 DOI: 10.3389/fcell.2021.720321] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 12/13/2021] [Indexed: 12/18/2022] Open
Abstract
An enormous challenge in the post-genome era is to annotate and resolve the consequences of genetic variation on diverse phenotypes. The genome-wide association study (GWAS) is a well-known method to identify potential genetic loci for complex traits from huge genetic variations, following which it is crucial to identify expression quantitative trait loci (eQTL). However, the conventional eQTL methods usually disregard the systematical role of single-nucleotide polymorphisms (SNPs) or genes, thereby overlooking many network-associated phenotypic determinates. Such a problem motivates us to recognize the network-based quantitative trait loci (QTL), i.e., network QTL (nQTL), which is to detect the cascade association as genotype → network → phenotype rather than conventional genotype → expression → phenotype in eQTL. Specifically, we develop the nQTL framework on the theory and approach of single-sample networks, which can identify not only network traits (e.g., the gene subnetwork associated with genotype) for analyzing complex biological processes but also network signatures (e.g., the interactive gene biomarker candidates screened from network traits) for characterizing targeted phenotype and corresponding subtypes. Our results show that the nQTL framework can efficiently capture associations between SNPs and network traits (i.e., edge traits) in various simulated data scenarios, compared with traditional eQTL methods. Furthermore, we have carried out nQTL analysis on diverse biological and biomedical datasets. Our analysis is effective in detecting network traits for various biological problems and can discover many network signatures for discriminating phenotypes, which can help interpret the influence of nQTL on disease subtyping, disease prognosis, drug response, and pathogen factor association. Particularly, in contrast to the conventional approaches, the nQTL framework could also identify many network traits from human bulk expression data, validated by matched single-cell RNA-seq data in an independent or unsupervised manner. All these results strongly support that nQTL and its detection framework can simultaneously explore the global genotype–network–phenotype associations and the underlying network traits or network signatures with functional impact and importance.
Collapse
Affiliation(s)
- Kai Yuan
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
- Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Guangzhou Laboratory, Guangzhou, China
- *Correspondence: Tao Zeng, ; Luonan Chen,
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
- *Correspondence: Tao Zeng, ; Luonan Chen,
| |
Collapse
|
3
|
Liu Y, Baggerly KA, Orouji E, Manyam G, Chen H, Lam M, Davis JS, Lee MS, Broom BM, Menter DG, Rai K, Kopetz S, Morris JS. Methylation-eQTL Analysis in Cancer Research. Bioinformatics 2021; 37:4014-4022. [PMID: 34117863 PMCID: PMC9188481 DOI: 10.1093/bioinformatics/btab443] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 03/15/2021] [Accepted: 06/11/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION DNA methylation is a key epigenetic factor regulating gene expression. While promoter methylation has been well studied, recent publications have revealed that functionally important methylation also occurs in intergenic and distal regions, and varies across genes and tissue types. Given the growing importance of inter-platform integrative genomic analyses, there is an urgent need to develop methods to discover and characterize gene-level relationships between methylation and expression. RESULTS We introduce a novel sequential penalized regression approach to identify methylation-expression quantitative trait loci (methyl-eQTLs), a term that we have coined to represent, for each gene and tissue type, a sparse set of CpG loci best explaining gene expression and accompanying weights indicating direction and strength of association. Using TCGA and MD Anderson colorectal cohorts to build and validate our models, we demonstrate our strategy better explains expression variability than current commonly used gene-level methylation summaries. The methyl-eQTLs identified by our approach can be used to construct gene-level methylation summaries that are maximally correlated with gene expression for use in integrative models, and produce a tissue-specific summary of which genes appear to be strongly regulated by methylation. Our results introduce an important resource to the biomedical community for integrative genomics analyses involving DNA methylation. AVAILABILITY AND IMPLEMENTATION We produce an R Shiny app (https://rstudio-prd-c1.pmacs.upenn.edu/methyl-eQTL/) that interactively presents methyl-eQTL results for colorectal, breast, and pancreatic cancer. The source R code for this work is provided in the supplement. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yusha Liu
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
| | - Keith A Baggerly
- Department of Bioinformatics and Computational Biology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Elias Orouji
- Department of Genomic Medicine, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Ganiraju Manyam
- Department of Bioinformatics and Computational Biology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Huiqin Chen
- Department of Bioinformatics and Computational Biology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Michael Lam
- Department of Gastrointestinal Medical Oncology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Jennifer S Davis
- Department of Epidemiology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Michael S Lee
- Department of Medicine, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Bradley M Broom
- Department of Bioinformatics and Computational Biology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - David G Menter
- Department of Gastrointestinal Medical Oncology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Kunal Rai
- Department of Genomic Medicine, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Scott Kopetz
- Department of Gastrointestinal Medical Oncology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Jeffrey S Morris
- Department of Biostatistics, Epidemiology and Informatics, The University of Pennsylvania, Philadelphia, PA 19104-6021, USA
| |
Collapse
|
4
|
Nariai N, Greenwald WW, DeBoever C, Li H, Frazer KA. Efficient Prioritization of Multiple Causal eQTL Variants via Sparse Polygenic Modeling. Genetics 2017; 207:1301-1312. [PMID: 29074555 PMCID: PMC5714449 DOI: 10.1534/genetics.117.300435] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Accepted: 10/13/2017] [Indexed: 11/18/2022] Open
Abstract
Expression quantitative trait loci (eQTL) studies have typically used single-variant association analysis to identify genetic variants correlated with gene expression. However, this approach has several drawbacks: causal variants cannot be distinguished from nonfunctional variants in strong linkage disequilibrium, combined effects from multiple causal variants cannot be captured, and low-frequency (<5% MAF) eQTL variants are difficult to identify. While these issues possibly could be overcome by using sparse polygenic models, which associate multiple genetic variants with gene expression simultaneously, the predictive performance of these models for eQTL studies has not been evaluated. Here, we assessed the ability of three sparse polygenic models (Lasso, Elastic Net, and BSLMM) to identify causal variants, and compared their efficacy to single-variant association analysis and a fine-mapping model. Using simulated data, we determined that, while these methods performed similarly when there was one causal SNP present at a gene, BSLMM substantially outperformed single-variant association analysis for prioritizing causal eQTL variants when multiple causal eQTL variants were present (1.6- to 5.2-fold higher recall at 20% precision), and identified up to 2.3-fold more low frequency variants as the top eQTL SNP. Analysis of real RNA-seq and whole-genome sequencing data of 131 iPSC samples showed that the eQTL SNPs identified by BSLMM had a higher functional enrichment in DHS sites and were more often low-frequency than those identified with single-variant association analysis. Our study showed that BSLMM is a more effective approach than single-variant association analysis for prioritizing multiple causal eQTL variants at a single gene.
Collapse
Affiliation(s)
- Naoki Nariai
- Department of Pediatrics and Rady Children's Hospital, University of California, San Diego, La Jolla, California 92093-0761
| | - William W Greenwald
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, California 92093-0761
| | - Christopher DeBoever
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, California 92093-0761
| | - He Li
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, California 92093-0761
| | - Kelly A Frazer
- Department of Pediatrics and Rady Children's Hospital, University of California, San Diego, La Jolla, California 92093-0761
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, California 92093-0761
| |
Collapse
|