1
|
Little P, Liu S, Zhabotynsky V, Li Y, Lin DY, Sun W. A computational method for cell type-specific expression quantitative trait loci mapping using bulk RNA-seq data. Nat Commun 2023; 14:3030. [PMID: 37231002 PMCID: PMC10212972 DOI: 10.1038/s41467-023-38795-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 05/16/2023] [Indexed: 05/27/2023] Open
Abstract
Mapping cell type-specific gene expression quantitative trait loci (ct-eQTLs) is a powerful way to investigate the genetic basis of complex traits. A popular method for ct-eQTL mapping is to assess the interaction between the genotype of a genetic locus and the abundance of a specific cell type using a linear model. However, this approach requires transforming RNA-seq count data, which distorts the relation between gene expression and cell type proportions and results in reduced power and/or inflated type I error. To address this issue, we have developed a statistical method called CSeQTL that allows for ct-eQTL mapping using bulk RNA-seq count data while taking advantage of allele-specific expression. We validated the results of CSeQTL through simulations and real data analysis, comparing CSeQTL results to those obtained from purified bulk RNA-seq data or single cell RNA-seq data. Using our ct-eQTL findings, we were able to identify cell types relevant to 21 categories of human traits.
Collapse
Affiliation(s)
- Paul Little
- Biostatistics Program, Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| | - Si Liu
- Biostatistics Program, Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Vasyl Zhabotynsky
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Wei Sun
- Biostatistics Program, Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
2
|
Abstract
Despite recent flourish of proposals on variable selection, genome-wide multiple loci mapping remains to be challenging. The majority of existing variable selection methods impose a model, and often the homoscedastic linear model, prior to selection. However, the true association between the phenotypical trait and the genetic markers is rarely known a priori, and the presence of epistatic interactions makes the association more complex than a linear relation. Model-free variable selection offers a useful alternative in this context, but the fact that the number of markers p often far exceeds the number of experimental units n renders all the existing model-free solutions that require n > p inapplicable. In this article, we examine a number of model-free variable selection methods for small-n-large-p regressions in the context of genome-wide multiple loci mapping. We propose and advocate a multivariate group-wise adaptive penalization solution, which requires no model prespecification and thus works for complex trait-marker association, and handles one variable at a time so that works for n < p. Effectiveness of the new method is demonstrated through both intensive simulations and a comprehensive real data analysis across 6100 gene expression traits.
Collapse
Affiliation(s)
- Wei Sun
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC, U.S.A
| | - Lexin Li
- Department of Statistics, North Carolina State University, Raleigh, NC, U.S.A
| |
Collapse
|
3
|
Rapid and robust resampling-based multiple-testing correction with application in a genome-wide expression quantitative trait loci study. Genetics 2012; 190:1511-20. [PMID: 22298711 DOI: 10.1534/genetics.111.137737] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. In a typical eQTL study, the huge number of genetic markers and expression traits and their complicated correlations present a challenging multiple-testing correction problem. The resampling-based test using permutation or bootstrap procedures is a standard approach to address the multiple-testing problem in eQTL studies. A brute force application of the resampling-based test to large-scale eQTL data sets is often computationally infeasible. Several computationally efficient methods have been proposed to calculate approximate resampling-based P-values. However, these methods rely on certain assumptions about the correlation structure of the genetic markers, which may not be valid for certain studies. We propose a novel algorithm, rapid and exact multiple testing correction by resampling (REM), to address this challenge. REM calculates the exact resampling-based P-values in a computationally efficient manner. The computational advantage of REM lies in its strategy of pruning the search space by skipping genetic markers whose upper bounds on test statistics are small. REM does not rely on any assumption about the correlation structure of the genetic markers. It can be applied to a variety of resampling-based multiple-testing correction methods including permutation and bootstrap methods. We evaluate REM on three eQTL data sets (yeast, inbred mouse, and human rare variants) and show that it achieves accurate resampling-based P-value estimation with much less computational cost than existing methods. The software is available at http://csbio.unc.edu/eQTL.
Collapse
|