1
|
Gao C, Wei H, Zhang K. LORSEN: Fast and Efficient eQTL Mapping With Low Rank Penalized Regression. Front Genet 2021; 12:690926. [PMID: 34868194 PMCID: PMC8636089 DOI: 10.3389/fgene.2021.690926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 10/08/2021] [Indexed: 12/02/2022] Open
Abstract
Characterization of genetic variations that are associated with gene expression levels is essential to understand cellular mechanisms that underline human complex traits. Expression quantitative trait loci (eQTL) mapping attempts to identify genetic variants, such as single nucleotide polymorphisms (SNPs), that affect the expression of one or more genes. With the availability of a large volume of gene expression data, it is necessary and important to develop fast and efficient statistical and computational methods to perform eQTL mapping for such large scale data. In this paper, we proposed a new method, the low rank penalized regression method (LORSEN), for eQTL mapping. We evaluated and compared the performance of LORSEN with two existing methods for eQTL mapping using extensive simulations as well as real data from the HapMap3 project. Simulation studies showed that our method outperformed two commonly used methods for eQTL mapping, LORS and FastLORS, in many scenarios in terms of area under the curve (AUC). We illustrated the usefulness of our method by applying it to SNP variants data and gene expression levels on four chromosomes from the HapMap3 Project.
Collapse
Affiliation(s)
- Cheng Gao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Hairong Wei
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, United States
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| |
Collapse
|
2
|
Zhang Q. High-Dimensional Mediation Analysis with Applications to Causal Gene Identification. STATISTICS IN BIOSCIENCES 2021. [DOI: 10.1007/s12561-021-09328-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
3
|
Statistical Methods for Latent Class Quantitative Trait Loci Mapping. Genetics 2017; 206:1309-1317. [PMID: 28550015 PMCID: PMC5500132 DOI: 10.1534/genetics.117.203885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 05/18/2017] [Indexed: 11/18/2022] Open
Abstract
Identifying the genetic basis of complex traits is an important problem with the potential to impact a broad range of biological endeavors. A number of effective statistical methods are available for quantitative trait loci (QTL) mapping that allow for the efficient identification of multiple, potentially interacting, loci under a variety of experimental conditions. Although proven useful in hundreds of studies, the majority of these methods assumes a single model common to each subject, which may reduce power and accuracy when genetically distinct subclasses exist. To address this, we have developed an approach to enable latent class QTL mapping. The approach combines latent class regression with stepwise variable selection and traditional QTL mapping to estimate the number of subclasses in a population, and to identify the genetic model that best describes each subclass. Simulations demonstrate good performance of the method when latent classes are present as well as when they are not, with accurate estimation of QTL. Application of the method to case studies of obesity and diabetes in mouse gives insight into the genetic basis of related complex traits.
Collapse
|
4
|
Cheng R, Doerge RW, Borevitz J. Novel Resampling Improves Statistical Power for Multiple-Trait QTL Mapping. G3 (BETHESDA, MD.) 2017; 7:813-822. [PMID: 28064191 PMCID: PMC5345711 DOI: 10.1534/g3.116.037531] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 12/29/2016] [Indexed: 01/13/2023]
Abstract
Multiple-trait analysis typically employs models that associate a quantitative trait locus (QTL) with all of the traits. As a result, statistical power for QTL detection may not be optimal if the QTL contributes to the phenotypic variation in only a small proportion of the traits. Excluding QTL effects that contribute little to the test statistic can improve statistical power. In this article, we show that an optimal power can be achieved when the number of QTL effects is best estimated, and that a stringent criterion for QTL effect selection may improve power when the number of QTL effects is small but can reduce power otherwise. We investigate strategies for excluding trivial QTL effects, and propose a method that improves statistical power when the number of QTL effects is relatively small, and fairly maintains the power when the number of QTL effects is large. The proposed method first uses resampling techniques to determine the number of nontrivial QTL effects, and then selects QTL effects by the backward elimination procedure for significance test. We also propose a method for testing QTL-trait associations that are desired for biological interpretation in applications. We validate our methods using simulations and Arabidopsis thaliana transcript data.
Collapse
Affiliation(s)
- Riyan Cheng
- Research School of Biology, The Australian National University, Acton, Australian Capital Territory 2601, Australia, ARC Center of Excellence in Plant Energy Biology, The Australian National University, Acton, ACT 2601, Australia
| | - R W Doerge
- Department of Statistics, Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Justin Borevitz
- Research School of Biology, The Australian National University, Acton, Australian Capital Territory 2601, Australia, ARC Center of Excellence in Plant Energy Biology, The Australian National University, Acton, ACT 2601, Australia
| |
Collapse
|
5
|
Yuan H, Li Z, Tang NLS, Deng M. A network based covariance test for detecting multivariate eQTL in saccharomyces cerevisiae. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:8. [PMID: 26818242 PMCID: PMC4895706 DOI: 10.1186/s12918-015-0245-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Background Expression quantitative trait locus (eQTL) analysis has been widely used to understand how genetic variations affect gene expressions in the biological systems. Traditional eQTL is investigated in a pair-wise manner in which one SNP affects the expression of one gene. In this way, some associated markers found in GWAS have been related to disease mechanism by eQTL study. However, in real life, biological process is usually performed by a group of genes. Although some methods have been proposed to identify a group of SNPs that affect the mean of gene expressions in the network, the change of co-expression pattern has not been considered. So we propose a process and algorithm to identify the marker which affects the co-expression pattern of a pathway. Considering two genes may have different correlations under different isoforms which is hard to detect by the linear test, we also consider the nonlinear test. Results When we applied our method to yeast eQTL dataset profiled under both the glucose and ethanol conditions, we identified a total of 166 modules, with each module consisting of a group of genes and one eQTL where the eQTL regulate the co-expression patterns of the group of genes. We found that many of these modules have biological significance. Conclusions We propose a network based covariance test to identify the SNP which affects the structure of a pathway. We also consider the nonlinear test as considering two genes may have different correlations under different isoforms which is hard to detect by linear test. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0245-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Huili Yuan
- LMAM, School of Mathematical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China.
| | - Zhenye Li
- LMAM, School of Mathematical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China.
| | - Nelson L S Tang
- Department of Chemical Pathology, Prince of Wales Hospital, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong, China.
| | - Minghua Deng
- LMAM, School of Mathematical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China. .,Center for Quantitative Biology, Peking University, Yiheyuan Road, Beijing, 100871, China. .,Center for Statistical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China.
| |
Collapse
|
6
|
Wang F, Meyer NJ, Walley KR, Russell JA, Feng R. Causal Genetic Inference Using Haplotypes as Instrumental Variables. Genet Epidemiol 2015; 40:35-44. [PMID: 26625855 DOI: 10.1002/gepi.21940] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Revised: 08/15/2015] [Accepted: 09/19/2015] [Indexed: 02/03/2023]
Abstract
In genomic studies with both genotypes and gene or protein expression profile available, causal effects of gene or protein on clinical outcomes can be inferred through using genetic variants as instrumental variables (IVs). The goal of introducing IV is to remove the effects of unobserved factors that may confound the relationship between the biomarkers and the outcome. A valid inference under the IV framework requires pairwise associations and pathway exclusivity. Among these assumptions, the IV expression association needs to be strong for the casual effect estimates to be unbiased. However, a small number of single nucleotide polymorphisms (SNPs) often provide limited explanation of the variability in the gene or protein expression and can only serve as weak IVs. In this study, we propose to replace SNPs with haplotypes as IVs to increase the variant-expression association and thus improve the casual effect inference of the expression. In the classical two-stage procedure, we developed a haplotype regression model combined with a model selection procedure to identify optimal instruments. The performance of the new method was evaluated through simulations and compared with the IV approaches using observed multiple SNPs. Our results showed the gain of power to detect a causal effect of gene or protein on the outcome using haplotypes compared with using only observed SNPs, under either complete or missing genotype scenarios. We applied our proposed method to a study of the effect of interleukin-1 beta (IL-1β) protein expression on the 90-day survival following sepsis and found that overly expressed IL-1β is likely to increase mortality.
Collapse
Affiliation(s)
- Fan Wang
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Nuala J Meyer
- Center for Translational Lung Biology, Pulmonary, Allergy, and Critical Care Division, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Keith R Walley
- Center for Heart Lung Innovation, University of British Columbia, Vancouver, British Columbia, Canada
| | - James A Russell
- Center for Heart Lung Innovation, University of British Columbia, Vancouver, British Columbia, Canada
| | - Rui Feng
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
7
|
The genetic basis of obesity-associated type 2 diabetes (diabesity) in polygenic mouse models. Mamm Genome 2014; 25:401-12. [PMID: 24752583 PMCID: PMC4164836 DOI: 10.1007/s00335-014-9514-2] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 03/25/2014] [Indexed: 11/08/2022]
Abstract
Obesity-associated diabetes (“diabesity”) in mouse strains is characterized by severe insulin resistance, hyperglycaemia and progressive failure, and loss of beta cells. This condition is observed in inbred obese mouse strains such as the New Zealand Obese (NZO/HlLt and NZO/HlBomDife) or the TALLYHO/JngJ mouse. In lean strains such as C57BLKS/J, BTBR T+tf/J or DBA/2 J carrying diabetes susceptibility genes (“diabetes susceptible” background), it can be induced by introgression of the obesity-causing mutations Lep<ob> (ob) or Lepr<db> (db). Outcross populations of these models have been employed in the genome-wide search for mouse diabetes genes, and have led to positional cloning of the strong candidates Pctp, Tbc1d1, Zfp69, and Ifi202b (NZO-derived obesity) and Sorcs1,Lisch-like, Tomosyn-2, App, Tsc2, and Ube2l6 (obesity caused by the ob or db mutation). Some of these genes have been shown to play a role in the regulation of the human glucose or lipid metabolism. Thus, dissection of the genetic basis of obesity and diabetes in mouse models can identify regulatory mechanisms that are relevant for the human disease.
Collapse
|
8
|
Wang Z, Xu J, Shi X. Finding alternative expression quantitative trait loci by exploring sparse model space. J Comput Biol 2014; 21:385-93. [PMID: 24689773 DOI: 10.1089/cmb.2014.0026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Sparse modeling, a feature selection method widely used in the machine-learning community, has been recently applied to identify associations in genetic studies including expression quantitative trait locus (eQTL) mapping. These genetic studies usually involve high dimensional data where the number of features is much larger than the number of samples. The high dimensionality of genetic data introduces a problem that there exist multiple solutions for optimizing a sparse model. In such situations, a single optimization result provides only an incomplete view of the data and lacks power to find alternative features associated with the same trait. In this article, we propose a novel method aimed to detecting alternative eQTLs where two genetic variants have alternative relationships regarding their associations with the expression of a particular gene. Our method accomplishes this goal by exploring multiple solutions sampled from the solution space. We proved our method theoretically and demonstrated its usage on simulated data. We then applied our method to a real eQTL data and identified a set of alternative eQTLs with potential biological insights. Additionally, these alternative eQTLs implicate a network view of understanding gene regulation.
Collapse
Affiliation(s)
- Zhiyong Wang
- 1 Toyota Technological Institute at Chicago , Chicago, Illinois
| | | | | |
Collapse
|
9
|
Peng CH, Jiang YZ, Tai AS, Liu CB, Peng SC, Liao CT, Yen TC, Hsieh WP. Causal inference of gene regulation with subnetwork assembly from genetical genomics data. Nucleic Acids Res 2013; 42:2803-19. [PMID: 24322297 PMCID: PMC3950678 DOI: 10.1093/nar/gkt1277] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Deciphering the causal networks of gene interactions is critical for identifying disease pathways and disease-causing genes. We introduce a method to reconstruct causal networks based on exploring phenotype-specific modules in the human interactome and including the expression quantitative trait loci (eQTLs) that underlie the joint expression variation of each module. Closely associated eQTLs help anchor the orientation of the network. To overcome the inherent computational complexity of causal network reconstruction, we first deduce the local causality of individual subnetworks using the selected eQTLs and module transcripts. These subnetworks are then integrated to infer a global causal network using a random-field ranking method, which was motivated by animal sociology. We demonstrate how effectively the inferred causality restores the regulatory structure of the networks that mediate lymph node metastasis in oral cancer. Network rewiring clearly characterizes the dynamic regulatory systems of distinct disease states. This study is the first to associate an RXRB-causal network with increased risks of nodal metastasis, tumor relapse, distant metastases and poor survival for oral cancer. Thus, identifying crucial upstream drivers of a signal cascade can facilitate the discovery of potential biomarkers and effective therapeutic targets.
Collapse
Affiliation(s)
- Chien-Hua Peng
- Departments of Resource Center for Clinical Research, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China, Institute of Statistics, National Tsing Hua University, Hsinchu 30013, Taiwan, Republic of China, Nuclear Medicine and Molecular Imaging Center, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China and Department of Otorhinolaryngology, Head and Neck Surgery, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Huang Y, Wuchty S, Przytycka TM. eQTL Epistasis - Challenges and Computational Approaches. Front Genet 2013; 4:51. [PMID: 23755066 PMCID: PMC3668133 DOI: 10.3389/fgene.2013.00051] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Accepted: 03/19/2013] [Indexed: 01/18/2023] Open
Abstract
The determination of expression quantitative trait loci (eQTL) epistasis – a form of functional interaction between genetic loci that affect gene expression – is an important step toward the thorough understanding of gene regulation. Since gene expression has emerged as an “intermediate” molecular phenotype eQTL epistasis might help to explain the relationship between genotype and higher level organismal phenotypes such as diseases. A characteristic feature of eQTL analysis is the big number of tests required to identify associations between gene expression and genetic loci variability. This problem is aggravated, when epistatic effects between eQTLs are analyzed. In this review, we discuss recent algorithmic approaches for the detection of eQTL epistasis and highlight lessons that can be learned from current methods.
Collapse
Affiliation(s)
- Yang Huang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Bethesda, MD, USA
| | | | | |
Collapse
|
11
|
Wright FA, Shabalin AA, Rusyn I. Computational tools for discovery and interpretation of expression quantitative trait loci. Pharmacogenomics 2012; 13:343-52. [PMID: 22304583 DOI: 10.2217/pgs.11.185] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Expression quantitative trait locus (eQTL) analysis is rapidly moving from a cutting-edge concept in genomics to a mature area of investigation, with important connections to genome-wide association studies for human disease, pharmacogenomics and toxicogenomics. Despite the importance of the topic, many investigators must develop their own code or use tools not specifically suited for eQTL analysis. Convenient computational tools are becoming available, but they are not widely publicized, and investigators who are interested in discovery or eQTL, or in using them to interpret genome-wide association study results may have difficulty navigating the available resources. The purpose of this review is to help investigators find appropriate programs for eQTL analysis and interpretation.
Collapse
Affiliation(s)
- Fred A Wright
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | | |
Collapse
|
12
|
Kloosterman B, Anithakumari AM, Chibon PY, Oortwijn M, van der Linden GC, Visser RGF, Bachem CWB. Organ specificity and transcriptional control of metabolic routes revealed by expression QTL profiling of source--sink tissues in a segregating potato population. BMC PLANT BIOLOGY 2012; 12:17. [PMID: 22313736 PMCID: PMC3546430 DOI: 10.1186/1471-2229-12-17] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 02/07/2012] [Indexed: 05/19/2023]
Abstract
BACKGROUND With the completion of genome sequences belonging to some of the major crop plants, new challenges arise to utilize this data for crop improvement and increased food security. The field of genetical genomics has the potential to identify genes displaying heritable differential expression associated to important phenotypic traits. Here we describe the identification of expression QTLs (eQTLs) in two different potato tissues of a segregating potato population and query the potato genome sequence to differentiate between cis- and trans-acting eQTLs in relation to gene subfunctionalization. RESULTS Leaf and tuber samples were analysed and screened for the presence of conserved and tissue dependent eQTLs. Expression QTLs present in both tissues are predominantly cis-acting whilst for tissue specific QTLs, the percentage of trans-acting QTLs increases. Tissue dependent eQTLs were assigned to functional classes and visualized in metabolic pathways. We identified a potential regulatory network on chromosome 10 involving genes crucial for maintaining circadian rhythms and controlling clock output genes. In addition, we show that the type of genetic material screened and sampling strategy applied, can have a high impact on the output of genetical genomics studies. CONCLUSIONS Identification of tissue dependent regulatory networks based on mapped differential expression not only gives us insight in tissue dependent gene subfunctionalization but brings new insights into key biological processes and delivers targets for future haplotyping and genetic marker development.
Collapse
Affiliation(s)
- Bjorn Kloosterman
- Wageningen UR Plant Breeding, Wageningen University and Research Center, PO Box 386, 6700 AJ Wageningen, the Netherlands
- KeyGene N.V., P.O. Box 216, 6700 AE Wageningen, The Netherlands
| | - AM Anithakumari
- Wageningen UR Plant Breeding, Wageningen University and Research Center, PO Box 386, 6700 AJ Wageningen, the Netherlands
- Graduate School Experimental Plant Sciences, Wageningen, The Netherlands
| | - Pierre-Yves Chibon
- Wageningen UR Plant Breeding, Wageningen University and Research Center, PO Box 386, 6700 AJ Wageningen, the Netherlands
- Graduate School Experimental Plant Sciences, Wageningen, The Netherlands
| | - Marian Oortwijn
- Wageningen UR Plant Breeding, Wageningen University and Research Center, PO Box 386, 6700 AJ Wageningen, the Netherlands
| | - Gerard C van der Linden
- Wageningen UR Plant Breeding, Wageningen University and Research Center, PO Box 386, 6700 AJ Wageningen, the Netherlands
| | - Richard GF Visser
- Wageningen UR Plant Breeding, Wageningen University and Research Center, PO Box 386, 6700 AJ Wageningen, the Netherlands
- Centre for BioSystems Genomics, P.O. Box 98, 6700 AA Wageningen, The Netherlands
| | - Christian WB Bachem
- Wageningen UR Plant Breeding, Wageningen University and Research Center, PO Box 386, 6700 AJ Wageningen, the Netherlands
| |
Collapse
|
13
|
Abstract
High-throughput genomics allows genome-wide quantification of gene expression levels in tissues and cell types and, when combined with sequence variation data, permits the identification of genetic control points of expression (expression QTL or eQTL). Clusters of eQTL influenced by single genetic polymorphisms can inform on hotspots of regulation of pathways and networks, although very few hotspots have been robustly detected, replicated, or experimentally verified. Here we present a novel modeling strategy to estimate the propensity of a genetic marker to influence several expression traits at the same time, based on a hierarchical formulation of related regressions. We implement this hierarchical regression model in a Bayesian framework using a stochastic search algorithm, HESS, that efficiently probes sparse subsets of genetic markers in a high-dimensional data matrix to identify hotspots and to pinpoint the individual genetic effects (eQTL). Simulating complex regulatory scenarios, we demonstrate that our method outperforms current state-of-the-art approaches, in particular when the number of transcripts is large. We also illustrate the applicability of HESS to diverse real-case data sets, in mouse and human genetic settings, and show that it provides new insights into regulatory hotspots that were not detected by conventional methods. The results suggest that the combination of our modeling strategy and algorithmic implementation provides significant advantages for the identification of functional eQTL hotspots, revealing key regulators underlying pathways.
Collapse
|