1
|
Yuan K, Zeng T, Chen L. Interpreting Functional Impact of Genetic Variations by Network QTL for Genotype–Phenotype Association Study. Front Cell Dev Biol 2022; 9:720321. [PMID: 35155440 PMCID: PMC8826544 DOI: 10.3389/fcell.2021.720321] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 12/13/2021] [Indexed: 12/18/2022] Open
Abstract
An enormous challenge in the post-genome era is to annotate and resolve the consequences of genetic variation on diverse phenotypes. The genome-wide association study (GWAS) is a well-known method to identify potential genetic loci for complex traits from huge genetic variations, following which it is crucial to identify expression quantitative trait loci (eQTL). However, the conventional eQTL methods usually disregard the systematical role of single-nucleotide polymorphisms (SNPs) or genes, thereby overlooking many network-associated phenotypic determinates. Such a problem motivates us to recognize the network-based quantitative trait loci (QTL), i.e., network QTL (nQTL), which is to detect the cascade association as genotype → network → phenotype rather than conventional genotype → expression → phenotype in eQTL. Specifically, we develop the nQTL framework on the theory and approach of single-sample networks, which can identify not only network traits (e.g., the gene subnetwork associated with genotype) for analyzing complex biological processes but also network signatures (e.g., the interactive gene biomarker candidates screened from network traits) for characterizing targeted phenotype and corresponding subtypes. Our results show that the nQTL framework can efficiently capture associations between SNPs and network traits (i.e., edge traits) in various simulated data scenarios, compared with traditional eQTL methods. Furthermore, we have carried out nQTL analysis on diverse biological and biomedical datasets. Our analysis is effective in detecting network traits for various biological problems and can discover many network signatures for discriminating phenotypes, which can help interpret the influence of nQTL on disease subtyping, disease prognosis, drug response, and pathogen factor association. Particularly, in contrast to the conventional approaches, the nQTL framework could also identify many network traits from human bulk expression data, validated by matched single-cell RNA-seq data in an independent or unsupervised manner. All these results strongly support that nQTL and its detection framework can simultaneously explore the global genotype–network–phenotype associations and the underlying network traits or network signatures with functional impact and importance.
Collapse
Affiliation(s)
- Kai Yuan
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
- Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Guangzhou Laboratory, Guangzhou, China
- *Correspondence: Tao Zeng, ; Luonan Chen,
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
- *Correspondence: Tao Zeng, ; Luonan Chen,
| |
Collapse
|
2
|
Cheng W, Shi Y, Zhang X, Wang W. Fast and robust group-wise eQTL mapping using sparse graphical models. BMC Bioinformatics 2015; 16:2. [PMID: 25593000 PMCID: PMC4387667 DOI: 10.1186/s12859-014-0421-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 12/11/2014] [Indexed: 01/01/2023] Open
Abstract
Background Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. The traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to hidden biological pathways. Results We introduce a new approach to identify novel group-wise associations between sets of SNPs and sets of genes. Such associations are captured by hidden variables connecting SNPs and genes. Our model is a linear-Gaussian model and uses two types of hidden variables. One captures the set associations between SNPs and genes, and the other captures confounders. We develop an efficient optimization procedure which makes this approach suitable for large scale studies. Extensive experimental evaluations on both simulated and real datasets demonstrate that the proposed methods can effectively capture both individual and group-wise signals that cannot be identified by the state-of-the-art eQTL mapping methods. Conclusions Considering group-wise associations significantly improves the accuracy of eQTL mapping, and the successful multi-layer regression model opens a new approach to understand how multiple SNPs interact with each other to jointly affect the expression level of a group of genes. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0421-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Cheng
- Department of Computer Science, UNC at Chapel Hill, 201 S Columbia St., Chapel Hill, 27599, NC, USA.
| | - Yu Shi
- Computer Science at the University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, 61801, IL, USA.
| | - Xiang Zhang
- Department of Elect. Eng. and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106, OH, USA.
| | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, 3531-G Boelter Hall, Los Angeles, 90095, CA, USA.
| |
Collapse
|
3
|
Abstract
Leishmaniasis, like other neglected diseases is characterized by a small arsenal of drugs for its control. To safeguard the efficacy of current drugs and guide the development of new ones it is thus of utmost importance to acquire a deep understanding of the phenomenon of drug resistance and its link with treatment outcome. We discuss here how (post-)genomic approaches may contribute to this purpose. We highlight the need for a clear definition of the phenotypes under consideration: innate and acquired resistance versus treatment failure. We provide a recent update of our knowledge on the Leishmania genome structure and dynamics, and compare the contribution of targeted and untargeted methods for the understanding of drug resistance and show their limits. We also present the main assays allowing the experimental validation of the genes putatively involved in drug resistance. The importance of analysing information downstream of the genome is stressed and further illustrated by recent metabolomics findings. Finally, the attention is called onto the challenges for implementing the acquired knowledge to the benefit of the patients and the population at risk.
Collapse
|
4
|
Huang Y, Siwo G, Wuchty S, Ferdig MT, Przytycka TM. Symmetric Epistasis Estimation (SEE) and its application to dissecting interaction map of Plasmodium falciparum. MOLECULAR BIOSYSTEMS 2012; 8:1544-52. [PMID: 22419061 DOI: 10.1039/c2mb05333k] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
It is being increasingly recognized that many important phenotypic traits, including various diseases, are governed by a combination of weak genetic effects and their interactions. While the detection of epistatic interactions that involve a non-additive effect of two loci on a quantitative trait is particularly challenging, this interaction type is fundamental for the understanding of genome organization and gene regulation. However, current methods that detect epistatic interactions typically rely on the existence of a strong primary effect, considerably limiting the sensitivity of the search. To fill this gap, we developed a new method, SEE (Symmetric Epistasis Estimation), allowing the genome-wide detection of epistatic interactions without the need for a strong primary effect. We applied our approach to progeny crosses of the human malaria parasite P. falciparum and S. cerevisiae. We found an abundance of epistatic interactions in the parasite and a much smaller number of such interactions in yeast. The genome of P. falciparum also harboured several epistatic interaction hotspots that putatively play a role in drug resistance mechanisms. The abundance of observed epistatic interactions might suggest a mechanism of compensation for the extremely limited repertoire of transcription factors. Interestingly, epistatic interaction hotspots were associated with elevated levels of linkage disequilibrium, an observation that suggests selection pressure acting on P. falciparum, potentially reflecting host-pathogen interactions or drug-induced selection.
Collapse
Affiliation(s)
- Yang Huang
- National Center for Biotechnology Information, NLM, NIH, 8600 Rockville Pike, Building 38A, Bethesda, MD 20894, USA
| | | | | | | | | |
Collapse
|
5
|
Mu J, Seydel KB, Bates A, Su XZ. Recent Progress in Functional Genomic Research in Plasmodium falciparum. Curr Genomics 2011; 11:279-86. [PMID: 21119892 PMCID: PMC2930667 DOI: 10.2174/138920210791233081] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Revised: 02/22/2010] [Accepted: 03/09/2010] [Indexed: 02/02/2023] Open
Abstract
With the completion and near completion of many malaria parasite genome-sequencing projects, efforts are now being directed to a better understanding of gene functions and to the discovery of vaccine and drug targets. Inter- and intraspecies comparisons of the parasite genomes will provide invaluable insights into parasite evolution, virulence, drug resistance, and immune invasion. Genome-wide searches for loci under various selection pressures may lead to discovery of genes conferring drug resistance or encoding for protective antigens. In addition, the Plasmodium falciparum genome sequence provides the basis for the development of various microarrays to monitor gene expression and to detect nucleotide substitution and deletion/amplification. Genome-wide profiling of the parasite proteome, chromatin modification, and nucleosome position also depend on availability of the parasite genome. In this brief review, we will highlight some recent advances and studies in characterizing gene function and related phenotype in P. falciparum that were made possible by the genome sequence, particularly the development of a genome-wide diversity map and various high-throughput genotyping methods for genome-wide association studies (GWAS).
Collapse
Affiliation(s)
- Jianbing Mu
- Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | | | | | | |
Collapse
|
6
|
Kompass KS, Witte JS. Co-regulatory expression quantitative trait loci mapping: method and application to endometrial cancer. BMC Med Genomics 2011; 4:6. [PMID: 21226949 PMCID: PMC3032645 DOI: 10.1186/1755-8794-4-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2010] [Accepted: 01/12/2011] [Indexed: 01/16/2023] Open
Abstract
Background Expression quantitative trait loci (eQTL) studies have helped identify the genetic determinants of gene expression. Understanding the potential interacting mechanisms underlying such findings, however, is challenging. Methods We describe a method to identify the trans-acting drivers of multiple gene co-expression, which reflects the action of regulatory molecules. This method-termed co-regulatory expression quantitative trait locus (creQTL) mapping-allows for evaluation of a more focused set of phenotypes within a clear biological context than conventional eQTL mapping. Results Applying this method to a study of endometrial cancer revealed regulatory mechanisms supported by the literature: a creQTL between a locus upstream of STARD13/DLC2 and a group of seven IFNβ-induced genes. This suggests that the Rho-GTPase encoded by STARD13 regulates IFNβ-induced genes and the DNA damage response. Conclusions Because of the importance of IFNβ in cancer, our results suggest that creQTL may provide a finer picture of gene regulation and may reveal additional molecular targets for intervention. An open source R implementation of the method is available at http://sites.google.com/site/kenkompass/.
Collapse
Affiliation(s)
- Kenneth S Kompass
- Department of Epidemiology and Biostatistics, Institute for Human Genetics, University of California, San Francisco, USA
| | | |
Collapse
|
7
|
Rider AK, Siwo G, Chawla NV, Ferdig M, Emrich SJ. A statistical approach to finding overlooked genetic associations. BMC Bioinformatics 2010; 11:526. [PMID: 20964847 PMCID: PMC2974753 DOI: 10.1186/1471-2105-11-526] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2010] [Accepted: 10/21/2010] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Complexity and noise in expression quantitative trait loci (eQTL) studies make it difficult to distinguish potential regulatory relationships among the many interactions. The predominant method of identifying eQTLs finds associations that are significant at a genome-wide level. The vast number of statistical tests carried out on these data make false negatives very likely. Corrections for multiple testing error render genome-wide eQTL techniques unable to detect modest regulatory effects.
We propose an alternative method to identify eQTLs that builds on traditional approaches. In contrast to genome-wide techniques, our method determines the significance of an association between an expression trait and a locus with respect to the set of all associations to the expression trait. The use of this specific information facilitates identification of expression traits that have an expression profile that is characterized by a single exceptional association to a locus.
Our approach identifies expression traits that have exceptional associations regardless of the genome-wide significance of those associations. This property facilitates the identification of possible false negatives for genome-wide significance. Further, our approach has the property of prioritizing expression traits that are affected by few strong associations. Expression traits identified by this method may warrant additional study because their expression level may be affected by targeting genes near a single locus.
Results
We demonstrate our method by identifying eQTL hotspots in Plasmodium falciparum (malaria) and Saccharomyces cerevisiae (yeast). We demonstrate the prioritization of traits with few strong genetic effects through Gene Ontology (GO) analysis of Yeast. Our results are strongly consistent with results gathered using genome-wide methods and identify additional hotspots and eQTLs.
Conclusions
New eQTLs and hotspots found with this method may represent regions of the genome or biological processes that are controlled through few relatively strong genetic interactions. These points of interest warrant experimental investigation.
Collapse
|
8
|
Metabolomics and malaria biology. Mol Biochem Parasitol 2010; 175:104-11. [PMID: 20970461 DOI: 10.1016/j.molbiopara.2010.09.008] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Revised: 09/29/2010] [Accepted: 09/30/2010] [Indexed: 12/31/2022]
Abstract
Metabolomics has ushered in a novel and multi-disciplinary realm in biological research. It has provided researchers with a platform to combine powerful biochemical, statistical, computational, and bioinformatics techniques to delve into the mysteries of biology and disease. The application of metabolomics to study malaria parasites represents a major advance in our approach towards gaining a more comprehensive perspective on parasite biology and disease etiology. This review attempts to highlight some of the important aspects of the field of metabolomics, and its ongoing and potential future applications to malaria research.
Collapse
|
9
|
Michaelson JJ, Alberts R, Schughart K, Beyer A. Data-driven assessment of eQTL mapping methods. BMC Genomics 2010; 11:502. [PMID: 20849587 PMCID: PMC2996998 DOI: 10.1186/1471-2164-11-502] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Accepted: 09/17/2010] [Indexed: 11/10/2022] Open
Abstract
Background The analysis of expression quantitative trait loci (eQTL) is a potentially powerful way to detect transcriptional regulatory relationships at the genomic scale. However, eQTL data sets often go underexploited because legacy QTL methods are used to map the relationship between the expression trait and genotype. Often these methods are inappropriate for complex traits such as gene expression, particularly in the case of epistasis. Results Here we compare legacy QTL mapping methods with several modern multi-locus methods and evaluate their ability to produce eQTL that agree with independent external data in a systematic way. We found that the modern multi-locus methods (Random Forests, sparse partial least squares, lasso, and elastic net) clearly outperformed the legacy QTL methods (Haley-Knott regression and composite interval mapping) in terms of biological relevance of the mapped eQTL. In particular, we found that our new approach, based on Random Forests, showed superior performance among the multi-locus methods. Conclusions Benchmarks based on the recapitulation of experimental findings provide valuable insight when selecting the appropriate eQTL mapping method. Our battery of tests suggests that Random Forests map eQTL that are more likely to be validated by independent data, when compared to competing multi-locus and legacy eQTL mapping methods.
Collapse
Affiliation(s)
- Jacob J Michaelson
- Cellular Networks and Systems Biology, Biotechnology Center - TU Dresden, Dresden, Germany
| | | | | | | |
Collapse
|
10
|
Przytycka TM, Singh M, Slonim DK. Toward the dynamic interactome: it's about time. Brief Bioinform 2010; 11:15-29. [PMID: 20061351 PMCID: PMC2810115 DOI: 10.1093/bib/bbp057] [Citation(s) in RCA: 147] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Revised: 11/01/2009] [Indexed: 11/14/2022] Open
Abstract
Dynamic molecular interactions play a central role in regulating the functioning of cells and organisms. The availability of experimentally determined large-scale cellular networks, along with other high-throughput experimental data sets that provide snapshots of biological systems at different times and conditions, is increasingly helpful in elucidating interaction dynamics. Here we review the beginnings of a new subfield within computational biology, one focused on the global inference and analysis of the dynamic interactome. This burgeoning research area, which entails a shift from static to dynamic network analysis, promises to be a major step forward in our ability to model and reason about cellular function and behavior.
Collapse
Affiliation(s)
- Teresa M Przytycka
- National Center of Biotechnology Information, NLM, NIH, 8000 Rockville Pike, Bethesda MD 20814, USA.
| | | | | |
Collapse
|