1
|
Wu Y, Zhang C, Duan S, Li Y, Lu L, Bajpai A, Yang C, Mi J, Tian G, Xu F, Qi D, Xu Z, Chi XD. TEAD1, MYO7A and NDUFC2 are novel functional genes associated with glucose metabolism in BXD recombinant inbred population. Diabetes Obes Metab 2024; 26:1775-1788. [PMID: 38385898 DOI: 10.1111/dom.15491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 01/12/2024] [Accepted: 01/17/2024] [Indexed: 02/23/2024]
Abstract
AIM The liver is an important metabolic organ that governs glucolipid metabolism, and its dysfunction may cause non-alcoholic fatty liver disease, type 2 diabetes mellitus, dyslipidaemia, etc. We aimed to systematic investigate the key factors related to hepatic glucose metabolism, which may be beneficial for understanding the underlying pathogenic mechanisms for obesity and diabetes mellitus. MATERIALS AND METHODS Oral glucose tolerance test (OGTT) phenotypes and liver transcriptomes of BXD mice under chow and high-fat diet conditions were collected from GeneNetwork. QTL mapping was conducted to pinpoint genomic regions associated with glucose homeostasis. Candidate genes were further nominated using a multi-criteria approach and validated to confirm their functional relevance in vitro. RESULTS Our results demonstrated that plasma glucose levels in OGTT were significantly affected by both diet and genetic background, with six genetic regulating loci were mapped on chromosomes 1, 4, and 7. Moreover, TEAD1, MYO7A and NDUFC2 were identified as the candidate genes. Functionally, siRNA-mediated TEAD1, MYO7A and NDUFC2 knockdown significantly decreased the glucose uptake and inhibited the transcription of genes related to insulin and glucose metabolism pathways. CONCLUSIONS Our study contributes novel insights to the understanding of hepatic glucose metabolism, demonstrating the impact of TEAD1, MYO7A and NDUFC2 on mitochondrial function in the liver and their regulatory role in maintaining in glucose homeostasis.
Collapse
Affiliation(s)
- Yingying Wu
- The Second School of Clinical Medicine of Binzhou Medical University, Yantai, China
| | - Chao Zhang
- Yantai Affiliated Hospital of Binzhou Medical University, Yantai, China
| | - Shaofei Duan
- Shandong Technology Innovation Center of Molecular Targeting and Intelligent Diagnosis and Treatment, Binzhou Medical University, Yantai, China
| | - Yushan Li
- Shandong Technology Innovation Center of Molecular Targeting and Intelligent Diagnosis and Treatment, Binzhou Medical University, Yantai, China
| | - Lu Lu
- The University of Tennessee Health Science Center, Memphis, Tennessee, USA
| | - Akhilesh Bajpai
- The University of Tennessee Health Science Center, Memphis, Tennessee, USA
| | - Chunhua Yang
- Shandong Technology Innovation Center of Molecular Targeting and Intelligent Diagnosis and Treatment, Binzhou Medical University, Yantai, China
| | - Jia Mi
- Shandong Technology Innovation Center of Molecular Targeting and Intelligent Diagnosis and Treatment, Binzhou Medical University, Yantai, China
| | - Geng Tian
- Shandong Technology Innovation Center of Molecular Targeting and Intelligent Diagnosis and Treatment, Binzhou Medical University, Yantai, China
| | - Fuyi Xu
- Shandong Technology Innovation Center of Molecular Targeting and Intelligent Diagnosis and Treatment, Binzhou Medical University, Yantai, China
| | - Donglai Qi
- Shandong Technology Innovation Center of Molecular Targeting and Intelligent Diagnosis and Treatment, Binzhou Medical University, Yantai, China
| | - Zhaowei Xu
- Shandong Technology Innovation Center of Molecular Targeting and Intelligent Diagnosis and Treatment, Binzhou Medical University, Yantai, China
| | - Xiao Dong Chi
- Shandong Technology Innovation Center of Molecular Targeting and Intelligent Diagnosis and Treatment, Binzhou Medical University, Yantai, China
| |
Collapse
|
2
|
Proud C, Campbell B, Susanti Z, Fukai S, Godwin I, Ovenden B, Snell P, Mitchell J. Quantitative trait loci (QTL) for low temperature tolerance at the young microspore stage in rice ( Oryza sativa L.) in Australian breeding material. BREEDING SCIENCE 2022; 72:238-247. [PMID: 36408321 PMCID: PMC9653190 DOI: 10.1270/jsbbs.21096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 04/07/2022] [Indexed: 06/15/2023]
Abstract
Low temperatures at the young microspore stage (YMS) decreases spikelet fertility and is a major limiting factor to rice production in temperate Australia. Low temperature tolerance is a difficult trait to phenotype, hence there is a strong desire for the identification of quantitative trait loci (QTL) for their use in marker-assisted selection (MAS). Association mapping was used in several breeding populations with a known source of low temperature tolerance, Norin PL8, to identify QTL for low temperature tolerance. A novel QTL for spikelet fertility was identified on chromosome 6, qYMCT6.1, in which the Australian variety, Kyeema, was the donor for increased fertility. Additional five genomics regions were identified that co-located with previously reported QTL, two of which have been previously cloned. Additionally, for the first time a QTL for spikelet fertility qYMCT10.1, has been shown to co-locate with the number of dehisced anthers qYMCTF10.1 which increases the shedding of pollen from the anthers. This study revealed one new QTL for low temperature tolerance at YMS in temperate japonica germplasm and identified an additional five previously reported. These QTL will be utilised for MAS in the Australian rice breeding program and may have merit for temperate breeding programs globally.
Collapse
Affiliation(s)
- Christopher Proud
- The University of Queensland, School of Agriculture and Food Sciences, St Lucia, Queensland 4072, Australia
| | - Bradley Campbell
- The University of Queensland, School of Agriculture and Food Sciences, St Lucia, Queensland 4072, Australia
| | - Zuziana Susanti
- The University of Queensland, School of Agriculture and Food Sciences, St Lucia, Queensland 4072, Australia
- Indonesian Centre for Rice Research, Agency for Agricultural Research and Development, Subang, West-Java, Indonesia
| | - Shu Fukai
- The University of Queensland, School of Agriculture and Food Sciences, St Lucia, Queensland 4072, Australia
| | - Ian Godwin
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, Queensland 4072, Australia
| | - Ben Ovenden
- Department of Primary Industries, Yanco Agricultural Institute, Yanco, NSW 2703, Australia
| | - Peter Snell
- Department of Primary Industries, Yanco Agricultural Institute, Yanco, NSW 2703, Australia
| | - Jaquie Mitchell
- The University of Queensland, School of Agriculture and Food Sciences, St Lucia, Queensland 4072, Australia
| |
Collapse
|
3
|
Monowar Anjum M, Mohammed N, Li W, Jiang X. Privacy Preserving Collaborative Learning of Generalized Linear Mixed Model. J Biomed Inform 2022; 127:104008. [DOI: 10.1016/j.jbi.2022.104008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 12/08/2021] [Accepted: 01/30/2022] [Indexed: 12/01/2022]
|
4
|
Odom GJ, Colaprico A, Silva TC, Chen XS, Wang L. PathwayMultiomics: An R Package for Efficient Integrative Analysis of Multi-Omics Datasets With Matched or Un-matched Samples. Front Genet 2022; 12:783713. [PMID: 35003218 PMCID: PMC8729182 DOI: 10.3389/fgene.2021.783713] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 12/07/2021] [Indexed: 01/27/2023] Open
Abstract
Recent advances in technology have made multi-omics datasets increasingly available to researchers. To leverage the wealth of information in multi-omics data, a number of integrative analysis strategies have been proposed recently. However, effectively extracting biological insights from these large, complex datasets remains challenging. In particular, matched samples with multiple types of omics data measured on each sample are often required for multi-omics analysis tools, which can significantly reduce the sample size. Another challenge is that analysis techniques such as dimension reductions, which extract association signals in high dimensional datasets by estimating a few variables that explain most of the variations in the samples, are typically applied to whole-genome data, which can be computationally demanding. Here we present pathwayMultiomics, a pathway-based approach for integrative analysis of multi-omics data with categorical, continuous, or survival outcome variables. The input of pathwayMultiomics is pathway p-values for individual omics data types, which are then integrated using a novel statistic, the MiniMax statistic, to prioritize pathways dysregulated in multiple types of omics datasets. Importantly, pathwayMultiomics is computationally efficient and does not require matched samples in multi-omics data. We performed a comprehensive simulation study to show that pathwayMultiomics significantly outperformed currently available multi-omics tools with improved power and well-controlled false-positive rates. In addition, we also analyzed real multi-omics datasets to show that pathwayMultiomics was able to recover known biology by nominating biologically meaningful pathways in complex diseases such as Alzheimer's disease.
Collapse
Affiliation(s)
- Gabriel J Odom
- Department of Biostatistics, Stempel College of Public Health, Florida International University, Miami, FL, United States.,Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Antonio Colaprico
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Tiago C Silva
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - X Steven Chen
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States.,Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Lily Wang
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States.,Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, United States.,Dr. John T Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, United States.,John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, United States
| |
Collapse
|
5
|
Yilmaz S, Tastan O, Cicek AE. SPADIS: An Algorithm for Selecting Predictive and Diverse SNPs in GWAS. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1208-1216. [PMID: 31443041 DOI: 10.1109/tcbb.2019.2935437] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Phenotypic heritability of complex traits and diseases is seldom explained by individual genetic variants identified in genome-wide association studies (GWAS). Many methods have been developed to select a subset of variant loci, which are associated with or predictive of the phenotype. Selecting connected SNPs on SNP-SNP networks have been proven successful in finding biologically interpretable and predictive SNPs. However, we argue that the connectedness constraint favors selecting redundant features that affect similar biological processes and therefore does not necessarily yield better predictive performance. In this paper, we propose a novel method called SPADIS that favors the selection of remotely located SNPs in order to account for their complementary effects in explaining a phenotype. SPADIS selects a diverse set of loci on a SNP-SNP network. This is achieved by maximizing a submodular set function with a greedy algorithm that ensures a constant factor approximation to the optimal solution. We compare SPADIS to the state-of-the-art method SConES, on a dataset of Arabidopsis Thaliana with continuous flowering time phenotypes. SPADIS has better average phenotype prediction performance in 15 out of 17 phenotypes when the same number of SNPs are selected and provides consistent improvements across multiple networks and settings on average. Moreover, it identifies more candidate genes and runs faster.
Collapse
|
6
|
Silberstein M, Nesbit N, Cai J, Lee PH. Pathway analysis for genome-wide genetic variation data: Analytic principles, latest developments, and new opportunities. J Genet Genomics 2021; 48:173-183. [PMID: 33896739 PMCID: PMC8286309 DOI: 10.1016/j.jgg.2021.01.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/24/2021] [Accepted: 01/25/2021] [Indexed: 12/23/2022]
Abstract
Pathway analysis, also known as gene-set enrichment analysis, is a multilocus analytic strategy that integrates a priori, biological knowledge into the statistical analysis of high-throughput genetics data. Originally developed for the studies of gene expression data, it has become a powerful analytic procedure for in-depth mining of genome-wide genetic variation data. Astonishing discoveries were made in the past years, uncovering genes and biological mechanisms underlying common and complex disorders. However, as massive amounts of diverse functional genomics data accrue, there is a pressing need for newer generations of pathway analysis methods that can utilize multiple layers of high-throughput genomics data. In this review, we provide an intellectual foundation of this powerful analytic strategy, as well as an update of the state-of-the-art in recent method developments. The goal of this review is threefold: (1) introduce the motivation and basic steps of pathway analysis for genome-wide genetic variation data; (2) review the merits and the shortcomings of classic and newly emerging integrative pathway analysis tools; and (3) discuss remaining challenges and future directions for further method developments.
Collapse
Affiliation(s)
- Micah Silberstein
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nicholas Nesbit
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jacquelyn Cai
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Phil H Lee
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Psychiatry, Harvard Medical School, Boston, MA 02115, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
7
|
Metabolic network of ammonium in cereal vinegar solid-state fermentation and its response to acid stress. Food Microbiol 2020; 95:103684. [PMID: 33397616 DOI: 10.1016/j.fm.2020.103684] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 11/13/2020] [Accepted: 11/16/2020] [Indexed: 12/28/2022]
Abstract
Shanxi aged vinegar (SAV), a Chinese traditional vinegar, is produced by various microorganisms. Ammonium is an important nitrogen source for microorganisms and a key intermediate for the utilization of non-ammonium nitrogen sources. In this work, an ammonium metabolic network during SAV fermentation was constructed through the meta-transcriptomic analysis of in situ samples, and the potential mechanism of acid affecting ammonium metabolism was revealed. The results showed that ammonium was enriched as the acidity increased. Meta-transcriptomic analysis showed that the conversion of glutamine to ammonia is the key pathway of ammonium metabolism in vinegar and that Lactobacillus and Acetobacter are the dominant genera. The construction and analysis of the metabolic network showed that amino acid metabolism, nucleic acid metabolism, pentose phosphate pathway and energy metabolism were enhanced to resist acid damage to the intracellular environment and cell structures. The enhancement of nitrogen assimilation provides nitrogen for metabolic pathways that resist acid cytotoxicity. In addition, the concentration gradient allows ammonium to diffuse outside the cell, which causes ammonium to accumulate during fermentation.
Collapse
|
8
|
Babb de Villiers C, Kroese M, Moorthie S. Understanding polygenic models, their development and the potential application of polygenic scores in healthcare. J Med Genet 2020; 57:725-732. [PMID: 32376789 PMCID: PMC7591711 DOI: 10.1136/jmedgenet-2019-106763] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 03/09/2020] [Accepted: 03/28/2020] [Indexed: 02/06/2023]
Abstract
The use of genomic information to better understand and prevent common complex diseases has been an ongoing goal of genetic research. Over the past few years, research in this area has proliferated with several proposed methods of generating polygenic scores. This has been driven by the availability of larger data sets, primarily from genome-wide association studies and concomitant developments in statistical methodologies. Here we provide an overview of the methodological aspects of polygenic model construction. In addition, we consider the state of the field and implications for potential applications of polygenic scores for risk estimation within healthcare.
Collapse
Affiliation(s)
| | - Mark Kroese
- PHG Foundation, University of Cambridge, Cambridge, Cambridgeshire, UK
| | - Sowmiya Moorthie
- PHG Foundation, University of Cambridge, Cambridge, Cambridgeshire, UK
| |
Collapse
|
9
|
Zhang L, Papachristou C, Choudhary PK, Biswas S. A Bayesian Hierarchical Framework for Pathway Analysis in Genome-Wide Association Studies. Hum Hered 2020; 84:240-255. [PMID: 32966977 DOI: 10.1159/000508664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 05/14/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Pathway analysis allows joint consideration of multiple SNPs belonging to multiple genes, which in turn belong to a biologically defined pathway. This type of analysis is usually more powerful than single-SNP analyses for detecting joint effects of variants in a pathway. METHODS We develop a Bayesian hierarchical model by fully modeling the 3-level hierarchy, namely, SNP-gene-pathway that is naturally inherent in the structure of the pathways, unlike the currently used ad hoc ways of combining such information. We model the effects at each level conditional on the effects of the levels preceding them within the generalized linear model framework. To deal with the high dimensionality, we regularize the regression coefficients through an appropriate choice of priors. The model is fit using a combination of iteratively weighted least squares and expectation-maximization algorithms to estimate the posterior modes and their standard errors. A normal approximation is used for inference. RESULTS We conduct simulations to study the proposed method and find that our method has higher power than some standard approaches in several settings for identifying pathways with multiple modest-sized variants. We illustrate the method by analyzing data from two genome-wide association studies on breast and renal cancers. CONCLUSION Our method can be helpful in detecting pathway association.
Collapse
Affiliation(s)
- Lei Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | | | - Pankaj K Choudhary
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA,
| |
Collapse
|
10
|
Abstract
Since the initial success of genome-wide association studies (GWAS) in 2005, tens of thousands of genetic variants have been identified for hundreds of human diseases and traits. In a GWAS, genotype information at up to millions of genetic markers is collected from up to hundreds of thousands of individuals, together with their phenotype information. Several scientific goals can be accomplished through the analysis of GWAS data, including the identification of variants, genes, and pathways associated with diseases and traits of interest; the inference of the genetic architecture of these traits; and the development of genetic risk prediction models. In this review, we provide an overview of the statistical challenges in achieving these goals and recent progress in statistical methodology to address these challenges.
Collapse
Affiliation(s)
- Ning Sun
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| |
Collapse
|
11
|
Price N, Lopez L, Platts AE, Lasky JR. In the presence of population structure: From genomics to candidate genes underlying local adaptation. Ecol Evol 2020; 10:1889-1904. [PMID: 32128123 DOI: 10.1101/642306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 12/19/2019] [Accepted: 12/23/2019] [Indexed: 05/26/2023] Open
Abstract
Understanding the genomic signatures, genes, and traits underlying local adaptation of organisms to heterogeneous environments is of central importance to the field evolutionary biology. To identify loci underlying local adaptation, models that combine allelic and environmental variation while controlling for the effects of population structure have emerged as the method of choice. Despite being evaluated in simulation studies, there has not been a thorough investigation of empirical evidence supporting local adaptation across these alleles. To evaluate these methods, we use 875 Arabidopsis thaliana Eurasian accessions and two mixed models (GEMMA and LFMM) to identify candidate SNPs underlying local adaptation to climate. Subsequently, to assess evidence of local adaptation and function among significant SNPs, we examine allele frequency differentiation and recent selection across Eurasian populations, in addition to their distribution along quantitative trait loci (QTL) explaining fitness variation between Italy and Sweden populations and cis-regulatory/nonsynonymous sites showing significant selective constraint. Our results indicate that significant LFMM/GEMMA SNPs show low allele frequency differentiation and linkage disequilibrium across locally adapted Italy and Sweden populations, in addition to a poor association with fitness QTL peaks (highest logarithm of odds score). Furthermore, when examining derived allele frequencies across the Eurasian range, we find that these SNPs are enriched in low-frequency variants that show very large climatic differentiation but low levels of linkage disequilibrium. These results suggest that their enrichment along putative functional sites most likely represents deleterious variation that is independent of local adaptation. Among all the genomic signatures examined, only SNPs showing high absolute allele frequency differentiation (AFD) and linkage disequilibrium (LD) between Italy and Sweden populations showed a strong association with fitness QTL peaks and were enriched along selectively constrained cis-regulatory/nonsynonymous sites. Using these SNPs, we find strong evidence linking flowering time, freezing tolerance, and the abscisic-acid pathway to local adaptation.
Collapse
Affiliation(s)
- Nicholas Price
- Department of Bioagricultural Sciences & Pest Management Colorado State University Fort Collins CO USA
- Department of Biological Sciences University of Cyprus Nicosia Cyprus
| | - Lua Lopez
- Department of Biology Binghamton University (State University of New York) Binghamton NY USA
| | - Adrian E Platts
- Simons Center for Quantitative Biology Cold Spring Harbor Laboratory Cold Spring Harbor NY USA
- Department of Biology Center for Genomics and Systems Biology New York University New York NY USA
| | - Jesse R Lasky
- Department of Biology Pennsylvania State University University Park PA USA
| |
Collapse
|
12
|
Price N, Lopez L, Platts AE, Lasky JR. In the presence of population structure: From genomics to candidate genes underlying local adaptation. Ecol Evol 2020; 10:1889-1904. [PMID: 32128123 PMCID: PMC7042746 DOI: 10.1002/ece3.6002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 12/19/2019] [Accepted: 12/23/2019] [Indexed: 12/25/2022] Open
Abstract
Understanding the genomic signatures, genes, and traits underlying local adaptation of organisms to heterogeneous environments is of central importance to the field evolutionary biology. To identify loci underlying local adaptation, models that combine allelic and environmental variation while controlling for the effects of population structure have emerged as the method of choice. Despite being evaluated in simulation studies, there has not been a thorough investigation of empirical evidence supporting local adaptation across these alleles. To evaluate these methods, we use 875 Arabidopsis thaliana Eurasian accessions and two mixed models (GEMMA and LFMM) to identify candidate SNPs underlying local adaptation to climate. Subsequently, to assess evidence of local adaptation and function among significant SNPs, we examine allele frequency differentiation and recent selection across Eurasian populations, in addition to their distribution along quantitative trait loci (QTL) explaining fitness variation between Italy and Sweden populations and cis-regulatory/nonsynonymous sites showing significant selective constraint. Our results indicate that significant LFMM/GEMMA SNPs show low allele frequency differentiation and linkage disequilibrium across locally adapted Italy and Sweden populations, in addition to a poor association with fitness QTL peaks (highest logarithm of odds score). Furthermore, when examining derived allele frequencies across the Eurasian range, we find that these SNPs are enriched in low-frequency variants that show very large climatic differentiation but low levels of linkage disequilibrium. These results suggest that their enrichment along putative functional sites most likely represents deleterious variation that is independent of local adaptation. Among all the genomic signatures examined, only SNPs showing high absolute allele frequency differentiation (AFD) and linkage disequilibrium (LD) between Italy and Sweden populations showed a strong association with fitness QTL peaks and were enriched along selectively constrained cis-regulatory/nonsynonymous sites. Using these SNPs, we find strong evidence linking flowering time, freezing tolerance, and the abscisic-acid pathway to local adaptation.
Collapse
Affiliation(s)
- Nicholas Price
- Department of Bioagricultural Sciences & Pest ManagementColorado State UniversityFort CollinsCOUSA
- Department of Biological SciencesUniversity of CyprusNicosiaCyprus
| | - Lua Lopez
- Department of BiologyBinghamton University (State University of New York)BinghamtonNYUSA
| | - Adrian E. Platts
- Simons Center for Quantitative BiologyCold Spring Harbor LaboratoryCold Spring HarborNYUSA
- Department of BiologyCenter for Genomics and Systems BiologyNew York UniversityNew YorkNYUSA
| | - Jesse R. Lasky
- Department of BiologyPennsylvania State UniversityUniversity ParkPAUSA
| |
Collapse
|
13
|
Mallik S, Odom GJ, Gao Z, Gomez L, Chen X, Wang L. An evaluation of supervised methods for identifying differentially methylated regions in Illumina methylation arrays. Brief Bioinform 2019; 20:2224-2235. [PMID: 30239597 PMCID: PMC6954393 DOI: 10.1093/bib/bby085] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 07/24/2018] [Accepted: 08/16/2018] [Indexed: 01/19/2023] Open
Abstract
Epigenome-wide association studies (EWASs) have become increasingly popular for studying DNA methylation (DNAm) variations in complex diseases. The Illumina methylation arrays provide an economical, high-throughput and comprehensive platform for measuring methylation status in EWASs. A number of software tools have been developed for identifying disease-associated differentially methylated regions (DMRs) in the epigenome. However, in practice, we found these tools typically had multiple parameter settings that needed to be specified and the performance of the software tools under different parameters was often unclear. To help users better understand and choose optimal parameter settings when using DNAm analysis tools, we conducted a comprehensive evaluation of 4 popular DMR analysis tools under 60 different parameter settings. In addition to evaluating power, precision, area under precision-recall curve, Matthews correlation coefficient, F1 score and type I error rate, we also compared several additional characteristics of the analysis results, including the size of the DMRs, overlap between the methods and execution time. The results showed that none of the software tools performed best under their default parameter settings, and power varied widely when parameters were changed. Overall, the precision of these software tools were good. In contrast, all methods lacked power when effect size was consistent but small. Across all simulation scenarios, comb-p consistently had the best sensitivity as well as good control of false-positive rate.
Collapse
Affiliation(s)
- Saurav Mallik
- Division of Biostatistics, Department of Public Health Sciences, University of Miami, Miller School of Medicine, Miami, FL, USA
- Joint First Authors
| | - Gabriel J Odom
- Division of Biostatistics, Department of Public Health Sciences, University of Miami, Miller School of Medicine, Miami, FL, USA
- Joint First Authors
| | - Zhen Gao
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Lissette Gomez
- Dr. John T. Macdonald Foundation, Department of Human Genetics, and John P. Hussman Institute for Human Genomics, University of Miami, Miami, FL, USA
| | - Xi Chen
- Division of Biostatistics, Department of Public Health Sciences, University of Miami, Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Lily Wang
- Division of Biostatistics, Department of Public Health Sciences, University of Miami, Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA
- Dr. John T. Macdonald Foundation, Department of Human Genetics, and John P. Hussman Institute for Human Genomics, University of Miami, Miami, FL, USA
| |
Collapse
|
14
|
Cui H, Srinivasan S, Korkin D. Enriching Human Interactome with Functional Mutations to Detect High-Impact Network Modules Underlying Complex Diseases. Genes (Basel) 2019; 10:E933. [PMID: 31731769 PMCID: PMC6895925 DOI: 10.3390/genes10110933] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Revised: 11/04/2019] [Accepted: 11/11/2019] [Indexed: 11/16/2022] Open
Abstract
Rapid progress in high-throughput -omics technologies moves us one step closer to the datacalypse in life sciences. In spite of the already generated volumes of data, our knowledge of the molecular mechanisms underlying complex genetic diseases remains limited. Increasing evidence shows that biological networks are essential, albeit not sufficient, for the better understanding of these mechanisms. The identification of disease-specific functional modules in the human interactome can provide a more focused insight into the mechanistic nature of the disease. However, carving a disease network module from the whole interactome is a difficult task. In this paper, we propose a computational framework, Discovering most IMpacted SUbnetworks in interactoMe (DIMSUM), which enables the integration of genome-wide association studies (GWAS) and functional effects of mutations into the protein-protein interaction (PPI) network to improve disease module detection. Specifically, our approach incorporates and propagates the functional impact of non-synonymous single nucleotide polymorphisms (nsSNPs) on PPIs to implicate the genes that are most likely influenced by the disruptive mutations, and to identify the module with the greatest functional impact. Comparison against state-of-the-art seed-based module detection methods shows that our approach could yield modules that are biologically more relevant and have stronger association with the studied disease. We expect for our method to become a part of the common toolbox for the disease module analysis, facilitating the discovery of new disease markers.
Collapse
Affiliation(s)
- Hongzhu Cui
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| | - Suhas Srinivasan
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| |
Collapse
|
15
|
Mora A. Gene set analysis methods for the functional interpretation of non-mRNA data—Genomic range and ncRNA data. Brief Bioinform 2019; 21:1495-1508. [DOI: 10.1093/bib/bbz090] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 05/30/2019] [Accepted: 06/28/2019] [Indexed: 12/31/2022] Open
Abstract
Abstract
Gene set analysis (GSA) is one of the methods of choice for analyzing the results of current omics studies; however, it has been mainly developed to analyze mRNA (microarray, RNA-Seq) data. The following review includes an update regarding general methods and resources for GSA and then emphasizes GSA methods and tools for non-mRNA omics datasets, specifically genomic range data (ChIP-Seq, SNP and methylation) and ncRNA data (miRNAs, lncRNAs and others). In the end, the state of the GSA field for non-mRNA datasets is discussed, and some current challenges and trends are highlighted, especially the use of network approaches to face complexity issues.
Collapse
Affiliation(s)
- Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences
| |
Collapse
|
16
|
Runcie DE, Crawford L. Fast and flexible linear mixed models for genome-wide genetics. PLoS Genet 2019; 15:e1007978. [PMID: 30735486 PMCID: PMC6383949 DOI: 10.1371/journal.pgen.1007978] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 02/21/2019] [Accepted: 01/21/2019] [Indexed: 11/18/2022] Open
Abstract
Linear mixed effect models are powerful tools used to account for population structure in genome-wide association studies (GWASs) and estimate the genetic architecture of complex traits. However, fully-specified models are computationally demanding and common simplifications often lead to reduced power or biased inference. We describe Grid-LMM (https://github.com/deruncie/GridLMM), an extendable algorithm for repeatedly fitting complex linear models that account for multiple sources of heterogeneity, such as additive and non-additive genetic variance, spatial heterogeneity, and genotype-environment interactions. Grid-LMM can compute approximate (yet highly accurate) frequentist test statistics or Bayesian posterior summaries at a genome-wide scale in a fraction of the time compared to existing general-purpose methods. We apply Grid-LMM to two types of quantitative genetic analyses. The first is focused on accounting for spatial variability and non-additive genetic variance while scanning for QTL; and the second aims to identify gene expression traits affected by non-additive genetic variation. In both cases, modeling multiple sources of heterogeneity leads to new discoveries.
Collapse
Affiliation(s)
- Daniel E. Runcie
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - Lorin Crawford
- Department of Biostatistics, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
17
|
Li A, Qin G, Suzuki A, Gajera M, Iwata J, Jia P, Zhao Z. Network-based identification of critical regulators as putative drivers of human cleft lip. BMC Med Genomics 2019; 12:16. [PMID: 30704473 PMCID: PMC6357351 DOI: 10.1186/s12920-018-0458-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Cleft lip (CL) is one of the most common congenital birth defects with complex etiology. While genome-wide association studies (GWAS) have made significant advances in our understanding of mutations and their related genes with potential involvement in the etiology of CL, it remains unknown how these genes are functionally regulated and interact with each other in lip development. Currently, identifying the disease-causing genes in human CL is urgently needed. So far, the causative CL genes have been largely undiscovered, making it challenging to design experiments to validate the functional influence of the mutations identified from large genomic studies such as CL GWAS. RESULTS Transcription factors (TFs) and microRNAs (miRNAs) are two important regulators in cellular system. In this study, we aimed to investigate the genetic interactions among TFs, miRNAs and the CL genes curated from the previous studies. We constructed miRNA-TF co-regulatory networks, from which the critical regulators as putative drivers in CL were examined. Based on the constructed networks, we identified ten critical hub genes with prior evidence in CL. Furthermore, the analysis of partitioned regulatory modules highlighted a number of biological processes involved in the pathology of CL, including a novel pathway "Signaling pathway regulating pluripotency of stem cells". Our subnetwork analysis pinpointed two candidate miRNAs, hsa-mir-27b and hsa-mir-497, activating the Wnt pathway that was associated with CL. Our results were supported by an independent gene expression dataset in CL. CONCLUSIONS This study represents the first regulatory network analysis of CL genes. Our work presents a global view of the CL regulatory network and a novel approach on investigating critical miRNAs, TFs and genes via combinatory regulatory networks in craniofacial development. The top genes and miRNAs will be important candidates for future experimental validation of their functions in CL.
Collapse
Affiliation(s)
- Aimin Li
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, Shaanxi, China.,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St., Suite 820, Houston, TX, 77030, USA
| | - Guimin Qin
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St., Suite 820, Houston, TX, 77030, USA.,School of Software, Xidian University, Xi'an, 710071, Shaanxi, China
| | - Akiko Suzuki
- Department of Diagnostic and Biomedical Sciences, School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX, 77054, USA.,Center for Craniofacial Research, The University of Texas Health Science Center at Houston, Houston, TX, 77054, USA
| | - Mona Gajera
- Department of Diagnostic and Biomedical Sciences, School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX, 77054, USA.,Center for Craniofacial Research, The University of Texas Health Science Center at Houston, Houston, TX, 77054, USA
| | - Junichi Iwata
- Department of Diagnostic and Biomedical Sciences, School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX, 77054, USA.,Center for Craniofacial Research, The University of Texas Health Science Center at Houston, Houston, TX, 77054, USA.,MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, 77030, USA
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St., Suite 820, Houston, TX, 77030, USA.
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St., Suite 820, Houston, TX, 77030, USA. .,MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, 77030, USA.
| |
Collapse
|
18
|
Lamparter D, Marbach D, Rueedi R, Kutalik Z, Bergmann S. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics. PLoS Comput Biol 2016; 12:e1004714. [PMID: 26808494 PMCID: PMC4726509 DOI: 10.1371/journal.pcbi.1004714] [Citation(s) in RCA: 223] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 12/17/2015] [Indexed: 12/17/2022] Open
Abstract
Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries. Genome-wide association studies (GWAS) typically generate lists of trait- or disease-associated SNPs. Yet, such output sheds little light on the underlying molecular mechanisms and tools are needed to extract biological insight from the results at the SNP level. Pathway analysis tools integrate signals from multiple SNPs at various positions in the genome in order to map associated genomic regions to well-established pathways, i.e., sets of genes known to act in concert. The nature of GWAS association results requires specifically tailored methods for this task. Here, we present Pascal (Pathway scoring algorithm), a tool that allows gene and pathway-level analysis of GWAS association results without the need to access the original genotypic data. Pascal was designed to be fast, accurate and to have high power to detect relevant pathways. We extensively tested our approach on a large collection of real GWAS association results and saw better discovery of confirmed pathways than with other popular methods. We believe that these results together with the ease-of-use of our publicly available software will allow Pascal to become a useful addition to the toolbox of the GWAS community.
Collapse
Affiliation(s)
- David Lamparter
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Daniel Marbach
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Rico Rueedi
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Zoltán Kutalik
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Social and Preventive Medicine (IUMSP), Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
- * E-mail: ;
| | - Sven Bergmann
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail: ;
| |
Collapse
|
19
|
Reed E, Nunez S, Kulp D, Qian J, Reilly MP, Foulkes AS. A guide to genome-wide association analysis and post-analytic interrogation. Stat Med 2015; 34:3769-92. [PMID: 26343929 PMCID: PMC5019244 DOI: 10.1002/sim.6605] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2015] [Revised: 06/09/2015] [Accepted: 07/06/2015] [Indexed: 01/14/2023]
Abstract
This tutorial is a learning resource that outlines the basic process and provides specific software tools for implementing a complete genome‐wide association analysis. Approaches to post‐analytic visualization and interrogation of potentially novel findings are also presented. Applications are illustrated using the free and open‐source R statistical computing and graphics software environment, Bioconductor software for bioinformatics and the UCSC Genome Browser. Complete genome‐wide association data on 1401 individuals across 861,473 typed single nucleotide polymorphisms from the PennCATH study of coronary artery disease are used for illustration. All data and code, as well as additional instructional resources, are publicly available through the Open Resources in Statistical Genomics project: http://www.stat-gen.org. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Collapse
Affiliation(s)
- Eric Reed
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| | - Sara Nunez
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| | - David Kulp
- Department of Computer Science, University of Massachusetts, Amherst, MA, U.S.A
| | - Jing Qian
- Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, U.S.A
| | - Muredach P Reilly
- Department of Medicine, University of Pennsylvania, Philadelphia, PA, U.S.A
| | - Andrea S Foulkes
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| |
Collapse
|
20
|
Evangelou M, Smyth DJ, Fortune MD, Burren OS, Walker NM, Guo H, Onengut-Gumuscu S, Chen WM, Concannon P, Rich SS, Todd JA, Wallace C. A method for gene-based pathway analysis using genomewide association study summary statistics reveals nine new type 1 diabetes associations. Genet Epidemiol 2014; 38:661-70. [PMID: 25371288 PMCID: PMC4258092 DOI: 10.1002/gepi.21853] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Revised: 06/02/2014] [Accepted: 07/29/2014] [Indexed: 12/11/2022]
Abstract
Pathway analysis can complement point-wise single nucleotide polymorphism (SNP) analysis in exploring genomewide association study (GWAS) data to identify specific disease-associated genes that can be candidate causal genes. We propose a straightforward methodology that can be used for conducting a gene-based pathway analysis using summary GWAS statistics in combination with widely available reference genotype data. We used this method to perform a gene-based pathway analysis of a type 1 diabetes (T1D) meta-analysis GWAS (of 7,514 cases and 9,045 controls). An important feature of the conducted analysis is the removal of the major histocompatibility complex gene region, the major genetic risk factor for T1D. Thirty-one of the 1,583 (2%) tested pathways were identified to be enriched for association with T1D at a 5% false discovery rate. We analyzed these 31 pathways and their genes to identify SNPs in or near these pathway genes that showed potentially novel association with T1D and attempted to replicate the association of 22 SNPs in additional samples. Replication P-values were skewed () with 12 of the 22 SNPs showing . Support, including replication evidence, was obtained for nine T1D associated variants in genes ITGB7 (rs11170466, ), NRP1 (rs722988, ), BAD (rs694739, ), CTSB (rs1296023, ), FYN (rs11964650, ), UBE2G1 (rs9906760, ), MAP3K14 (rs17759555, ), ITGB1 (rs1557150, ), and IL7R (rs1445898, ). The proposed methodology can be applied to other GWAS datasets for which only summary level data are available.
Collapse
Affiliation(s)
- Marina Evangelou
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Cambridge Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, CB2 0XY, UK
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Rowlands DS, Page RA, Sukala WR, Giri M, Ghimbovschi SD, Hayat I, Cheema BS, Lys I, Leikis M, Sheard PW, Wakefield SJ, Breier B, Hathout Y, Brown K, Marathi R, Orkunoglu-Suer FE, Devaney JM, Leiken B, Many G, Krebs J, Hopkins WG, Hoffman EP. Multi-omic integrated networks connect DNA methylation and miRNA with skeletal muscle plasticity to chronic exercise in Type 2 diabetic obesity. Physiol Genomics 2014; 46:747-65. [PMID: 25138607 PMCID: PMC4200377 DOI: 10.1152/physiolgenomics.00024.2014] [Citation(s) in RCA: 83] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 08/08/2014] [Indexed: 01/19/2023] Open
Abstract
Epigenomic regulation of the transcriptome by DNA methylation and posttranscriptional gene silencing by miRNAs are potential environmental modulators of skeletal muscle plasticity to chronic exercise in healthy and diseased populations. We utilized transcriptome networks to connect exercise-induced differential methylation and miRNA with functional skeletal muscle plasticity. Biopsies of the vastus lateralis were collected from middle-aged Polynesian men and women with morbid obesity (44 kg/m(2) ± 10) and Type 2 diabetes before and following 16 wk of resistance (n = 9) or endurance training (n = 8). Longitudinal transcriptome, methylome, and microRNA (miRNA) responses were obtained via microarray, filtered by novel effect-size based false discovery rate probe selection preceding bioinformatic interrogation. Metabolic and microvascular transcriptome topology dominated the network landscape following endurance exercise. Lipid and glucose metabolism modules were connected to: microRNA (miR)-29a; promoter region hypomethylation of nuclear receptor factor (NRF1) and fatty acid transporter (SLC27A4), and hypermethylation of fatty acid synthase, and to exon hypomethylation of 6-phosphofructo-2-kinase and Ser/Thr protein kinase. Directional change in the endurance networks was validated by lower intramyocellular lipid, increased capillarity, GLUT4, hexokinase, and mitochondrial enzyme activity and proteome. Resistance training also lowered lipid and increased enzyme activity and caused GLUT4 promoter hypomethylation; however, training was inconsequential to GLUT4, capillarity, and metabolic transcriptome. miR-195 connected to negative regulation of vascular development. To conclude, integrated molecular network modelling revealed differential DNA methylation and miRNA expression changes occur in skeletal muscle in response to chronic exercise training that are most pronounced with endurance training and topographically associated with functional metabolic and microvascular plasticity relevant to diabetes rehabilitation.
Collapse
Affiliation(s)
- David S Rowlands
- School of Sport and Exercise, Massey University, Wellington, New Zealand;
| | - Rachel A Page
- Institute of Food, Nutrition & Human Health, Massey University, New Zealand
| | - William R Sukala
- Institute of Food, Nutrition & Human Health, Massey University, New Zealand
| | - Mamta Giri
- Children's National Medical Center, Center for Genetic Medicine Research (CGMR), Washington, District of Columbia
| | - Svetlana D Ghimbovschi
- Children's National Medical Center, Center for Genetic Medicine Research (CGMR), Washington, District of Columbia
| | - Irum Hayat
- Institute of Food, Nutrition & Human Health, Massey University, New Zealand
| | - Birinder S Cheema
- School of Science and Health, University of Western Sydney, Campbelltown, Australia
| | - Isabelle Lys
- Faculty of Engineering, Health, Science and the Environment, Charles Darwin University, Australia
| | - Murray Leikis
- Wellington Hospital, Capital and Coast District Health Board, Wellington, New Zealand
| | - Phillip W Sheard
- Department of Physiology, University of Otago, Dunedin, New Zealand
| | - St John Wakefield
- Department of Pathology, University of Otago, Wellington, New Zealand; and
| | - Bernhard Breier
- Institute of Food, Nutrition & Human Health, Massey University, New Zealand
| | - Yetrib Hathout
- Children's National Medical Center, Center for Genetic Medicine Research (CGMR), Washington, District of Columbia
| | - Kristy Brown
- Children's National Medical Center, Center for Genetic Medicine Research (CGMR), Washington, District of Columbia
| | - Ramya Marathi
- Children's National Medical Center, Center for Genetic Medicine Research (CGMR), Washington, District of Columbia
| | - Funda E Orkunoglu-Suer
- Children's National Medical Center, Center for Genetic Medicine Research (CGMR), Washington, District of Columbia
| | - Joseph M Devaney
- Children's National Medical Center, Center for Genetic Medicine Research (CGMR), Washington, District of Columbia
| | - Benjamin Leiken
- Children's National Medical Center, Center for Genetic Medicine Research (CGMR), Washington, District of Columbia
| | - Gina Many
- Children's National Medical Center, Center for Genetic Medicine Research (CGMR), Washington, District of Columbia
| | - Jeremy Krebs
- Endocrine and Diabetes Unit, Capital and Coast District Health Board, Wellington, New Zealand
| | - Will G Hopkins
- Health Science/Sport and Recreation, Auckland University of Technology, Auckland, New Zealand
| | - Eric P Hoffman
- Children's National Medical Center, Center for Genetic Medicine Research (CGMR), Washington, District of Columbia
| |
Collapse
|
22
|
Mitra R, Edmonds MD, Sun J, Zhao M, Yu H, Eischen CM, Zhao Z. Reproducible combinatorial regulatory networks elucidate novel oncogenic microRNAs in non-small cell lung cancer. RNA (NEW YORK, N.Y.) 2014; 20:1356-68. [PMID: 25024357 PMCID: PMC4138319 DOI: 10.1261/rna.042754.113] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Accepted: 05/01/2014] [Indexed: 06/03/2023]
Abstract
While previous studies reported aberrant expression of microRNAs (miRNAs) in non-small cell lung cancer (NSCLC), little is known about which miRNAs play central roles in NSCLC's pathogenesis and its regulatory mechanisms. To address this issue, we presented a robust computational framework that integrated matched miRNA and mRNA expression profiles in NSCLC using feed-forward loops. The network consists of miRNAs, transcription factors (TFs), and their common predicted target genes. To discern the biological meaning of their associations, we introduced the direction of regulation. A network edge validation strategy using three independent NSCLC expression profiling data sets pinpointed reproducible biological regulations. Reproducible regulation, which may reflect the true molecular interaction, has not been applied to miRNA-TF co-regulatory network analyses in cancer or other diseases yet. We revealed eight hub miRNAs that connected to a higher proportion of targets validated by independent data sets. Network analyses showed that these miRNAs might have strong oncogenic characteristics. Furthermore, we identified a novel miRNA-TF co-regulatory module that potentially suppresses the tumor suppressor activity of the TGF-β pathway by targeting a core pathway molecule (TGFBR2). Follow-up experiments showed two miRNAs (miR-9-5p and miR-130b-3p) in this module had increased expression while their target gene TGFBR2 had decreased expression in a cohort of human NSCLC. Moreover, we demonstrated these two miRNAs directly bind to the 3' untranslated region of TGFBR2. This study enhanced our understanding of miRNA-TF co-regulatory mechanisms in NSCLC. The combined bioinformatics and validation approach we described can be applied to study other types of diseases.
Collapse
Affiliation(s)
- Ramkrishna Mitra
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA
| | - Mick D Edmonds
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA
| | - Jingchun Sun
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA
| | - Min Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA
| | - Hui Yu
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA
| | - Christine M Eischen
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, Tennessee 37212, USA Center for Quantitative Sciences, Vanderbilt University, Nashville, Tennessee 37232, USA
| |
Collapse
|
23
|
Mooney MA, Nigg JT, McWeeney SK, Wilmot B. Functional and genomic context in pathway analysis of GWAS data. Trends Genet 2014; 30:390-400. [PMID: 25154796 DOI: 10.1016/j.tig.2014.07.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 07/18/2014] [Accepted: 07/18/2014] [Indexed: 02/07/2023]
Abstract
Gene set analysis (GSA) is a promising tool for uncovering the polygenic effects associated with complex diseases. However, the available techniques reflect a wide variety of hypotheses about how genetic effects interact to contribute to disease susceptibility. The lack of consensus about the best way to perform GSA has led to confusion in the field and has made it difficult to compare results across methods. A clear understanding of the various choices made during GSA - such as how gene sets are defined, how single-nucleotide polymorphisms (SNPs) are assigned to genes, and how individual SNP-level effects are aggregated to produce gene- or pathway-level effects - will improve the interpretability and comparability of results across methods and studies. In this review we provide an overview of the various data sources used to construct gene sets and the statistical methods used to test for gene set association, as well as provide guidelines for ensuring the comparability of results.
Collapse
Affiliation(s)
- Michael A Mooney
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA; OHSU Knight Cancer Institute, Portland, OR, USA
| | - Joel T Nigg
- Division of Psychology, Department of Psychiatry, Oregon Health & Science University, Portland, OR, USA; Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, OR, USA
| | - Shannon K McWeeney
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA; Oregon Clinical and Translational Research Institute, Portland, OR, USA; OHSU Knight Cancer Institute, Portland, OR, USA.
| | - Beth Wilmot
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA; Oregon Clinical and Translational Research Institute, Portland, OR, USA; OHSU Knight Cancer Institute, Portland, OR, USA
| |
Collapse
|
24
|
Jia P, Zhao Z. Network.assisted analysis to prioritize GWAS results: principles, methods and perspectives. Hum Genet 2014; 133:125-38. [PMID: 24122152 PMCID: PMC3943795 DOI: 10.1007/s00439-013-1377-1] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2012] [Accepted: 10/03/2013] [Indexed: 01/24/2023]
Abstract
Genome-wide association studies (GWAS) have rapidly become a powerful tool in genetic studies of complex diseases and traits. Traditionally, single marker-based tests have been used prevalently in GWAS and have uncovered tens of thousands of disease-associated SNPs. Network-assisted analysis (NAA) of GWAS data is an emerging area in which network-related approaches are developed and utilized to perform advanced analyses of GWAS data in order to study various human diseases or traits. Progress has been made in both methodology development and applications of NAA in GWAS data, and it has already been demonstrated that NAA results may enhance our interpretation and prioritization of candidate genes and markers. Inspired by the strong interest in and high demand for advanced GWAS data analysis, in this review article, we discuss the methodologies and strategies that have been reported for the NAA of GWAS data. Many NAA approaches search for subnetworks and assess the combined effects of multiple genes participating in the resultant subnetworks through a gene set analysis. With no restriction to pre-defined canonical pathways, NAA has the advantage of defining subnetworks with the guidance of the GWAS data under investigation. In addition, some NAA methods prioritize genes from GWAS data based on their interconnections in the reference network. Here, we summarize NAA applications to various diseases and discuss the available options and potential caveats related to their practical usage. Additionally, we provide perspectives regarding this rapidly growing research area.
Collapse
|
25
|
Incorporating prior knowledge to increase the power of genome-wide association studies. Methods Mol Biol 2014; 1019:519-41. [PMID: 23756909 DOI: 10.1007/978-1-62703-447-0_25] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Typical methods of analyzing genome-wide single nucleotide variant (SNV) data in cases and controls involve testing each variant's genotypes separately for phenotype association, and then using a substantial multiple-testing penalty to minimize the rate of false positives. This approach, however, can result in low power for modestly associated SNVs. Furthermore, simply looking at the most associated SNVs may not directly yield biological insights about disease etiology. SNVset methods attempt to address both limitations of the traditional approach by testing biologically meaningful sets of SNVs (e.g., genes or pathways). The number of tests run in a SNVset analysis is typically much lower (hundreds or thousands instead of millions) than in a traditional analysis, so the false-positive rate is lower. Additionally, by testing SNVsets that are biologically meaningful finding a significant set may more quickly yield insights into disease etiology.In this chapter we summarize the short history of SNVset testing and provide an overview of the many recently proposed methods. Furthermore, we provide detailed step-by-step instructions on how to perform a SNVset analysis, including a substantial number of practical tips and questions that researchers should consider before undertaking a SNVset analysis. Lastly, we describe a companion R package (snvset) that implements recently proposed SNVset methods. While SNVset testing is a new approach, with many new methods still being developed and many open questions, the promise of the approach is worth serious consideration when considering analytic methods for GWAS.
Collapse
|
26
|
Lee Y, Ghosh D, Zhang Y. Association testing to detect gene-gene interactions on sex chromosomes in trio data. Front Genet 2013; 4:239. [PMID: 24312118 PMCID: PMC3826485 DOI: 10.3389/fgene.2013.00239] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 10/24/2013] [Indexed: 11/13/2022] Open
Abstract
Autism Spectrum Disorder (ASD) occurs more often among males than females in a 4:1 ratio. Among theories used to explain the causes of ASD, the X chromosome and the Y chromosome theories attribute ASD to the X-linked mutation and the male-limited gene expressions on the Y chromosome, respectively. Despite the rationale of the theory, studies have failed to attribute the sex-biased ratio to the significant linkage or association on the regions of interest on X chromosome. We further study the gender biased ratio by examining the possible interaction effects between two genes in the sex chromosomes. We propose a logistic regression model with mixed effects to detect gene–gene interactions on sex chromosomes. We investigated the power and type I error rates of the approach for a range of minor allele frequencies and varying linkage disequilibrium between markers and QTLs. We also evaluated the robustness of the model to population stratification. We applied the model to a trio-family data set with an ASD affected male child to study gene–gene interactions on sex chromosomes.
Collapse
Affiliation(s)
- Yeonok Lee
- Department of Statistics, Penn State University, University Park PA, USA
| | | | | |
Collapse
|
27
|
Silver M, Chen P, Li R, Cheng CY, Wong TY, Tai ES, Teo YY, Montana G. Pathways-driven sparse regression identifies pathways and genes associated with high-density lipoprotein cholesterol in two Asian cohorts. PLoS Genet 2013; 9:e1003939. [PMID: 24278029 PMCID: PMC3836716 DOI: 10.1371/journal.pgen.1003939] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 09/11/2013] [Indexed: 01/11/2023] Open
Abstract
Standard approaches to data analysis in genome-wide association studies (GWAS) ignore any potential functional relationships between gene variants. In contrast gene pathways analysis uses prior information on functional structure within the genome to identify pathways associated with a trait of interest. In a second step, important single nucleotide polymorphisms (SNPs) or genes may be identified within associated pathways. The pathways approach is motivated by the fact that genes do not act alone, but instead have effects that are likely to be mediated through their interaction in gene pathways. Where this is the case, pathways approaches may reveal aspects of a trait's genetic architecture that would otherwise be missed when considering SNPs in isolation. Most pathways methods begin by testing SNPs one at a time, and so fail to capitalise on the potential advantages inherent in a multi-SNP, joint modelling approach. Here, we describe a dual-level, sparse regression model for the simultaneous identification of pathways and genes associated with a quantitative trait. Our method takes account of various factors specific to the joint modelling of pathways with genome-wide data, including widespread correlation between genetic predictors, and the fact that variants may overlap multiple pathways. We use a resampling strategy that exploits finite sample variability to provide robust rankings for pathways and genes. We test our method through simulation, and use it to perform pathways-driven gene selection in a search for pathways and genes associated with variation in serum high-density lipoprotein cholesterol levels in two separate GWAS cohorts of Asian adults. By comparing results from both cohorts we identify a number of candidate pathways including those associated with cardiomyopathy, and T cell receptor and PPAR signalling. Highlighted genes include those associated with the L-type calcium channel, adenylate cyclase, integrin, laminin, MAPK signalling and immune function.
Collapse
Affiliation(s)
- Matt Silver
- Statistics Section, Department of Mathematics, Imperial College, London, United Kingdom
- MRC International Nutrition Group, London School of Hygiene and Tropical Medicine, London, United Kingdom
- * E-mail:
| | - Peng Chen
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Ruoying Li
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Ching-Yu Cheng
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
- Department of Ophthalmology, National University of Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Center, Singapore
| | - Tien-Yin Wong
- Department of Ophthalmology, National University of Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Center, Singapore
| | - E-Shyong Tai
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Yik-Ying Teo
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
- NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore
- Life Sciences Institute, National University of Singapore, Singapore
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - Giovanni Montana
- Statistics Section, Department of Mathematics, Imperial College, London, United Kingdom
| |
Collapse
|
28
|
Evangelou M, Dudbridge F, Wernisch L. Two novel pathway analysis methods based on a hierarchical model. ACTA ACUST UNITED AC 2013; 30:690-7. [PMID: 24123673 PMCID: PMC3933872 DOI: 10.1093/bioinformatics/btt583] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Motivation: Over the past few years several pathway analysis methods have been proposed for exploring and enhancing the analysis of genome-wide association data. Hierarchical models have been advocated as a way to integrate SNP and pathway effects in the same model, but their computational complexity has prevented them being applied on a genome-wide scale to date. Methods: We present two novel methods for identifying associated pathways. In the proposed hierarchical model, the SNP effects are analytically integrated out of the analysis, allowing computationally tractable model fitting to genome-wide data. The first method uses Bayes factors for calculating the effect of the pathways, whereas the second method uses a machine learning algorithm and adaptive lasso for finding a sparse solution of associated pathways. Results: The performance of the proposed methods was explored on both simulated and real data. The results of the simulation study showed that the methods outperformed some well-established association methods: the commonly used Fisher’s method for combining P-values and also the recently published BGSA. The methods were applied to two genome-wide association study datasets that aimed to find the genetic structure of platelet function and body mass index, respectively. The results of the analyses replicated the results of previously published pathway analysis of these phenotypes but also identified novel pathways that are potentially involved. Availability: An R package is under preparation. In the meantime, the scripts of the methods are available on request from the authors. Contact: marina.evangelou@cimr.cam.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marina Evangelou
- Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge, CB2 0SR, UK, JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, NIHR Cambridge Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Cambridge, CB2 0XY, UK and Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, WC1E 7HT, UK
| | | | | |
Collapse
|
29
|
Carbonetto P, Stephens M. Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease. PLoS Genet 2013; 9:e1003770. [PMID: 24098138 PMCID: PMC3789883 DOI: 10.1371/journal.pgen.1003770] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2012] [Accepted: 07/22/2013] [Indexed: 12/17/2022] Open
Abstract
Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and "Measles" pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14, and FYN) constitute novel putative T1D loci for further study.
Collapse
Affiliation(s)
- Peter Carbonetto
- Dept. of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Matthew Stephens
- Dept. of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Dept. of Statistics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
30
|
Association signals unveiled by a comprehensive gene set enrichment analysis of dental caries genome-wide association studies. PLoS One 2013; 8:e72653. [PMID: 23967329 PMCID: PMC3743773 DOI: 10.1371/journal.pone.0072653] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 07/11/2013] [Indexed: 11/19/2022] Open
Abstract
Gene set-based analysis of genome-wide association study (GWAS) data has recently emerged as a useful approach to examine the joint effects of multiple risk loci in complex human diseases or phenotypes. Dental caries is a common, chronic, and complex disease leading to a decrease in quality of life worldwide. In this study, we applied the approaches of gene set enrichment analysis to a major dental caries GWAS dataset, which consists of 537 cases and 605 controls. Using four complementary gene set analysis methods, we analyzed 1331 Gene Ontology (GO) terms collected from the Molecular Signatures Database (MSigDB). Setting false discovery rate (FDR) threshold as 0.05, we identified 13 significantly associated GO terms. Additionally, 17 terms were further included as marginally associated because they were top ranked by each method, although their FDR is higher than 0.05. In total, we identified 30 promising GO terms, including ‘Sphingoid metabolic process,’ ‘Ubiquitin protein ligase activity,’ ‘Regulation of cytokine secretion,’ and ‘Ceramide metabolic process.’ These GO terms encompass broad functions that potentially interact and contribute to the oral immune response related to caries development, which have not been reported in the standard single marker based analysis. Collectively, our gene set enrichment analysis provided complementary insights into the molecular mechanisms and polygenic interactions in dental caries, revealing promising association signals that could not be detected through single marker analysis of GWAS data.
Collapse
|
31
|
Foulkes AS, Matthews GJ, Das U, Ferguson JF, Lin R, Reilly MP. Mixed modeling of meta-analysis P-values (MixMAP) suggests multiple novel gene loci for low density lipoprotein cholesterol. PLoS One 2013; 8:e54812. [PMID: 23405096 PMCID: PMC3566142 DOI: 10.1371/journal.pone.0054812] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2012] [Accepted: 12/17/2012] [Indexed: 12/26/2022] Open
Abstract
Informing missing heritability for complex disease will likely require leveraging information across multiple SNPs within a gene region simultaneously to characterize gene and locus-level contributions to disease phenotypes. To this aim, we introduce a novel strategy, termed Mixed modeling of Meta-Analysis P-values (MixMAP), that draws on a principled statistical modeling framework and the vast array of summary data now available from genetic association studies, to test formally for locus level association. The primary inputs to this approach are: (a) single SNP level p-values for tests of association; and (b) the mapping of SNPs to genomic regions. The output of MixMAP is comprised of locus level estimates and tests of association. In application of MixMAP to summary data from the Global Lipids Gene Consortium, we suggest twelve new loci (PKN, FN1, UGT1A1, PPARG, DMDGH, PPARD, CDK6, VPS13B, GAD2, GAB2, APOH and NPC1) for low-density lipoprotein cholesterol (LDL-C), a causal risk factor for cardiovascular disease and we also demonstrate the potential utility of MixMAP in small data settings. Overall, MixMAP offers novel and complementary information as compared to SNP-based analysis approaches and is straightforward to implement with existing open-source statistical software tools.
Collapse
Affiliation(s)
- Andrea S Foulkes
- Division of Biostatistics, School of Public Health and Health Sciences at the University of Massachusetts, Amherst, MA, USA.
| | | | | | | | | | | |
Collapse
|
32
|
Jia P, Liu Y, Zhao Z. Integrative pathway analysis of genome-wide association studies and gene expression data in prostate cancer. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 3:S13. [PMID: 23281744 PMCID: PMC3524313 DOI: 10.1186/1752-0509-6-s3-s13] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
BACKGROUND Pathway analysis of large-scale omics data assists us with the examination of the cumulative effects of multiple functionally related genes, which are difficult to detect using the traditional single gene/marker analysis. So far, most of the genomic studies have been conducted in a single domain, e.g., by genome-wide association studies (GWAS) or microarray gene expression investigation. A combined analysis of disease susceptibility genes across multiple platforms at the pathway level is an urgent need because it can reveal more reliable and more biologically important information. RESULTS We performed an integrative pathway analysis of a GWAS dataset and a microarray gene expression dataset in prostate cancer. We obtained a comprehensive pathway annotation set from knowledge-based public resources, including KEGG pathways and the prostate cancer candidate gene set, and gene sets specifically defined based on cross-platform information. By leveraging on this pathway collection, we first searched for significant pathways in the GWAS dataset using four methods, which represent two broad groups of pathway analysis approaches. The significant pathways identified by each method varied greatly, but the results were more consistent within each method group than between groups. Next, we conducted a gene set enrichment analysis of the microarray gene expression data and found 13 pathways with cross-platform evidence, including "Fc gamma R-mediated phagocytosis" (P GWAS = 0.003, P expr < 0.001, and P combined = 6.18 × 10(-8)), "regulation of actin cytoskeleton" (P GWAS = 0.003, P expr = 0.009, and P combined = 3.34 × 10(-4)), and "Jak-STAT signaling pathway" (P GWAS = 0.001, P expr = 0.084, and P combined = 8.79 × 10(-4)). CONCLUSIONS Our results provide evidence at both the genetic variation and expression levels that several key pathways might have been involved in the pathological development of prostate cancer. Our framework that employs gene expression data to facilitate pathway analysis of GWAS data is not only feasible but also much needed in studying complex disease.
Collapse
Affiliation(s)
- Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | | | | |
Collapse
|
33
|
Uncovering networks from genome-wide association studies via circular genomic permutation. G3-GENES GENOMES GENETICS 2012; 2:1067-75. [PMID: 22973544 PMCID: PMC3429921 DOI: 10.1534/g3.112.002618] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 06/29/2012] [Indexed: 11/24/2022]
Abstract
Genome-wide association studies (GWAS) aim to detect single nucleotide polymorphisms (SNP) associated with trait variation. However, due to the large number of tests, standard analysis techniques impose highly stringent significance thresholds, leaving potentially associated SNPs undetected, and much of the trait genetic variation unexplained. Pathway- and network-based methodologies applied to GWAS aim to detect associations missed by standard single-marker approaches. The complex and non-random architecture of the genome makes it a challenge to derive an appropriate testing framework for such methodologies. We developed a rapid and simple permutation approach that uses GWAS SNP association results to establish the significance of pathway associations while accounting for the linkage disequilibrium structure of SNPs and the clustering of functionally related elements in the genome. All SNPs used in the GWAS are placed in a “circular genome” according to their location. Then the complete set of SNP association P values are permuted by rotation with respect to the genomic locations of the SNPs. Once these “simulated” P values are assigned, the joint gene P values are calculated using Fisher’s combination test, and the association of pathways is tested using the hypergeometric test. The circular genomic permutation approach was applied to a human genome-wide association dataset. The data consists of 719 individuals from the ORCADES study genotyped for ∼300,000 SNPs and measured for 51 traits ranging from physical to biochemical measurements. KEGG pathways (n = 225) were used as the sets of pathways to be tested. Our results demonstrate that the circular genomic permutations provide robust association P values. The non-permuted hypergeometric analysis generates ∼1400 pathway-trait combination results with an association P value more significant than P ≤ 0.05, whereas applying circular genomic permutation reduces the number of significant results to a more credible 40% of that value. The circular permutation software (“genomicper”) is available as an R package at http://cran.r-project.org/.
Collapse
|
34
|
Sun J, Gong X, Purow B, Zhao Z. Uncovering MicroRNA and Transcription Factor Mediated Regulatory Networks in Glioblastoma. PLoS Comput Biol 2012; 8:e1002488. [PMID: 22829753 PMCID: PMC3400583 DOI: 10.1371/journal.pcbi.1002488] [Citation(s) in RCA: 111] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Accepted: 03/05/2012] [Indexed: 12/12/2022] Open
Abstract
Glioblastoma multiforme (GBM) is the most common and lethal brain tumor in humans. Recent studies revealed that patterns of microRNA (miRNA) expression in GBM tissue samples are different from those in normal brain tissues, suggesting that a number of miRNAs play critical roles in the pathogenesis of GBM. However, little is yet known about which miRNAs play central roles in the pathology of GBM and their regulatory mechanisms of action. To address this issue, in this study, we systematically explored the main regulation format (feed-forward loops, FFLs) consisting of miRNAs, transcription factors (TFs) and their impacting GBM-related genes, and developed a computational approach to construct a miRNA-TF regulatory network. First, we compiled GBM-related miRNAs, GBM-related genes, and known human TFs. We then identified 1,128 3-node FFLs and 805 4-node FFLs with statistical significance. By merging these FFLs together, we constructed a comprehensive GBM-specific miRNA-TF mediated regulatory network. Then, from the network, we extracted a composite GBM-specific regulatory network. To illustrate the GBM-specific regulatory network is promising for identification of critical miRNA components, we specifically examined a Notch signaling pathway subnetwork. Our follow up topological and functional analyses of the subnetwork revealed that six miRNAs (miR-124, miR-137, miR-219-5p, miR-34a, miR-9, and miR-92b) might play important roles in GBM, including some results that are supported by previous studies. In this study, we have developed a computational framework to construct a miRNA-TF regulatory network and generated the first miRNA-TF regulatory network for GBM, providing a valuable resource for further understanding the complex regulatory mechanisms in GBM. The observation of critical miRNAs in the Notch signaling pathway, with partial verification from previous studies, demonstrates that our network-based approach is promising for the identification of new and important miRNAs in GBM and, potentially, other cancers. Several recent studies have implicated the critical role of microRNAs (miRNAs) in the pathogenesis of glioblastoma (GBM), the most common and lethal brain tumor in humans, suggesting that miRNAs may be clinically useful as biomarkers for brain tumors and other cancers. However, to date, the regulatory mechanisms of miRNAs in GBM are unclear. In this study, we have systematically constructed miRNA and transcription factor (TF) mediated regulatory networks specific to GBM. To demonstrate that the GBM-specific regulatory network contains functional modules that may composite of critical miRNA components, we extracted a subnetwork including GBM-related genes involved in the Notch signaling pathway. Through network topological and functional analyses of the Notch signaling pathway subnetwork, several critical miRNAs have been identified, some of which have been reinforced by previous studies. This study not only provides novel miRNAs for further experimental design but also develops a novel computational framework to construct a miRNA-TF combinatory regulatory network for a specific disease.
Collapse
Affiliation(s)
- Jingchun Sun
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Xue Gong
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Benjamin Purow
- Division of Neuro-Oncology, Neurology Department, University of Virginia Health System, Charlottesville, Virginia, United States of America
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
35
|
Herold C, Mattheisen M, Lacour A, Vaitsiakhovich T, Angisch M, Drichel D, Becker T. Integrated Genome-Wide Pathway Association Analysis with INTERSNP. Hum Hered 2012; 73:63-72. [DOI: 10.1159/000336196] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 12/30/2011] [Indexed: 11/19/2022] Open
|
36
|
Gui H, Li M, Sham PC, Cherny SS. Comparisons of seven algorithms for pathway analysis using the WTCCC Crohn's Disease dataset. BMC Res Notes 2011; 4:386. [PMID: 21981765 PMCID: PMC3199264 DOI: 10.1186/1756-0500-4-386] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2011] [Accepted: 10/07/2011] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Though rooted in genomic expression studies, pathway analysis for genome-wide association studies (GWAS) has gained increasing popularity, since it has the potential to discover hidden disease pathogenic mechanisms by combining statistical methods with biological knowledge. Generally, algorithms or programs proposed recently can be categorized by different types of input data, null hypothesis or counts of analysis stages. Due to complexity caused by SNP, gene and pathway relationships, re-sampling strategies like permutation are always utilized to derive an empirical distribution for test statistics for evaluating the significance of candidate pathways. However, evaluation of these algorithms on real GWAS datasets and real biological pathway databases needs to be addressed before we apply them widely with confidence. FINDINGS Two algorithms which use summary statistics from GWAS as input were implemented in KGG, a novel and user-friendly software tool for GWAS pathway analysis. Comparisons of these two algorithms as well as the other five selected algorithms were conducted by analyzing the WTCCC Crohn's Disease dataset utilizing the MsigDB canonical pathways. As a result of using permutation to obtain empirical p-value, most of these methods could control Type I error rate well, although some are conservative. However, the methods varied greatly in terms of power and running time, with the PLINK truncated set-based test being the most powerful and KGG being the fastest. CONCLUSIONS Raw data-based algorithms, such as those implemented in PLINK, are preferable for GWAS pathway analysis as long as computational capacity is available. It may be worthwhile to apply two or more pathway analysis algorithms on the same GWAS dataset, since the methods differ greatly in their outputs and might provide complementary findings for the studied complex disease.
Collapse
Affiliation(s)
- Hongsheng Gui
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
| | | | | | | |
Collapse
|
37
|
Wang L, Jia P, Wolfinger RD, Chen X, Zhao Z. Gene set analysis of genome-wide association studies: methodological issues and perspectives. Genomics 2011; 98:1-8. [PMID: 21565265 PMCID: PMC3852939 DOI: 10.1016/j.ygeno.2011.04.006] [Citation(s) in RCA: 164] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2010] [Revised: 03/02/2011] [Accepted: 04/15/2011] [Indexed: 12/25/2022]
Abstract
Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis.
Collapse
Affiliation(s)
- Lily Wang
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | | | - Xi Chen
- Division of Cancer Biostatistics, Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| |
Collapse
|