1
|
Healthspan pathway maps in C. elegans and humans highlight transcription, proliferation/biosynthesis and lipids. Aging (Albany NY) 2020; 12:12534-12581. [PMID: 32634117 PMCID: PMC7377848 DOI: 10.18632/aging.103514] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 06/04/2020] [Indexed: 12/17/2022]
Abstract
The molecular basis of aging and of aging-associated diseases is being unraveled at an increasing pace. An extended healthspan, and not merely an extension of lifespan, has become the aim of medical practice. Here, we define health based on the absence of diseases and dysfunctions. Based on an extensive review of the literature, in particular for humans and C. elegans, we compile a list of features of health and of the genes associated with them. These genes may or may not be associated with survival/lifespan. In turn, survival/lifespan genes that are not known to be directly associated with health are not considered. Clusters of these genes based on molecular interaction data give rise to maps of healthspan pathways for humans and for C. elegans. Overlaying healthspan-related gene expression data onto the healthspan pathway maps, we observe the downregulation of (pro-inflammatory) Notch signaling in humans and of proliferation in C. elegans. We identify transcription, proliferation/biosynthesis and lipids as a common theme on the annotation level, and proliferation-related kinases on the gene/protein level. Our literature-based data corpus, including visualization, should be seen as a pilot investigation of the molecular underpinnings of health in two different species. Web address: http://pathways.h2020awe.eu.
Collapse
|
2
|
Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges. ENTROPY 2020; 22:e22040427. [PMID: 33286201 PMCID: PMC7516904 DOI: 10.3390/e22040427] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 03/18/2020] [Accepted: 04/03/2020] [Indexed: 12/22/2022]
Abstract
Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors.
Collapse
|
3
|
Kaur T, Thakur K, Singh J, Kamboj SS, Kaur M. Identification of functional SNPs in human LGALS3 gene by in silico analyses. EGYPTIAN JOURNAL OF MEDICAL HUMAN GENETICS 2017. [DOI: 10.1016/j.ejmhg.2017.02.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
|
4
|
SNPer: an R library for quantitative variant analysis on single nucleotide polymorphisms among influenza virus populations. PLoS One 2015; 10:e0122812. [PMID: 25876137 PMCID: PMC4395159 DOI: 10.1371/journal.pone.0122812] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Accepted: 02/14/2015] [Indexed: 01/14/2023] Open
Abstract
Influenza virus (IFV) can evolve rapidly leading to genetic drifts and shifts resulting in human and animal influenza epidemics and pandemics. The genetic shift that gave rise to the 2009 influenza A/H1N1 pandemic originated from a triple gene reassortment of avian, swine and human IFVs. More minor genetic alterations in genetic drift can lead to influenza drug resistance such as the H274Y mutation associated with oseltamivir resistance. Hence, a rapid tool to detect IFV mutations and the potential emergence of new virulent strains can better prepare us for seasonal influenza outbreaks as well as potential pandemics. Furthermore, identification of specific mutations by closely examining single nucleotide polymorphisms (SNPs) in IFV sequences is essential to classify potential genetic markers associated with potentially dangerous IFV phenotypes. In this study, we developed a novel R library called "SNPer" to analyze quantitative variants in SNPs among IFV subpopulations. The computational SNPer program was applied to three different subpopulations of published IFV genomic information. SNPer queried SNPs data and grouped the SNPs into (1) universal SNPs, (2) likely common SNPs, and (3) unique SNPs. SNPer outperformed manual visualization in terms of time and labor. SNPer took only three seconds with no errors in SNP comparison events compared with 40 hours with errors using manual visualization. The SNPer tool can accelerate the capacity to capture new and potentially dangerous IFV strains to mitigate future influenza outbreaks.
Collapse
|
5
|
Gonzalez-Perez A, Deu-Pons J, Lopez-Bigas N. Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation. Genome Med 2012. [PMID: 23181723 PMCID: PMC4064314 DOI: 10.1186/gm390] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
High-throughput prioritization of cancer-causing mutations (drivers) is a key challenge of cancer genome projects, due to the number of somatic variants detected in tumors. One important step in this task is to assess the functional impact of tumor somatic mutations. A number of computational methods have been employed for that purpose, although most were originally developed to distinguish disease-related nonsynonymous single nucleotide variants (nsSNVs) from polymorphisms. Our new method, transformed Functional Impact score for Cancer (transFIC), improves the assessment of the functional impact of tumor nsSNVs by taking into account the baseline tolerance of genes to functional variants.
Collapse
Affiliation(s)
- Abel Gonzalez-Perez
- Research Programme on Biomedical Informatics - GRIB. Universitat Pompeu Fabra - UPF, Hospital del Mar Medical Research Institute - IMIM. Parc de Recerca Biomèdica de Barcelona (PRBB). Dr. Aiguader, 88, E-08003 Barcelona, Spain
| | - Jordi Deu-Pons
- Research Programme on Biomedical Informatics - GRIB. Universitat Pompeu Fabra - UPF, Hospital del Mar Medical Research Institute - IMIM. Parc de Recerca Biomèdica de Barcelona (PRBB). Dr. Aiguader, 88, E-08003 Barcelona, Spain
| | - Nuria Lopez-Bigas
- Research Programme on Biomedical Informatics - GRIB. Universitat Pompeu Fabra - UPF, Hospital del Mar Medical Research Institute - IMIM. Parc de Recerca Biomèdica de Barcelona (PRBB). Dr. Aiguader, 88, E-08003 Barcelona, Spain ; Institució Catalana de Recerca i Estudis Avançats (ICREA). Passeig Lluís Companys, 23, E-08010, Barcelona, Spain
| |
Collapse
|
6
|
Wang L, Jia P, Wolfinger RD, Chen X, Zhao Z. Gene set analysis of genome-wide association studies: methodological issues and perspectives. Genomics 2011; 98:1-8. [PMID: 21565265 PMCID: PMC3852939 DOI: 10.1016/j.ygeno.2011.04.006] [Citation(s) in RCA: 164] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2010] [Revised: 03/02/2011] [Accepted: 04/15/2011] [Indexed: 12/25/2022]
Abstract
Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis.
Collapse
Affiliation(s)
- Lily Wang
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | | | - Xi Chen
- Division of Cancer Biostatistics, Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| |
Collapse
|
7
|
Wang Q, Zhao H, Pan Y. SNPknow: a web server for functional annotation of cattle SNP markers. CANADIAN JOURNAL OF ANIMAL SCIENCE 2011. [DOI: 10.4141/cjas2010-032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Wang, Q., Zhao, H. and Pan, Y. 2011. SNPknow: a web server for functional annotation of cattle SNP markers. Can. J. Anim. Sci. 91: 247–253. Single nucleotide polymorphisms (SNP) microarray technology provides new insights to identify the genetic factors associated with the traits of interest. To meet the immediate need for a framework of genome-wide association study (GWAS), we have developed SNPknow, a suite of CGI-based tools that provide enrichment analysis and functional annotation for cattle SNP markers and allow the users to navigate and analysis large sets of high-dimensional data from the gene ontology (GO) annotation systems. SNPknow is the only web server currently providing functional annotations of cattle SNP markers in three commercial platforms and dbSNP database. The web server may be particularly beneficial for the analysis of combining SNP association analysis with the gene set enrichment analysis and is freely available at http://klab.sjtu.edu.cn/SNPknow .
Collapse
Affiliation(s)
- Qishan Wang
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, P.R. China
- Shanghai Key Lab of Animal Biotechnology, Shanghai, 200240, P. R. China
| | - Hongbo Zhao
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, P.R. China
| | - Yuchun Pan
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, P.R. China
- Shanghai Key Lab of Animal Biotechnology, Shanghai, 200240, P. R. China
| |
Collapse
|
8
|
Chen X, Wang L, Hu B, Guo M, Barnard J, Zhu X. Pathway-based analysis for genome-wide association studies using supervised principal components. Genet Epidemiol 2011; 34:716-24. [PMID: 20842628 DOI: 10.1002/gepi.20532] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome-wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway-based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within-category selection to identify the most important SNPs within each gene set. The proposed model operates in a well-established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures, and we illustrate the SPCA method using the Wellcome Trust Case-Control Consortium Crohn Disease (CD) data set.
Collapse
Affiliation(s)
- Xi Chen
- Division of Cancer Biostatistics, Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA
| | | | | | | | | | | |
Collapse
|
9
|
Wang L, Jia P, Wolfinger RD, Chen X, Grayson BL, Aune TM, Zhao Z. An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies. ACTA ACUST UNITED AC 2011; 27:686-92. [PMID: 21266443 DOI: 10.1093/bioinformatics/btq728] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION In genome-wide association studies (GWAS) of complex diseases, genetic variants having real but weak associations often fail to be detected at the stringent genome-wide significance level. Pathway analysis, which tests disease association with combined association signals from a group of variants in the same pathway, has become increasingly popular. However, because of the complexities in genetic data and the large sample sizes in typical GWAS, pathway analysis remains to be challenging. We propose a new statistical model for pathway analysis of GWAS. This model includes a fixed effects component that models mean disease association for a group of genes, and a random effects component that models how each gene's association with disease varies about the gene group mean, thus belongs to the class of mixed effects models. RESULTS The proposed model is computationally efficient and uses only summary statistics. In addition, it corrects for the presence of overlapping genes and linkage disequilibrium (LD). Via simulated and real GWAS data, we showed our model improved power over currently available pathway analysis methods while preserving type I error rate. Furthermore, using the WTCCC Type 1 Diabetes (T1D) dataset, we demonstrated mixed model analysis identified meaningful biological processes that agreed well with previous reports on T1D. Therefore, the proposed methodology provides an efficient statistical modeling framework for systems analysis of GWAS. AVAILABILITY The software code for mixed models analysis is freely available at http://biostat.mc.vanderbilt.edu/LilyWang.
Collapse
Affiliation(s)
- Lily Wang
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA.
| | | | | | | | | | | | | |
Collapse
|
10
|
Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat 2009; 30:1237-44. [PMID: 19514061 DOI: 10.1002/humu.21047] [Citation(s) in RCA: 467] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Single nucleotide polymorphisms (SNPs) are the simplest and most frequent form of human DNA variation, also valuable as genetic markers of disease susceptibility. The most investigated SNPs are missense mutations resulting in residue substitutions in the protein. Here we propose SNPs&GO, an accurate method that, starting from a protein sequence, can predict whether a mutation is disease related or not by exploiting the protein functional annotation. The scoring efficiency of SNPs&GO is as high as 82%, with a Matthews correlation coefficient equal to 0.63 over a wide set of annotated nonsynonymous mutations in proteins, including 16,330 disease-related and 17,432 neutral polymorphisms. SNPs&GO collects in unique framework information derived from protein sequence, evolutionary information, and function as encoded in the Gene Ontology terms, and outperforms other available predictive methods.
Collapse
Affiliation(s)
- Remo Calabrese
- Laboratory of Biocomputing, CIRB/Department of Biology, University of Bologna, Bologna 40126, Italy
| | | | | | | | | |
Collapse
|
11
|
Taher L, Ovcharenko I. Variable locus length in the human genome leads to ascertainment bias in functional inference for non-coding elements. ACTA ACUST UNITED AC 2009; 25:578-84. [PMID: 19168912 PMCID: PMC2647827 DOI: 10.1093/bioinformatics/btp043] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
MOTIVATION Several functional gene annotation databases have been developed in the recent years, and are widely used to infer the biological function of gene sets, by scrutinizing the attributes that appear over- and underrepresented. However, this strategy is not directly applicable to the study of non-coding DNA, as the non-coding sequence span varies greatly among different gene loci in the human genome and longer loci have a higher likelihood of being selected purely by chance. Therefore, conclusions involving the function of non-coding elements that are drawn based on the annotation of neighboring genes are often biased. We assessed the systematic bias in several particular Gene Ontology (GO) categories using the standard hypergeometric test, by randomly sampling non-coding elements from the human genome and inferring their function based on the functional annotation of the closest genes. While no category is expected to occur significantly over- or underrepresented for a random selection of elements, categories such as 'cell adhesion', 'nervous system development' and 'transcription factor activities' appeared to be systematically overrepresented, while others such as 'olfactory receptor activity'-underrepresented. RESULTS Our results suggest that functional inference for non-coding elements using gene annotation databases requires a special correction. We introduce a set of correction coefficients for the probabilities of the GO categories that accounts for the variability in the length of the non-coding DNA across different loci and effectively eliminates the ascertainment bias from the functional characterization of non-coding elements. Our approach can be easily generalized to any other gene annotation database.
Collapse
Affiliation(s)
- Leila Taher
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | |
Collapse
|