51
|
Alemu EY, Carl JW, Corrada Bravo H, Hannenhalli S. Determinants of expression variability. Nucleic Acids Res 2014; 42:3503-14. [PMID: 24435799 PMCID: PMC3973347 DOI: 10.1093/nar/gkt1364] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The amount of tissue-specific expression variability (EV) across individuals is an essential characteristic of a gene and believed to have evolved, in part, under functional constraints. However, the determinants and functional implications of EV are only beginning to be investigated. Our analyses based on multiple expression profiles in 41 primary human tissues show that a gene’s EV is significantly correlated with a number of features pertaining to the genomic, epigenomic, regulatory, polymorphic, functional, structural and network characteristics of the gene. We found that (i) EV of a gene is encoded, in part, by its genomic context and is further influenced by the epigenome; (ii) strong promoters induce less variable expression; (iii) less variable gene loci evolve under purifying selection against copy number polymorphisms; (iv) genes that encode inherently disordered or highly interacting proteins exhibit lower variability; and (v) genes with less variable expression are enriched for house-keeping functions, while genes with highly variable expression tend to function in development and extra-cellular response and are associated with human diseases. Thus, our analysis reveals a number of potential mediators as well as functional and evolutionary correlates of EV, and provides new insights into the inherent variability in eukaryotic gene expression.
Collapse
Affiliation(s)
- Elfalem Y Alemu
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | | | | | | |
Collapse
|
52
|
Yang S, Liu Y, Jiang N, Chen J, Leach L, Luo Z, Wang M. Genome-wide eQTLs and heritability for gene expression traits in unrelated individuals. BMC Genomics 2014; 15:13. [PMID: 24405759 PMCID: PMC4028055 DOI: 10.1186/1471-2164-15-13] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Accepted: 11/22/2013] [Indexed: 11/18/2022] Open
Abstract
Background While the possible sources underlying the so-called ‘missing heritability’ evident in current genome-wide association studies (GWAS) of complex traits have been actively pursued in recent years, resolving this mystery remains a challenging task. Studying heritability of genome-wide gene expression traits can shed light on the goal of understanding the relationship between phenotype and genotype. Here we used microarray gene expression measurements of lymphoblastoid cell lines and genome-wide SNP genotype data from 210 HapMap individuals to examine the heritability of gene expression traits. Results Heritability levels for expression of 10,720 genes were estimated by applying variance component model analyses and 1,043 expression quantitative loci (eQTLs) were detected. Our results indicate that gene expression traits display a bimodal distribution of heritability, one peak close to 0% and the other summit approaching 100%. Such a pattern of the within-population variability of gene expression heritability is common among different HapMap populations of unrelated individuals but different from that obtained in the CEU and YRI trio samples. Higher heritability levels are shown by housekeeping genes and genes associated with cis eQTLs. Both cis and trans eQTLs make comparable cumulative contributions to the heritability. Finally, we modelled gene-gene interactions (epistasis) for genes with multiple eQTLs and revealed that epistasis was not prevailing in all genes but made a substantial contribution in explaining total heritability for some genes analysed. Conclusions We utilised a mixed effect model analysis for estimating genetic components from population based samples. On basis of analyses of genome-wide gene expression from four HapMap populations, we demonstrated detailed exploitation of the distribution of genetic heritabilities for expression traits from different populations, and highlighted the importance of studying interaction at the gene expression level as an important source of variation underlying missing heritability. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-13) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | - Zewei Luo
- Department of Biostatistics and Computational Biology, School of Life Sciences, Laboratory of Population & Quantitative Genetics, State Key Laboratory of Genetic Engineering, Fudan University, Shanghai 200433, China.
| | | |
Collapse
|
53
|
Roden DL, Sewell GW, Lobley A, Levine AP, Smith AM, Segal AW. ZODET: software for the identification, analysis and visualisation of outlier genes in microarray expression data. PLoS One 2014; 9:e81123. [PMID: 24416128 PMCID: PMC3885386 DOI: 10.1371/journal.pone.0081123] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Accepted: 10/09/2013] [Indexed: 11/18/2022] Open
Abstract
SUMMARY Complex human diseases can show significant heterogeneity between patients with the same phenotypic disorder. An outlier detection strategy was developed to identify variants at the level of gene transcription that are of potential biological and phenotypic importance. Here we describe a graphical software package (z-score outlier detection (ZODET)) that enables identification and visualisation of gross abnormalities in gene expression (outliers) in individuals, using whole genome microarray data. Mean and standard deviation of expression in a healthy control cohort is used to detect both over and under-expressed probes in individual test subjects. We compared the potential of ZODET to detect outlier genes in gene expression datasets with a previously described statistical method, gene tissue index (GTI), using a simulated expression dataset and a publicly available monocyte-derived macrophage microarray dataset. Taken together, these results support ZODET as a novel approach to identify outlier genes of potential pathogenic relevance in complex human diseases. The algorithm is implemented using R packages and Java. AVAILABILITY The software is freely available from http://www.ucl.ac.uk/medicine/molecular-medicine/publications/microarray-outlier-analysis.
Collapse
Affiliation(s)
- Daniel L. Roden
- Division of Medicine, University College London, London, United Kingdom
- * E-mail:
| | - Gavin W. Sewell
- Division of Medicine, University College London, London, United Kingdom
| | - Anna Lobley
- Division of Medicine, University College London, London, United Kingdom
| | - Adam P. Levine
- Division of Medicine, University College London, London, United Kingdom
| | - Andrew M. Smith
- Division of Medicine, University College London, London, United Kingdom
| | - Anthony W. Segal
- Division of Medicine, University College London, London, United Kingdom
| |
Collapse
|
54
|
Papakostas S, Vasemägi A, Himberg M, Primmer CR. Proteome variance differences within populations of European whitefish (Coregonus lavaretus) originating from contrasting salinity environments. J Proteomics 2014; 105:144-50. [PMID: 24406297 DOI: 10.1016/j.jprot.2013.12.019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Accepted: 12/22/2013] [Indexed: 01/09/2023]
Abstract
UNLABELLED Variation in gene expression is an important component of the phenotypic differences observed in nature. Gene expression variance across biological groups and environmental conditions has been studied extensively and has revealed specific genes and molecular mechanisms of interest. However, little is known regarding the importance of within-population gene expression variation to environmental adaptation. To address this issue, we quantified the proteomes of individuals of European whitefish (Coregonus lavaretus) from populations that have previously been shown to have adapted during early development to freshwater and brackishwater salinity environments. Using MS-based label-free proteomics, we studied 955 proteins in eight hatch-stage fish embryos from each population that had been reared in either freshwater or brackishwater salinity conditions. By comparing the levels of within-population protein expression variance over individuals and per protein between populations, we found that fish embryos from the population less affected by salinity level had also markedly higher levels of expression variance. Gene Ontologies and molecular pathways associated with osmoregulation showed the most significant difference of within-population proteome variance between populations. Several new candidate genes for salinity adaptation were identified, emphasising the added value of combining assessments of within-population gene expression variation with standard gene expression analysis practices for better understanding the mechanisms of environmental adaptation. BIOLOGICAL SIGNIFICANCE We demonstrate the benefits of studying within-population gene expression variance together with more typical methods of gene expression profiling. Proteome variance differences within European whitefish populations originating from different salinity environments allowed us to identify several new candidate genes for salinity adaptation in teleost fish and generate many further hypotheses to be tested. This article is part of a Special Issue entitled: Proteomics of non-model organisms.
Collapse
Affiliation(s)
- Spiros Papakostas
- Division of Genetics and Physiology, Department of Biology, University of Turku, 20014, Turku, Finland
| | - Anti Vasemägi
- Division of Genetics and Physiology, Department of Biology, University of Turku, 20014, Turku, Finland; Department of Aquaculture, Institute of Veterinary Medicine and Animal Science, Estonian University of Life Sciences, 51014 Tartu, Estonia
| | - Mikael Himberg
- Laboratory of Aquatic Pathobiology, Åbo Academy University, 20520, Turku, Finland
| | - Craig R Primmer
- Division of Genetics and Physiology, Department of Biology, University of Turku, 20014, Turku, Finland.
| |
Collapse
|
55
|
Li JW, Lai KP, Ching AKK, Chan TF. Transcriptome sequencing of Chinese and Caucasian population identifies ethnic-associated differential transcript abundance of heterogeneous nuclear ribonucleoprotein K (hnRNPK). Genomics 2013; 103:56-64. [PMID: 24373910 DOI: 10.1016/j.ygeno.2013.12.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Revised: 12/06/2013] [Accepted: 12/18/2013] [Indexed: 01/22/2023]
Abstract
Gene expression variations (GEV) among different ethnic groups have been a subject matter for extensive study. Relatively less known is the extent of alternative splicing variations (ASV) in the context of ethnicity. We conducted a transcriptome sequencing study of 20 lymphoblastoid cell lines obtained from Caucasian and Han Chinese, and identified known genes that exhibit differential isoform abundance between the two ethnic groups. Among them hnRNPK, a co-factor of p53 (TP53), could be further replicated in a 39-sample cohort with TaqMan assay. Although within-population novel splice variants are common, inter-population novel splice variants are rare. We further analyzed 5.63 billion sequencing reads retrieved from the NCBI Sequence Read Archive and identified potential ethnic-specific transcribed regions.
Collapse
Affiliation(s)
- Jing-Woei Li
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong; Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, Hong Kong; Department of Anatomical and Cellular Pathology, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | - Keng-Po Lai
- Department of Anatomical and Cellular Pathology, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | - Arthur K K Ching
- Department of Anatomical and Cellular Pathology, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | - Ting-Fung Chan
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong; Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, Hong Kong.
| |
Collapse
|
56
|
Geographical, environmental and pathophysiological influences on the human blood transcriptome. CURRENT GENETIC MEDICINE REPORTS 2013; 1:203-211. [PMID: 25830076 DOI: 10.1007/s40142-013-0028-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Gene expression variation provides a read-out of both genetic and environmental influences on gene activity. Geographical, genomic and sociogenomic studies have highlighted how life circumstances of an individual modify the expression of hundreds and in some cases thousands of genes in a co-ordinated manner. This review places such results in the context of a conserved set of 90 transcripts known as Blood Informative Transcripts (BIT) that capture the major conserved components of variation in the peripheral blood transcriptome. Pathophysiological states are also shown to associate with the perturbation of transcript abundance along the major axes. Discussion of false negative rates leads us to argue that simple significance thresholds provide a biased perspective on assessment of differential expression that may cloud the interpretation of studies with small sample sizes.
Collapse
|
57
|
Lukiw WJ. Variability in micro RNA (miRNA) abundance, speciation and complexity amongst different human populations and potential relevance to Alzheimer's disease (AD). Front Cell Neurosci 2013; 7:133. [PMID: 23986657 PMCID: PMC3753559 DOI: 10.3389/fncel.2013.00133] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 08/06/2013] [Indexed: 12/23/2022] Open
Affiliation(s)
- Walter J Lukiw
- Department of Neurology, Neuroscience and Ophthalmology, LSU Neuroscience Center, Louisiana State University Health Sciences Center New Orleans, LA, USA
| |
Collapse
|
58
|
Hicks C, Miele L, Koganti T, Young-Gaylor L, Rogers D, Vijayakumar V, Megason G. Analysis of Patterns of Gene Expression Variation within and between Ethnic Populations in Pediatric B-ALL. Cancer Inform 2013; 12:155-73. [PMID: 24023509 PMCID: PMC3762614 DOI: 10.4137/cin.s11831] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
B-Precursor acute lymphoblastic leukemia (B-ALL) is the most common childhood cancer. Although 80% of B-ALL patients are able to be cured, significant challenges persist. Significant disparities in clinical outcomes and mortality rates exist between racial/ethnic populations. The objective of this study was to determine whether gene expression levels significantly differ between ethnic populations. We compared gene expression levels between four ethnic populations (Whites, Blacks, Hispanics, and Asians) in the United States. Additionally, we performed network and pathway analysis to identify gene networks and pathways. Gene expression data involved 198 samples distributed as follows: 126 Whites, 51 Hispanics, 13 Blacks, and 8 Asians. We identified 300 highly significantly (P < 0.001) differentially expressed genes between the four ethnic populations. Among the identified genes included the genes PHF6, BRD3, CRLF2, and RNF135 which have been implicated in pediatric B-ALL. We identified key pathways implicated in B-ALL including the PDGF, PI3/AKT, ERBB2-ERBB3, and IL-15 signaling pathways.
Collapse
Affiliation(s)
- Chindo Hicks
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS. ; Cancer Institute, University of Mississippi Medical Center, Jackson, MS. ; Chindren's Cancer Center University of Mississippi Medical Center, Jackson, MS
| | | | | | | | | | | | | |
Collapse
|
59
|
Variation and genetic control of protein abundance in humans. Nature 2013; 499:79-82. [PMID: 23676674 PMCID: PMC3789121 DOI: 10.1038/nature12223] [Citation(s) in RCA: 263] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 04/26/2013] [Indexed: 12/21/2022]
Abstract
Gene expression differs among both individuals and populations and is thought to be a major determinant of phenotypic variation. Although variation and genetic loci responsible for RNA expression levels have been analyzed extensively in human populations1–5, our knowledge is limited regarding the differences in human protein abundance and their genetic basis. Variation in mRNA expression is not a perfect surrogate for protein expression because the latter is influenced by a battery of post-transcriptional regulatory mechanisms, and, empirically, the correlation between protein and mRNA levels is generally modest6,7. Here we used isobaric tandem mass tag (TMT)-based quantitative mass spectrometry to determine relative protein levels of 5953 genes in lymphoblastoid cell lines (LCLs) from 95 diverse individuals genotyped in the HapMap Project8,9. We found that protein levels are heritable molecular phenotypes that exhibit considerable variation between individuals, populations, and sexes. Levels of specific sets of proteins involved in the same biological process co-vary among individuals, indicating that these processes are tightly regulated at the protein level. We identified cis-pQTLs (protein quantitative trait loci), including variants not detected by previous transcriptome studies. This study demonstrates the feasibility of high throughput human proteome quantification which, when integrated with DNA variation and transcriptome information, adds a new dimension to the characterization of gene expression regulation.
Collapse
|
60
|
Manor O, Segal E. Robust prediction of expression differences among human individuals using only genotype information. PLoS Genet 2013; 9:e1003396. [PMID: 23555302 PMCID: PMC3610805 DOI: 10.1371/journal.pgen.1003396] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 01/24/2013] [Indexed: 11/25/2022] Open
Abstract
Many genetic variants that are significantly correlated to gene expression changes across human individuals have been identified, but the ability of these variants to predict expression of unseen individuals has rarely been evaluated. Here, we devise an algorithm that, given training expression and genotype data for a set of individuals, predicts the expression of genes of unseen test individuals given only their genotype in the local genomic vicinity of the predicted gene. Notably, the resulting predictions are remarkably robust in that they agree well between the training and test sets, even when the training and test sets consist of individuals from distinct populations. Thus, although the overall number of genes that can be predicted is relatively small, as expected from our choice to ignore effects such as environmental factors and trans sequence variation, the robust nature of the predictions means that the identity and quantitative degree to which genes can be predicted is known in advance. We also present an extension that incorporates heterogeneous types of genomic annotations to differentially weigh the importance of the various genetic variants, and we show that assigning higher weights to variants with particular annotations such as proximity to genes and high regional G/C content can further improve the predictions. Finally, genes that are successfully predicted have, on average, higher expression and more variability across individuals, providing insight into the characteristics of the types of genes that can be predicted from their cis genetic variation. Variation in gene expression across different individuals has been found to play a role in susceptibility to different diseases. In addition, many genetic variants that are linked to changes in expression have been found to date. However, their joint ability to accurately predict these changes is not well understood and has rarely been evaluated. Here, we devise a method that uses multiple genetic variants to explain the variation in expression of genes across individuals. One important aspect of our method is its robustness, in that our predictions agree well between training and test sets. Thus, although the number of genes that could be explained is relatively small, the identity and quantitative degree to which genes can be predicted is known in advance. We also present an extension to our method that integrates different genomic annotations such as location of the genetic variant or its context to differentially weigh the genetic variants in our model and improve predictions. Finally, genes that are successfully predicted have, on average, higher expression and more variability across individuals, providing insight into the characteristics of the types of genes that can be predicted by our method.
Collapse
Affiliation(s)
- Ohad Manor
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
- * E-mail:
| |
Collapse
|
61
|
Wu X, Zhang D, Li G. Insights into the regulation of human CNV-miRNAs from the view of their target genes. BMC Genomics 2012; 13:707. [PMID: 23244579 PMCID: PMC3582595 DOI: 10.1186/1471-2164-13-707] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2012] [Accepted: 12/07/2012] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND microRNAs (miRNAs) represent a class of small (typically 22 nucleotides in length) non-coding RNAs that can degrade their target mRNAs or block their translation. Recent research showed that copy number alterations of miRNAs and their target genes are highly prevalent in cancers; however, the evolutionary and biological functions of naturally existing copy number variable miRNAs (CNV-miRNAs) among individuals have not been studied extensively throughout the genome. RESULTS In this study, we comprehensively analyzed the properties of genes regulated by CNV-miRNAs, and found that CNV-miRNAs tend to target a higher average number of genes and prefer to synergistically regulate the same genes; further, the targets of CNV-miRNAs tend to have higher variability of expression within and between populations. Finally, we found the targets of CNV-miRNAs are more likely to be differentially expressed among tissues and developmental stages, and participate in a wide range of cellular responses. CONCLUSIONS Our analyses of CNV-miRNAs provide new insights into the impact of copy number variations on miRNA-mediated post-transcriptional networks. The deeper interpretation of patterns of gene expression variation and the functional characterization of CNV-miRNAs will help to broaden the current understanding of the molecular basis of human phenotypic diversity.
Collapse
Affiliation(s)
- Xudong Wu
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Rd,, Dalian, 116023, PR China
| | | | | |
Collapse
|
62
|
Green VA, Arbuthnot P, Weinberg MS. Impact of sustained RNAi-mediated suppression of cellular cofactor Tat-SF1 on HIV-1 replication in CD4+ T cells. Virol J 2012; 9:272. [PMID: 23153325 PMCID: PMC3511259 DOI: 10.1186/1743-422x-9-272] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 10/18/2012] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Conventional anti-HIV drug regimens targeting viral enzymes are plagued by the emergence of drug resistance. There is interest in targeting HIV-dependency factors (HDFs), host proteins that the virus requires for replication, as drugs targeting their function may prove protective. Reporter cell lines provide a rapid and convenient method of identifying putative HDFs, but this approach may lead to misleading results and a failure to detect subtle detrimental effects on cells that result from HDF suppression. Thus, alternative methods for HDF validation are required. Cellular Tat-SF1 has long been ascribed a cofactor role in Tat-dependent transactivation of viral transcription elongation. Here we employ sustained RNAi-mediated suppression of Tat-SF1 to validate its requirement for HIV-1 replication in a CD4+ T cell-derived line and its potential as a therapeutic target. RESULTS shRNA-mediated suppression of Tat-SF1 reduced HIV-1 replication and infectious particle production from TZM-bl reporter cells. This effect was not a result of increased apoptosis, loss of cell viability or an immune response. To validate its requirement for HIV-1 replication in a more relevant cell line, CD4+ SupT1 cell populations were generated that stably expressed shRNAs. HIV-1 replication was significantly reduced for two weeks (~65%) in cells with depleted Tat-SF1, although the inhibition of viral replication was moderate when compared to SupT1 cells expressing a shRNA targeting the integration cofactor LEDGF/p75. Tat-SF1 suppression was attenuated over time, resulting from decreased shRNA guide strand expression, suggesting that there is a selective pressure to restore Tat-SF1 levels. CONCLUSIONS This study validates Tat-SF1 as an HDF in CD4+ T cell-derived SupT1 cells. However, our findings also suggest that Tat-SF1 is not a critical cofactor required for virus replication and its suppression may affect cell growth. Therefore, this study demonstrates the importance of examining HIV-1 replication kinetics and cytotoxicity in cells with sustained HDF suppression to validate their therapeutic potential as targets.
Collapse
Affiliation(s)
- Victoria A Green
- Antiviral Gene Therapy Research Unit, Health Sciences Faculty, University of the Witwatersrand, Johannesburg, South Africa
| | | | | |
Collapse
|
63
|
Abstract
Expression quantitative trait loci (eQTL) studies have established convincing relationships between genetic variants and gene expression. Most of these studies focused on the mean of gene expression level, but not the variance of gene expression level (i.e., gene expression variability). In the present study, we systematically explore genome-wide association between genetic variants and gene expression variability in humans. We adapt the double generalized linear model (dglm) to simultaneously fit the means and the variances of gene expression among the three possible genotypes of a biallelic SNP. The genomic loci showing significant association between the variances of gene expression and the genotypes are termed expression variability QTL (evQTL). Using a data set of gene expression in lymphoblastoid cell lines (LCLs) derived from 210 HapMap individuals, we identify cis-acting evQTL involving 218 distinct genes, among which 8 genes, ADCY1, CTNNA2, DAAM2, FERMT2, IL6, PLOD2, SNX7, and TNFRSF11B, are cross-validated using an extra expression data set of the same LCLs. We also identify ∼300 trans-acting evQTL between >13,000 common SNPs and 500 randomly selected representative genes. We employ two distinct scenarios, emphasizing single-SNP and multiple-SNP effects on expression variability, to explain the formation of evQTL. We argue that detecting evQTL may represent a novel method for effectively screening for genetic interactions, especially when the multiple-SNP influence on expression variability is implied. The implication of our results for revealing genetic mechanisms of gene expression variability is discussed.
Collapse
|
64
|
Bravo HC, Pihur V, McCall M, Irizarry RA, Leek JT. Gene expression anti-profiles as a basis for accurate universal cancer signatures. BMC Bioinformatics 2012; 13:272. [PMID: 23088656 PMCID: PMC3487959 DOI: 10.1186/1471-2105-13-272] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 10/17/2012] [Indexed: 11/10/2022] Open
Abstract
Background Early screening for cancer is arguably one of the greatest public health advances over the last fifty years. However, many cancer screening tests are invasive (digital rectal exams), expensive (mammograms, imaging) or both (colonoscopies). This has spurred growing interest in developing genomic signatures that can be used for cancer diagnosis and prognosis. However, progress has been slowed by heterogeneity in cancer profiles and the lack of effective computational prediction tools for this type of data. Results We developed anti-profiles as a first step towards translating experimental findings suggesting that stochastic across-sample hyper-variability in the expression of specific genes is a stable and general property of cancer into predictive and diagnostic signatures. Using single-chip microarray normalization and quality assessment methods, we developed an anti-profile for colon cancer in tissue biopsy samples. To demonstrate the translational potential of our findings, we applied the signature developed in the tissue samples, without any further retraining or normalization, to screen patients for colon cancer based on genomic measurements from peripheral blood in an independent study (AUC of 0.89). This method achieved higher accuracy than the signature underlying commercially available peripheral blood screening tests for colon cancer (AUC of 0.81). We also confirmed the existence of hyper-variable genes across a range of cancer types and found that a significant proportion of tissue-specific genes are hyper-variable in cancer. Based on these observations, we developed a universal cancer anti-profile that accurately distinguishes cancer from normal regardless of tissue type (ten-fold cross-validation AUC > 0.92). Conclusions We have introduced anti-profiles as a new approach for developing cancer genomic signatures that specifically takes advantage of gene expression heterogeneity. We have demonstrated that anti-profiles can be successfully applied to develop peripheral-blood based diagnostics for cancer and used anti-profiles to develop a highly accurate universal cancer signature. By using single-chip normalization and quality assessment methods, no further retraining of signatures developed by the anti-profile approach would be required before their application in clinical settings. Our results suggest that anti-profiles may be used to develop inexpensive and non-invasive universal cancer screening tests.
Collapse
Affiliation(s)
- Héctor Corrada Bravo
- Department of Computer Science, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA.
| | | | | | | | | |
Collapse
|
65
|
Intra- and inter-individual variance of gene expression in clinical studies. PLoS One 2012; 7:e38650. [PMID: 22723873 PMCID: PMC3377725 DOI: 10.1371/journal.pone.0038650] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2011] [Accepted: 05/11/2012] [Indexed: 01/29/2023] Open
Abstract
Background Variance in microarray studies has been widely discussed as a critical topic on the identification of differentially expressed genes; however, few studies have addressed the influence of estimating variance. Methodology/Principal Findings To break intra- and inter-individual variance in clinical studies down to three levels–technical, anatomic, and individual–we designed experiments and algorithms to investigate three forms of variances. As a case study, a group of “inter-individual variable genes” were identified to exemplify the influence of underestimated variance on the statistical and biological aspects in identification of differentially expressed genes. Our results showed that inadequate estimation of variance inevitably led to the inclusion of non-statistically significant genes into those listed as significant, thereby interfering with the correct prediction of biological functions. Applying a higher cutoff value of fold changes in the selection of significant genes reduces/eliminates the effects of underestimated variance. Conclusions/Significance Our data demonstrated that correct variance evaluation is critical in selecting significant genes. If the degree of variance is underestimated, “noisy” genes are falsely identified as differentially expressed genes. These genes are the noise associated with biological interpretation, reducing the biological significance of the gene set. Our results also indicate that applying a higher number of fold change as the selection criteria reduces/eliminates the differences between distinct estimations of variance.
Collapse
|
66
|
Gonzàlez-Porta M, Calvo M, Sammeth M, Guigó R. Estimation of alternative splicing variability in human populations. Genome Res 2011; 22:528-38. [PMID: 22113879 DOI: 10.1101/gr.121947.111] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
DNA arrays have been widely used to perform transcriptome-wide analysis of gene expression, and many methods have been developed to measure gene expression variability and to compare gene expression between conditions. Because RNA-seq is also becoming increasingly popular for transcriptome characterization, the possibility exists for further quantification of individual alternative transcript isoforms, and therefore for estimating the relative ratios of alternative splice forms within a given gene. Changes in splicing ratios, even without changes in overall gene expression, may have important phenotypic effects. Here we have developed statistical methodology to measure variability in splicing ratios within conditions, to compare it between conditions, and to identify genes with condition-specific splicing ratios. Furthermore, we have developed methodology to deconvolute the relative contribution of variability in gene expression versus variability in splicing ratios to the overall variability of transcript abundances. As a proof of concept, we have applied this methodology to estimates of transcript abundances obtained from RNA-seq experiments in lymphoblastoid cells from Caucasian and Yoruban individuals. We have found that protein-coding genes exhibit low splicing variability within populations, with many genes exhibiting constant ratios across individuals. When comparing these two populations, we have found that up to 10% of the studied protein-coding genes exhibit population-specific splicing ratios. We estimate that ~60% of the total variability observed in the abundance of transcript isoforms can be explained by variability in transcription. A large fraction of the remaining variability can likely result from variability in splicing. Finally, we also detected that variability in splicing is uncommon without variability in transcription.
Collapse
Affiliation(s)
- Mar Gonzàlez-Porta
- Bioinformatics and Genomics, Center for Genomic Regulation, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | | | | | | |
Collapse
|
67
|
Abstract
PURPOSE OF REVIEW The aim is to review recent literature on 'test-and-treat', a prevention strategy that promotes high levels of HIV testing and initiating antiretroviral therapy upon diagnosis, regardless of CD4 cell count. Antiretroviral therapy (ART) has been shown to dramatically reduce viral load which is strongly associated with the risk of transmission, therefore there is the potential to reduce HIV transmissions with ART. RECENT FINDINGS Recent papers from observational studies on heterosexual sero-discordant couples found an overall rate of transmission of HIV-1 from ART-treated patients of 0.46 per 100 person-years, confirming the possibility of using ART as a prevention strategy. Several models have been used to predict the effect of this strategy and the potential risks of it. Randomized controlled trials are currently ongoing investigating the effect of ART on reducing infectiousness and the feasibility of this policy. SUMMARY More precise estimations of the transmission risk under virally suppressive ART (especially in MSM) and of change in sex risk behaviour at diagnosis and at start of ART are needed. Further, the benefit to individual health of very early ART initiation and the feasibility of this policy need to be evaluated. Achieving very high levels of testing should be a high priority due to the benefits of initiating ART in all those who are in need (CD4 cell count < 350 cells/μl) and potential benefits on incidence due to reductions in risk behaviour in those diagnosed. Use of ART immediately at diagnosis in those with high CD4 cell counts should await results from further studies.
Collapse
|
68
|
Chen R, Davydov EV, Sirota M, Butte AJ. Non-synonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association. PLoS One 2010; 5:e13574. [PMID: 21042586 PMCID: PMC2962641 DOI: 10.1371/journal.pone.0013574] [Citation(s) in RCA: 133] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2010] [Accepted: 09/30/2010] [Indexed: 12/21/2022] Open
Abstract
Many DNA variants have been identified on more than 300 diseases and traits using Genome-Wide Association Studies (GWASs). Some have been validated using deep sequencing, but many fewer have been validated functionally, primarily focused on non-synonymous coding SNPs (nsSNPs). It is an open question whether synonymous coding SNPs (sSNPs) and other non-coding SNPs can lead to as high odds ratios as nsSNPs. We conducted a broad survey across 21,429 disease-SNP associations curated from 2,113 publications studying human genetic association, and found that nsSNPs and sSNPs shared similar likelihood and effect size for disease association. The enrichment of disease-associated SNPs around the 80th base in the first introns might provide an effective way to prioritize intronic SNPs for functional studies. We further found that the likelihood of disease association was positively associated with the effect size across different types of SNPs, and SNPs in the 3′untranslated regions, such as the microRNA binding sites, might be under-investigated. Our results suggest that sSNPs are just as likely to be involved in disease mechanisms, so we recommend that sSNPs discovered from GWAS should also be examined with functional studies.
Collapse
Affiliation(s)
- Rong Chen
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America.
| | | | | | | |
Collapse
|