51
|
Boyle EA, O'Roak BJ, Martin BK, Kumar A, Shendure J. MIPgen: optimized modeling and design of molecular inversion probes for targeted resequencing. ACTA ACUST UNITED AC 2014; 30:2670-2. [PMID: 24867941 DOI: 10.1093/bioinformatics/btu353] [Citation(s) in RCA: 116] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
UNLABELLED Molecular inversion probes (MIPs) enable cost-effective multiplex targeted gene resequencing in large cohorts. However, the design of individual MIPs is a critical parameter governing the performance of this technology with respect to capture uniformity and specificity. MIPgen is a user-friendly package that simplifies the process of designing custom MIP assays to arbitrary targets. New logistic and SVM-derived models enable in silico predictions of assay success, and assay redesign exhibits improved coverage uniformity relative to previous methods, which in turn improves the utility of MIPs for cost-effective targeted sequencing for candidate gene validation and for diagnostic sequencing in a clinical setting. AVAILABILITY AND IMPLEMENTATION MIPgen is implemented in C++. Source code and accompanying Python scripts are available at http://shendurelab.github.io/MIPGEN/.
Collapse
Affiliation(s)
- Evan A Boyle
- Department of Genome Sciences, University of Washington, Seattle, WA 98105 and Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA
| | - Brian J O'Roak
- Department of Genome Sciences, University of Washington, Seattle, WA 98105 and Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA 98105 and Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA
| | - Akash Kumar
- Department of Genome Sciences, University of Washington, Seattle, WA 98105 and Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98105 and Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA
| |
Collapse
|
52
|
Kukurba KR, Zhang R, Li X, Smith KS, Knowles DA, How Tan M, Piskol R, Lek M, Snyder M, MacArthur DG, Li JB, Montgomery SB. Allelic expression of deleterious protein-coding variants across human tissues. PLoS Genet 2014; 10:e1004304. [PMID: 24786518 PMCID: PMC4006732 DOI: 10.1371/journal.pgen.1004304] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 02/27/2014] [Indexed: 11/19/2022] Open
Abstract
Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants. Gene expression is a fundamental cellular process that contributes to phenotypic diversity. Gene expression can vary between alleles of an individual through differences in genomic imprinting or cis-acting regulatory variation. Distinguishing allelic activity is important for informing the abundance of altered mRNA and protein products. Advances in sequencing technologies allow us to quantify patterns of allele-specific expression (ASE) in different individuals and cell-types. Previous studies have identified patterns of ASE across human populations for single cell-types; however the degree of tissue-specificity of ASE has not been deeply characterized. In this study, we compare patterns of ASE across multiple tissues from a single individual using whole transcriptome sequencing (RNA-Seq) and a targeted, high-resolution assay (mmPCR-Seq). We detect patterns of ASE for rare deleterious and loss-of-function protein-coding variants, informing the frequency at which allelic expression could modify the functional impact of personal deleterious protein-coding across tissues. We demonstrate that these interactions occur for one third of such variants however large direction flips in allelic expression are infrequent.
Collapse
Affiliation(s)
- Kimberly R. Kukurba
- Department of Pathology, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Rui Zhang
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Xin Li
- Department of Pathology, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Kevin S. Smith
- Department of Pathology, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - David A. Knowles
- Department of Computer Science, Stanford University School of Medicine, Stanford, California, United States of America
| | - Meng How Tan
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Robert Piskol
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Monkol Lek
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Michael Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Daniel G. MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Jin Billy Li
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
- * E-mail: (JBL); (SBM)
| | - Stephen B. Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Computer Science, Stanford University School of Medicine, Stanford, California, United States of America
- * E-mail: (JBL); (SBM)
| |
Collapse
|
53
|
Battle A, Montgomery SB. Determining causality and consequence of expression quantitative trait loci. Hum Genet 2014; 133:727-35. [PMID: 24770875 DOI: 10.1007/s00439-014-1446-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Accepted: 04/09/2014] [Indexed: 12/18/2022]
Abstract
Expression quantitative trait loci (eQTLs) are currently the most abundant and systematically-surveyed class of functional consequence for genetic variation. Recent genetic studies of gene expression have identified thousands of eQTLs in diverse tissue types for the majority of human genes. Application of this large eQTL catalog provides an important resource for understanding the molecular basis of common genetic diseases. However, only now has both the availability of individuals with full genomes and corresponding advances in functional genomics provided the opportunity to dissect eQTLs to identify causal regulatory variants. Resolving the properties of such causal regulatory variants is improving understanding of the molecular mechanisms that influence traits and guiding the development of new genome-scale approaches to variant interpretation. In this review, we provide an overview of current computational and experimental methods for identifying causal regulatory variants and predicting their phenotypic consequences.
Collapse
Affiliation(s)
- A Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA,
| | | |
Collapse
|
54
|
Abstract
RNA sequencing (RNAseq) samples the majority of expressed genes infrequently, owing to the large size, complex splicing and wide dynamic range of eukaryotic transcriptomes. This results in sparse sequencing coverage that can hinder robust isoform assembly and quantification. RNA capture sequencing (CaptureSeq) addresses this challenge by using oligonucleotide probes to capture selected genes or regions of interest for targeted sequencing. Targeted RNAseq provides enhanced coverage for sensitive gene discovery, robust transcript assembly and accurate gene quantification. Here we describe a detailed protocol for all stages of RNA CaptureSeq, from initial probe design considerations and capture of targeted genes to final assembly and quantification of captured transcripts. Initial probe design and final analysis can take less than 1 d, whereas the central experimental capture stage requires ∼7 d.
Collapse
|
55
|
Lee JH, Daugharthy ER, Scheiman J, Kalhor R, Yang JL, Ferrante TC, Terry R, Jeanty SSF, Li C, Amamoto R, Peters DT, Turczyk BM, Marblestone AH, Inverso SA, Bernard A, Mali P, Rios X, Aach J, Church GM. Highly multiplexed subcellular RNA sequencing in situ. Science 2014; 343:1360-3. [PMID: 24578530 DOI: 10.1126/science.1250212] [Citation(s) in RCA: 649] [Impact Index Per Article: 64.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Understanding the spatial organization of gene expression with single-nucleotide resolution requires localizing the sequences of expressed RNA transcripts within a cell in situ. Here, we describe fluorescent in situ RNA sequencing (FISSEQ), in which stably cross-linked complementary DNA (cDNA) amplicons are sequenced within a biological sample. Using 30-base reads from 8102 genes in situ, we examined RNA expression and localization in human primary fibroblasts with a simulated wound-healing assay. FISSEQ is compatible with tissue sections and whole-mount embryos and reduces the limitations of optical resolution and noisy signals on single-molecule detection. Our platform enables massively parallel detection of genetic elements, including gene transcripts and molecular barcodes, and can be used to investigate cellular phenotype, gene regulation, and environment in situ.
Collapse
Affiliation(s)
- Je Hyuk Lee
- Wyss Institute, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
56
|
Abstract
The differential abundance of transcripts from alternative alleles of a gene, for example in a hybrid plant or an outbred natural population, can provide information about the nature of interindividual or interstrain variation in gene expression. Allele-specific expression (ASE) can result from epigenetic phenomena, such as imprinting (when the overexpressed allele is inherited consistently from one parent) or allele-specific chromatin modifications. Alternatively, DNA sequence variants in the promoter or within the transcribed region of a gene can affect the rate of transcription or the rate of decay of the transcript, respectively. The existence of this allelic variation and the insights it provides into the nature of the gene regulation are of significant interest. With the recent widespread availability of sequencing based transcriptomics, the power to detect ASE has increased; however, inference of ASE from transcriptome sequencing data is subject to several caveats and potential biases and the results need to be interpreted with care.
Collapse
Affiliation(s)
- Paul K Korir
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway (NUI Galway), Ireland
| | | |
Collapse
|
57
|
Guella I, Sequeira A, Rollins B, Morgan L, Myers RM, Watson SJ, Akil H, Bunney WE, DeLisi LE, Byerley W, Vawter MP. Evidence of allelic imbalance in the schizophrenia susceptibility gene ZNF804A in human dorsolateral prefrontal cortex. Schizophr Res 2014; 152:111-6. [PMID: 24315717 PMCID: PMC3947280 DOI: 10.1016/j.schres.2013.11.021] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2013] [Revised: 11/11/2013] [Accepted: 11/13/2013] [Indexed: 02/01/2023]
Abstract
The rs1344706, an intronic SNP within the zinc-finger protein 804A gene (ZNF804A), was identified as one of the most compelling risk SNPs for schizophrenia (SZ) and bipolar disorder (BD). It is however not clear by which molecular mechanisms ZNF804A increases disease risk. We evaluated the role of ZNF804A in SZ and BD by genotyping the originally associated rs1344706 SNP and an exonic SNP (rs12476147) located in exon four of ZNF804A in a sample of 422 SZ, 382 BD, and 507 controls from the isolated population of the Costa Rica Central Valley. We also investigated the rs1344706 SNP for allelic specific expression (ASE) imbalance in the dorsolateral prefrontal cortex (DLPFC) of 46 heterozygous postmortem brains. While no significant association between rs1344706 and SZ or BD was observed in the Costa Rica sample, we observed an increased risk of SZ for the minor allele (A) of the exonic rs12476147 SNP (p=0.026). Our ASE assay detected a significant over-expression of the rs12476147 A allele in DLPFC of rs1344706 heterozygous subjects. Interestingly, cDNA allele ratios were significantly different according to the intronic rs1344706 genotypes (p-value=0.03), with the rs1344706 A allele associated with increased ZNF804A rs12476147 A allele expression (average 1.06, p-value=0.02, for heterozygous subjects vs. genomic DNA). In conclusion, we have demonstrated a significant association of rs12476147 with SZ, and using a powerful within-subject design, an allelic expression imbalance of ZNF804A exonic SNP rs12476147 in the DLPFC. Although this data does not preclude the possibility of other functional variants in ZNF804A, it provides evidence that the rs1344706 SZ risk allele is the cis-regulatory variant directly responsible for this allelic expression imbalance in adult cortex.
Collapse
Affiliation(s)
- Ilaria Guella
- Functional Genomics Laboratory, Department of Psychiatry and Human Behavior, University of California, Irvine, CA
| | - Adolfo Sequeira
- Functional Genomics Laboratory, Department of Psychiatry and Human Behavior, University of California, Irvine, CA
| | - Brandi Rollins
- Functional Genomics Laboratory, Department of Psychiatry and Human Behavior, University of California, Irvine, CA
| | - Ling Morgan
- Functional Genomics Laboratory, Department of Psychiatry and Human Behavior, University of California, Irvine, CA
| | | | - Stanley J. Watson
- Molecular and Behavioral Neurosciences Institute, University of Michigan, Ann Arbor, MI
| | - Huda Akil
- Molecular and Behavioral Neurosciences Institute, University of Michigan, Ann Arbor, MI
| | - William E. Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA
| | - Lynn E. DeLisi
- Harvard Medical School, Brockton VA Boston Healthcare System, Brockton, MA
| | - William Byerley
- Department of Psychiatry, University of California, San Francisco, CA
| | - Marquis P. Vawter
- Functional Genomics Laboratory, Department of Psychiatry and Human Behavior, University of California, Irvine, CA
| |
Collapse
|
58
|
Nag A, Savova V, Fung HL, Miron A, Yuan GC, Zhang K, Gimelbrant AA. Chromatin signature of widespread monoallelic expression. eLife 2013; 2:e01256. [PMID: 24381246 PMCID: PMC3873816 DOI: 10.7554/elife.01256] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
In mammals, numerous autosomal genes are subject to mitotically stable monoallelic expression (MAE), including genes that play critical roles in a variety of human diseases. Due to challenges posed by the clonal nature of MAE, very little is known about its regulation; in particular, no molecular features have been specifically linked to MAE. In this study, we report an approach that distinguishes MAE genes in human cells with great accuracy: a chromatin signature consisting of chromatin marks associated with active transcription (H3K36me3) and silencing (H3K27me3) simultaneously occurring in the gene body. The MAE signature is present in ∼20% of ubiquitously expressed genes and over 30% of tissue-specific genes across cell types. Notably, it is enriched among key developmental genes that have bivalent chromatin structure in pluripotent cells. Our results open a new approach to the study of MAE that is independent of polymorphisms, and suggest that MAE is linked to cell differentiation. DOI:http://dx.doi.org/10.7554/eLife.01256.001 Understanding how genes are activated and silenced is one of the central challenges in modern biology. These processes underpin the development of a fertilized egg into a complex organism, and they can also lead to life-threatening diseases when they go wrong. There are two copies of each gene in a human cell, a maternal copy and a paternal copy, and it is thought that both copies are usually regulated together. However, there are exceptions to this rule: for certain genes only the maternal copy is expressed as a protein in some cells, whereas the paternal copy is expressed in other cells. This form of gene regulation, which is called monoallelic expression, can result in neighboring cells heading down very different paths. In extreme cases, depending on the differences between the two copies of the gene, cells that express one copy may function normally, while cells where the other copy is activated will start forming tumors. However, despite these potentially grave consequences, and early results which suggested that monoallelic expression affected a large number of human and mouse genes, it has proved to be a major technical challenge to identify these genes in most cell types. Now, Nag, Savova et al. have discovered a molecular signature that can be used to detect monoallelic expression. The signature was found in chromatin, the densely packed structure formed by DNA and proteins inside the cell nucleus. Nag, Savova et al. discovered that the genes that are subject to monoallelic expression are bound with proteins that are modified in two contrasting ways. One modification, which is usually a sign of gene silencing, is prevalent on the inactive copy of the gene, and the other, which often marks active genes, is chiefly present on the active copy. Nag, Savova et al. report that these modifications are found in different sets of genes in different cell types, indicating distinct genome-wide patterns of monoallelic expression. The chromatin signature approach lets them estimate the fraction of human genes that are subject to monoallelic expression. This number is surprisingly high: about 20% of commonly expressed genes and more than one-third of tissue-specific genes. In a particularly intriguing finding, almost all bivalent genes—a subset of genes that are involved in determining the fate of cell during development—are estimated to become monoallelic when they are activated. In addition to these unexpected findings, the chromatin signature approach opens the door to exploring monoallelic expression as a form of gene regulation in all types of cells and, ultimately, to understanding how it is involved in both normal development and in disease. DOI:http://dx.doi.org/10.7554/eLife.01256.002
Collapse
Affiliation(s)
- Anwesha Nag
- Department of Cancer Biology and Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, United States
| | | | | | | | | | | | | |
Collapse
|
59
|
Song G, Guo Z, Liu Z, Cheng Q, Qu X, Chen R, Jiang D, Liu C, Wang W, Sun Y, Zhang L, Zhu Y, Yang D. Global RNA sequencing reveals that genotype-dependent allele-specific expression contributes to differential expression in rice F1 hybrids. BMC PLANT BIOLOGY 2013; 13:221. [PMID: 24358981 PMCID: PMC3878109 DOI: 10.1186/1471-2229-13-221] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2013] [Accepted: 12/09/2013] [Indexed: 05/23/2023]
Abstract
BACKGROUND Extensive studies on heterosis in plants using transcriptome analysis have identified differentially expressed genes (DEGs) in F1 hybrids. However, it is not clear why yield in heterozygotes is superior to that of the homozygous parents or how DEGs are produced. Global allele-specific expression analysis in hybrid rice has the potential to answer these questions. RESULTS We report a genome-wide allele-specific expression analysis using RNA-sequencing technology of 3,637-3,824 genes from three rice F1 hybrids. Of the expressed genes, 3.7% exhibited an unexpected type of monoallelic expression and 23.8% showed preferential allelic expression that was genotype-dependent in reciprocal crosses. Those genes exhibiting allele-specific expression comprised 42.4% of the genes differentially expressed between F1 hybrids and their parents. Allele-specific expression accounted for 79.8% of the genes displaying more than a 10-fold expression level difference between an F1 and its parents, and almost all (97.3%) of the genes expressed in F1, but non-expressed in one parent. Significant allelic complementary effects were detected in the F1 hybrids of rice. CONCLUSIONS Analysis of the allelic expression profiles of genes at the critical stage for highest biomass production from the leaves of three different rice F1 hybrids identified genotype-dependent allele-specific expression genes. A cis-regulatory mechanism was identified that contributes to allele-specific expression, leading to differential gene expression and allelic complementary effects in F1 hybrids.
Collapse
Affiliation(s)
- Gaoyuan Song
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Zhibin Guo
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Zhenwei Liu
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Qin Cheng
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Xuefeng Qu
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Rong Chen
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Daiming Jiang
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Chuan Liu
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Wei Wang
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Yunfang Sun
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Liping Zhang
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Yingguo Zhu
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| | - Daichang Yang
- State Key Laboratory of Hybrid Rice and College of Life Sciences, Wuhan University, Luojia Hill, Wuhan, Hubei Province 430072, China
| |
Collapse
|
60
|
Zhang R, Li X, Ramaswami G, Smith KS, Turecki G, Montgomery SB, Li JB. Quantifying RNA allelic ratios by microfluidic multiplex PCR and sequencing. Nat Methods 2013; 11:51-4. [PMID: 24270603 PMCID: PMC3877737 DOI: 10.1038/nmeth.2736] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 10/18/2013] [Indexed: 11/16/2022]
Abstract
We developed a targeted RNA sequencing method that couples microfluidics-based multiplex PCR and deep sequencing (mmPCR-seq) to uniformly and simultaneously amplify up to 960 loci in 48 samples independently of their gene expression levels, and accurately and cost-effectively measure allelic ratios even for low-quantity or low-quality RNA samples. We applied mmPCR-seq to RNA editing and allele-specific expression studies. mmPCR-seq complements RNA-seq and provides a highly desirable solution for future applications.
Collapse
Affiliation(s)
- Rui Zhang
- Department of Genetics, Stanford University, Stanford, California, USA
| | - Xin Li
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Gokul Ramaswami
- Department of Genetics, Stanford University, Stanford, California, USA
| | - Kevin S Smith
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Gustavo Turecki
- McGill Group for Suicide Studies, Douglas Mental Health University Institute, McGill University, Montreal, Quebec, Canada
| | - Stephen B Montgomery
- 1] Department of Genetics, Stanford University, Stanford, California, USA. [2] Department of Pathology, Stanford University, Stanford, California, USA
| | - Jin Billy Li
- Department of Genetics, Stanford University, Stanford, California, USA
| |
Collapse
|
61
|
Battle A, Mostafavi S, Zhu X, Potash JB, Weissman MM, McCormick C, Haudenschild CD, Beckman KB, Shi J, Mei R, Urban AE, Montgomery SB, Levinson DF, Koller D. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res 2013. [PMID: 24092820 DOI: 10.1101/gr.155192] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Understanding the consequences of regulatory variation in the human genome remains a major challenge, with important implications for understanding gene regulation and interpreting the many disease-risk variants that fall outside of protein-coding regions. Here, we provide a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals. We present a comprehensive description of the distribution of regulatory variation--by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants. We detect variants influencing expression of over ten thousand genes, and through the enhanced resolution offered by RNA-sequencing, for the first time we identify thousands of variants associated with specific phenotypes including splicing and allelic expression. Evaluating the effects of both long-range intra-chromosomal and trans (cross-chromosomal) regulation, we observe modularity in the regulatory network, with three-dimensional chromosomal configuration playing a particular role in regulatory modules within each chromosome. We also observe a significant depletion of regulatory variants affecting central and critical genes, along with a trend of reduced effect sizes as variant frequency increases, providing evidence that purifying selection and buffering have limited the deleterious impact of regulatory variation on the cell. Further, generalizing beyond observed variants, we have analyzed the genomic properties of variants associated with expression and splicing and developed a Bayesian model to predict regulatory consequences of genetic variants, applicable to the interpretation of individual genomes and disease studies. Together, these results represent a critical step toward characterizing the complete landscape of human regulatory variation.
Collapse
Affiliation(s)
- Alexis Battle
- Department of Computer Science, Stanford University, Stanford, California 94305, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
62
|
Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res 2013; 24:14-24. [PMID: 24092820 PMCID: PMC3875855 DOI: 10.1101/gr.155192.113] [Citation(s) in RCA: 381] [Impact Index Per Article: 34.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Understanding the consequences of regulatory variation in the human genome remains a major challenge, with important implications for understanding gene regulation and interpreting the many disease-risk variants that fall outside of protein-coding regions. Here, we provide a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals. We present a comprehensive description of the distribution of regulatory variation—by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants. We detect variants influencing expression of over ten thousand genes, and through the enhanced resolution offered by RNA-sequencing, for the first time we identify thousands of variants associated with specific phenotypes including splicing and allelic expression. Evaluating the effects of both long-range intra-chromosomal and trans (cross-chromosomal) regulation, we observe modularity in the regulatory network, with three-dimensional chromosomal configuration playing a particular role in regulatory modules within each chromosome. We also observe a significant depletion of regulatory variants affecting central and critical genes, along with a trend of reduced effect sizes as variant frequency increases, providing evidence that purifying selection and buffering have limited the deleterious impact of regulatory variation on the cell. Further, generalizing beyond observed variants, we have analyzed the genomic properties of variants associated with expression and splicing and developed a Bayesian model to predict regulatory consequences of genetic variants, applicable to the interpretation of individual genomes and disease studies. Together, these results represent a critical step toward characterizing the complete landscape of human regulatory variation.
Collapse
|
63
|
Human papillomavirus type 58 genome variations and RNA expression in cervical lesions. J Virol 2013; 87:9313-22. [PMID: 23785208 DOI: 10.1128/jvi.01154-13] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Human papillomavirus type 58 (HPV58) is relatively prevalent in China and other Asian countries. In this study, the HPV58 genome in cervical lesions was decoded from five grade 2 or 3 cervical intraepithelial neoplasia lesion (CIN2/3) samples and five cervical cancer tissues using rolling-circle amplification of total cell DNA and deep sequencing and verified by whole-genome cloning and sequencing. HPV58 isolates from China feature a total of 52 nucleotide substitutions (0.66%) from the reference HPV58 sequence, which appear mainly in two regions, with 12 from nucleotides (nt) 3430 to 4136 covering the E2/E4/E5 open reading frames (ORFs) and 13 from nt 4621 to 5540 covering the L2 ORF; these could be grouped as HPV58 Chinese Zhejiang-1, -2, and -3 (CNZJ-1, -2, and -3) according to their sequence similarities and restriction enzyme digestion. Phylogenetically, CNZJ-3 is similar to the reference HPV58 sublineage A1 sequence. The other two are close to sublineage A2. Analysis of cervical lesion-derived RNA revealed abundant HPV58 early transcripts spliced at the E6 and E1/E2 ORFs, where two 5' splice sites at nt 232 and nt 898 and two 3' splice sites at nt 510 and nt 3355 can be identified. Thus, our study represents the first genome-wide analysis of HPV58 and its expression in cervical lesions.
Collapse
|
64
|
Lower KM, De Gobbi M, Hughes JR, Derry CJ, Ayyub H, Sloane-Stanley JA, Vernimmen D, Garrick D, Gibbons RJ, Higgs DR. Analysis of sequence variation underlying tissue-specific transcription factor binding and gene expression. Hum Mutat 2013; 34:1140-8. [PMID: 23616472 DOI: 10.1002/humu.22343] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Accepted: 04/12/2013] [Indexed: 11/09/2022]
Abstract
Although mutations causing monogenic disorders most frequently lie within the affected gene, sequence variation in complex disorders is more commonly found in noncoding regions. Furthermore, recent genome- wide studies have shown that common DNA sequence variants in noncoding regions are associated with "normal" variation in gene expression resulting in cell-specific and/or allele-specific differences. The mechanism by which such sequence variation causes changes in gene expression is largely unknown. We have addressed this by studying natural variation in the binding of key transcription factors (TFs) in the well-defined, purified cell system of erythropoiesis. We have shown that common polymorphisms frequently directly perturb the binding sites of key TFs, and detailed analysis shows how this causes considerable (~10-fold) changes in expression from a single allele in a tissue-specific manner. We also show how a SNP, located at some distance from the recognized TF binding site, may affect the recruitment of a large multiprotein complex and alter the associated chromatin modification of the variant regulatory element. This study illustrates the principles by which common sequence variation may cause changes in tissue-specific gene expression, and suggests that such variation may underlie an individual's propensity to develop complex human genetic diseases.
Collapse
Affiliation(s)
- Karen M Lower
- MRC Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Oxford, UK
| | | | | | | | | | | | | | | | | | | |
Collapse
|
65
|
Li X, Montgomery SB. Detection and impact of rare regulatory variants in human disease. Front Genet 2013; 4:67. [PMID: 23755067 PMCID: PMC3668132 DOI: 10.3389/fgene.2013.00067] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2012] [Accepted: 04/09/2013] [Indexed: 12/20/2022] Open
Abstract
Advances in genome sequencing are providing unprecedented resolution of rare and private variants. However, methods which assess the effect of these variants have relied predominantly on information within coding sequences. Assessing their impact in non-coding sequences remains a significant contemporary challenge. In this review, we highlight the role of regulatory variation as causative agents and modifiers of monogenic disorders. We further discuss how advances in functional genomics are now providing new opportunity to assess the impact of rare non-coding variants and their role in disease.
Collapse
Affiliation(s)
- Xin Li
- Department of Pathology, Stanford University School of Medicine Stanford, CA, USA ; Department of Genetics, Stanford University School of Medicine Stanford, CA, USA
| | | |
Collapse
|
66
|
Abstract
As RNA-seq is replacing gene expression microarrays to assess genome-wide transcription abundance, gene expression Quantitative Trait Locus (eQTL) studies using RNA-seq have emerged. RNA-seq delivers two novel features that are important for eQTL studies. First, it provides information on allele-specific expression (ASE), which is not available from gene expression microarrays. Second, it generates unprecedentedly rich data to study RNA-isoform expression. In this paper, we review current methods for eQTL mapping using ASE and discuss some future directions. We also review existing works that use RNA-seq data to study RNA-isoform expression and we discuss the gaps between these works and isoform-specific eQTL mapping.
Collapse
Affiliation(s)
- Wei Sun
- Department of Biostatistics, Department of Genetics, Carolina Center of Genome Science, UNC Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yijuan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
67
|
Abstract
Rising atmospheric carbon dioxide (CO2) conditions are driving unprecedented changes in seawater chemistry, resulting in reduced pH and carbonate ion concentrations in the Earth's oceans. This ocean acidification has negative but variable impacts on individual performance in many marine species. However, little is known about the adaptive capacity of species to respond to an acidified ocean, and, as a result, predictions regarding future ecosystem responses remain incomplete. Here we demonstrate that ocean acidification generates striking patterns of genome-wide selection in purple sea urchins (Strongylocentrotus purpuratus) cultured under different CO2 levels. We examined genetic change at 19,493 loci in larvae from seven adult populations cultured under realistic future CO2 levels. Although larval development and morphology showed little response to elevated CO2, we found substantial allelic change in 40 functional classes of proteins involving hundreds of loci. Pronounced genetic changes, including excess amino acid replacements, were detected in all populations and occurred in genes for biomineralization, lipid metabolism, and ion homeostasis--gene classes that build skeletons and interact in pH regulation. Such genetic change represents a neglected and important impact of ocean acidification that may influence populations that show few outward signs of response to acidification. Our results demonstrate the capacity for rapid evolution in the face of ocean acidification and show that standing genetic variation could be a reservoir of resilience to climate change in this coastal upwelling ecosystem. However, effective response to strong natural selection demands large population sizes and may be limited in species impacted by other environmental stressors.
Collapse
|
68
|
Giorgi FM, Del Fabbro C, Licausi F. Comparative study of RNA-seq- and microarray-derived coexpression networks in Arabidopsis thaliana. ACTA ACUST UNITED AC 2013; 29:717-24. [PMID: 23376351 DOI: 10.1093/bioinformatics/btt053] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
MOTIVATION Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions. They have been used for hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes. So far, the main platform for expression data has been DNA microarrays; however, the recent development of RNA-seq allows for higher accuracy and coverage of transcript populations. It is therefore important to assess the potential for biological investigation of coexpression networks derived from this novel technique in a condition-independent dataset. RESULTS We collected 65 publicly available Illumina RNA-seq high quality Arabidopsis thaliana samples and generated Pearson correlation coexpression networks. These networks were then compared with those derived from analogous microarray data. We show how Variance-Stabilizing Transformed (VST) RNA-seq data samples are the most similar to microarray ones, with respect to inter-sample variation, correlation coefficient distribution and network topological architecture. Microarray networks show a slightly higher score in biology-derived quality assessments such as overlap with the known protein-protein interaction network and edge ontological agreement. Different coexpression network centralities are investigated; in particular, we show how betweenness centrality is generally a positive marker for essential genes in A.thaliana, regardless of the platform originating the data. In the end, we focus on a specific gene network case, showing that although microarray data seem more suited for gene network reverse engineering, RNA-seq offers the great advantage of extending coexpression analyses to the entire transcriptome.
Collapse
|
69
|
Kim JK, Marioni JC. Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol 2013; 14:R7. [PMID: 23360624 PMCID: PMC3663116 DOI: 10.1186/gb-2013-14-1-r7] [Citation(s) in RCA: 118] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 01/28/2013] [Indexed: 12/15/2022] Open
Abstract
Background Genetically identical populations of cells grown in the same environmental condition show substantial variability in gene expression profiles. Although single-cell RNA-seq provides an opportunity to explore this phenomenon, statistical methods need to be developed to interpret the variability of gene expression counts. Results We develop a statistical framework for studying the kinetics of stochastic gene expression from single-cell RNA-seq data. By applying our model to a single-cell RNA-seq dataset generated by profiling mouse embryonic stem cells, we find that the inferred kinetic parameters are consistent with RNA polymerase II binding and chromatin modifications. Our results suggest that histone modifications affect transcriptional bursting by modulating both burst size and frequency. Furthermore, we show that our model can be used to identify genes with slow promoter kinetics, which are important for probabilistic differentiation of embryonic stem cells. Conclusions We conclude that the proposed statistical model provides a flexible and efficient way to investigate the kinetics of transcription.
Collapse
|
70
|
Jeffries AR, Perfect LW, Ledderose J, Schalkwyk LC, Bray NJ, Mill J, Price J. Stochastic choice of allelic expression in human neural stem cells. Stem Cells 2013; 30:1938-47. [PMID: 22714879 DOI: 10.1002/stem.1155] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Monoallelic gene expression, such as genomic imprinting, is well described. Less well-characterized are genes undergoing stochastic monoallelic expression (MA), where specific clones of cells express just one allele at a given locus. We performed genome-wide allelic expression assessment of human clonal neural stem cells derived from cerebral cortex, striatum, and spinal cord, each with differing genotypes. We assayed three separate clonal lines from each donor, distinguishing stochastic MA from genotypic effects. Roughly 2% of genes showed evidence for autosomal MA, and in about half of these, allelic expression was stochastic between different clones. Many of these loci were known neurodevelopmental genes, such as OTX2 and OLIG2. Monoallelic genes also showed increased levels of DNA methylation compared to hypomethylated biallelic loci. Identified monoallelic gene loci showed altered chromatin signatures in fetal brain, suggesting an in vivo correlate of this phenomenon. We conclude that stochastic allelic expression is prevalent in neural stem cells, providing clonal diversity to developing tissues such as the human brain.
Collapse
Affiliation(s)
- Aaron R Jeffries
- King's College London, Institute of Psychiatry, Centre for the Cellular Basis of Behaviour, Department of Neuroscience, London, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
71
|
Hwang JY, Lee SH, Go MJ, Kim BJ, Kou I, Ikegawa S, Guo Y, Deng HW, Raychaudhuri S, Kim YJ, Oh JH, Kim Y, Moon S, Kim DJ, Koo H, Cha MJ, Lee MH, Yun JY, Yoo HS, Kang YA, Cho EH, Kim SW, Oh KW, Kang MI, Son HY, Kim SY, Kim GS, Han BG, Cho YS, Cho MC, Lee JY, Koh JM. Meta-analysis identifies a MECOM gene as a novel predisposing factor of osteoporotic fracture. J Med Genet 2013; 50:212-9. [PMID: 23349225 DOI: 10.1136/jmedgenet-2012-101156] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
BACKGROUND Osteoporotic fracture (OF) as a clinical endpoint is a major complication of osteoporosis. To screen for OF susceptibility genes, we performed a genome-wide association study and carried out de novo replication analysis of an East Asian population. METHODS Association was tested using a logistic regression analysis. A meta-analysis was performed on the combined results using effect size and standard errors estimated for each study. RESULTS In a combined meta-analysis of a discovery cohort (288 cases and 1139 controls), three hospital based sets in replication stage I (462 cases and 1745 controls), and an independent ethnic group in replication stage II (369 cases and 560 for controls), we identified a new locus associated with OF (rs784288 in the MECOM gene) that showed genome-wide significance (p=3.59×10(-8); OR 1.39). RNA interference revealed that a MECOM knockdown suppresses osteoclastogenesis. CONCLUSIONS Our findings provide new insights into the genetic architecture underlying OF in East Asians.
Collapse
Affiliation(s)
- Joo-Yeon Hwang
- Center for Genome Science, National Institute of Health, Chungcheongbuk-do, Republic of Korea
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
72
|
Soon WW, Hariharan M, Snyder MP. High-throughput sequencing for biology and medicine. Mol Syst Biol 2013; 9:640. [PMID: 23340846 PMCID: PMC3564260 DOI: 10.1038/msb.2012.61] [Citation(s) in RCA: 174] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 10/29/2012] [Indexed: 02/06/2023] Open
Abstract
Advances in genome sequencing have progressed at a rapid pace, with increased throughput accompanied by plunging costs. But these advances go far beyond faster and cheaper. High-throughput sequencing technologies are now routinely being applied to a wide range of important topics in biology and medicine, often allowing researchers to address important biological questions that were not possible before. In this review, we discuss these innovative new approaches-including ever finer analyses of transcriptome dynamics, genome structure and genomic variation-and provide an overview of the new insights into complex biological systems catalyzed by these technologies. We also assess the impact of genotyping, genome sequencing and personal omics profiling on medical applications, including diagnosis and disease monitoring. Finally, we review recent developments in single-cell sequencing, and conclude with a discussion of possible future advances and obstacles for sequencing in biology and health.
Collapse
Affiliation(s)
- Wendy Weijia Soon
- Department of Genetics, Stanford University School of Medicine, Alway Building, 300 Pasteur Drive, Stanford, CA, USA
| | - Manoj Hariharan
- Department of Genetics, Stanford University School of Medicine, Alway Building, 300 Pasteur Drive, Stanford, CA, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Alway Building, 300 Pasteur Drive, Stanford, CA, USA
| |
Collapse
|
73
|
Wei Y, Li X, Wang QF, Ji H. iASeq: integrative analysis of allele-specificity of protein-DNA interactions in multiple ChIP-seq datasets. BMC Genomics 2012. [PMID: 23194258 PMCID: PMC3576346 DOI: 10.1186/1471-2164-13-681] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND ChIP-seq provides new opportunities to study allele-specific protein-DNA binding (ASB). However, detecting allelic imbalance from a single ChIP-seq dataset often has low statistical power since only sequence reads mapped to heterozygote SNPs are informative for discriminating two alleles. RESULTS We develop a new method iASeq to address this issue by jointly analyzing multiple ChIP-seq datasets. iASeq uses a Bayesian hierarchical mixture model to learn correlation patterns of allele-specificity among multiple proteins. Using the discovered correlation patterns, the model allows one to borrow information across datasets to improve detection of allelic imbalance. Application of iASeq to 77 ChIP-seq samples from 40 ENCODE datasets and 1 genomic DNA sample in GM12878 cells reveals that allele-specificity of multiple proteins are highly correlated, and demonstrates the ability of iASeq to improve allelic inference compared to analyzing each individual dataset separately. CONCLUSIONS iASeq illustrates the value of integrating multiple datasets in the allele-specificity inference and offers a new tool to better analyze ASB.
Collapse
Affiliation(s)
- Yingying Wei
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe StreetBaltimore, Maryland 21205, USA
| | | | | | | |
Collapse
|
74
|
Basu M, Das T, Ghosh A, Majumder S, Maji AK, Kanjilal SD, Mukhopadhyay I, Roychowdhury S, Banerjee S, Sengupta S. Gene-gene interaction and functional impact of polymorphisms on innate immune genes in controlling Plasmodium falciparum blood infection level. PLoS One 2012; 7:e46441. [PMID: 23071570 PMCID: PMC3470565 DOI: 10.1371/journal.pone.0046441] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Accepted: 08/30/2012] [Indexed: 12/19/2022] Open
Abstract
Genetic variations in toll-like receptors and cytokine genes of the innate immune pathways have been implicated in controlling parasite growth and the pathogenesis of Plasmodium falciparum mediated malaria. We previously published genetic association of TLR4 non-synonymous and TNF-α promoter polymorphisms with P.falciparum blood infection level and here we extend the study considerably by (i) investigating genetic dependence of parasite-load on interleukin-12B polymorphisms, (ii) reconstructing gene-gene interactions among candidate TLRs and cytokine loci, (iii) exploring genetic and functional impact of epistatic models and (iv) providing mechanistic insights into functionality of disease-associated regulatory polymorphisms. Our data revealed that carriage of AA (P = 0.0001) and AC (P = 0.01) genotypes of IL12B 3′UTR polymorphism was associated with a significant increase of mean log-parasitemia relative to rare homozygous genotype CC. Presence of IL12B+1188 polymorphism in five of six multifactor models reinforced its strong genetic impact on malaria phenotype. Elevation of genetic risk in two-component models compared to the corresponding single locus and reduction of IL12B (2.2 fold) and lymphotoxin-α (1.7 fold) expressions in patients'peripheral-blood-mononuclear-cells under TLR4Thr399Ile risk genotype background substantiated the role of Multifactor Dimensionality Reduction derived models. Marked reduction of promoter activity of TNF-α risk haplotype (C-C-G-G) compared to wild-type haplotype (T-C-G-G) with (84%) and without (78%) LPS stimulation and the loss of binding of transcription factors detected in-silico supported a causal role of TNF-1031. Significantly lower expression of IL12B+1188 AA (5 fold) and AC (9 fold) genotypes compared to CC and under-representation (P = 0.0048) of allele A in transcripts of patients' PBMCs suggested an Allele-Expression-Imbalance. Allele (A+1188C) dependent differential stability (2 fold) of IL12B-transcripts upon actinomycin-D treatment and observed structural modulation (P = 0.013) of RNA-ensemble were the plausible explanations for AEI. In conclusion, our data provides functional support to the hypothesis that de-regulated receptor-cytokine axis of innate immune pathway influences blood infection level in P. falciparum malaria.
Collapse
Affiliation(s)
- Madhumita Basu
- Department of Biochemistry, University of Calcutta, Kolkata, West Bengal, India
| | - Tania Das
- Cancer & Cell Biology Division, Indian Institute of Chemical Biology, Kolkata, West Bengal, India
| | - Alip Ghosh
- Centre for Liver Research, The Institute of Post-Graduate Medical Education & Research, Kolkata, West Bengal, India
| | - Subhadipa Majumder
- Department of Biochemistry, University of Calcutta, Kolkata, West Bengal, India
| | - Ardhendu Kumar Maji
- Department of Protozoology, The Calcutta School of Tropical Medicine, Kolkata, West Bengal, India
| | - Sumana Datta Kanjilal
- Department of Pediatric Medicine, Calcutta National Medical College, Kolkata, West Bengal, India
| | | | - Susanta Roychowdhury
- Cancer & Cell Biology Division, Indian Institute of Chemical Biology, Kolkata, West Bengal, India
| | - Soma Banerjee
- Centre for Liver Research, The Institute of Post-Graduate Medical Education & Research, Kolkata, West Bengal, India
| | - Sanghamitra Sengupta
- Department of Biochemistry, University of Calcutta, Kolkata, West Bengal, India
- * E-mail:
| |
Collapse
|
75
|
van Delft J, Gaj S, Lienhard M, Albrecht MW, Kirpiy A, Brauers K, Claessen S, Lizarraga D, Lehrach H, Herwig R, Kleinjans J. RNA-Seq provides new insights in the transcriptome responses induced by the carcinogen benzo[a]pyrene. Toxicol Sci 2012; 130:427-39. [PMID: 22889811 DOI: 10.1093/toxsci/kfs250] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Whole-genome transcriptome measurements are pivotal for characterizing molecular mechanisms of chemicals and predicting toxic classes, such as genotoxicity and carcinogenicity, from in vitro and in vivo assays. In recent years, deep sequencing technologies have been developed that hold the promise of measuring the transcriptome in a more complete and unbiased manner than DNA microarrays. Here, we applied this RNA-seq technology for the characterization of the transcriptomic responses in HepG2 cells upon exposure to benzo[a]pyrene (BaP), a well-known DNA damaging human carcinogen. Based on EnsEMBL genes, we demonstrate that RNA-seq detects ca 20% more genes than microarray-based technology but almost threefold more significantly differentially expressed genes. Functional enrichment analyses show that RNA-seq yields more insight into the biology and mechanisms related to the toxic effects caused by BaP, i.e., two- to fivefold more affected pathways and biological processes. Additionally, we demonstrate that RNA-seq allows detecting alternative isoform expression in many genes, including regulators of cell death and DNA repair such as TP53, BCL2 and XPA, which are relevant for genotoxic responses. Moreover, potentially novel isoforms were found, such as fragments of known transcripts, transcripts with additional exons, intron retention or exon-skipping events. The biological function(s) of these isoforms remain for the time being unknown. Finally, we demonstrate that RNA-seq enables the investigation of allele-specific gene expression, although no changes could be observed. Our results provide evidence that RNA-seq is a powerful tool for toxicology, which, compared with microarrays, is capable of generating novel and valuable information at the transcriptome level for characterizing deleterious effects caused by chemicals.
Collapse
Affiliation(s)
- Joost van Delft
- Department of Toxicogenomics, Maastricht University, Maastricht, The Netherlands.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
76
|
Wu JR, Zeng R. Molecular basis for population variation: from SNPs to SAPs. FEBS Lett 2012; 586:2841-5. [PMID: 22828278 DOI: 10.1016/j.febslet.2012.07.036] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2012] [Revised: 07/14/2012] [Accepted: 07/16/2012] [Indexed: 01/09/2023]
Abstract
Single nucleotide polymorphisms (SNPs) are one type of genomic DNA variations in a population. Correspondingly, single amino-acid polymorphisms (SAPs) derived from non-synonymous SNPs represent protein variations in a population. Recently, using proteomic approaches, SAPs in the plasma proteomes of an Asian population were systematically identified for the first time. That study showed that heterozygous and homozygous proteins with various SAPs have different associations with particular traits in the population. Recent discoveries of widespread differences between RNA and DNA sequences indicate that RNA editing is also a source of SAPs--one that is independent of genomic SNPs. Furthermore, we argue that there are de novo SAPs that are not encoded by either DNA or RNA sequences.
Collapse
Affiliation(s)
- Jia-Rui Wu
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | | |
Collapse
|
77
|
Abstract
Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain-we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.
Collapse
|
78
|
Reddy TE, Gertz J, Pauli F, Kucera KS, Varley KE, Newberry KM, Marinov GK, Mortazavi A, Williams BA, Song L, Crawford GE, Wold B, Willard HF, Myers RM. Effects of sequence variation on differential allelic transcription factor occupancy and gene expression. Genome Res 2012; 22:860-9. [PMID: 22300769 PMCID: PMC3337432 DOI: 10.1101/gr.131201.111] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2011] [Accepted: 02/01/2012] [Indexed: 01/01/2023]
Abstract
A complex interplay between transcription factors (TFs) and the genome regulates transcription. However, connecting variation in genome sequence with variation in TF binding and gene expression is challenging due to environmental differences between individuals and cell types. To address this problem, we measured genome-wide differential allelic occupancy of 24 TFs and EP300 in a human lymphoblastoid cell line GM12878. Overall, 5% of human TF binding sites have an allelic imbalance in occupancy. At many sites, TFs clustered in TF-binding hubs on the same homolog in especially open chromatin. While genetic variation in core TF binding motifs generally resulted in large allelic differences in TF occupancy, most allelic differences in occupancy were subtle and associated with disruption of weak or noncanonical motifs. We also measured genome-wide differential allelic expression of genes with and without heterozygous exonic variants in the same cells. We found that genes with differential allelic expression were overall less expressed both in GM12878 cells and in unrelated human cell lines. Comparing TF occupancy with expression, we found strong association between allelic occupancy and expression within 100 bp of transcription start sites (TSSs), and weak association up to 100 kb from TSSs. Sites of differential allelic occupancy were significantly enriched for variants associated with disease, particularly autoimmune disease, suggesting that allelic differences in TF occupancy give functional insights into intergenic variants associated with disease. Our results have the potential to increase the power and interpretability of association studies by targeting functional intergenic variants in addition to protein coding sequences.
Collapse
Affiliation(s)
- Timothy E. Reddy
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
- Duke Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA
| | - Jason Gertz
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Florencia Pauli
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Katerina S. Kucera
- Duke Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA
| | | | | | - Georgi K. Marinov
- Department of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Ali Mortazavi
- Department of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Brian A. Williams
- Department of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Lingyun Song
- Duke Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA
| | - Gregory E. Crawford
- Duke Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA
| | - Barbara Wold
- Department of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Huntington F. Willard
- Duke Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA
| | - Richard M. Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| |
Collapse
|
79
|
Abstract
The individual human genome and epigenome are being defined at unprecedented resolution by current advances in sequencing technologies with important implications for human disease. This review uses examples relevant to clinical practice to illustrate the functional consequences of genetic and epigenetic variation. The insights gained from genome-wide association studies are described together with current efforts to understand the role of rare variants in common disease, set in the context of recent successes in Mendelian traits through the application of whole exome sequencing. The application of functional genomics to interrogate the genome and epigenome, build up an integrated picture of the regulatory genomic landscape and inform disease association studies is discussed, together with the role of expression quantitative trait mapping and analysis of allele-specific gene expression.
Collapse
Affiliation(s)
- J C Knight
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK.
| |
Collapse
|
80
|
Graze RM, Novelo LL, Amin V, Fear JM, Casella G, Nuzhdin SV, McIntyre LM. Allelic imbalance in Drosophila hybrid heads: exons, isoforms, and evolution. Mol Biol Evol 2012; 29:1521-32. [PMID: 22319150 DOI: 10.1093/molbev/msr318] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Unraveling how regulatory divergence contributes to species differences and adaptation requires identifying functional variants from among millions of genetic differences. Analysis of allelic imbalance (AI) reveals functional genetic differences in cis regulation and has demonstrated differences in cis regulation within and between species. Regulatory mechanisms are often highly conserved, yet differences between species in gene expression are extensive. What evolutionary forces explain widespread divergence in cis regulation? AI was assessed in Drosophila melanogaster-Drosophila simulans hybrid female heads using RNA-seq technology. Mapping bias was virtually eliminated by using genotype-specific references. Allele representation in DNA sequencing was used as a prior in a novel Bayesian model for the estimation of AI in RNA. Cis regulatory divergence was common in the organs and tissues of the head with 41% of genes analyzed showing significant AI. Using existing population genomic data, the relationship between AI and patterns of sequence evolution was examined. Evidence of positive selection was found in 30% of cis regulatory divergent genes. Genes involved in defense, RNAi/RISC complex genes, and those that are sex regulated are enriched among adaptively evolving cis regulatory divergent genes. For genes in these groups, adaptive evolution may play a role in regulatory divergence between species. However, there is no evidence that adaptive evolution drives most of the cis regulatory divergence that is observed. The majority of genes showed patterns consistent with stabilizing selection and neutral evolutionary processes.
Collapse
Affiliation(s)
- R M Graze
- Department of Molecular Genetics and Microbiology, University of Florida, USA
| | | | | | | | | | | | | |
Collapse
|
81
|
Teare MD, Pinyakorn S, Heighway J, Santibanez Koref MF. Comparing methods for mapping cis acting polymorphisms using allelic expression ratios. PLoS One 2011; 6:e28636. [PMID: 22174852 PMCID: PMC3236754 DOI: 10.1371/journal.pone.0028636] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2011] [Accepted: 11/11/2011] [Indexed: 02/04/2023] Open
Abstract
Genome wide association studies frequently reveal associations between disease susceptibility and polymorphisms outside coding regions. Such associations cannot always be explained by linkage disequilibrium with changes affecting the transcription products. This has stimulated the interest in characterising sequence variation influencing gene expression levels, in particular in changes acting in cis. Differences in transcription between the two alleles at an autosomal locus can be used to test the association between candidate polymorphisms and the modulation of gene expression in cis. This type of approach requires at least one transcribed polymorphism and one candidate polymorphism. In the past five years, different methods have been proposed to analyse such data. Here we use simulations and real data sets to compare the power of some of these methods. The results show that when it is not possible to determine the phase between the transcribed and potentially cis acting allele there is some advantage in using methods that estimate phased genotype and effect on expression simultaneously. However when the phase can be determined, simple regression models seem preferable because of their simplicity and flexibility. The simulations and the analysis of experimental data suggest that in the majority of situations, methods that assume a lognormal distribution of the allelic expression ratios are both robust to deviations from this assumption and more powerful than alternatives that do not make these assumptions.
Collapse
Affiliation(s)
- Marion Dawn Teare
- School of Health and Related Research, University of Sheffield, Sheffield, United Kingdom.
| | | | | | | |
Collapse
|
82
|
Meacham F, Boffelli D, Dhahbi J, Martin DIK, Singer M, Pachter L. Identification and correction of systematic error in high-throughput sequence data. BMC Bioinformatics 2011; 12:451. [PMID: 22099972 PMCID: PMC3295828 DOI: 10.1186/1471-2105-12-451] [Citation(s) in RCA: 163] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 11/21/2011] [Indexed: 12/21/2022] Open
Abstract
Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments.
Collapse
Affiliation(s)
- Frazer Meacham
- Department of Mathematics, University of California, Berkeley, 970 Evans Hall #3840, Berkeley, CA 94720, USA
| | | | | | | | | | | |
Collapse
|
83
|
Dewal N, Hu Y, Freedman ML, Laframboise T, Pe'er I. Calling amplified haplotypes in next generation tumor sequence data. Genome Res 2011; 22:362-74. [PMID: 22090379 DOI: 10.1101/gr.122564.111] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
During tumor initiation and progression, cancer cells acquire a selective advantage, allowing them to outcompete their normal counterparts. Identification of the genetic changes that underlie these tumor acquired traits can provide deeper insights into the biology of tumorigenesis. Regions of copy number alterations and germline DNA variants are some of the elements subject to selection during tumor evolution. Integrated examination of inherited variation and somatic alterations holds the potential to reveal specific nucleotide alleles that a tumor "prefers" to have amplified. Next-generation sequencing of tumor and matched normal tissues provides a high-resolution platform to identify and analyze such somatic amplicons. Within an amplicon, examination of informative (e.g., heterozygous) sites deviating from a 1:1 ratio may suggest selection of that allele. A naive approach examines the reads for each heterozygous site in isolation; however, this ignores available valuable linkage information across sites. We, therefore, present a novel hidden Markov model-based method-Haplotype Amplification in Tumor Sequences (HATS)-that analyzes tumor and normal sequence data, along with training data for phasing purposes, to infer amplified alleles and haplotypes in regions of copy number gain. Our method is designed to handle rare variants and biases in read data. We assess the performance of HATS using simulated amplified regions generated from varying copy number and coverage levels, followed by amplicons in real data. We demonstrate that HATS infers the amplified alleles more accurately than does the naive approach, especially at low to intermediate coverage levels and in cases (including high coverage) possessing stromal contamination or allelic bias.
Collapse
Affiliation(s)
- Ninad Dewal
- Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA
| | | | | | | | | |
Collapse
|
84
|
Panopoulos AD, Yanes O, Ruiz S, Kida YS, Diep D, Tautenhahn R, Herrerías A, Batchelder EM, Plongthongkum N, Lutz M, Berggren WT, Zhang K, Evans RM, Siuzdak G, Izpisua Belmonte JC. The metabolome of induced pluripotent stem cells reveals metabolic changes occurring in somatic cell reprogramming. Cell Res 2011; 22:168-77. [PMID: 22064701 DOI: 10.1038/cr.2011.177] [Citation(s) in RCA: 389] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Metabolism is vital to every aspect of cell function, yet the metabolome of induced pluripotent stem cells (iPSCs) remains largely unexplored. Here we report, using an untargeted metabolomics approach, that human iPSCs share a pluripotent metabolomic signature with embryonic stem cells (ESCs) that is distinct from their parental cells, and that is characterized by changes in metabolites involved in cellular respiration. Examination of cellular bioenergetics corroborated with our metabolomic analysis, and demonstrated that somatic cells convert from an oxidative state to a glycolytic state in pluripotency. Interestingly, the bioenergetics of various somatic cells correlated with their reprogramming efficiencies. We further identified metabolites that differ between iPSCs and ESCs, which revealed novel metabolic pathways that play a critical role in regulating somatic cell reprogramming. Our findings are the first to globally analyze the metabolome of iPSCs, and provide mechanistic insight into a new layer of regulation involved in inducing pluripotency, and in evaluating iPSC and ESC equivalence.
Collapse
Affiliation(s)
- Athanasia D Panopoulos
- Gene Expression Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
85
|
Yang Y, Graze RM, Walts BM, Lopez CM, Baker HV, Wayne ML, Nuzhdin SV, McIntyre LM. Partitioning transcript variation in Drosophila: abundance, isoforms, and alleles. G3 (BETHESDA, MD.) 2011; 1:427-36. [PMID: 22384353 PMCID: PMC3276160 DOI: 10.1534/g3.111.000596] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 09/11/2011] [Indexed: 12/25/2022]
Abstract
Multilevel analysis of transcription is facilitated by a new array design that includes modules for assessment of differential expression, isoform usage, and allelic imbalance in Drosophila. The ∼2.5 million feature chip incorporates a large number of controls, and it contains 18,769 3' expression probe sets and 61,919 exon probe sets with probe sequences from Drosophila melanogaster and 60,118 SNP probe sets focused on Drosophila simulans. An experiment in D. simulans identified genes differentially expressed between males and females (34% in the 3' expression module; 32% in the exon module). These proportions are consistent with previous reports, and there was good agreement (κ = 0.63) between the modules. Alternative isoform usage between the sexes was identified for 164 genes. The SNP module was verified with resequencing data. Concordance between resequencing and the chip design was greater than 99%. The design also proved apt in separating alleles based upon hybridization intensity. Concordance between the highest hybridization signals and the expected alleles in the genotype was greater than 96%. Intriguingly, allelic imbalance was detected for 37% of 6579 probe sets examined that contained heterozygous SNP loci. The large number of probes and multiple probe sets per gene in the 3' expression and exon modules allows the array to be used in D. melanogaster and in closely related species. The SNP module can be used for allele specific expression and genotyping of D. simulans.
Collapse
Affiliation(s)
- Yajie Yang
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610-0266
| | - Rita M. Graze
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610-0266
| | - Brandon M. Walts
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
| | - Cecilia M. Lopez
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610-0266
| | - Henry V. Baker
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610-0266
| | - Marta L. Wayne
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Zoology, University of Florida, Gainesville, FL, 32611-8525
| | - Sergey V. Nuzhdin
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089-2910
| | - Lauren M. McIntyre
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610-0266
- Department of Statistics, University of Florida, Gainesville, FL 32611-8545
| |
Collapse
|
86
|
Xu X, Wang H, Zhu M, Sun Y, Tao Y, He Q, Wang J, Chen L, Saffen D. Next-generation DNA sequencing-based assay for measuring allelic expression imbalance (AEI) of candidate neuropsychiatric disorder genes in human brain. BMC Genomics 2011; 12:518. [PMID: 22013986 PMCID: PMC3228908 DOI: 10.1186/1471-2164-12-518] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2011] [Accepted: 10/20/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Common genetic variants that regulate gene expression are widely suspected to contribute to the etiology and phenotypic variability of complex diseases. Although high-throughput, microarray-based assays have been developed to measure differences in mRNA expression among independent samples, these assays often lack the sensitivity to detect rare mRNAs and the reproducibility to quantify small changes in mRNA expression. By contrast, PCR-based allelic expression imbalance (AEI) assays, which use a "marker" single nucleotide polymorphism (mSNP) in the mRNA to distinguish expression from pairs of genetic alleles in individual samples, have high sensitivity and accuracy, allowing differences in mRNA expression greater than 1.2-fold to be quantified with high reproducibility. In this paper, we describe the use of an efficient PCR/next-generation DNA sequencing-based assay to analyze allele-specific differences in mRNA expression for candidate neuropsychiatric disorder genes in human brain. RESULTS Using our assay, we successfully analyzed AEI for 70 candidate neuropsychiatric disorder genes in 52 independent human brain samples. Among these genes, 62/70 (89%) showed AEI ratios greater than 1 ± 0.2 in at least one sample and 8/70 (11%) showed no AEI. Arranging log2AEI ratios in increasing order from negative-to-positive values revealed highly reproducible distributions of log2AEI ratios that are distinct for each gene/marker SNP combination. Mathematical modeling suggests that these log2AEI distributions can provide important clues concerning the number, location and contributions of cis-acting regulatory variants to mRNA expression. CONCLUSIONS We have developed a highly sensitive and reproducible method for quantifying AEI of mRNA expressed in human brain. Importantly, this assay allowed quantification of differential mRNA expression for many candidate disease genes entirely missed in previously published microarray-based studies of mRNA expression in human brain. Given the ability of next-generation sequencing technology to generate large numbers of independent sequencing reads, our method should be suitable for analyzing from 100- to 200-candidate genes in 100 samples in a single experiment. We believe that this is the appropriate scale for investigating variation in mRNA expression for defined sets candidate disorder genes, allowing, for example, comprehensive coverage of genes that function within biological pathways implicated in specific disorders. The combination of AEI measurements and mathematical modeling described in this study can assist in identifying SNPs that correlate with mRNA expression. Alleles of these SNPs (individually or as sets) that accurately predict high- or low-mRNA expression should be useful as markers in genetic association studies aimed at linking candidate genes to specific neuropsychiatric disorders.
Collapse
Affiliation(s)
- Xiang Xu
- Institutes of Brain Science, Fudan University, 138 Yixueyuan Road, Shanghai 200032, China
| | | | | | | | | | | | | | | | | |
Collapse
|
87
|
Human oocytes reprogram somatic cells to a pluripotent state. Nature 2011; 478:70-5. [PMID: 21979046 DOI: 10.1038/nature10397] [Citation(s) in RCA: 140] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 08/01/2011] [Indexed: 01/13/2023]
Abstract
The exchange of the oocyte's genome with the genome of a somatic cell, followed by the derivation of pluripotent stem cells, could enable the generation of specific cells affected in degenerative human diseases. Such cells, carrying the patient's genome, might be useful for cell replacement. Here we report that the development of human oocytes after genome exchange arrests at late cleavage stages in association with transcriptional abnormalities. In contrast, if the oocyte genome is not removed and the somatic cell genome is merely added, the resultant triploid cells develop to the blastocyst stage. Stem cell lines derived from these blastocysts differentiate into cell types of all three germ layers, and a pluripotent gene expression program is established on the genome derived from the somatic cell. This result demonstrates the feasibility of reprogramming human cells using oocytes and identifies removal of the oocyte genome as the primary cause of developmental failure after genome exchange.
Collapse
|
88
|
Skelly DA, Johansson M, Madeoy J, Wakefield J, Akey JM. A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Res 2011; 21:1728-37. [PMID: 21873452 PMCID: PMC3202289 DOI: 10.1101/gr.119784.110] [Citation(s) in RCA: 155] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2010] [Accepted: 07/12/2011] [Indexed: 11/24/2022]
Abstract
Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes.
Collapse
Affiliation(s)
- Daniel A. Skelly
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Marnie Johansson
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Jennifer Madeoy
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Jon Wakefield
- Department of Biostatistics and Department of Statistics, University of Washington, Seattle, Washington 98195, USA
| | - Joshua M. Akey
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
89
|
Haas J, Katus HA, Meder B. Next-generation sequencing entering the clinical arena. Mol Cell Probes 2011; 25:206-11. [PMID: 21914469 DOI: 10.1016/j.mcp.2011.08.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Revised: 08/29/2011] [Accepted: 08/29/2011] [Indexed: 10/17/2022]
Abstract
Over the last decade the genetic etiology of many heritable diseases could be resolved. For heart muscle diseases, so called cardiomyopathies, mutations in more than 40 different genes have been identified. Due to this large genetic heterogeneity and missing of adequate gene-diagnostic tools, most patients are not genetically characterized, which would be important for individualized patient care. Currently, next-generation sequencing technologies are revolutionizing genetic and epigenetic research, since they are capable to produce billions of bases of sequence information in a single experiment. Accordingly, this powerful technology can now also open avenues for genetic diagnostics. The scope of this article is to illustrate technical approaches, clinical applications, and yet unsolved problems of next-generation sequencing entering the clinical arena.
Collapse
Affiliation(s)
- Jan Haas
- Department of Internal Medicine III, University of Heidelberg, Im Neuenheimer Feld 350, Heidelberg 69120, Germany
| | | | | |
Collapse
|
90
|
Abstract
Current methodologies used to synthesize DNA and RNA are reviewed. These focus on using controlled pore glass and microarrays on glass slides.
Collapse
|
91
|
Tang F, Barbacioru C, Nordman E, Bao S, Lee C, Wang X, Tuch BB, Heard E, Lao K, Surani MA. Deterministic and stochastic allele specific gene expression in single mouse blastomeres. PLoS One 2011; 6:e21208. [PMID: 21731673 PMCID: PMC3121735 DOI: 10.1371/journal.pone.0021208] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2011] [Accepted: 05/23/2011] [Indexed: 01/14/2023] Open
Abstract
Stochastic and deterministic allele specific gene expression (ASE) might influence single cell phenotype, but the extent and nature of the phenomenon at the onset of early mouse development is unknown. Here we performed single cell RNA-Seq analysis of single blastomeres of mouse embryos, which revealed significant changes in the transcriptome. Importantly, over half of the transcripts with detectable genetic polymorphisms exhibit ASE, most notably, individual blastomeres from the same two-cell embryo show similar pattern of ASE. However, about 6% of them exhibit stochastic expression, indicated by altered expression ratio between the two alleles. Thus, we demonstrate that ASE is both deterministic and stochastic in early blastomeres. Furthermore, we also found that 1,718 genes express two isoforms with different lengths of 3'UTRs, with the shorter one on average 5-6 times more abundant in early blastomeres compared to the transcripts in epiblast cells, suggesting that microRNA mediated regulation of gene expression acquires increasing importance as development progresses.
Collapse
Affiliation(s)
- Fuchou Tang
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge, United Kingdom
- Biodynamic Optical Imaging Center, School of Life Sciences, Peking University, Beijing, China
| | - Catalin Barbacioru
- Genetic Systems, Applied Biosystems, Life Technologies, Foster City, California, United States of America
| | - Ellen Nordman
- Genetic Systems, Applied Biosystems, Life Technologies, Foster City, California, United States of America
| | - Siqin Bao
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge, United Kingdom
| | - Caroline Lee
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge, United Kingdom
| | - Xiaohui Wang
- Genetic Systems, Applied Biosystems, Life Technologies, Foster City, California, United States of America
| | - Brian B. Tuch
- Genetic Systems, Applied Biosystems, Life Technologies, Foster City, California, United States of America
| | - Edith Heard
- CNRS UMR3215, INSERM U934, Institut Curie, Paris, France
| | - Kaiqin Lao
- Genetic Systems, Applied Biosystems, Life Technologies, Foster City, California, United States of America
- * E-mail: (MAS); (KL)
| | - M. Azim Surani
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (MAS); (KL)
| |
Collapse
|
92
|
Schulte JH, Bachmann HS, Brockmeyer B, Depreter K, Oberthür A, Ackermann S, Kahlert Y, Pajtler K, Theissen J, Westermann F, Vandesompele J, Speleman F, Berthold F, Eggert A, Brors B, Hero B, Schramm A, Fischer M. High ALK receptor tyrosine kinase expression supersedes ALK mutation as a determining factor of an unfavorable phenotype in primary neuroblastoma. Clin Cancer Res 2011; 17:5082-92. [PMID: 21632861 DOI: 10.1158/1078-0432.ccr-10-2809] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
PURPOSE Genomic alterations of the anaplastic lymphoma kinase (ALK) gene have been postulated to contribute to neuroblastoma pathogenesis. This study aimed to determine the interrelation of ALK mutations, ALK expression levels, and clinical phenotype in primary neuroblastoma. EXPERIMENTAL DESIGN The genomic ALK status and global gene expression patterns were examined in 263 primary neuroblastomas. Allele-specific ALK expression was determined by cDNA cloning and sequencing. Associations of genomic ALK alterations and ALK expression levels with clinical phenotypes and transcriptomic profiles were compared. RESULTS Nonsynonymous point mutations of ALK were detected in 21 of 263 neuroblastomas (8%). Tumors with ALK mutations exhibited about 2-fold elevated median ALK mRNA levels in comparison with tumors with wild-type (WT) ALK. Unexpectedly, the WT allele was preferentially expressed in 12 of 21 mutated tumors. Whereas survival of patients with ALK mutated tumors was significantly worse as compared with the entire cohort of WT ALK patients, it was similarly poor in patients with WT ALK tumors in which ALK expression was as high as in ALK mutated neuroblastomas. Global gene expression patterns of tumors with ALK mutations or with high-level WT ALK expression were highly similar, and suggested that ALK may be involved in cellular proliferation in primary neuroblastoma. CONCLUSIONS Primary neuroblastomas with mutated ALK exhibit high ALK expression levels and strongly resemble neuroblastomas with elevated WT ALK expression levels in both their clinical and molecular phenotypes. These data suggest that high levels of mutated and WT ALK mediate similar molecular functions that may contribute to a malignant phenotype in primary neuroblastoma.
Collapse
Affiliation(s)
- Johannes H Schulte
- Department of Pediatric Oncology and Hematology, University Children's Hospital, Heidelberg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
93
|
Abstract
Exponential advances in the quantitation of DNA variation and epigenetic states seem poised to convert much of biological research into a statistical exercise. But these developments also invite us to reimagine well-worn biological concepts on a grander scale. Somatic mosaicism refers to postzygotic mutations persisting in the individual, occasionally conspicuous to dermatologists as Blaschkoid, checkerboard, phylloid and patchy morphologies. A thoughtful examination of cutaneous mosaicism suggests, however, that virtually all of us may be somatic mosaics. Such genetic variability within individuals might explain localized presentations of disease and implies that some tissues literally evolve throughout life. We discuss here (i) the likely ubiquity of somatic mosaicism, (ii) the broad range of possible biological consequences and (iii) how experimentalists and clinicians may begin establishing genotype-to-phenotype correlates.
Collapse
Affiliation(s)
- Raymond J Cho
- Department of Dermatology, University of California, San Francisco, CA 94115, USA.
| |
Collapse
|
94
|
Abstract
Dissecting the relationship between genotype and phenotype is one of the central goals in developmental biology and medicine. Transcriptome analysis is a powerful strategy to connect genotype to phenotype of a cell. Here we review the history, progress, potential applications and future developments of single-cell transcriptome analysis. In combination with live cell imaging and lineage tracing, it will be possible to decipher the full gene expression network underlying physiological functions of individual cells in embryos and adults, and to study diseases.
Collapse
Affiliation(s)
- Fuchou Tang
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK
- BIOPIC, School of Life Sciences, Peking University, Beijing, 100871, China
| | - Kaiqin Lao
- Genetic Systems, Applied Biosystems, part of Life Technologies, 850 Lincoln Centre Drive, Foster City, CA 94404, USA
| | - M. Azim Surani
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK
| |
Collapse
|
95
|
Tansey KE, Hill MJ, Cochrane LE, Gill M, Anney RJ, Gallagher L. Functionality of promoter microsatellites of arginine vasopressin receptor 1A (AVPR1A): implications for autism. Mol Autism 2011; 2:3. [PMID: 21453499 PMCID: PMC3080300 DOI: 10.1186/2040-2392-2-3] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 03/31/2011] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Arginine vasopressin (AVP) has been hypothesized to play a role in aetiology of autism based on a demonstrated involvement in the regulation of social behaviours. The arginine vasopressin receptor 1A gene (AVPR1A) is widely expressed in the brain and is considered to be a key receptor for regulation of social behaviour. Moreover, genetic variation at AVPR1A has been reported to be associated with autism. Evidence from non-human mammals implicates variation in the 5'-flanking region of AVPR1A in variable gene expression and social behaviour. METHODS We examined four tagging single nucleotide polymorphisms (SNPs) (rs3803107, rs1042615, rs3741865, rs11174815) and three microsatellites (RS3, RS1 and AVR) at the AVPR1A gene for association in an autism cohort from Ireland. Two 5'-flanking region polymorphisms in the human AVPR1A, RS3 and RS1, were also tested for their effect on relative promoter activity. RESULTS The short alleles of RS1 and the SNP rs11174815 show weak association with autism in the Irish population (P = 0.036 and P = 0.008, respectively). Both RS1 and RS3 showed differences in relative promoter activity by length. Shorter repeat alleles of RS1 and RS3 decreased relative promoter activity in the human neuroblastoma cell line SH-SY5Y. CONCLUSIONS These aligning results can be interpreted as a functional route for this association, namely that shorter alleles of RS1 lead to decreased AVPR1A transcription, which may proffer increased susceptibility to the autism phenotype.
Collapse
Affiliation(s)
- Katherine E Tansey
- Neuropsychiatric Genetics Research Group, Department of Psychiatry, Institute of Molecular Medicine, Trinity College Dublin, Dublin, Ireland.
| | | | | | | | | | | |
Collapse
|
96
|
Somatic coding mutations in human induced pluripotent stem cells. Nature 2011; 471:63-7. [PMID: 21368825 PMCID: PMC3074107 DOI: 10.1038/nature09805] [Citation(s) in RCA: 930] [Impact Index Per Article: 71.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2010] [Accepted: 01/12/2011] [Indexed: 11/30/2022]
Abstract
Defined transcription factors can induce epigenetic reprogramming of adult mammalian cells into induced pluripotent stem cells. Although DNA factors are integrated during some reprogramming methods, it is unknown whether the genome remains unchanged at the single nucleotide level. Here we show that 22 human induced pluripotent stem (hiPS) cell lines reprogrammed using five different methods each contained an average of five protein-coding point mutations in the regions sampled (an estimated six protein coding point mutations per exome). The majority of these mutations were non-synonymous, nonsense, or splice variants, and were enriched in genes mutated or having causative effects in cancers. At least half of these reprogramming-associated mutations pre-existed in fibroblast progenitors at low frequencies, while the rest were newly occurring during or after reprogramming. Thus, hiPS cells acquire genetic modifications in addition to epigenetic modifications. Extensive genetic screening should become a standard procedure to ensure hiPS safety before clinical use.
Collapse
|
97
|
Zhu H, Lensch MW, Cahan P, Daley GQ. Investigating monogenic and complex diseases with pluripotent stem cells. Nat Rev Genet 2011; 12:266-75. [PMID: 21386866 DOI: 10.1038/nrg2951] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Human genetic studies have revealed the molecular basis of countless monogenic diseases but have been less successful in associating phenotype to genotype in complex multigenic conditions. Pluripotent stem cells (PSCs), which can differentiate into any cell type, offer promise for defining the functional effects of genetic variation. Here, we recount the advantages and practical limitations of coupling PSCs to genome-wide analyses to probe complex genetics and discuss the ability to investigate epigenetic contributions to disease states. We also describe new ways of using mice and mouse embryonic stem cells (ESCs) in tandem with human stem cells to further define genotype-phenotype relationships.
Collapse
Affiliation(s)
- Hao Zhu
- Division of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
| | | | | | | |
Collapse
|
98
|
Kiialainen A, Karlberg O, Ahlford A, Sigurdsson S, Lindblad-Toh K, Syvänen AC. Performance of microarray and liquid based capture methods for target enrichment for massively parallel sequencing and SNP discovery. PLoS One 2011; 6:e16486. [PMID: 21347407 PMCID: PMC3036585 DOI: 10.1371/journal.pone.0016486] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2010] [Accepted: 12/21/2010] [Indexed: 11/18/2022] Open
Abstract
Targeted sequencing is a cost-efficient way to obtain answers to biological questions in many projects, but the choice of the enrichment method to use can be difficult. In this study we compared two hybridization methods for target enrichment for massively parallel sequencing and single nucleotide polymorphism (SNP) discovery, namely Nimblegen sequence capture arrays and the SureSelect liquid-based hybrid capture system. We prepared sequencing libraries from three HapMap samples using both methods, sequenced the libraries on the Illumina Genome Analyzer, mapped the sequencing reads back to the genome, and called variants in the sequences. 74-75% of the sequence reads originated from the targeted region in the SureSelect libraries and 41-67% in the Nimblegen libraries. We could sequence up to 99.9% and 99.5% of the regions targeted by capture probes from the SureSelect libraries and from the Nimblegen libraries, respectively. The Nimblegen probes covered 0.6 Mb more of the original 3.1 Mb target region than the SureSelect probes. In each sample, we called more SNPs and detected more novel SNPs from the libraries that were prepared using the Nimblegen method. Thus the Nimblegen method gave better results when judged by the number of SNPs called, but this came at the cost of more over-sampling.
Collapse
Affiliation(s)
- Anna Kiialainen
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden.
| | | | | | | | | | | |
Collapse
|
99
|
Abstract
The development of next-generation sequencing technologies has enabled the transcriptome to be measured and characterized at a level which was previously unattainable. Shot gun sequencing of RNAs, or RNA-Seq as it is known, is providing the means to simultaneously survey locus activity, transcript-specific expression, sequence content of transcripts and transcriptome discovery. This article discusses the current state of RNA-Seq, its potential for redefining transcriptomics and some of the challenges associated with this revolutionary technology.
Collapse
Affiliation(s)
- Karin S Kassahn
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, Brisbane, Australia
| | | | | |
Collapse
|
100
|
Abstract
In the few years since its initial application, massively parallel cDNA sequencing, or RNA-seq, has allowed many advances in the characterization and quantification of transcriptomes. Recently, several developments in RNA-seq methods have provided an even more complete characterization of RNA transcripts. These developments include improvements in transcription start site mapping, strand-specific measurements, gene fusion detection, small RNA characterization and detection of alternative splicing events. Ongoing developments promise further advances in the application of RNA-seq, particularly direct RNA sequencing and approaches that allow RNA quantification from very small amounts of cellular materials.
Collapse
Affiliation(s)
- Fatih Ozsolak
- Helicos BioSciences Corporation, One Kendall Square, Cambridge, Massachusetts 02139, USA.
| | | |
Collapse
|