26
|
Molania R, Gagnon-Bartsch JA, Dobrovic A, Speed TP. A new normalization for Nanostring nCounter gene expression data. Nucleic Acids Res 2020; 47:6073-6083. [PMID: 31114909 PMCID: PMC6614807 DOI: 10.1093/nar/gkz433] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 04/25/2019] [Accepted: 05/07/2019] [Indexed: 12/18/2022] Open
Abstract
The Nanostring nCounter gene expression assay uses molecular barcodes and single molecule imaging to detect and count hundreds of unique transcripts in a single reaction. These counts need to be normalized to adjust for the amount of sample, variations in assay efficiency and other factors. Most users adopt the normalization approach described in the nSolver analysis software, which involves background correction based on the observed values of negative control probes, a within-sample normalization using the observed values of positive control probes and normalization across samples using reference (housekeeping) genes. Here we present a new normalization method, Removing Unwanted Variation-III (RUV-III), which makes vital use of technical replicates and suitable control genes. We also propose an approach using pseudo-replicates when technical replicates are not available. The effectiveness of RUV-III is illustrated on four different datasets. We also offer suggestions on the design and analysis of studies involving this technology.
Collapse
|
27
|
Haupt S, Caramia F, Herschtal A, Soussi T, Lozano G, Chen H, Liang H, Speed TP, Haupt Y. Identification of cancer sex-disparity in the functional integrity of p53 and its X chromosome network. Nat Commun 2019; 10:5385. [PMID: 31772231 PMCID: PMC6879765 DOI: 10.1038/s41467-019-13266-3] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 10/31/2019] [Indexed: 12/12/2022] Open
Abstract
The disproportionately high prevalence of male cancer is poorly understood. We tested for sex-disparity in the functional integrity of the major tumor suppressor p53 in sporadic cancers. Our bioinformatics analyses expose three novel levels of p53 impact on sex-disparity in 12 non-reproductive cancer types. First, TP53 mutation is more frequent in these cancers among US males than females, with poorest survival correlating with its mutation. Second, numerous X-linked genes are associated with p53, including vital genomic regulators. Males are at unique risk from alterations of their single copies of these genes. High expression of X-linked negative regulators of p53 in wild-type TP53 cancers corresponds with reduced survival. Third, females exhibit an exceptional incidence of non-expressed mutations among p53-associated X-linked genes. Our data indicate that poor survival in males is contributed by high frequencies of TP53 mutations and an inability to shield against deregulated X-linked genes that engage in p53 networks.
Collapse
|
28
|
Gigante S, Gouil Q, Lucattini A, Keniry A, Beck T, Tinning M, Gordon L, Woodruff C, Speed TP, Blewitt ME, Ritchie ME. Using long-read sequencing to detect imprinted DNA methylation. Nucleic Acids Res 2019; 47:e46. [PMID: 30793194 PMCID: PMC6486641 DOI: 10.1093/nar/gkz107] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Revised: 01/14/2019] [Accepted: 02/08/2019] [Indexed: 02/01/2023] Open
Abstract
Systematic variation in the methylation of cytosines at CpG sites plays a critical role in early development of humans and other mammals. Of particular interest are regions of differential methylation between parental alleles, as these often dictate monoallelic gene expression, resulting in parent of origin specific control of the embryonic transcriptome and subsequent development, in a phenomenon known as genomic imprinting. Using long-read nanopore sequencing we show that, with an average genomic coverage of ∼10, it is possible to determine both the level of methylation of CpG sites and the haplotype from which each read arises. The long-read property is exploited to characterize, using novel methods, both methylation and haplotype for reads that have reduced basecalling precision compared to Sanger sequencing. We validate the analysis both through comparison of nanopore-derived methylation patterns with those from Reduced Representation Bisulfite Sequencing data and through comparison with previously reported data. Our analysis successfully identifies known imprinting control regions (ICRs) as well as some novel differentially methylated regions which, due to their proximity to hitherto unknown monoallelically expressed genes, may represent new ICRs.
Collapse
|
29
|
Peters TJ, French HJ, Bradford ST, Pidsley R, Stirzaker C, Varinli H, Nair S, Qu W, Song J, Giles KA, Statham AL, Speirs H, Speed TP, Clark SJ. Evaluation of cross-platform and interlaboratory concordance via consensus modelling of genomic measurements. Bioinformatics 2019; 35:560-570. [PMID: 30084929 PMCID: PMC6378945 DOI: 10.1093/bioinformatics/bty675] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 07/10/2018] [Accepted: 07/31/2018] [Indexed: 01/23/2023] Open
Abstract
Motivation A synoptic view of the human genome benefits chiefly from the application of nucleic acid sequencing and microarray technologies. These platforms allow interrogation of patterns such as gene expression and DNA methylation at the vast majority of canonical loci, allowing granular insights and opportunities for validation of original findings. However, problems arise when validating against a “gold standard” measurement, since this immediately biases all subsequent measurements towards that particular technology or protocol. Since all genomic measurements are estimates, in the absence of a ”gold standard” we instead empirically assess the measurement precision and sensitivity of a large suite of genomic technologies via a consensus modelling method called the row-linear model. This method is an application of the American Society for Testing and Materials Standard E691 for assessing interlaboratory precision and sources of variability across multiple testing sites. Both cross-platform and cross-locus comparisons can be made across all common loci, allowing identification of technology- and locus-specific tendencies. Results We assess technologies including the Infinium MethylationEPIC BeadChip, whole genome bisulfite sequencing (WGBS), two different RNA-Seq protocols (PolyA+ and Ribo-Zero) and five different gene expression array platforms. Each technology thus is characterised herein, relative to the consensus. We showcase a number of applications of the row-linear model, including correlation with known interfering traits. We demonstrate a clear effect of cross-hybridisation on the sensitivity of Infinium methylation arrays. Additionally, we perform a true interlaboratory test on a set of samples interrogated on the same platform across twenty-one separate testing laboratories. Availability and implementation A full implementation of the row-linear model, plus extra functions for visualisation, are found in the R package consensus at https://github.com/timpeters82/consensus. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
30
|
Kim ML, Martin WJ, Minigo G, Keeble JL, Garnham AL, Pacini G, Smyth GK, Speed TP, Carapetis J, Wicks IP. Dysregulated IL-1β-GM-CSF Axis in Acute Rheumatic Fever That Is Limited by Hydroxychloroquine. Circulation 2019; 138:2648-2661. [PMID: 30571257 DOI: 10.1161/circulationaha.118.033891] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Acute rheumatic fever (ARF) and rheumatic heart disease are autoimmune consequences of group A streptococcus infection and remain major causes of cardiovascular morbidity and mortality around the world. Improved treatment has been stymied by gaps in understanding key steps in the immunopathogenesis of ARF and rheumatic heart disease. This study aimed to identify (1) effector T cell cytokine(s) that might be dysregulated in the autoimmune response of patients with ARF by group A streptococcus, and (2) an immunomodulatory agent that suppresses this response and could be clinically translatable to high-risk patients with ARF. METHODS The immune response to group A streptococcus was analyzed in peripheral blood mononuclear cells from an Australian Aboriginal ARF cohort by a combination of multiplex cytokine array, flow cytometric analysis, and global gene expression analysis by RNA sequencing. The immunomodulatory drug hydroxychloroquine was tested for effects on this response. RESULTS We found a dysregulated interleukin-1β-granulocyte-macrophage colony-stimulating factor (GM-CSF) cytokine axis in ARF peripheral blood mononuclear cells exposed to group A streptococcus in vitro, whereby persistent interleukin-1β production is coupled to overproduction of GM-CSF and selective expansion of CXCR3+CCR4-CCR6- CD4 T cells. CXCR3+CCR4-CCR6- CD4 T cells are the major source of GM-CSF in human CD4 T cells and CXCL10, a CXCR3 ligand and potent T helper 1 chemoattractant, was elevated in sera from patients with ARF. GM-CSF has recently emerged as a key T cell-derived effector cytokine in numerous autoimmune diseases, including myocarditis, and the production of CXCL10 may explain selective trafficking of these cells to the heart. We provide evidence that interleukin-1β amplifies the expansion of GM-CSF-expressing CD4 T cells, which is effectively suppressed by hydroxychloroquine. RNA sequencing showed shifts in gene expression profiles and differentially expressed genes in peripheral blood mononuclear cells derived from patients at different clinical stages of ARF. CONCLUSIONS Given the safety profile of hydroxychloroquine and its clinical pedigree in treating autoimmune diseases such as rheumatoid arthritis, where GM-CSF plays a pivotal role, we propose that hydroxychloroquine could be repurposed to reduce the risk of rheumatic heart disease after ARF.
Collapse
|
31
|
Savas P, Virassamy B, Ye C, Salim A, Mintoff CP, Caramia F, Salgado R, Teo ZL, Dushyanthen S, Byrne A, Luen SJ, Fox SB, Speed TP, Mackay LK, Neeson PJ, Loi S. Abstract PD5-03: Characterization of high TIL breast cancers reveals a prognostic and functionally distinct tissue-resident memory subpopulation. Cancer Res 2019. [DOI: 10.1158/1538-7445.sabcs18-pd5-03] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Background: Tumor infiltrating lymphocytes (TILs) assessed via light microscopy are prognostic and predictive in the early stage and advanced triple negative and HER2-amplified breast cancer (BC). Higher TILs can also identify patients more likely to benefit from anti-PD-1 therapy. In this study we interrogated T cell subsets that comprise high TILs to determine if distinct subpopulations are key mediators of anti-tumor immunity.
Methods: We characterised TILs with a focus on CD3+ T cells in 129 primary and metastatic BC samples using flow cytometry, bulk RNASeq on flow sorted T cell populations, multiplex immunohistochemistry and microdroplet based single cell 3' mRNA sequencing on the 10X Genomics Chromium platform. Cell type specific gene expression signatures were determined from differential expression between putative T cell subpopulations. These signatures were investigated in clinical cohorts, including trial cohorts treated with pembrolizumab.
Results: High TIL Infiltrates consisted primarily of CD3+ T cells, with both CD8 and CD4 populations. Unsupervised clustering of single cell sequencing identified 9 CD8 and CD4 subpopulations with distinct gene expression profiles. In addition to Tregs and CD8 effector memory (TEM) T cells, we found a CD8+ tissue resident memory (TRM) population expressing greater levels of T-cell checkpoints and cytotoxic markers compared to effector memory cells. In 2 primary tumours and 1 liver metastasis, bulk RNASeq of flow sorted TEM and TRM corroborated the single cell mRNASeq results. T cell receptor profiling (TCR) in the 3 samples found non-overlapping repertoires in the 2 primary tumours, but overlap in one metastatic lesion, suggesting divergent developmental origins in the breast, but the potential for nascent TRM differentiation in a metastatic niche. Clustering of these TCRs suggested differing antigen specificities between TRM and non-TRM CD8 T cells. Using Metabric data, the CD8 TRM gene expression signature was prognostic for disease free survival (DFS) in primary TNBCs (n=329, log-rank p=0.003), and was able to further stratify cases with high and low CD8A expression for DFS (log-rank p = 0.03). The CD8 TRM signature was enriched in baseline tumour samples of responders (n = 9) compared with non-responders (n=36) in 45 patients with metastatic melanoma treated with T cell checkpoint blockade (p < 0.0001). Additional single cell sequencing data with TCR sequencing will be combined with these initial results, and an independent data set of single cell mRNASeq and TCR Seq on CD3+ BC TILs will be used to confirm our findings. Cell type specific signatures will be explored in additional clinical cohorts including KEYNOTE-086, and presented at the meeting.
Conclusion: Using single cell profiling of the immune microenvironment in BC we demonstrate that high TIL BCs contain multiple T cell subpopulations with different functional and prognostic significance. Our approach identified a CD8 TRM population with a distinct gene expression profile and strong expression of key immune checkpoints likely representing the presence of true tumor specific immunity. This population may be a key target of immune checkpoint blockade.
Citation Format: Savas P, Virassamy B, Ye C, Salim A, Mintoff CP, Caramia F, Salgado R, Teo ZL, Dushyanthen S, Byrne A, Luen SJ, Fox SB, Speed TP, Mackay LK, Neeson PJ, Loi S. Characterization of high TIL breast cancers reveals a prognostic and functionally distinct tissue-resident memory subpopulation [abstract]. In: Proceedings of the 2018 San Antonio Breast Cancer Symposium; 2018 Dec 4-8; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2019;79(4 Suppl):Abstract nr PD5-03.
Collapse
|
32
|
Hicks DG, Speed TP, Yassin M, Russell SM. Maps of variability in cell lineage trees. PLoS Comput Biol 2019; 15:e1006745. [PMID: 30753182 PMCID: PMC6388934 DOI: 10.1371/journal.pcbi.1006745] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 02/25/2019] [Accepted: 01/02/2019] [Indexed: 11/19/2022] Open
Abstract
New approaches to lineage tracking have allowed the study of differentiation in multicellular organisms over many generations of cells. Understanding the phenotypic variability observed in these lineage trees requires new statistical methods. Whereas an invariant cell lineage, such as that for the nematode Caenorhabditis elegans, can be described by a lineage map, defined as the pattern of phenotypes overlaid onto the binary tree, a traditional lineage map is static and does not describe the variability inherent in the cell lineages of higher organisms. Here, we introduce lineage variability maps which describe the pattern of second-order variation in lineage trees. These maps can be undirected graphs of the partial correlations between every lineal position, or directed graphs showing the dynamics of bifurcated patterns in each subtree. We show how to infer these graphical models for lineages of any depth from sample sizes of only a few pedigrees. This required developing the generalized spectral analysis for a binary tree, the natural framework for describing tree-structured variation. When tested on pedigrees from C. elegans expressing a marker for pharyngeal differentiation potential, the variability maps recover essential features of the known lineage map. When applied to highly-variable pedigrees monitoring cell size in T lymphocytes, the maps show that most of the phenotype is set by the founder naive T cell. Lineage variability maps thus elevate the concept of the lineage map to the population level, addressing questions about the potency and dynamics of cell lineages and providing a way to quantify the progressive restriction of cell fate with increasing depth in the tree.
Collapse
|
33
|
Colborn KL, Mueller I, Speed TP. Joint Modeling of Mixed Plasmodium Species Infections Using a Bivariate Poisson Lognormal Model. Am J Trop Med Hyg 2018; 98:71-76. [PMID: 29182143 DOI: 10.4269/ajtmh.17-0523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Infectious diseases often present as coinfections that may affect each other in positive or negative ways. Understanding the relationship between two coinfecting pathogens is thus important to understand the risk of infection and burden of disease caused by each pathogen. Although coinfections with Plasmodium falciparum and Plasmodium vivax are very common outside Africa, it is yet unclear whether infections by the two parasite species are positively associated or if infection by one parasite suppresses the other. In this study, we use bivariate Poisson lognormal models (BPLM) to estimate covariate-adjusted associations between the incidence of infections (as measured by the force of blood-stage infections, molFOI) and clinical episodes caused by both P. falciparum and P. vivax in a cohort of Papua New Guinean children. A BPLM permits estimation of either positive or negative correlation, unlike most other multivariate Poisson models. Our results demonstrated a moderately positive association between P. falciparum and P. vivax infection rates, arguing against the hypothesis that P. vivax infections protect against P. falciparum infections. Our findings also suggest that the BPLM is only useful for counts with suitably large means and overdispersion.
Collapse
|
34
|
Savas P, Virassamy B, Ye C, Salim A, Mintoff CP, Caramia F, Salgado R, Byrne DJ, Teo ZL, Dushyanthen S, Byrne A, Wein L, Luen SJ, Poliness C, Nightingale SS, Skandarajah AS, Gyorki DE, Thornton CM, Beavis PA, Fox SB, Darcy PK, Speed TP, Mackay LK, Neeson PJ, Loi S. Publisher Correction: Single-cell profiling of breast cancer T cells reveals a tissue-resident memory subset associated with improved prognosis. Nat Med 2018; 24:1941. [PMID: 30135555 DOI: 10.1038/s41591-018-0176-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In the version of this article originally published, the institution in affiliation 10 was missing. Affiliation 10 was originally listed as Department of Surgery, Royal Melbourne Hospital and Royal Womens' Hospital, Melbourne, Victoria, Australia. It should have been Department of Surgery, Royal Melbourne Hospital and Royal Womens' Hospital, University of Melbourne, Melbourne, Victoria, Australia. The error has been corrected in the HTML and PDF versions of this article.
Collapse
|
35
|
Jacob L, Speed TP. The healthy ageing gene expression signature for Alzheimer's disease diagnosis: a random sampling perspective. Genome Biol 2018; 19:97. [PMID: 30045771 PMCID: PMC6060554 DOI: 10.1186/s13059-018-1481-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 07/06/2018] [Indexed: 11/23/2022] Open
Abstract
In a recent publication, Sood et al. (Genome Biol 16:185, 2015) presented a set of 150 probe sets that could be used in the diagnosis of Alzheimer’s disease (AD) based on gene expression. We reproduce some of their experiments and show that their signature is indeed able to discriminate between AD and control patients using blood gene expression in two cohorts. We also show that its performance does not stand out compared to randomly sampled sets of 150 probe sets from the same array.
Collapse
|
36
|
Savas P, Virassamy B, Ye C, Salim A, Mintoff CP, Caramia F, Salgado R, Byrne DJ, Teo ZL, Dushyanthen S, Byrne A, Wein L, Luen SJ, Poliness C, Nightingale SS, Skandarajah AS, Gyorki DE, Thornton CM, Beavis PA, Fox SB, Darcy PK, Speed TP, Mackay LK, Neeson PJ, Loi S. Single-cell profiling of breast cancer T cells reveals a tissue-resident memory subset associated with improved prognosis. Nat Med 2018; 24:986-993. [PMID: 29942092 DOI: 10.1038/s41591-018-0078-7] [Citation(s) in RCA: 629] [Impact Index Per Article: 104.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 04/25/2018] [Indexed: 12/18/2022]
Abstract
The quantity of tumor-infiltrating lymphocytes (TILs) in breast cancer (BC) is a robust prognostic factor for improved patient survival, particularly in triple-negative and HER2-overexpressing BC subtypes1. Although T cells are the predominant TIL population2, the relationship between quantitative and qualitative differences in T cell subpopulations and patient prognosis remains unknown. We performed single-cell RNA sequencing (scRNA-seq) of 6,311 T cells isolated from human BCs and show that significant heterogeneity exists in the infiltrating T cell population. We demonstrate that BCs with a high number of TILs contained CD8+ T cells with features of tissue-resident memory T (TRM) cell differentiation and that these CD8+ TRM cells expressed high levels of immune checkpoint molecules and effector proteins. A CD8+ TRM gene signature developed from the scRNA-seq data was significantly associated with improved patient survival in early-stage triple-negative breast cancer (TNBC) and provided better prognostication than CD8 expression alone. Our data suggest that CD8+ TRM cells contribute to BC immunosurveillance and are the key targets of modulation by immune checkpoint inhibition. Further understanding of the development, maintenance and regulation of TRM cells will be crucial for successful immunotherapeutic development in BC.
Collapse
|
37
|
Rubin AF, Gelman H, Lucas N, Bajjalieh SM, Papenfuss AT, Speed TP, Fowler DM. Correction to: A statistical framework for analyzing deep mutational scanning data. Genome Biol 2018; 19:17. [PMID: 29415752 PMCID: PMC5803959 DOI: 10.1186/s13059-018-1391-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
CORRECTION After publication of our article [1] it was brought to our attention that a line of code was missing from our program to combine the within-replicate variance and between-replicate variance. This led to an overestimation of the standard errors calculated using the Enrich2 random-effects model.
Collapse
|
38
|
Li J, Fu C, Speed TP, Wang W, Symmans WF. Accurate RNA Sequencing From Formalin-Fixed Cancer Tissue To Represent High-Quality Transcriptome From Frozen Tissue. JCO Precis Oncol 2018; 2018:PO.17.00091. [PMID: 29862382 PMCID: PMC5976456 DOI: 10.1200/po.17.00091] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
PURPOSE Accurate transcriptional sequencing (RNA-seq) from formalin-fixation and paraffin-embedding (FFPE) tumor samples presents an important challenge for translational research and diagnostic development. In addition, there are now several different protocols to prepare a sequencing library from total RNA. We evaluated the accuracy of RNA-seq data generated from FFPE samples in terms of expression profiling. METHODS We designed a biospecimen study to directly compare gene expression results from different protocols to prepare libraries for RNA-seq from human breast cancer tissues, with randomization to fresh-frozen (FF) or FFPE conditions. The protocols were compared using multiple computational methods to assess alignment of reads to reference genome, and the uniformity and continuity of coverage; as well as the variance and correlation, of overall gene expression and patterns of measuring coding sequence, phenotypic patterns of gene expression, and measurements from representative multigene signatures. RESULTS The principal determinant of variance in gene expression was use of exon capture probes, followed by the conditions of preservation (FF versus FFPE), and phenotypic differences between breast cancers. One protocol, with RNase H-based rRNA depletion, exhibited least variability of gene expression measurements, strongest correlation between FF and FFPE samples, and was generally representative of the transcriptome from standard FF RNA-seq protocols. CONCLUSION Method of RNA-seq library preparation from FFPE samples had marked effect on the accuracy of gene expression measurement compared to matched FF samples. Nevertheless, some protocols produced highly concordant expression data from FFPE RNA-seq data, compared to RNA-seq results from matched frozen samples.
Collapse
|
39
|
Rubin AF, Gelman H, Lucas N, Bajjalieh SM, Papenfuss AT, Speed TP, Fowler DM. A statistical framework for analyzing deep mutational scanning data. Genome Biol 2017; 18:150. [PMID: 28784151 PMCID: PMC5547491 DOI: 10.1186/s13059-017-1272-5] [Citation(s) in RCA: 121] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 07/06/2017] [Indexed: 11/10/2022] Open
Abstract
Deep mutational scanning is a widely used method for multiplex measurement of functional consequences of protein variants. We developed a new deep mutational scanning statistical model that generates error estimates for each measurement, capturing both sampling error and consistency between replicates. We apply our model to one novel and five published datasets comprising 243,732 variants and demonstrate its superiority in removing noisy variants and conducting hypothesis testing. Simulations show our model applies to scans based on cell growth or binding and handles common experimental errors. We implemented our model in Enrich2, software that can empower researchers analyzing deep mutational scanning data.
Collapse
|
40
|
Savas P, Teo ZL, Lefevre C, Flensburg C, Caramia F, Alsop K, Mansour M, Francis PA, Thorne HA, Silva MJ, Kanu N, Dietzen M, Rowan A, Kschischo M, Fox S, Bowtell DD, Dawson SJ, Speed TP, Swanton C, Loi S. Correction: The Subclonal Architecture of Metastatic Breast Cancer: Results from a Prospective Community-Based Rapid Autopsy Program "CASCADE". PLoS Med 2017; 14:e1002302. [PMID: 28430777 PMCID: PMC5400239 DOI: 10.1371/journal.pmed.1002302] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
[This corrects the article DOI: 10.1371/journal.pmed.1002204.].
Collapse
|
41
|
Choi YJ, Lin CP, Risso D, Chen S, Kim TA, Tan MH, Li JB, Wu Y, Chen C, Xuan Z, Macfarlan T, Peng W, Lloyd KCK, Kim SY, Speed TP, He L. Deficiency of microRNA miR-34a expands cell fate potential in pluripotent stem cells. Science 2017; 355:science.aag1927. [PMID: 28082412 DOI: 10.1126/science.aag1927] [Citation(s) in RCA: 110] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Accepted: 12/14/2016] [Indexed: 12/13/2022]
Abstract
Embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs) efficiently generate all embryonic cell lineages but rarely generate extraembryonic cell types. We found that microRNA miR-34a deficiency expands the developmental potential of mouse pluripotent stem cells, yielding both embryonic and extraembryonic lineages and strongly inducing MuERV-L (MERVL) endogenous retroviruses, similar to what is seen with features of totipotent two-cell blastomeres. miR-34a restricts the acquisition of expanded cell fate potential in pluripotent stem cells, and it represses MERVL expression through transcriptional regulation, at least in part by targeting the transcription factor Gata2. Our studies reveal a complex molecular network that defines and restricts pluripotent developmental potential in cultured ESCs and iPSCs.
Collapse
|
42
|
Savas P, Teo ZL, Lefevre C, Flensburg C, Caramia F, Alsop K, Mansour M, Francis PA, Thorne HA, Silva MJ, Kanu N, Dietzen M, Rowan A, Kschischo M, Fox S, Bowtell DD, Dawson SJ, Speed TP, Swanton C, Loi S. The Subclonal Architecture of Metastatic Breast Cancer: Results from a Prospective Community-Based Rapid Autopsy Program "CASCADE". PLoS Med 2016; 13:e1002204. [PMID: 28027312 PMCID: PMC5189956 DOI: 10.1371/journal.pmed.1002204] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 11/17/2016] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Understanding the cancer genome is seen as a key step in improving outcomes for cancer patients. Genomic assays are emerging as a possible avenue to personalised medicine in breast cancer. However, evolution of the cancer genome during the natural history of breast cancer is largely unknown, as is the profile of disease at death. We sought to study in detail these aspects of advanced breast cancers that have resulted in lethal disease. METHODS AND FINDINGS Three patients with oestrogen-receptor (ER)-positive, human epidermal growth factor receptor 2 (HER2)-negative breast cancer and one patient with triple negative breast cancer underwent rapid autopsy as part of an institutional prospective community-based rapid autopsy program (CASCADE). Cases represented a range of management problems in breast cancer, including late relapse after early stage disease, de novo metastatic disease, discordant disease response, and disease refractory to treatment. Between 5 and 12 metastatic sites were collected at autopsy together with available primary tumours and longitudinal metastatic biopsies taken during life. Samples underwent paired tumour-normal whole exome sequencing and single nucleotide polymorphism (SNP) arrays. Subclonal architectures were inferred by jointly analysing all samples from each patient. Mutations were validated using high depth amplicon sequencing. Between cases, there were significant differences in mutational burden, driver mutations, mutational processes, and copy number variation. Within each case, we found dramatic heterogeneity in subclonal structure from primary to metastatic disease and between metastatic sites, such that no single lesion captured the breadth of disease. Metastatic cross-seeding was found in each case, and treatment drove subclonal diversification. Subclones displayed parallel evolution of treatment resistance in some cases and apparent augmentation of key oncogenic drivers as an alternative resistance mechanism. We also observed the role of mutational processes in subclonal evolution. Limitations of this study include the potential for bias introduced by joint analysis of formalin-fixed archival specimens with fresh specimens and the difficulties in resolving subclones with whole exome sequencing. Other alterations that could define subclones such as structural variants or epigenetic modifications were not assessed. CONCLUSIONS This study highlights various mechanisms that shape the genome of metastatic breast cancer and the value of studying advanced disease in detail. Treatment drives significant genomic heterogeneity in breast cancers which has implications for disease monitoring and treatment selection in the personalised medicine paradigm.
Collapse
|
43
|
Gerstner JR, Koberstein JN, Watson AJ, Zapero N, Risso D, Speed TP, Frank MG, Peixoto L. Removal of unwanted variation reveals novel patterns of gene expression linked to sleep homeostasis in murine cortex. BMC Genomics 2016; 17:727. [PMID: 27801296 PMCID: PMC5088519 DOI: 10.1186/s12864-016-3065-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Why we sleep is still one of the most perplexing mysteries in biology. Strong evidence indicates that sleep is necessary for normal brain function and that sleep need is a tightly regulated process. Surprisingly, molecular mechanisms that determine sleep need are incompletely described. Moreover, very little is known about transcriptional changes that specifically accompany the accumulation and discharge of sleep need. Several studies have characterized differential gene expression changes following sleep deprivation. Much less is known, however, about changes in gene expression during the compensatory response to sleep deprivation (i.e. recovery sleep). RESULTS In this study we present a comprehensive analysis of the effects of sleep deprivation and subsequent recovery sleep on gene expression in the mouse cortex. We used a non-traditional analytical method for normalization of genome-wide gene expression data, Removal of Unwanted Variation (RUV). RUV improves detection of differential gene expression following sleep deprivation. We also show that RUV normalization is crucial to the discovery of differentially expressed genes associated with recovery sleep. Our analysis indicates that the majority of transcripts upregulated by sleep deprivation require 6 h of recovery sleep to return to baseline levels, while the majority of downregulated transcripts return to baseline levels within 1-3 h. We also find that transcripts that change rapidly during recovery (i.e. within 3 h) do so on average with a time constant that is similar to the time constant for the discharge of sleep need. CONCLUSIONS We demonstrate that proper data normalization is essential to identify changes in gene expression that are specifically linked to sleep deprivation and recovery sleep. Our results provide the first evidence that recovery sleep is comprised of two waves of transcriptional regulation that occur at different times and affect functionally distinct classes of genes.
Collapse
|
44
|
Zhang Y, Feng ZP, Naselli G, Bell F, Wettenhall J, Auyeung P, Ellis JA, Ponsonby AL, Speed TP, Chong MM, Harrison LC. Corrigendum to ‘MicroRNAs in CD4+ T cell subsets are markers of disease risk and T cell dysfunction in individuals at risk for type 1 diabetes’ [J. Autoimmun. 68C (2016) 52–61]. J Autoimmun 2016; 73:130. [DOI: 10.1016/j.jaut.2016.04.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
45
|
Poplawski SG, Peixoto L, Porcari GS, Wimmer ME, McNally AG, Mizuno K, Giese KP, Chatterjee S, Koberstein JN, Risso D, Speed TP, Abel T. Contextual fear conditioning induces differential alternative splicing. Neurobiol Learn Mem 2016; 134 Pt B:221-35. [PMID: 27451143 DOI: 10.1016/j.nlm.2016.07.018] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Revised: 07/16/2016] [Accepted: 07/19/2016] [Indexed: 12/20/2022]
Abstract
The process of memory consolidation requires transcription and translation to form long-term memories. Significant effort has been dedicated to understanding changes in hippocampal gene expression after contextual fear conditioning. However, alternative splicing by differential transcript regulation during this time period has received less attention. Here, we use RNA-seq to determine exon-level changes in expression after contextual fear conditioning and retrieval. Our work reveals that a short variant of Homer1, Ania-3, is regulated by contextual fear conditioning. The ribosome biogenesis regulator Las1l, small nucleolar RNA Snord14e, and the RNA-binding protein Rbm3 also change specific transcript usage after fear conditioning. The changes in Ania-3 and Las1l are specific to either the new context or the context-shock association, while the changes in Rbm3 occur after context or shock only. Our analysis revealed novel transcript regulation of previously undetected changes after learning, revealing the importance of high throughput sequencing approaches in the study of gene expression changes after learning.
Collapse
|
46
|
Lin SJ, Gagnon-Bartsch JA, Tan IB, Earle S, Ruff L, Pettinger K, Ylstra B, van Grieken N, Rha SY, Chung HC, Lee JS, Cheong JH, Noh SH, Aoyama T, Miyagi Y, Tsuburaya A, Yoshikawa T, Ajani JA, Boussioutas A, Yeoh KG, Yong WP, So J, Lee J, Kang WK, Kim S, Kameda Y, Arai T, zur Hausen A, Speed TP, Grabsch HI, Tan P. Signatures of tumour immunity distinguish Asian and non-Asian gastric adenocarcinomas. Gut 2015; 64:1721-31. [PMID: 25385008 PMCID: PMC4680172 DOI: 10.1136/gutjnl-2014-308252] [Citation(s) in RCA: 164] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 09/09/2014] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Differences in gastric cancer (GC) clinical outcomes between patients in Asian and non-Asian countries has been historically attributed to variability in clinical management. However, recent international Phase III trials suggest that even with standardised treatments, GC outcomes differ by geography. Here, we investigated gene expression differences between Asian and non-Asian GCs, and if these molecular differences might influence clinical outcome. DESIGN We compared gene expression profiles of 1016 GCs from six Asian and three non-Asian GC cohorts, using a two-stage meta-analysis design and a novel biostatistical method (RUV-4) to adjust for technical variation between cohorts. We further validated our findings by computerised immunohistochemical analysis on two independent tissue microarray (TMA) cohorts from Asian and non-Asian localities (n=665). RESULTS Gene signatures differentially expressed between Asians and non-Asian GCs were related to immune function and inflammation. Non-Asian GCs were significantly enriched in signatures related to T-cell biology, including CTLA-4 signalling. Similarly, in the TMA cohorts, non-Asian GCs showed significantly higher expression of T-cell markers (CD3, CD45R0, CD8) and lower expression of the immunosuppressive T-regulatory cell marker FOXP3 compared to Asian GCs (p<0.05). Inflammatory cell markers CD66b and CD68 also exhibited significant cohort differences (p<0.05). Exploratory analyses revealed a significant relationship between tumour immunity factors, geographic locality-specific prognosis, and postchemotherapy outcomes. CONCLUSIONS Analyses of >1600 GCs suggest that Asian and non-Asian GCs exhibit distinct tumour immunity signatures related to T-cell function. These differences may influence geographical differences in clinical outcome, and the design of future trials particularly in immuno-oncology.
Collapse
|
47
|
Huang KT, Mikeska T, Li J, Takano EA, Millar EKA, Graham PH, Boyle SE, Campbell IG, Speed TP, Dobrovic A, Fox SB. Assessment of DNA methylation profiling and copy number variation as indications of clonal relationship in ipsilateral and contralateral breast cancers to distinguish recurrent breast cancer from a second primary tumour. BMC Cancer 2015; 15:669. [PMID: 26452468 PMCID: PMC4600279 DOI: 10.1186/s12885-015-1676-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Accepted: 10/01/2015] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Patients with breast cancer have an increased risk of developing subsequent breast cancers. It is important to distinguish whether these tumours are de novo or recurrences of the primary tumour in order to guide the appropriate therapy. Our aim was to investigate the use of DNA methylation profiling and array comparative genomic hybridization (aCGH) to determine whether the second tumour is clonally related to the first tumour. METHODS Methylation-sensitive high-resolution melting was used to screen promoter methylation in a panel of 13 genes reported as methylated in breast cancer (RASSF1A, TWIST1, APC, WIF1, MGMT, MAL, CDH13, RARβ, BRCA1, CDH1, CDKN2A, TP73, and GSTP1) in 29 tumour pairs (16 ipsilateral and 13 contralateral). Using the methylation profile of these genes, we employed a Bayesian and an empirical statistical approach to estimate clonal relationship. Copy number alterations were analysed using aCGH on the same set of tumour pairs. RESULTS There is a higher probability of the second tumour being recurrent in ipsilateral tumours compared with contralateral tumours (38 % versus 8 %; p <0.05) based on the methylation profile. Using previously reported recurrence rates as Bayesian prior probabilities, we classified 69 % of ipsilateral and 15 % of contralateral tumours as recurrent. The inferred clonal relationship results of the tumour pairs were generally concordant between methylation profiling and aCGH. CONCLUSION Our results show that DNA methylation profiling as well as aCGH have potential as diagnostic tools in improving the clinical decisions to differentiate recurrences from a second de novo tumour.
Collapse
|
48
|
Freytag S, Gagnon-Bartsch J, Speed TP, Bahlo M. Systematic noise degrades gene co-expression signals but can be corrected. BMC Bioinformatics 2015; 16:309. [PMID: 26403471 PMCID: PMC4583191 DOI: 10.1186/s12859-015-0745-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 09/16/2015] [Indexed: 12/31/2022] Open
Abstract
Background In the past decade, the identification of gene co-expression has become a routine part of the analysis of high-dimensional microarray data. Gene co-expression, which is mostly detected via the Pearson correlation coefficient, has played an important role in the discovery of molecular pathways and networks. Unfortunately, the presence of systematic noise in high-dimensional microarray datasets corrupts estimates of gene co-expression. Removing systematic noise from microarray data is therefore crucial. Many cleaning approaches for microarray data exist, however these methods are aimed towards improving differential expression analysis and their performances have been primarily tested for this application. To our knowledge, the performances of these approaches have never been systematically compared in the context of gene co-expression estimation. Results Using simulations we demonstrate that standard cleaning procedures, such as background correction and quantile normalization, fail to adequately remove systematic noise that affects gene co-expression and at times further degrade true gene co-expression. Instead we show that a global version of removal of unwanted variation (RUV), a data-driven approach, removes systematic noise but also allows the estimation of the true underlying gene-gene correlations. We compare the performance of all noise removal methods when applied to five large published datasets on gene expression in the human brain. RUV retrieves the highest gene co-expression values for sets of genes known to interact, but also provides the greatest consistency across all five datasets. We apply the method to prioritize epileptic encephalopathy candidate genes. Conclusions Our work raises serious concerns about the quality of many published gene co-expression analyses. RUV provides an efficient and flexible way to remove systematic noise from high-dimensional microarray datasets when the objective is gene co-expression analysis. The RUV method as applicable in the context of gene-gene correlation estimation is available as a BioconductoR-package: RUVcorr. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0745-3) contains supplementary material, which is available to authorized users.
Collapse
|
49
|
Jacob L, Gagnon-Bartsch JA, Speed TP. Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Biostatistics 2015; 17:16-28. [PMID: 26286812 PMCID: PMC4679071 DOI: 10.1093/biostatistics/kxv026] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 06/25/2015] [Indexed: 11/13/2022] Open
Abstract
When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset--as opposed to the study of an observed factor of interest--taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize.
Collapse
|
50
|
Peixoto L, Risso D, Poplawski SG, Wimmer ME, Speed TP, Wood MA, Abel T. How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets. Nucleic Acids Res 2015. [PMID: 26202970 PMCID: PMC4652761 DOI: 10.1093/nar/gkv736] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, challenges remain in RNA-seq data analysis. One often-overlooked aspect is normalization. Despite the fact that a variety of factors or ‘batch effects’ can contribute unwanted variation to the data, commonly used RNA-seq normalization methods only correct for sequencing depth. The study of gene expression is particularly problematic when it is influenced simultaneously by a variety of biological factors in addition to the one of interest. Using examples from experimental neuroscience, we show that batch effects can dominate the signal of interest; and that the choice of normalization method affects the power and reproducibility of the results. While commonly used global normalization methods are not able to adequately normalize the data, more recently developed RNA-seq normalization can. We focus on one particular method, RUVSeq and show that it is able to increase power and biological insight of the results. Finally, we provide a tutorial outlining the implementation of RUVSeq normalization that is applicable to a broad range of studies as well as meta-analysis of publicly available data.
Collapse
|