1
|
Machine learning multi-omics analysis reveals cancer driver dysregulation in pan-cancer cell lines compared to primary tumors. Commun Biol 2022; 5:1367. [PMID: 36513728 PMCID: PMC9747808 DOI: 10.1038/s42003-022-04075-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 10/06/2022] [Indexed: 12/15/2022] Open
Abstract
Cancer cell lines have been widely used for decades to study biological processes driving cancer development, and to identify biomarkers of response to therapeutic agents. Advances in genomic sequencing have made possible large-scale genomic characterizations of collections of cancer cell lines and primary tumors, such as the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA). These studies allow for the first time a comprehensive evaluation of the comparability of cancer cell lines and primary tumors on the genomic and proteomic level. Here we employ bulk mRNA and micro-RNA sequencing data from thousands of samples in CCLE and TCGA, and proteomic data from partner studies in the MD Anderson Cell Line Project (MCLP) and The Cancer Proteome Atlas (TCPA), to characterize the extent to which cancer cell lines recapitulate tumors. We identify dysregulation of a long non-coding RNA and microRNA regulatory network in cancer cell lines, associated with differential expression between cell lines and primary tumors in four key cancer driver pathways: KRAS signaling, NFKB signaling, IL2/STAT5 signaling and TP53 signaling. Our results emphasize the necessity for careful interpretation of cancer cell line experiments, particularly with respect to therapeutic treatments targeting these important cancer pathways.
Collapse
|
2
|
EXTH-96. BIOPROCESSING OF SURGICAL PEDIATRIC BRAIN TUMOR SPECIMENS FOR GENOME-GUIDED PERSONALIZED DRUG TESTING. Neuro Oncol 2022. [PMCID: PMC9661144 DOI: 10.1093/neuonc/noac209.894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Abstract
Novel treatment approaches for pediatric central nervous system (CNS) tumors are urgently needed. A lack of patient-derived tumor cells impedes progress towards developing such new therapies. We intended to overcome this challenge by establishing methods to create a biorepository of viable single cell suspensions of pediatric brain tumor surgical specimens. Quantitative and qualitative comparisons of tissue processing strategies were performed to preserve viability of heterogeneous tumor and immune cells. Novel drug targets were identified by analyzing pathways affected by RNA transcripts that are highly expressed (outliers) in a patients’ tumor; outliers for each patient were determined by RNA-seq data from individual patients’ tumors compared with a compendium of 12,747 pediatric and adult samples harmonized by the Treehouse Childhood Cancer Initiative at the UCSC Genomics Institute. The predicted anti-tumor efficacy of small molecule inhibitors of the outlier pathways was tested in cell viability assays against short-term cultured cells from matched patients. Successful tissue collection required obtaining informed consent, standard operating procedures, and sample recording using a Laboratory Inventory Management Software (LIMS). Since 2020 we have banked 67 pediatric CNS brain tumor specimens at Stanford. Amongst those, 51 cases yielded sufficient tissue for RNA-seq and cryoprotection. The most common tumor histology was low-grade glioma (LGG, 26 of 67), the majority of which were pilocytic astrocytoma (18 of 26). The second and third most common tumor types are embryonal tumors (6 medulloblastoma, 3 AT/RT) and ependymoma (4), respectively. We identified significant differences in cell viability with different preservation media. An outlier pathway previously not implicated in LGG was identified and sensitivity to a small molecule inhibitor of this outlier pathway was demonstrated. Taken together, we established feasibility for validating therapeutic vulnerabilities identified by a genome-guided approach in short-term cultures from surgical specimens. This works facilitates the rapid development of personalized CNS tumor treatment.
Collapse
|
3
|
Abstract LB059: Subtype classification of pediatric high-grade glioma tumors by comparative transcriptomics. Cancer Res 2022. [DOI: 10.1158/1538-7445.am2022-lb059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Pediatric high-grade glioma (pHGG) is a highly malignant and poorly understood cancer driven by diverse genetic and epigenetic mechanisms. Here, we use comparative RNA sequencing, outlier analysis, and spectral clustering approaches to analyze transcriptomic data of 1,543 pediatric brain tumor specimens from the UCSC Treehouse Childhood Cancer Initiative (Treehouse) and Open Pediatric Brain Tumor Atlas (OpenPBTA) to identify subpopulations of pHGG patients with characteristic gene expression profiles. We find that approximately half (45%) of pHGG tumors from OpenPBTA exist in three subgroups defined by high outlier-level expression either of: mitochondrially-encoded 12S and 16S rRNAs; genes enriched in the HSF1-mediated heat shock response and activation pathways; or six C/D box snoRNA (SNORD) genes originating from the paternally-expressed SNORD116 locus involved in Prader-Willi syndrome, a complex neurodevelopmental disorder. Interestingly, the same set of HSF1-dependent pathway genes is also significantly upregulated in a subset (~11%) of pHGG tumors from Treehouse, validating this finding in two independent compendia with different transcript isolation strategies (Treehouse, polyA selection; OpenPBTA, ribodepletion). Our work identifies distinct classes of tumors with outlier-level expression of genes with previously unknown roles in pHGG and provides a framework for subtyping tumors by comparative transcriptomics that is adaptable to any cancer type. We are currently investigating the molecular roles of HSF1-response genes and the imprinted SNORD116 gene cluster in pHGG. Our ongoing research into the biomolecular signatures and mechanisms of the three major tumor classes of pHGG as defined in our study will contribute to a greater understanding of pHGG disease manifestation and progression, and will inform strategies of tailored therapeutic interventions for children with this devastating disease.
Citation Format: Gina D. Mawla, A. Geoffrey Lyle, Ellen T. Kephart, Katrina Learned, Holly C. Beale, Joshua E. Goldford, Olena M. Vaske. Subtype classification of pediatric high-grade glioma tumors by comparative transcriptomics [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr LB059.
Collapse
|
4
|
A Functional Precision Medicine Pipeline Combines Comparative Transcriptomics and Tumor Organoid Modeling to Identify Bespoke Treatment Strategies for Glioblastoma. Cells 2021; 10:cells10123400. [PMID: 34943910 PMCID: PMC8699481 DOI: 10.3390/cells10123400] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 11/25/2021] [Accepted: 11/29/2021] [Indexed: 12/15/2022] Open
Abstract
Li Fraumeni syndrome (LFS) is a hereditary cancer predisposition syndrome caused by germline mutations in TP53. TP53 is the most common mutated gene in human cancer, occurring in 30-50% of glioblastomas (GBM). Here, we highlight a precision medicine platform to identify potential targets for a GBM patient with LFS. We used a comparative transcriptomics approach to identify genes that are uniquely overexpressed in the LFS GBM patient relative to a cancer compendium of 12,747 tumor RNA sequencing data sets, including 200 GBMs. STAT1 and STAT2 were identified as being significantly overexpressed in the LFS patient, indicating ruxolitinib, a Janus kinase 1 and 2 inhibitors, as a potential therapy. The LFS patient had the highest level of STAT1 and STAT2 expression in an institutional high-grade glioma cohort of 45 patients, further supporting the cancer compendium results. To empirically validate the comparative transcriptomics pipeline, we used a combination of adherent and organoid cell culture techniques, including ex vivo patient-derived organoids (PDOs) from four patient-derived cell lines, including the LFS patient. STAT1 and STAT2 expression levels in the four patient-derived cells correlated with levels identified in the respective parent tumors. In both adherent and organoid cultures, cells from the LFS patient were among the most sensitive to ruxolitinib compared to patient-derived cells with lower STAT1 and STAT2 expression levels. A spheroid-based drug screening assay (3D-PREDICT) was performed and used to identify further therapeutic targets. Two targeted therapies were selected for the patient of interest and resulted in radiographic disease stability. This manuscript supports the use of comparative transcriptomics to identify personalized therapeutic targets in a functional precision medicine platform for malignant brain tumors.
Collapse
|
5
|
Metastatic Pediatric Sclerosing Epithelioid Fibrosarcoma. Cold Spring Harb Mol Case Stud 2021; 7:mcs.a006093. [PMID: 34362827 PMCID: PMC8559621 DOI: 10.1101/mcs.a006093] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 07/02/2021] [Indexed: 11/25/2022] Open
Abstract
Sclerosing epithelioid fibrosarcoma (SEF) is a rare and aggressive soft-tissue sarcoma thought to originate in fibroblasts of the tissues comprising tendons, ligaments, and muscles. Minimally responsive to conventional cytotoxic chemotherapies, >50% of SEF patients experience local recurrence and/or metastatic disease. SEF is most commonly discovered in middle-aged and elderly adults, but also rarely in children. A common gene fusion occurring between the EWSR1 and CREB3L1 genes has been observed in 80%–90% of SEF cases. We describe here the youngest SEF patient reported to date (a 3-yr-old Caucasian male) who presented with numerous bony and lung metastases. Additionally, we perform a comprehensive literature review of all SEF-related articles published since the disease was first characterized. Finally, we describe the generation of an SEF primary cell line, the first such culture to be reported. The patient described here experienced persistent disease progression despite aggressive treatment including multiple resections, radiotherapy, and numerous chemotherapies and targeted therapeutics. Untreated and locally recurrent tumor and metastatic tissue were sequenced by whole-genome, whole-exome, and deep-transcriptome next-generation sequencing with comparison to a patient-matched normal blood sample. Consistent across all sequencing analyses was the disease-defining EWSR1–CREB3L1 fusion as a single feature consensus. We provide an analysis of our genomic findings and discuss potential therapeutic strategies for SEF.
Collapse
|
6
|
Abstract 3033: Molecular classification of pediatric high-risk leukemias using expression profiles of multimodally expressed genes. Cancer Res 2021. [DOI: 10.1158/1538-7445.am2021-3033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Introduction: Leukemia is the most common cancer in children, accounting for approximately one third of all malignancies that occur in the pediatric age group. Acute Lymphoblastic Leukemia (ALL) and Acute Myeloid Leukemia (AML) account for most leukemia diagnosed in this age group. While known markers for poor prognosis include higher age, higher white blood cell count at diagnosis and certain translocations, innovative approaches in tumor RNA sequencing (RNA-Seq) data analysis can discover novel prognostic factors that could be exploited for future therapeutic development in fusion-negative ALL and AML.
Methods: To reveal gene expression signatures among fusion-negative leukemias, we used a novel unsupervised analysis model called Hydra. Hydra uses a Dirichlet process mixture model to detect multimodally expressed genes to use in characterizing clusters within cancer cohorts. This approach can detect subtle yet robust differences in gene expression without the reliance on reference normal RNA-Seq datasets. The Hydra model reveals clusters of the cancer cohort, and differences among these clusters can be investigated by finding enriched pathways via Gene Set Enrichment Analysis (GSEA). The cluster-specific enriched pathways can be used in conjunction with survival data to determine how certain pathways are associated with outcome. This analysis used publicly available data from the National Cancer Institute (NCI) Therapeutically Applicable Research to Generate Effective Treatments (TARGET) database that was uniformly processed by the Treehouse Childhood Cancer Initiative.
Results: First, 202 fusion-negative AML and fusion-negative B-cell precursor ALL samples were run through Hydra and five clusters were identified. These clusters had different enriched pathways, such as high mitochondrial activity, high cell proliferation, and high cell signaling. Though these are characteristics of all cancer cells, each cluster demonstrated that one pathway was most distinctive of those samples. Most clusters were differentiated by disease, however, one cluster with enriched heme metabolism and immunoglobulin pathways contained almost equal amounts of AML and ALL samples, suggesting that specific cohorts of AML and ALL patients had increased inflammatory response. Another cluster contained 72 AML samples and 4 ALL samples. The four ALL samples in this cluster showed lowered expression of CD19, a B-cell lineage immune marker, and elevated expression of CD14, a myeloid lineage immune marker. These ALL patients exhibited genomic characteristics of AML, which may suggest a more specialized treatment regimen.
Discussion: Despite extensive characterization of pediatric high-risk leukemias using genomic approaches, there is ample opportunity to study RNA-Seq-derived gene expression profiles to help accurately diagnose and treat pediatric patients.
Citation Format: Sneha S. Jariwala, Alfred Geoffrey Lyle, Jacob Pfeil, Lauren Sanders, Holly C. Beale, Ellen T. Kephart, Katrina Learned, Allison Cheney, Olena M. Vaske. Molecular classification of pediatric high-risk leukemias using expression profiles of multimodally expressed genes [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 3033.
Collapse
|
7
|
Abstract 3035: Identifying potential druggable targets for synovial sarcoma using comparative RNA-seq analysis. Cancer Res 2021. [DOI: 10.1158/1538-7445.am2021-3035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Synovial sarcoma (SS) is an aggressive soft-tissue malignancy, accounting for 10% of all soft-tissue sarcomas. These tumors can occur at any age but most often affect young adults and adolescents, developing deep in the distal extremities. The prognosis of SS tumors is poor, with a 5-year survival rate of 36-76%, a high rate of metastasis and few treatments. The purpose of this study was to identify novel overexpressed oncogenes that could serve as druggable targets for treating synovial sarcoma patients. We compared the RNA-Seq expression profiles of a cohort of 36 synovial sarcomas to our compendium of RNA-Seq expression data from 12,236 tumor samples (treehousegenomics.ucsc.edu) from pediatric and adult cancer patients. In comparing gene expression in the synovial sarcoma cohort samples against the compendium samples, gene expression outliers were defined as having expression above the gene-specific outlier threshold as defined by the Tukey's outlier method. Among the overexpression outliers, pathway enrichment analysis was used to identify common and druggable pathways, with implications for potential therapeutics for patients with SS. Our analysis identified the overexpression of members of the Sonic Hedgehog pathway in the majority of synovial sarcoma samples. For example, GLI1 expression exceeded the outlier threshold in 35 out of 36 samples. This pathway can be targeted by available small molecule inhibitors. Ongoing work focuses on evaluating the role of Sonic Hedgehog signaling in the pathogenesis of SS using pharmacological inhibition, CRISPRi studies in cell line models of the disease and nanopore sequencing. We currently have 4 patient-derived synovial sarcoma cell lines (HSSY-II, SYO-1, YAMATO, and ASKA) that we can grow in both adherent conditions and in 3D cell culture as sarcospheres. We detected the expression of the SYT-SSX fusion transcript in each of the cell lines by RT-PCR to confirm the cell lines maintained expression of the pathogenic fusion. This work has implications for using comparative tumor RNA-seq derived gene expression data for nominating novel druggable targets specific to synovial sarcoma tumors.
Citation Format: Yvonne A. Vasquez, Jacob Pfeil, Letitia Mueller, Holly Beale, Alfred G. Lyle, Lauren Sanders, Katrina Learned, Ellen Kephart, Anouk van den Bout, Allison Cheney, Sahar Hosseinzadeh, Isabel Bjork, Sofie R. Salama, Olena Vaske. Identifying potential druggable targets for synovial sarcoma using comparative RNA-seq analysis [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 3035.
Collapse
|
8
|
Abstract 261: Long-read sequencing characterization of a patient with bilateral Wilms tumor of unknown etiology. Cancer Res 2021. [DOI: 10.1158/1538-7445.am2021-261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Wilms tumor is the most common childhood kidney cancer. 15% of patients with Wilms tumor have germline pathogenic variants in genes or regions such as WT1 or the 11p15 region. Variants in these regions can include structural or copy number alterations or alterations in methylation. In the majority of cases of Wilms tumor no known pathogenic variant could be found using the state-of-the-art technologies, including comprehensive approaches such as Illumina whole exome sequencing. One explanation for this is that such technologies have difficulties in detecting structural variants (SVs) in areas associated with repeat or low complexity sequence. In addition, Illumina technology does not immediately support direct methylation detection. Therefore, we hypothesized that analysis of bilateral Wilms tumor of unknown etiology using long-read sequencing could reveal molecular events of potential clinical interest. We performed in-depth genomic analysis on a whole blood DNA sample from a patient with a bilateral Wilms tumor. This patient had no significant family history of cancer, and previously tested negative for Beckwith-Wiedemann syndrome by methylation testing of the 11p15 region; clinical exome sequencing of the patient's germline detected no variants associated with Wilms tumors. We sequenced the genome at 40x depth using PromethION Nanopore sequencing. 29180 SVs (deletions or insertions larger than 30 base pairs) were detected using Sniffles and 26480 were detected with SVIM. Only SVs that were detected by both methods were considered for downstream analysis. Variants were annotated and filtered using a short-read catalog of SVs (gnomAD-SV), a long-read catalog, the Database of Genomic Variants catalog, and by comparison to 11 in-house genomes. We focused on SVs, copy number variants, and methylation events affecting genes previously associated with Wilms tumor. Our long-read sequencing approach detected compound heterozygotes using phased variant calls. A heterozygous missense mutation was identified in haplotype one, while a 300 base pair insertion in an ALU element was present in haplotype two. These two compound heterozygous variants overlap an exon of the OVCH2 gene, and the ALU element was not detected by the prior Illumina analysis. Additionally, we determined the frequency of methylation in CpG sites genomewide using nanopolish. Using a normal blood sample from an unrelated individual as a control, we searched for extreme differences across large and gene promoter regions. Hypermethylation in the promoter regions of genes in the 11p15.5 locus was observed in the patient as compared to the control. Hypomethylation in this region is associated with Beckwith-Wiedemann syndrome. In conclusion, nanopore technology is able to detect variants missed by Illumina sequencing, and has the potential to yield new findings of interest in a case of a child with suspected cancer predisposition syndrome.
Citation Format: Allison R. Cheney, Jean Monlong, Holly C. Beale, Hugh Olsen, Ellen Towle Kephart, Katrina Learned, Shanna White, Julian A. Martinez-Agosto, Noah Federman, Mark Akeson, Miten Jain, Vivian Y. Chang, Olena M. Vaske. Long-read sequencing characterization of a patient with bilateral Wilms tumor of unknown etiology [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 261.
Collapse
|
9
|
The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets. Gigascience 2021; 10:6169410. [PMID: 33712853 PMCID: PMC7955155 DOI: 10.1093/gigascience/giab011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 12/27/2020] [Accepted: 02/07/2021] [Indexed: 01/22/2023] Open
Abstract
Background The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis. Findings In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1–77% of all reads (median [IQR], 3% [3–6%]); duplicate reads constitute 3–100% of mapped reads (median [IQR], 27% [13–43%]); and non-exonic reads constitute 4–97% of mapped, non-duplicate reads (median [IQR], 25% [16–37%]). MEND reads constitute 0–79% of total reads (median [IQR], 50% [30–61%]). Conclusions Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.
Collapse
|
10
|
Identification of a differentiation stall in epithelial mesenchymal transition in histone H3-mutant diffuse midline glioma. Gigascience 2020; 9:giaa136. [PMID: 33319914 PMCID: PMC7736793 DOI: 10.1093/gigascience/giaa136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 08/17/2020] [Accepted: 11/05/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Diffuse midline gliomas with histone H3 K27M (H3K27M) mutations occur in early childhood and are marked by an invasive phenotype and global decrease in H3K27me3, an epigenetic mark that regulates differentiation and development. H3K27M mutation timing and effect on early embryonic brain development are not fully characterized. RESULTS We analyzed multiple publicly available RNA sequencing datasets to identify differentially expressed genes between H3K27M and non-K27M pediatric gliomas. We found that genes involved in the epithelial-mesenchymal transition (EMT) were significantly overrepresented among differentially expressed genes. Overall, the expression of pre-EMT genes was increased in the H3K27M tumors as compared to non-K27M tumors, while the expression of post-EMT genes was decreased. We hypothesized that H3K27M may contribute to gliomagenesis by stalling an EMT required for early brain development, and evaluated this hypothesis by using another publicly available dataset of single-cell and bulk RNA sequencing data from developing cerebral organoids. This analysis revealed similarities between H3K27M tumors and pre-EMT normal brain cells. Finally, a previously published single-cell RNA sequencing dataset of H3K27M and non-K27M gliomas revealed subgroups of cells at different stages of EMT. In particular, H3.1K27M tumors resemble a later EMT stage compared to H3.3K27M tumors. CONCLUSIONS Our data analyses indicate that this mutation may be associated with a differentiation stall evident from the failure to proceed through the EMT-like developmental processes, and that H3K27M cells preferentially exist in a pre-EMT cell phenotype. This study demonstrates how novel biological insights could be derived from combined analysis of several previously published datasets, highlighting the importance of making genomic data available to the community in a timely manner.
Collapse
|
11
|
Comparative Transcriptomics to Identify Targeted Therapy Candidates in High Grade Glioma. Neurosurgery 2020. [DOI: 10.1093/neuros/nyaa447_871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
12
|
EPCO-23. COMPARATIVE TRANSCRIPTOMICS TO IDENTIFY TARGETED THERAPY CANDIDATES IN HIGH GRADE GLIOMA. Neuro Oncol 2020. [DOI: 10.1093/neuonc/noaa215.302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Abstract
Genomic characterization is often used for the identification of therapeutic targets in tumors. Recently, comparative transcriptomics has begun to be utilized for this purpose. In this pilot, we compare the transcriptome of a patient with recurrent high grade glioma (HGG) to our cohort to identify potential therapies. We reviewed transcriptomic profiles from patients who had resection of HGG at our institution over the past year as well as the UCSC cancer compendium. Briefly, tumor RNA was extracted from embedded tumor tissue sections with tumor cellularity higher than 20%. RNA libraries were sequenced to obtain approximately 65 million reads on an Illumina HiSeq 4000 System utilizing patterned flow cell technology. The RNA profile of a 24 male with Li-Fraumeni syndrome and recurrent HGG with leptomeningeal spread underwent comparative transcriptomics to identify targets. A Bayesian statistical framework for gene expression outlier detection was used. These comparisons allowed for the identification of genes and pathways that are significantly overexpressed. Our internal HGG cohort consisted of 44 adult patients and was evenly distributed among the 4 HGG Verhaak subtypes. Our patient of interest had druggable outlier expression in HDAC1, STAT1 and STAT2 in comparison to our internal cohort indicating vorinostat and ruxolitinib as potential therapies, respectively. We then compared our patient of interest to 12,747 patients in the cancer compendium and STAT2 expression was high but not an outlier. In comparison to 738 glioma samples, STAT1 and STAT2 were outliers but not HDAC1 again indicating ruxolitinib as a potential targeted therapy. The patient did not have outlier expression in notch transcriptional targets or immune checkpoint biomarkers when compared to all cohorts. In conclusion, comparative Transcriptomics can identify therapeutic targets in a patient with recurrent HGG even in small cohorts. In our pilot, we identified ruxolitinib as a potential candidate to treat leptomeningeal recurrence.
Collapse
|
13
|
Abstract 5464: Determining accuracy of RNA sequencing data for gene expression profiling of single samples. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-5464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Gene expression analysis of single samples shows increasing promise for clinical applications. However, obtaining high quality RNA from a human tumor sample can be challenging because medical, surgical, and pathological requirements often lead to sparse or degraded RNA. The variability in RNA quality presents challenges for defining input sample requirements, which are required to calculate sensitivity, specificity and reference ranges as required for a Clinical Laboratory Improvement Amendments (CLIA)-approved test.
Clinical analysis of a single RNA-Seq dataset for the purpose of gene expression profiling involves not only the patient's sample, but a comparison cohort. We use 12,236 total tumor samples and require at least 20 samples for within-disease comparisons. Many of these samples do not have associated metadata about the quality of the sample, and so we have prioritized quality measures that can be derived from the sequence data alone.
In order to characterize variability present in RNA-Seq datasets, we analyzed paired-end Illumina RNA sequencing (RNA-Seq) data from 1088 tumor samples from 29 data providers. We categorized reads based on where and how well they map to the genome, as well as by their PCR duplicate status. We defined reference ranges for five types of reads found in sequencing data: unmapped (0-13%); multi-mapped (2-15%); mapped duplicate (2-66%); mapped non exonic (0-26%) and mapped, exonic, non-duplicate (MEND, 27-76%). Only 64% of the 1088 tumor samples had read type fractions within the reference ranges. Of the remainder, most exceeded the reference ranges of more than one type of read.
We then measured the relationship of sensitivity and specificity to input MEND read depth. We subsampled 5 deeply sequenced samples. With each subsample, we identified exceptionally highly expressed genes and samples with similar gene expression profiles. With subsampling to 20 million MEND reads, we detected over-expressed genes (“up-outlier” genes) with a median sensitivity of 96.1% and specificity of 99.8%; sample similarity had 96.6% sensitivity and 100.0% specificity. We estimate that a sample sequenced to a depth of 70 million total reads will typically have sufficient data for the up-outlier and sample-similarity gene expression analysis assays described here.
With this analysis, we have identified a conservative approach to measuring the quality of RNA-Seq read data, which can then be used to define the sensitivity and specificity of single-sample assays to support their ultimate clinical adoption.
Citation Format: Holly C. Beale, Jacquelyn M. Roger, Matthew A. Cattle, Liam T. McKay, Katrina Learned, Geoff Lyle, Ellen T. Kephart, Rob Currie, Du Linh Lam, Lauren Sanders, Jacob Pfeil, John Vivian, Isabel Bjork, Sofie R. Salama, David Haussler, Olena M. Vaske. Determining accuracy of RNA sequencing data for gene expression profiling of single samples [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 5464.
Collapse
|
14
|
Abstract 6154: H3K27M gliomas are characterized by a stall in the epithelial-mesenchymal transition. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-6154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Pediatric diffuse midline gliomas are lethal cancers, the majority of which harbor the H3 p.K27M mutations. Although it has potential implications on the treatment of diffuse midline glioma as a disease driver; the timing, cell type of origin, and effect of the H3 p.K27M mutation on early embryonic brain development is poorly understood. The purpose of our study is to elucidate the molecular mechanisms by which the histone H3 p.K27M mutation drives tumorigenesis of pediatric diffuse midline gliomas using the analysis of genomic datasets. Here, we performed differential RNA sequencing gene expression analysis of a cohort of H3K27M and H3 wild type (WT) pediatric diffuse midline gliomas, revealing that genes in the epithelial-mesenchymal transition (EMT) pathway were significantly differentially expressed between the mutant and WT tumors. Several EMTs are required for normal brain development. Overall, pre-EMT genes, including the master regulator of EMT SNAI1, were overexpressed in H3K27M tumors compared to the WT tumors, while post-EMT genes were underexpressed. We hypothesized that the H3 p.K27M mutation may lead to gliomagenesis by inducing a stall in the EMT in early brain development. To test this hypothesis, we examined published single-cell RNA sequencing data from pediatric diffuse midline gliomas alongside similar data from organoid models of neural development, collected from multiple developmental timepoints. This analysis revealed transcriptional similarities between H3K27M and pre-EMT neural stem cells. Currently, we are investigating the expression of EMT markers in H3K27M and WT pediatric glioma primary cell lines, using Western blotting, RT-PCR, and CRISPRi screening. In conclusion, we observed aberrant expression of genes involved in EMT in H3K27M pediatric gliomas. Our observations are consistent with a model in which the p.H3K27M mutation is associated with a pre-EMT cell phenotype, potentially due to an arrest in the EMT pathway or de-differentiation of mature astrocytes.
Citation Format: ALLISON R. CHENEY, Lauren M. Sanders, Lucas Seninge, Holly C. Beale, Ellen Towle Kephart, Jacob Pfeil, Katrina Learned, A. Geoffrey Lyle, Isabel Bjork, David Haussler, Sofie R. Salama, Olena M. Vaske. H3K27M gliomas are characterized by a stall in the epithelial-mesenchymal transition [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 6154.
Collapse
|
15
|
Abstract B06: Candidate differentiation stall in epithelial mesenchymal transition in H3K27M diffuse midline glioma. Cancer Res 2020. [DOI: 10.1158/1538-7445.pedca19-b06] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
The purpose of our study was to elucidate the molecular mechanisms by which the histone H3 K27M mutation drives tumorigenesis of pediatric gliomas. Though it has potential implications on the treatment of diffuse midline glioma as a disease driver, the timing, cell type of origin, and effect of the H3K27M mutation on early embryonic brain development are not fully characterized. Here, we performed differential expression analysis on a cohort of H3K27M and H3WT pediatric gliomas, revealing that genes in the epithelial-mesenchymal transition (EMT) pathway were significantly differentially expressed. SNAI1, the EMT master regulator, was significantly overexpressed in the H3K27M tumor cohort. Overall, pre-EMT genes were overexpressed in H3K27M tumors while post-EMT genes were underexpressed. We hypothesized that H3K27M may lead to gliomagenesis by stalling an EMT in early brain development and employed single-cell and bulk RNA sequencing data from cerebral organoids at multiple developmental timepoints to test this hypothesis. We observed that a long noncoding RNA (lncRNA) signature identified as transiently expressed in early brain development was preferentially expressed in H3K27M tumors. Cell type-specific lncRNA signatures had higher expression in H3K27M tumors for pre-EMT cell types, and higher expression in H3WT tumors for post-EMT cell types. Finally, t-SNE clustering of single-cell glioma RNA sequencing data with single-cell organoid data revealed transcriptional similarities between H3K27M and pre-EMT neural stem cells. In conclusion, we observed aberrant activity of the EMT in H3K27M gliomas. Our data suggest that the H3K27M mutation is associated with a pre-EMT cell phenotype, and that this mutation may cause EMT arrest or de-differentiation.
Citation Format: Allison R. Cheney, Lauren M. Sanders, Lucas Seninge, Holly C. Beale, Ellen Towle Kephart, Jacob Pfeil, Katrina Learned, A. Geoffrey Lyle, Isabel Bjork, David Haussler, Sofie R. Salama, Olena M. Vaske. Candidate differentiation stall in epithelial mesenchymal transition in H3K27M diffuse midline glioma [abstract]. In: Proceedings of the AACR Special Conference on the Advances in Pediatric Cancer Research; 2019 Sep 17-20; Montreal, QC, Canada. Philadelphia (PA): AACR; Cancer Res 2020;80(14 Suppl):Abstract nr B06.
Collapse
|
16
|
Abstract
IMPORTANCE Pediatric cancers are epigenetic diseases; therefore, considering tumor gene expression information is necessary for a complete understanding of the tumorigenic processes. OBJECTIVE To evaluate the feasibility and utility of incorporating comparative gene expression information into the precision medicine framework for difficult-to-treat pediatric and young adult patients with cancer. DESIGN, SETTING, AND PARTICIPANTS This cohort study was conducted as a consortium between the University of California, Santa Cruz (UCSC) Treehouse Childhood Cancer Initiative and clinical genomic trials. RNA sequencing (RNA-Seq) data were obtained from the following 4 clinical sites and analyzed at UCSC: British Columbia Children's Hospital (n = 31), Lucile Packard Children's Hospital at Stanford University (n = 80), CHOC Children's Hospital and Hyundai Cancer Institute (n = 46), and the Pacific Pediatric Neuro-Oncology Consortium (n = 24). The study dates were January 1, 2016, to March 22, 2017. EXPOSURES Participants underwent tumor RNA-Seq profiling as part of 4 separate clinical trials at partner hospitals. The UCSC either downloaded RNA-Seq data from a partner institution for analysis in the cloud or provided a Docker pipeline that performed the same analysis at a partner institution. The UCSC then compared each participant's tumor RNA-Seq profile with more than 11 000 uniformly analyzed tumor profiles from pediatric and young adult patients with cancer, downloaded from public data repositories. These comparisons were used to identify genes and pathways that are significantly overexpressed in each patient's tumor. Results of the UCSC analysis were presented to clinical partners. MAIN OUTCOMES AND MEASURES Feasibility of a third-party institution (UCSC Treehouse Childhood Cancer Initiative) to obtain tumor RNA-Seq data from patients, conduct comparative analysis, and present analysis results to clinicians; and proportion of patients for whom comparative tumor gene expression analysis provided useful clinical and biological information. RESULTS Among 144 samples from children and young adults (median age at diagnosis, 9 years; range, 0-26 years; 72 of 118 [61.0%] male [26 patients sex unknown]) with a relapsed, refractory, or rare cancer treated on precision medicine protocols, RNA-Seq-derived gene expression was potentially useful for 99 of 144 samples (68.8%) compared with DNA mutation information that was potentially useful for only 34 of 74 samples (45.9%). CONCLUSIONS AND RELEVANCE This study's findings suggest that tumor RNA-Seq comparisons may be feasible and highlight the potential clinical utility of incorporating such comparisons into the clinical genomic interpretation framework for difficult-to-treat pediatric and young adult patients with cancer. The study also highlights for the first time to date the potential clinical utility of harmonized publicly available genomic data sets.
Collapse
|
17
|
Abstract
Although increasingly recognized as critical to genomic research, genomic data sharing is hindered by an absence of standards regarding timing, patient privacy, use agreement standards, and data characterization and quality. Only after months of identifying, permissioning for use, committing to terms restricting use and sharing, downloading, and assessing quality, is it possible to know whether or not a dataset can be used. In this paper, we evaluate the barriers to data sharing based on the Treehouse experience and offer recommendations for use agreement standards, data characterization and metadata standardization to enhance data sharing and outcomes for all pediatric cancer patients.
Collapse
|
18
|
GENE-11. SHARED LONG NON-CODING RNA DYSREGULATION IN HISTONE H3 K27M GLIOMAS AND PF-A EPENDYMOMAS. Neuro Oncol 2019. [DOI: 10.1093/neuonc/noz036.082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
19
|
DIPG-07. GENOMIC ANALYSIS METHODS FOR IDENTIFICATION OF CANCER DRIVER PATHWAYS IN CHILDHOOD BRAIN TUMORS. Neuro Oncol 2018. [DOI: 10.1093/neuonc/noy059.101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
20
|
Abstract LB-338: A critical evaluation of genomic data sharing: Barriers to accessing pediatric cancer genomic datasets: a Treehouse Childhood Cancer Initiative experience. Cancer Res 2017. [DOI: 10.1158/1538-7445.am2017-lb-338] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Genomic data sharing is increasingly recognized as critical to genomic research. The need is acute in pediatric cancer research due to the rarity of pediatric tumor types and paucity of pediatric cancer data, and in translational research to assess the impact of genomic research on human health. However, genomic data sharing is hindered by an absence of standards regarding timing, patient privacy, use agreement standards, and data characterization and quality. At UC Santa Cruz Treehouse Childhood Cancer Initiative (treehousegenomics.soe.ucsc.edu), we examine individual pediatric cancer tumor RNA sequencing profiles against a database of over 11,000 tumor RNA sequencing profiles from public genomic datasets such as The Cancer Genome Atlas, Therapeutically Applicable Research To Generate Effective Treatments, International Cancer Genome Consortium, and Medulloblastoma Advanced Genomics International, and pediatric cancer clinical trials with which we partner, such as those at Stanford University, UC San Francisco, Children’s Hospital of Orange County, and British Columbia Children’s Hospital. For over 18 months, we have worked systematically to enhance the Treehouse dataset by adding pediatric cancer data and presently underrepresented tumor types. The NIH and other leading funding agencies now regularly require grantees to make genomic data generated available to the research community, either post-publication or after an embargo period. We have combed websites and public repositories, searched PubMed, and contacted researchers directly. Finding data requires a mining of literature, often with limited information, and initiating the many different processes for requesting permission for these datasets, with different and often cumbersome data use obligations. The combination of cryptically named datasets, multiple data types and the practice of grouping datasets from multiple papers under a single study accession makes zeroing in on the correct dataset challenging. Downloading the genomic data is time-consuming, such that a dataset of under a 100 files can take up to a week to download under optimal conditions. Matching metadata is inconsistently available, often vague, sparse or error ridden. Only after months of identifying, permissioning for use, committing to use- and sharing-restricting terms, and downloading the genomic and metadata, is it possible to assess the quality, often discovering that data quality is low. We evaluate the barriers to data sharing based on the Treehouse experience and offer guidelines for timing, use agreement standards, and data characterization and quality, to enhance data sharing and outcomes for all pediatric cancer patients.
Citation Format: Katrina Learned, Ann Durbin, Robert Currie, Holly Beale, Du Linh Lam, Theodore Goldstein, Sofie R. Salama, David Haussler, Olena Morozova, Isabel Bjork. A critical evaluation of genomic data sharing: Barriers to accessing pediatric cancer genomic datasets: a Treehouse Childhood Cancer Initiative experience [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr LB-338. doi:10.1158/1538-7445.AM2017-LB-338
Collapse
|
21
|
The UCSC Genome Browser database: 2016 update. Nucleic Acids Res 2015; 44:D717-25. [PMID: 26590259 PMCID: PMC4702902 DOI: 10.1093/nar/gkv1275] [Citation(s) in RCA: 334] [Impact Index Per Article: 37.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 11/03/2015] [Indexed: 01/19/2023] Open
Abstract
For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the “Data Integrator”, for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.
Collapse
|
22
|
Abstract LB-212: Treehouse Childhood Cancer Project: a resource for sharing and multiple cohort analysis of pediatric cancer genomics data. Cancer Res 2015. [DOI: 10.1158/1538-7445.am2015-lb-212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Deep sequencing of adult and pediatric tumors revealed that different cancers share common genetic mutations. Aside from sequence mutation, gene expression, copy number, and epigenetic mechanisms contribute to tumorigenesis, and integrating this information may reveal more aberrant signaling pathways than analysis of mutations alone. Significantly, agents targeting specific pathways may be effective against multiple malignancies, regardless of the mechanisms of pathway deregulation. These observations suggest that pediatric cancer patients may benefit from targeted therapies developed for adults. Since the development of pediatric-cancer-specific therapies is hindered by the limited involvement of pharmaceutical companies and small patient cohorts, repositioning drugs designed for adult tumors remains the fastest and most effective way to bring new treatment options to pediatric cancer patients
While pediatric tumors have been characterized by genome-wide technologies, the data from these studies are typically under-utilized beyond the initial single cohort, single data type analyses. Consequently, we still lack a comprehensive picture of the molecular pathways that contribute to pediatric cancer in each patient, especially those that can be targeted in the clinic. Integrating multiple datasets is essential for assembling large enough patient cohorts to achieve an understanding of cancer-driving molecular aberrations in individual patients.
The Treehouse Childhood Cancer Project consolidates gene expression, mutation and copy number datasets under the UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu), and currently contains data from over 1000 pediatric tumors from TARGET and other studies. Treehouse enables mining these data alongside the data from adult cancers studied by The Cancer Genome Atlas consortium (TCGA). This is accomplished using bioinformatics tools developed for the TCGA Pan-Cancer Analysis Working Group and aimed at identifying situations where a subset of pediatric tumors may be driven by similar molecular pathways as adult tumors. We have assembled a consortium of researchers who plan to both contribute data to the Treehouse platform and apply Treehouse data in their analyses. These include John Maris (Children's Hospital of Philadelphia), Michael Taylor (Hospital for Sick Children, Toronto), Poul Sorensen (University of British Columbia), Timothy Triche (Children's Hospital Los Angeles), Soheil Meshinchi (Fred Hutchinson Cancer Research Center), Doug Hawkins (Seattle Children's Hospital), Javed Khan (NIH Center for Cancer Research), Ching Lao (Texas Children's Hospital), Leonard Sender (UC Irvine, Children's Hospital of Orange County), Alejandro Sweet-Cordero (Stanford School of Medicine), and D.W. Parsons (Baylor College of Medicine).
In this submission, we demonstrate the utility of the Treehouse resource by analyzing the neuroblastoma TARGET cohort in the context of adult TCGA cancers. This work presents a proof of concept that cross-cancer multiple cohort analysis can lead to new insights into pediatric malignancies.
Citation Format: Olena Morozova, Yulia Newton, Melissa Cline, Jingchun Zhu, Katrina Learned, Josh Stuart, Sofie Salama, Robert Arceci, David Haussler. Treehouse Childhood Cancer Project: a resource for sharing and multiple cohort analysis of pediatric cancer genomics data. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr LB-212. doi:10.1158/1538-7445.AM2015-LB-212
Collapse
|
23
|
A comparative encyclopedia of DNA elements in the mouse genome. Nature 2015; 515:355-64. [PMID: 25409824 PMCID: PMC4266106 DOI: 10.1038/nature13992] [Citation(s) in RCA: 1135] [Impact Index Per Article: 126.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 10/24/2014] [Indexed: 12/11/2022]
Abstract
The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
Collapse
|
24
|
Abstract
Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.
Collapse
|
25
|
The UCSC Genome Browser database: 2014 update. Nucleic Acids Res 2014; 42:D764-70. [PMID: 24270787 PMCID: PMC3964947 DOI: 10.1093/nar/gkt1168] [Citation(s) in RCA: 550] [Impact Index Per Article: 55.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Revised: 10/30/2013] [Accepted: 10/30/2013] [Indexed: 12/17/2022] Open
Abstract
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser's web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation 'tracks' for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany.
Collapse
|
26
|
The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 2013; 41:D64-9. [PMID: 23155063 PMCID: PMC3531082 DOI: 10.1093/nar/gks1048] [Citation(s) in RCA: 612] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2012] [Accepted: 10/08/2012] [Indexed: 11/14/2022] Open
Abstract
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic datasets. As of September 2012, genomic sequence and a basic set of annotation 'tracks' are provided for 63 organisms, including 26 mammals, 13 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms, yeast and sea hare. In the past year 19 new genome assemblies have been added, and we anticipate releasing another 28 in early 2013. Further, a large number of annotation tracks have been either added, updated by contributors or remapped to the latest human reference genome. Among these are an updated UCSC Genes track for human and mouse assemblies. We have also introduced several features to improve usability, including new navigation menus. This article provides an update to the UCSC Genome Browser database, which has been previously featured in the Database issue of this journal.
Collapse
|
27
|
ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res 2013; 41:D56-63. [PMID: 23193274 PMCID: PMC3531152 DOI: 10.1093/nar/gks1172] [Citation(s) in RCA: 610] [Impact Index Per Article: 55.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Revised: 10/26/2012] [Accepted: 10/28/2012] [Indexed: 02/07/2023] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE), http://encodeproject.org, has completed its fifth year of scientific collaboration to create a comprehensive catalog of functional elements in the human genome, and its third year of investigations in the mouse genome. Since the last report in this journal, the ENCODE human data repertoire has grown by 898 new experiments (totaling 2886), accompanied by a major integrative analysis. In the mouse genome, results from 404 new experiments became available this year, increasing the total to 583, collected during the course of the project. The University of California, Santa Cruz, makes this data available on the public Genome Browser http://genome.ucsc.edu for visual browsing and data mining. Download of raw and processed data files are all supported. The ENCODE portal provides specialized tools and information about the ENCODE data sets.
Collapse
|
28
|
Abstract
To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
Collapse
|
29
|
ENCODE whole-genome data in the UCSC Genome Browser: update 2012. Nucleic Acids Res 2012; 40:D912-7. [PMID: 22075998 PMCID: PMC3245183 DOI: 10.1093/nar/gkr1012] [Citation(s) in RCA: 207] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 10/18/2011] [Accepted: 10/20/2011] [Indexed: 11/23/2022] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE) Consortium is entering its 5th year of production-level effort generating high-quality whole-genome functional annotations of the human genome. The past year has brought the ENCODE compendium of functional elements to critical mass, with a diverse set of 27 biochemical assays now covering 200 distinct human cell types. Within the mouse genome, which has been under study by ENCODE groups for the past 2 years, 37 cell types have been assayed. Over 2000 individual experiments have been completed and submitted to the Data Coordination Center for public use. UCSC makes this data available on the quality-reviewed public Genome Browser (http://genome.ucsc.edu) and on an early-access Preview Browser (http://genome-preview.ucsc.edu). Visual browsing, data mining and download of raw and processed data files are all supported. An ENCODE portal (http://encodeproject.org) provides specialized tools and information about the ENCODE data sets.
Collapse
|
30
|
The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res 2012; 40:D918-23. [PMID: 22086951 PMCID: PMC3245018 DOI: 10.1093/nar/gkr1055] [Citation(s) in RCA: 273] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 10/18/2011] [Accepted: 10/25/2011] [Indexed: 01/05/2023] Open
Abstract
The University of California Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic data sets. In the past year, the local database has been updated with four new species assemblies, and we anticipate another four will be released by the end of 2011. Further, a large number of annotation tracks have been either added, updated by contributors, or remapped to the latest human reference genome. Among these are new phenotype and disease annotations, UCSC genes, and a major dbSNP update, which required new visualization methods. Growing beyond the local database, this year we have introduced 'track data hubs', which allow the Genome Browser to provide access to remotely located sets of annotations. This feature is designed to significantly extend the number and variety of annotation tracks that are publicly available for visualization and analysis from within our site. We have also introduced several usability features including track search and a context-sensitive menu of options available with a right-click anywhere on the Browser's image.
Collapse
|
31
|
Abstract
The ENCODE project is an international consortium with a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) at the University of California, Santa Cruz serves as the central repository for ENCODE data. In this role, the DCC offers a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. This data helps illuminate transcription factor-binding sites, histone marks, chromatin accessibility, DNA methylation, RNA expression, RNA binding and other cell-state indicators. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser (http://genome.ucsc.edu/). ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay. The ENCODE web portal at UCSC (http://encodeproject.org/) provides information about the ENCODE data and links for access.
Collapse
|
32
|
Abstract
The University of California, Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online access to a database of genomic sequence and annotation data for a wide variety of organisms. The Browser also has many tools for visualizing, comparing and analyzing both publicly available and user-generated genomic data sets, aligning sequences and uploading user data. Among the features released this year are a gene search tool and annotation track drag-reorder functionality as well as support for BAM and BigWig/BigBed file formats. New display enhancements include overlay of multiple wiggle tracks through use of transparent coloring, options for displaying transformed wiggle data, a 'mean+whiskers' windowing function for display of wiggle data at high zoom levels, and more color schemes for microarray data. New data highlights include seven new genome assemblies, a Neandertal genome data portal, phenotype and disease association data, a human RNA editing track, and a zebrafish Conservation track. We also describe updates to existing tracks.
Collapse
|
33
|
Abstract
The Encyclopedia of DNA Elements (ENCODE) project is an international consortium of investigators funded to analyze the human genome with the goal of producing a comprehensive catalog of functional elements. The ENCODE Data Coordination Center at The University of California, Santa Cruz (UCSC) is the primary repository for experimental results generated by ENCODE investigators. These results are captured in the UCSC Genome Bioinformatics database and download server for visualization and data mining via the UCSC Genome Browser and companion tools (Rhead et al. The UCSC Genome Browser Database: update 2010, in this issue). The ENCODE web portal at UCSC (http://encodeproject.org or http://genome.ucsc.edu/ENCODE) provides information about the ENCODE data and convenient links for access.
Collapse
|
34
|
Abstract
The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users’ own annotation data. As of September 2009, genomic sequence and a basic set of annotation ‘tracks’ are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools.
Collapse
|