1
|
CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
2
|
Federated Analysis for Privacy-Preserving Data Sharing: A Technical and Legal Primer. Annu Rev Genomics Hum Genet 2023; 24:347-368. [PMID: 37253596 PMCID: PMC10846631 DOI: 10.1146/annurev-genom-110122-084756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Continued advances in precision medicine rely on the widespread sharing of data that relate human genetic variation to disease. However, data sharing is severely limited by legal, regulatory, and ethical restrictions that safeguard patient privacy. Federated analysis addresses this problem by transferring the code to the data-providing the technical and legal capability to analyze the data within their secure home environment rather than transferring the data to another institution for analysis. This allows researchers to gain new insights from data that cannot be moved, while respecting patient privacy and the data stewards' legal obligations. Because federated analysis is a technical solution to the legal challenges inherent in data sharing, the technology and policy implications must be evaluated together. Here, we summarize the technical approaches to federated analysis and provide a legal analysis of their policy implications.
Collapse
|
3
|
Abstract 1177: Introduction of the GA4GH Variation Representation Specification (VRS) and supporting tools for discovery and exchange of clinical genomic and cytogenomic knowledge in cancers. Cancer Res 2022. [DOI: 10.1158/1538-7445.am2022-1177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Precision oncology is the practice of interpreting the clinical significance of observed molecular changes in patient neoplasms, potentially impacting medical decision making and care. This process is labor-intensive and (among other challenges) involves accurately translating between variation representation conventions from one resource to the next. For example, differences in representations of Copy Number Variation (CNV) from genomic regions, cytogenomic bands, or gene features create challenges in knowledge matching due to lack of standards covering all of these modalities of observed variation.The Global Alliance for Genomics and Health (GA4GH; ga4gh.org) is an international collaborative of genomic data sharing initiatives (Driver Projects) developing genomic data sharing standards within a human rights framework. GA4GH recently published the Variation Representation Specification (VRS; pronounced “verse”), a standard for the computational representation of biomolecular variation. VRS is a terminology, schema, and associated conventions for creating uniquely identifiable and federatable representations of molecular variation. VRS has formal data classes well-suited to differentiating between variation on a single molecule (e.g. tandem duplications) from variation measured at a systemic level (e.g. genome-wide copy number variation). In addition to molecular sequence variation, VRS also supports variation on cytogenetic coordinate systems and genes, making it well-suited to representing variation associated with cancer biomarkers.We demonstrate the use of VRS to model reported gene-associated CNVs from the AACR Project GENIE cohort, to aid in the computational discovery of evidence from clinico-genomic knowledgebases with genomic or cytogenomic CNV representations. We highlight the use case of knowledge matching to the Atlas of Genetics and Cytogenetics in Oncology and Haematology (“the Atlas”; atlasgeneticsoncology.org), a cytogenetics resource historically driven by user website navigation. Using VRS search tools we developed for the Variant Interpretation for Cancer Consortium (VICC; cancervariants.org) GA4GH Driver Project, we found that 64% of GENIE samples with reported CNVs matched clinically relevant knowledge in the Atlas. This work was enabled by programmatic search tools leveraging standard VRS object structures, demonstrating how VRS enables collection of real-world evidence across more resources without manual interpretation or custom normalization methods. We conclude with a survey of open-source tools supporting this analysis as well as search of other clinico-genomic knowledgebases with VRS, including CIViC (civicdb.org), BRCA Exchange (brcaexchange.org), and the Molecular Oncology Almanac (moalmanac.org).
Citation Format: Matthew Cannon, Kori Kuzma, James Stevenson, Jiachen Liu, Colin O'Sullivan, Bimal P. Chaudhari, Matthew Brush, Robert R. Freimuth, Tristan Nelson, Michael Baudis, Obi L. Griffith, Malachi Griffith, Lawrence Babb, Melissa S. Cline, Xuelu Liu, Brian Walsh, Alex H. Wagner. Introduction of the GA4GH Variation Representation Specification (VRS) and supporting tools for discovery and exchange of clinical genomic and cytogenomic knowledge in cancers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1177.
Collapse
|
4
|
Abstract
We promote a shared vision and guide for how and when to federate genomic and health-related data sharing, enabling connections and insights across independent, secure databases. The GA4GH encourages a federated approach wherein data providers have the mandate and resources to share, but where data cannot move for legal or technical reasons. We recommend a federated approach to connect national genomics initiatives into a global network and precision medicine resource.
Collapse
|
5
|
Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants. Hum Mutat 2019; 40:1546-1556. [PMID: 31294896 PMCID: PMC6744348 DOI: 10.1002/humu.23861] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 07/01/2019] [Accepted: 07/02/2019] [Indexed: 12/31/2022]
Abstract
Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly-interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.
Collapse
|
6
|
Abstract
The BRCA Challenge is a long-term data-sharing project initiated within the Global Alliance for Genomics and Health (GA4GH) to aggregate BRCA1 and BRCA2 data to support highly collaborative research activities. Its goal is to generate an informed and current understanding of the impact of genetic variation on cancer risk across the iconic cancer predisposition genes, BRCA1 and BRCA2. Initially, reported variants in BRCA1 and BRCA2 available from public databases were integrated into a single, newly created site, www.brcaexchange.org. The purpose of the BRCA Exchange is to provide the community with a reliable and easily accessible record of variants interpreted for a high-penetrance phenotype. More than 20,000 variants have been aggregated, three times the number found in the next-largest public database at the project’s outset, of which approximately 7,250 have expert classifications. The data set is based on shared information from existing clinical databases—Breast Cancer Information Core (BIC), ClinVar, and the Leiden Open Variation Database (LOVD)—as well as population databases, all linked to a single point of access. The BRCA Challenge has brought together the existing international Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium expert panel, along with expert clinicians, diagnosticians, researchers, and database providers, all with a common goal of advancing our understanding of BRCA1 and BRCA2 variation. Ongoing work includes direct contact with national centers with access to BRCA1 and BRCA2 diagnostic data to encourage data sharing, development of methods suitable for extraction of genetic variation at the level of individual laboratory reports, and engagement with participant communities to enable a more comprehensive understanding of the clinical significance of genetic variation in BRCA1 and BRCA2. The goal of this study and paper has been to develop an international resource to generate an informed and current understanding of the impact of genetic variation on cancer risk across the cancer predisposition genes, BRCA1 and BRCA2. Reported variants in BRCA1 and BRCA2 available from public databases were integrated into a single, newly created site, www.brcaexchange.org, to provide a reliable and easily accessible record of variants interpreted for a high-penetrance phenotype.
Collapse
|
7
|
Abstract
Purpose Genetic tests of cancer predisposition genes, BRCA1 and BRCA2, inform significant clinical decisions for both physicians and patients. Most uncovered variants are benign, and determining which few are pathogenic—disease causing—is sometimes challenging and can potentially be inconsistent among laboratories. The ClinVar database makes deidentified clinical variant classifications from multiple laboratories publicly available for comparison and review, per recommendations by the American Medical Association, the American College of Medical Genetics, the National Society for Genetic Counselors, and other organizations. Methods Classifications of more than 2,000 BRCA1/2 variants in ClinVar that represent approximately 22,000 patients were dichotomized as clinically actionable or not actionable and compared among as many as seven laboratories. The properties of these variants and classification differences were investigated in detail. Results Per-variant concordance was 98.5% (CI, 97.9% to 99.0%). All discordant variants were rare; thus, per-patient concordance was estimated to be higher (99.7%). ClinVar facilitated resolution of many of the discordant variants, and concordance increased to 99.0% per variant and 99.8% per patient when reclassified, but not yet resubmitted, variants and submission errors were addressed. Most of the remaining discordances seemed to involve either legitimate differences in expert judgment regarding particular scientific evidence or were classifications that predated the availability of important scientific evidence. Conclusion Significant classification disagreements among professional clinical laboratories represented in ClinVar are infrequent yet important. Unrestricted sharing of clinical genetic data allows detailed interlaboratory quality control and peer review, as exemplified by this study.
Collapse
|
8
|
Human genetics special issue on computational molecular medicine. Hum Genet 2015; 134:455-7. [PMID: 25805167 DOI: 10.1007/s00439-015-1545-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
9
|
A comparative encyclopedia of DNA elements in the mouse genome. Nature 2015; 515:355-64. [PMID: 25409824 PMCID: PMC4266106 DOI: 10.1038/nature13992] [Citation(s) in RCA: 1135] [Impact Index Per Article: 126.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 10/24/2014] [Indexed: 12/11/2022]
Abstract
The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
Collapse
|
10
|
The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau093. [PMID: 25267794 PMCID: PMC4178372 DOI: 10.1093/database/bau093] [Citation(s) in RCA: 127] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The Cancer Genomics Hub (CGHub) is the online repository of the sequencing programs of the National Cancer Institute (NCI), including The Cancer Genomics Atlas (TCGA), the Cancer Cell Line Encyclopedia (CCLE) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects, with data from 25 different types of cancer. The CGHub currently contains >1.4 PB of data, has grown at an average rate of 50 TB a month and serves >100 TB per week. The architecture of CGHub is designed to support bulk searching and downloading through a Web-accessible application programming interface, enforce patient genome confidentiality in data storage and transmission and optimize for efficiency in access and transfer. In this article, we describe the design of these three components, present performance results for our transfer protocol, GeneTorrent, and finally report on the growth of the system in terms of data stored and transferred, including estimated limits on the current architecture. Our experienced-based estimates suggest that centralizing storage and computational resources is more efficient than wide distribution across many satellite labs. Database URL:https://cghub.ucsc.edu
Collapse
|
11
|
The UCSC Genome Browser database: 2014 update. Nucleic Acids Res 2014; 42:D764-70. [PMID: 24270787 PMCID: PMC3964947 DOI: 10.1093/nar/gkt1168] [Citation(s) in RCA: 550] [Impact Index Per Article: 55.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Revised: 10/30/2013] [Accepted: 10/30/2013] [Indexed: 12/17/2022] Open
Abstract
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser's web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation 'tracks' for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany.
Collapse
|
12
|
Exploring TCGA Pan-Cancer data at the UCSC Cancer Genomics Browser. Sci Rep 2013; 3:2652. [PMID: 24084870 PMCID: PMC3788369 DOI: 10.1038/srep02652] [Citation(s) in RCA: 207] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Accepted: 08/28/2013] [Indexed: 12/31/2022] Open
Abstract
The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu) offers interactive visualization and exploration of TCGA genomic, phenotypic, and clinical data, as produced by the Cancer Genome Atlas Research Network. Researchers can explore the impact of genomic alterations on phenotypes by visualizing gene and protein expression, copy number, DNA methylation, somatic mutation and pathway inference data alongside clinical features, Pan-Cancer subtype classifications and genomic biomarkers. Integrated Kaplan–Meier survival analysis helps investigators to assess survival stratification by any of the information.
Collapse
|
13
|
Quaking and PTB control overlapping splicing regulatory networks during muscle cell differentiation. RNA (NEW YORK, N.Y.) 2013; 19:627-38. [PMID: 23525800 PMCID: PMC3677278 DOI: 10.1261/rna.038422.113] [Citation(s) in RCA: 119] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Accepted: 02/20/2013] [Indexed: 05/26/2023]
Abstract
Alternative splicing contributes to muscle development, but a complete set of muscle-splicing factors and their combinatorial interactions are unknown. Previous work identified ACUAA ("STAR" motif) as an enriched intron sequence near muscle-specific alternative exons such as Capzb exon 9. Mass spectrometry of myoblast proteins selected by the Capzb exon 9 intron via RNA affinity chromatography identifies Quaking (QK), a protein known to regulate mRNA function through ACUAA motifs in 3' UTRs. We find that QK promotes inclusion of Capzb exon 9 in opposition to repression by polypyrimidine tract-binding protein (PTB). QK depletion alters inclusion of 406 cassette exons whose adjacent intron sequences are also enriched in ACUAA motifs. During differentiation of myoblasts to myotubes, QK levels increase two- to threefold, suggesting a mechanism for QK-responsive exon regulation. Combined analysis of the PTB- and QK-splicing regulatory networks during myogenesis suggests that 39% of regulated exons are under the control of one or both of these splicing factors. This work provides the first evidence that QK is a global regulator of splicing during muscle development in vertebrates and shows how overlapping splicing regulatory networks contribute to gene expression programs during differentiation.
Collapse
|
14
|
Rbfox1 downregulation and altered calpain 3 splicing by FRG1 in a mouse model of Facioscapulohumeral muscular dystrophy (FSHD). PLoS Genet 2013; 9:e1003186. [PMID: 23300487 PMCID: PMC3536703 DOI: 10.1371/journal.pgen.1003186] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Accepted: 11/06/2012] [Indexed: 01/17/2023] Open
Abstract
Facioscapulohumeral muscular dystrophy (FSHD) is a common muscle disease whose molecular pathogenesis remains largely unknown. Over-expression of FSHD region gene 1 (FRG1) in mice, frogs, and worms perturbs muscle development and causes FSHD–like phenotypes. FRG1 has been implicated in splicing, and we asked how splicing might be involved in FSHD by conducting a genome-wide analysis in FRG1 mice. We find that splicing perturbations parallel the responses of different muscles to FRG1 over-expression and disease progression. Interestingly, binding sites for the Rbfox family of splicing factors are over-represented in a subset of FRG1-affected splicing events. Rbfox1 knockdown, over-expression, and RNA-IP confirm that these are direct Rbfox1 targets. We find that FRG1 is associated to the Rbfox1 RNA and decreases its stability. Consistent with this, Rbfox1 expression is down-regulated in mice and cells over-expressing FRG1 as well as in FSHD patients. Among the genes affected is Calpain 3, which is mutated in limb girdle muscular dystrophy, a disease phenotypically similar to FSHD. In FRG1 mice and FSHD patients, the Calpain 3 isoform lacking exon 6 (Capn3 E6–) is increased. Finally, Rbfox1 knockdown and over-expression of Capn3 E6- inhibit muscle differentiation. Collectively, our results suggest that a component of FSHD pathogenesis may arise by over-expression of FRG1, reducing Rbfox1 levels and leading to aberrant expression of an altered Calpain 3 protein through dysregulated splicing. Alternative splicing is a major contributor to the complexity of human cells, and its disruption can lead to a wide range of human disorders. FSHD is one of the most important muscle diseases. While muscle differentiation defects have been widely reported in the disease, the molecular mechanisms responsible are largely unknown. We found that expression of the alternative splicing factor Rbfox1 is a direct FRG1 target, and its expression decreased in the muscles of a mouse model of FSHD and FSHD patients. Moreover, alternative splicing of Calpain 3, encoding for a protease involved in muscle differentiation, is regulated by Rbfox1 and is altered in the muscles of the mouse model of FSHD and FSHD patients. Interestingly, we found that Rbfox1 is required for muscle differentiation and that this activity is likely mediated by Calpain 3 alternative splicing. Hence, our results suggest that decreased expression of Rbfox1 and aberrant Calpain 3 splicing contribute to the muscle differentiation defects of FSHD patients.
Collapse
|
15
|
The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 2013; 41:D64-9. [PMID: 23155063 PMCID: PMC3531082 DOI: 10.1093/nar/gks1048] [Citation(s) in RCA: 612] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2012] [Accepted: 10/08/2012] [Indexed: 11/14/2022] Open
Abstract
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic datasets. As of September 2012, genomic sequence and a basic set of annotation 'tracks' are provided for 63 organisms, including 26 mammals, 13 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms, yeast and sea hare. In the past year 19 new genome assemblies have been added, and we anticipate releasing another 28 in early 2013. Further, a large number of annotation tracks have been either added, updated by contributors or remapped to the latest human reference genome. Among these are an updated UCSC Genes track for human and mouse assemblies. We have also introduced several features to improve usability, including new navigation menus. This article provides an update to the UCSC Genome Browser database, which has been previously featured in the Database issue of this journal.
Collapse
|
16
|
Abstract
To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
Collapse
|
17
|
ENCODE whole-genome data in the UCSC Genome Browser: update 2012. Nucleic Acids Res 2012; 40:D912-7. [PMID: 22075998 PMCID: PMC3245183 DOI: 10.1093/nar/gkr1012] [Citation(s) in RCA: 207] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 10/18/2011] [Accepted: 10/20/2011] [Indexed: 11/23/2022] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE) Consortium is entering its 5th year of production-level effort generating high-quality whole-genome functional annotations of the human genome. The past year has brought the ENCODE compendium of functional elements to critical mass, with a diverse set of 27 biochemical assays now covering 200 distinct human cell types. Within the mouse genome, which has been under study by ENCODE groups for the past 2 years, 37 cell types have been assayed. Over 2000 individual experiments have been completed and submitted to the Data Coordination Center for public use. UCSC makes this data available on the quality-reviewed public Genome Browser (http://genome.ucsc.edu) and on an early-access Preview Browser (http://genome-preview.ucsc.edu). Visual browsing, data mining and download of raw and processed data files are all supported. An ENCODE portal (http://encodeproject.org) provides specialized tools and information about the ENCODE data sets.
Collapse
|
18
|
The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res 2012; 40:D918-23. [PMID: 22086951 PMCID: PMC3245018 DOI: 10.1093/nar/gkr1055] [Citation(s) in RCA: 273] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 10/18/2011] [Accepted: 10/25/2011] [Indexed: 01/05/2023] Open
Abstract
The University of California Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic data sets. In the past year, the local database has been updated with four new species assemblies, and we anticipate another four will be released by the end of 2011. Further, a large number of annotation tracks have been either added, updated by contributors, or remapped to the latest human reference genome. Among these are new phenotype and disease annotations, UCSC genes, and a major dbSNP update, which required new visualization methods. Growing beyond the local database, this year we have introduced 'track data hubs', which allow the Genome Browser to provide access to remotely located sets of annotations. This feature is designed to significantly extend the number and variety of annotation tracks that are publicly available for visualization and analysis from within our site. We have also introduced several usability features including track search and a context-sensitive menu of options available with a right-click anywhere on the Browser's image.
Collapse
|
19
|
Using bioinformatics to predict the functional impact of SNVs. Bioinformatics 2011; 27:441-8. [PMID: 21159622 PMCID: PMC3105482 DOI: 10.1093/bioinformatics/btq695] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Revised: 11/21/2010] [Accepted: 12/12/2010] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The past decade has seen the introduction of fast and relatively inexpensive methods to detect genetic variation across the genome and exponential growth in the number of known single nucleotide variants (SNVs). There is increasing interest in bioinformatics approaches to identify variants that are functionally important from millions of candidate variants. Here, we describe the essential components of bioinformatics tools that predict functional SNVs. RESULTS Bioinformatics tools have great potential to identify functional SNVs, but the black box nature of many tools can be a pitfall for researchers. Understanding the underlying methods, assumptions and biases of these tools is essential to their intelligent application.
Collapse
|
20
|
Abstract
The ENCODE project is an international consortium with a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) at the University of California, Santa Cruz serves as the central repository for ENCODE data. In this role, the DCC offers a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. This data helps illuminate transcription factor-binding sites, histone marks, chromatin accessibility, DNA methylation, RNA expression, RNA binding and other cell-state indicators. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser (http://genome.ucsc.edu/). ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay. The ENCODE web portal at UCSC (http://encodeproject.org/) provides information about the ENCODE data and links for access.
Collapse
|
21
|
Abstract
The University of California, Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online access to a database of genomic sequence and annotation data for a wide variety of organisms. The Browser also has many tools for visualizing, comparing and analyzing both publicly available and user-generated genomic data sets, aligning sequences and uploading user data. Among the features released this year are a gene search tool and annotation track drag-reorder functionality as well as support for BAM and BigWig/BigBed file formats. New display enhancements include overlay of multiple wiggle tracks through use of transparent coloring, options for displaying transformed wiggle data, a 'mean+whiskers' windowing function for display of wiggle data at high zoom levels, and more color schemes for microarray data. New data highlights include seven new genome assemblies, a Neandertal genome data portal, phenotype and disease association data, a human RNA editing track, and a zebrafish Conservation track. We also describe updates to existing tracks.
Collapse
|
22
|
Aberrant alternative splicing and extracellular matrix gene expression in mouse models of myotonic dystrophy. Nat Struct Mol Biol 2010; 17:187-93. [PMID: 20098426 DOI: 10.1038/nsmb.1720] [Citation(s) in RCA: 250] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Accepted: 10/14/2009] [Indexed: 01/08/2023]
Abstract
The common form of myotonic dystrophy (DM1) is associated with the expression of expanded CTG DNA repeats as RNA (CUG(exp) RNA). To test whether CUG(exp) RNA creates a global splicing defect, we compared the skeletal muscle of two mouse models of DM1, one expressing a CTG(exp) transgene and another homozygous for a defective muscleblind 1 (Mbnl1) gene. Strong correlation in splicing changes for approximately 100 new Mbnl1-regulated exons indicates that loss of Mbnl1 explains >80% of the splicing pathology due to CUG(exp) RNA. In contrast, only about half of mRNA-level changes can be attributed to loss of Mbnl1, indicating that CUG(exp) RNA has Mbnl1-independent effects, particularly on mRNAs for extracellular matrix proteins. We propose that CUG(exp) RNA causes two separate effects: loss of Mbnl1 function (disrupting splicing) and loss of another function that disrupts extracellular matrix mRNA regulation, possibly mediated by Mbnl2. These findings reveal unanticipated similarities between DM1 and other muscular dystrophies.
Collapse
|
23
|
Abstract
SUMMARY Most genes in human, mouse and rat produce more than one transcript isoform. The Affymetrix Exon Array is a tool for studying the many processes that regulate RNA production, with separate probesets measuring RNA levels at known and putative exons. For insights on how exons levels vary between normal tissues, we constructed the Affy Exon Tissues track from tissue data published by Affymetrix. This track reports exon probeset intensities as log ratios relative to median values across the dataset and renders them as colored heat maps, to yield quick visual identification of exons with intensities that vary between normal tissues. AVAILABILITY Affy Exon Tissues track is freely available under the UCSC Genome Browser (http://genome.ucsc.edu/) for human (hg18), mouse (mm8 and mm9), and rat (rn4).
Collapse
|
24
|
Abstract
Cytoscape is a free software package for visualizing, modeling, and analyzing molecular and genetic interaction networks. As a key feature, Cytoscape enables biologists to determine and analyze the interconnectivity of a list of genes or proteins. This unit explains how to use Cytoscape to load and navigate biological network information and view mRNA expression profiles and other functional genomics and proteomics data in the context of the network obtained for genes of interest. Additional analyses that can be performed with Cytoscape are also discussed.
Collapse
|
25
|
Abstract
Summary: Recent studies have revealed that alternative splicing plays an important role in the observed protein and interaction diversity. Special microarrays allow for measuring gene expression at the exon level and thus for studying alternative transcripts and their corresponding protein domain architecture. We have developed the Cytoscape plugin DomainGraph that enables the visualization and detailed study of domain–domain interactions forming protein interaction networks. In addition, the integration of exon expression data supports the analysis of alternative splicing events and the characterization of their effects on the protein and domain interaction network. Different expression patterns between human tissues or cells can be identified by comparing the generated domain graphs. Availability: The plugin DomainGraph and the online documentation are available at http://domaingraph.bioinf.mpi-inf.mpg.de. Contact:mario.albrecht@mpi-inf.mpg.de
Collapse
|
26
|
Integrative Visual Analysis of the Effects of Alternative Splicing on Protein Domain Interaction Networks. J Integr Bioinform 2008. [DOI: 10.1515/jib-2008-101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
SummaryProteins and their interactions are essential for the functioning of all organisms and for understanding biological processes. Alternative splicing is an important molecular mechanism for increasing the protein diversity in eukaryotic cells. Splicing events that alter the protein structure and the domain composition can be responsible for the regulation of protein interactions and the functional diversity of different tissues. Discovering the occurrence of splicing events and studying protein isoforms have become feasible using Affymetrix Exon Arrays. Therefore, we have developed the versatile Cytoscape plugin DomainGraph that allows for the visual analysis of protein domain interaction networks and their integration with exon expression data. Protein domains affected by alternative splicing are highlighted and splicing patterns can be compared.
Collapse
|
27
|
Abstract
MOTIVATION Many or most mammalian genes undergo alternative splicing, generating a variety of transcripts from a single gene. New information on splice variation is becoming available through technology for measuring expression levels of several exons or splice junctions per gene. We have developed a statistical method, ANalysis Of Splice VAriation (ANOSVA) to detect alternative splicing from expression data. Since ANOSVA requires no transcript information, it can be applied when the level of annotation is poor. When validated against spiked clone data, it generated no false positives and few false negatives. We demonstrated ANOSVA with data from a prototype mouse alternative splicing array, run against normal adult tissues, yielding a set of genes with evidence of tissue-specific splice variation. AVAILABILITY The results are available at the supplementary information site. SUPPLEMENTARY INFORMATION The results are available at the supplementary information site https://bioinfo.affymetrix.com/Papers/ANOSVA/
Collapse
|
28
|
Unusual intron conservation near tissue-regulated exons found by splicing microarrays. PLoS Comput Biol 2006; 2:e4. [PMID: 16424921 PMCID: PMC1331982 DOI: 10.1371/journal.pcbi.0020004] [Citation(s) in RCA: 163] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2005] [Accepted: 12/14/2005] [Indexed: 01/27/2023] Open
Abstract
Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5' splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families.
Collapse
|
29
|
Exploring alternative transcript structure in the human genome using blocks and InterPro. J Bioinform Comput Biol 2005; 1:289-306. [PMID: 15290774 DOI: 10.1142/s0219720003000113] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2002] [Revised: 12/07/2002] [Accepted: 01/15/2003] [Indexed: 11/18/2022]
Abstract
Understanding how alternative splicing affects gene function is an important challenge facing modern-day molecular biology. Using homology-based, protein sequence analysis methods, it should be possible to investigate how transcript diversity impacts protein function. To test this, high-quality exon-intron structures were deduced for over 8000 human genes, including over 1300 (17 percent) that produce multiple transcript variants. A data mining technique (DiffMotif) was developed to identify genes in which transcript variation coincides with changes in conserved motifs between variants. Applying this method, we found that 30 percent of the multi-variant genes in our test set exhibited a differential profile of conserved InterPro and/or BLOCKS motifs across different mRNA variants. To investigate these, a visualization tool (ProtAnnot) that displays amino acid motifs in the context of genomic sequence was developed. Using this tool, genes revealed by the DiffMotif method were analyzed, and when possible, hypotheses regarding the potential role of alternative transcript structure in modulating gene function were developed. Examples of these, including: MEOX1, a homeobox-containing protein; AIRE, involved in auto-immune disease; PLAT, tissue type plasminogen activator; and CD79b, a component of the B-cell receptor complex, are presented. These results demonstrate that amino acid motif databases like BLOCKS and InterPro are useful tools for investigating how alternative transcript structure affects gene function.
Collapse
|
30
|
Abstract
SUMMARY The NetAffx Gene Ontology (GO) Mining Tool is a web-based, interactive tool that permits traversal of the GO graph in the context of microarray data. It accepts a list of Affymetrix probe sets and renders a GO graph as a heat map colored according to significance measurements. The rendered graph is interactive, with nodes linked to public web sites and to lists of the relevant probe sets. The GO Mining Tool provides visualization combining biological annotation with expression data, encompassing thousands of genes in one interactive view. AVAILABILITY GO Mining Tool is freely available at http://www.affymetrix.com/analysis/query/go_analysis.affx
Collapse
|
31
|
The effects of alternative splicing on transmembrane proteins in the mouse genome. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004:17-28. [PMID: 14992489 DOI: 10.1142/9789812704856_0003] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Alternative splicing is a major source of variety in mammalian mRNAs, yet many questions remain on its downstream effects on protein function. To this end, we assessed the impact of gene structure and splice variation on signal peptide and transmembrane regions in proteins. Transmembrane proteins perform several key functions in cell signaling and transport, with their function tied closely to their transmembrane architecture. Signal peptides and transmembrane regions both provide key information on protein localization. Thus, any modification to such regions will likely alter protein destination and function. We applied TMHMM and SignalP to a nonredundant set of proteins, and assessed the effects of gene structure and alternative splicing on predicted transmembrane and signal peptide regions. These regions were altered by alternative splicing in roughly half of the cases studied. Transmembrane regions are divided by introns slightly less often than expected given gene structure and transmembrane region size. However, the transmembrane regions in single-pass transmembranes are divided substantially less often than expected. This suggests that intron placement might be subject to some evolutionary pressure to preserve function in these signaling proteins. The data described in this paper is available online at http://www.affymetrix.com/community/publications/affymetrix/tmsplice/.
Collapse
|
32
|
Abstract
Pairwise contact potentials have a long, successful history in protein structure prediction. They provide an easily-estimated representation of many attributes of protein structures, such as the hydrophobic effect. In order to improve on existing potentials, one should develop a clear understanding of precisely what information they convey. Here, using mutual information, we quantified the information in amino acid potentials, and the importance of hydropathy, charge, disulfide bonding, and burial. Sampling error in mutual information was controlled for by estimating how much information cannot be attributed to sampling bias. We found the information in amino acid contacts to be modest: 0.04 bits per contact. Of that, only 0.01 bits of information could not be attributed to hydropathy, charge, disulfide bonding, or burial.
Collapse
|
33
|
Protein-based analysis of alternative splicing in the human genome. PROCEEDINGS. IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE 2002; 1:118-24. [PMID: 15838129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Understanding the functional significance of alternative splicing and other mechanisms that generate RNA transcript diversity is an important challenge facing modern-day molecular biology. Using homology-based, protein sequence analysis methods, it should be possible to investigate how transcript diversity impacts protein structure and function. To test this, a data mining technique ("DiffHit") was developed to identify and catalog genes producing protein isoforms which exhibit distinct profiles of conserved protein motifs. We found that out of a test set of over 1,300 alternatively spliced genes with solved genomic structure, over 30% exhibited a differential profile of conserved InterPro and/or Blocks protein motifs across distinct isoforms. These results suggest that motif databases such as Blocks and InterPro are potentially useful tools for investigating how alternative transcript structure affects gene function.
Collapse
|
34
|
Abstract
A case of bacillary angiomatosis infection presenting as a skin nodule in a renal transplant recipient was found. The patient was taking cyclosporine, prednisone, and mycophenolate mofetil at the time of presentation. The bacillary angiomatosis responded to 6 months of therapy with oral erythromycin.
Collapse
|
35
|
Fatal and non-fatal farm injuries. Inj Prev 1998; 4:79. [PMID: 9595339 PMCID: PMC1730329 DOI: 10.1136/ip.4.1.79] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
36
|
Abstract
Methemoglobinemia should always be considered in the differential diagnosis of cyanosis in patients with normal arterial oxygen tension. Cyanosis that is brownish and not relieved by oxygen administration should lead to consideration of the diagnosis. A normal or elevated PaO2 value further suggests the presence of methemoglobinemia. Mild cases can be treated by removing the offending agent. In more severe cases, intravenous methylene blue and, if necessary, packed red cells or exchange transfusions may be given.
Collapse
|