1
|
Sant DW, Sinclair M, Mungall CJ, Schulz S, Zerbino D, Lovering RC, Logie C, Eilbeck K. Sequence ontology terminology for gene regulation. Biochim Biophys Acta Gene Regul Mech 2021; 1864:194745. [PMID: 34389511 DOI: 10.1016/j.bbagrm.2021.194745] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 07/17/2021] [Accepted: 08/05/2021] [Indexed: 01/12/2023]
Abstract
The Sequence Ontology (SO) is a structured, controlled vocabulary that provides terms and definitions for genomic annotation. The Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC) initiative has gathered input from many groups of researchers, including the SO, the Gene Ontology (GO), and gene regulation experts, with the goal of curating information about how gene expression is regulated at the molecular level. Here we discuss recent updates to the SO reflecting current knowledge. We have developed more accurate human-readable terms (also known as classes), including new definitions, and relationships related to the expression of genes. New findings continue to give us insight into the biology of gene regulation, including the order of events, and participants in those events. These updates to the SO support logical reasoning with the current understanding of gene expression regulation at the molecular level.
Collapse
Affiliation(s)
- David W Sant
- Department of biomedical informatics, University of Utah, Salt Lake City, UT, USA; Department of Biomedical Sciences, Noorda College of Osteopathic Medicine, Provo, UT, USA.
| | - Michael Sinclair
- Department of biomedical informatics, University of Utah, Salt Lake City, UT, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory: Berkeley, CA, US.
| | - Stefan Schulz
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.
| | - Ruth C Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London, UK.
| | - Colin Logie
- Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 28, 6525, GA Nijmegen, Netherlands.
| | - Karen Eilbeck
- Department of biomedical informatics, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
2
|
Harrison PW, Sokolov A, Nayak A, Fan J, Zerbino D, Cochrane G, Flicek P. The FAANG Data Portal: Global, Open-Access, "FAIR", and Richly Validated Genotype to Phenotype Data for High-Quality Functional Annotation of Animal Genomes. Front Genet 2021; 12:639238. [PMID: 34220930 PMCID: PMC8248360 DOI: 10.3389/fgene.2021.639238] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 05/04/2021] [Indexed: 11/13/2022] Open
Abstract
The Functional Annotation of ANimal Genomes (FAANG) project is a worldwide coordinated action creating high-quality functional annotation of farmed and companion animal genomes. The generation of a rich genome-to-phenome resource and supporting informatic infrastructure advances the scope of comparative genomics and furthers the understanding of functional elements. The project also provides terrestrial and aquatic animal agriculture community powerful resources for supporting improvements to farmed animal production, disease resistance, and genetic diversity. The FAANG Data Portal (https://data.faang.org) ensures Findable, Accessible, Interoperable and Reusable (FAIR) open access to the wealth of sample, sequencing, and analysis data produced by an ever-growing number of FAANG consortia. It is developed and maintained by the FAANG Data Coordination Centre (DCC) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). FAANG projects produce a standardised set of multi-omic assays with resulting data placed into a range of specialised open data archives. To ensure this data is easily findable and accessible by the community, the portal automatically identifies and collates all submitted FAANG data into a single easily searchable resource. The Data Portal supports direct download from the multiple underlying archives to enable seamless access to all FAANG data from within the portal itself. The portal provides a range of predefined filters, powerful predictive search, and a catalogue of sampling and analysis protocols and automatically identifies publications associated with any dataset. To ensure all FAANG data submissions are high-quality, the portal includes powerful contextual metadata validation and data submissions brokering to the underlying EMBL-EBI archives. The portal will incorporate extensive new technical infrastructure to effectively deliver and standardise FAANG's shift to single-cellomics, cell atlases, pangenomes, and novel phenotypic prediction models. The Data Portal plays a key role for FAANG by supporting high-quality functional annotation of animal genomes, through open FAIR sharing of data, complete with standardised rich metadata. Future Data Portal features developed by the DCC will support new technological developments for continued improvement for FAANG projects.
Collapse
Affiliation(s)
- Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
| | - Alexey Sokolov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
| | - Akshatha Nayak
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
| | - Jun Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
| |
Collapse
|
3
|
Gundersen S, Boddu S, Capella-Gutierrez S, Drabløs F, Fernández JM, Kompova R, Taylor K, Titov D, Zerbino D, Hovig E. Recommendations for the FAIRification of genomic track metadata. F1000Res 2021; 10. [PMID: 34249331 PMCID: PMC8226415 DOI: 10.12688/f1000research.28449.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2021] [Indexed: 01/25/2023] Open
Abstract
Background: Many types of data from genomic analyses can be represented as genomic tracks,
i.e. features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, as well as RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information. Description of work: We propose to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable) to produce searchable metadata for genomic tracks. Findability and Accessibility of metadata can then be ensured by a track search service that integrates globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories. Interoperability and Reusability need to be ensured by the specification and implementation of a basic set of recommendations for metadata. We have tested this concept by developing such a specification in a JSON Schema, called FAIRtracks, and have integrated it into a novel track search service, called TrackFind. We demonstrate practical usage by importing datasets through TrackFind into existing examples of relevant analytical tools for genomic tracks: EPICO and the GSuite HyperBrowser. Conclusion: We here provide a first iteration of a draft standard for genomic track metadata, as well as the accompanying software ecosystem. It can easily be adapted or extended to future needs of the research community regarding data, methods and tools, balancing the requirements of both data submitters and analytical end-users.
Collapse
Affiliation(s)
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Finn Drabløs
- Department of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Radmila Kompova
- Center for Bioinformatics, University of Oslo (UiO), Oslo, Norway
| | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Dmytro Titov
- Center for Bioinformatics, University of Oslo (UiO), Oslo, Norway
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Eivind Hovig
- Center for Bioinformatics, University of Oslo (UiO), Oslo, Norway.,Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital (OUH), Oslo, Norway
| |
Collapse
|
4
|
Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland JE, Mudge JM, Sisu C, Wright JC, Armstrong J, Barnes I, Berry A, Bignell A, Boix C, Carbonell Sala S, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Howe KL, Hunt T, Izuogu OG, Johnson R, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FCP, Parker A, Pei B, Pozo F, Riera FC, Ruffier M, Schmitt BM, Stapleton E, Suner MM, Sycheva I, Uszczynska-Ratajczak B, Wolf MY, Xu J, Yang YT, Yates A, Zerbino D, Zhang Y, Choudhary JS, Gerstein M, Guigó R, Hubbard TJP, Kellis M, Paten B, Tress ML, Flicek P. GENCODE 2021. Nucleic Acids Res 2021; 49:D916-D923. [PMID: 33270111 PMCID: PMC7778937 DOI: 10.1093/nar/gkaa1087] [Citation(s) in RCA: 475] [Impact Index Per Article: 158.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 10/21/2020] [Accepted: 10/24/2020] [Indexed: 12/14/2022] Open
Abstract
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139, USA.,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.,Department of Bioscience, Brunel University London, Uxbridge UB8 3PH, UK
| | - James C Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carles Boix
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139, USA.,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA.,Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Silvia Carbonell Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tomás Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kevin L Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Osagie G Izuogu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rory Johnson
- Department of Medical Oncology, Inselspital, University Hospital, University of Bern, Bern, Switzerland.,Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Shamika Mohanan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Muir
- Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT 06520, USA.,Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Fabio C P Navarro
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Baikang Pei
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Ferriol Calvet Riera
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eloise Stapleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Maxim Y Wolf
- Department of Biomedical Informatics at Harvard Medical School, 10 Shattuck Street, Suite 514, Boston, MA 02115, USA
| | - Jinuri Xu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Yucheng T Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.,Program in Computational Biology & Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yan Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.,Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.,Program in Computational Biology & Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA.,Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139, USA.,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
5
|
Peat G, Jones W, Nuhn M, Marugán JC, Newell W, Dunham I, Zerbino D. The open targets post-GWAS analysis pipeline. Bioinformatics 2020; 36:2936-2937. [PMID: 31930349 PMCID: PMC7203748 DOI: 10.1093/bioinformatics/btaa020] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 12/19/2019] [Accepted: 01/09/2020] [Indexed: 11/17/2022] Open
Abstract
Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.
Collapse
Affiliation(s)
- Gareth Peat
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Open Targets, EBI South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - William Jones
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michael Nuhn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Open Targets, EBI South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - José Carlos Marugán
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Open Targets, EBI South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - William Newell
- Open Targets, EBI South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,GSK, Medicines Research Center, Stevenage SG1 2NY, UK
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Open Targets, EBI South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Open Targets, EBI South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
6
|
Besh D, Sokolov M, Zerbino D, Kyyak Y. P3586Influence of morphological features of intracoronary thrombi on ST segment resolution in patients with STEMI after primary PCI. Eur Heart J 2019. [DOI: 10.1093/eurheartj/ehz745.0446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Background
Coronary artery thrombosis is a key element in the onset of STEMI and its further course. Coronary clots significantly vary in their morphological features and types. Perhaps, it may be caused by individual differences in thrombus formation that could have a strong influence on STEMI course and prognosis.
Purpose
To elucidate the influence of morphological features of intracoronary thrombi (IT) on ST-segment resolution (the outcome) after primary PCI in patients with STEMI.
Methods
The study included 100 patients with STEMI (female 22%, male 78%) aged 57.81±10.26 years, who underwent PCI with manual thromboaspiration within the first 12 hours (mean 7.22±3.74 h) after the onset of symptoms. The obtained ITs were examined morphologically after hematoxylin & eosin staining by the standard method and by the method proposed by prof. Zerbino that allows determining the antiquity of fibrin according to its color. Each IT was classified by five morphological features: 1) “old” or “fresh” (according to Zerbino's staining); 2) with or without layered structure; and presence or absence of 3) microchannels, 4) peripheral infiltration with neutrophil leukocytes, 5) elements of atherosclerotic plaque. Correlations between these features and incidence of ST-segment resolution in 60 minutes after primary PCI for more than 50% from baseline were analyzed.
Results
Neither IT antiquity nor presence of the atherosclerotic plaque elements had any significant correlations with ST-segment resolution. Significant positive correlation was revealed with peripheral neutrophil infiltration (r=0.42, p<0.05), and negative correlations were found with layered arrangement IT (r=−0.31, p<0.05) and presence of microchannels (r=−0.56, p<0.05). Statistically significant mathematical prognostic model of ST-segment resolution was obtained after analysis of the co-influence of all five morphological features on the outcome using logistic regression method. It included three IT features significantly correlating with ST-segment resolution:
Z = 1.13 − 0.51*V1 − 0.57*V2 − 0.94*V3
where V1 – IT with layered arrangement of fibrin, V2 – IT with microchannels' formation, V3 – IT with peripheral areas of infiltration by the neutrophil leukocytes (1 for the presence of the feature, 0 for its absence).
“Old” IT with layered structure
Conclusions
Morphological features of IT in patients with acute STEMI appear to have significant influence on the primary PCI outcomes. They may be important predictors of the disease course and treatments efficacy. Further investigations of the IT peculiarities and their influences on the course of STEMI may help to improve therapy.
Acknowledgement/Funding
Study financial support was made by authors
Collapse
Affiliation(s)
- D Besh
- Danylo Halytsky Lviv National Medical University, Lviv, Ukraine
| | - M Sokolov
- NSC Institute of Cardiology M.D. Strazhesko, Kiev, Ukraine
| | - D Zerbino
- Danylo Halytsky Lviv National Medical University, Lviv, Ukraine
| | - Y Kyyak
- Danylo Halytsky Lviv National Medical University, Lviv, Ukraine
| |
Collapse
|
7
|
Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, Barnes I, Berry A, Bignell A, Carbonell Sala S, Chrast J, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Hunt T, Izuogu OG, Lagarde J, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FC, Parker A, Pei B, Pozo F, Ruffier M, Schmitt BM, Stapleton E, Suner MM, Sycheva I, Uszczynska-Ratajczak B, Xu J, Yates A, Zerbino D, Zhang Y, Aken B, Choudhary JS, Gerstein M, Guigó R, Hubbard TJ, Kellis M, Paten B, Reymond A, Tress ML, Flicek P. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 2019; 47:D766-D773. [PMID: 30357393 PMCID: PMC6323946 DOI: 10.1093/nar/gky955] [Citation(s) in RCA: 1713] [Impact Index Per Article: 342.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/20/2018] [Accepted: 10/08/2018] [Indexed: 02/06/2023] Open
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anne-Maud Ferreira
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Rory Johnson
- Department of Medical Oncology, Inselspital, University Hospital, University of Bern, Bern, Switzerland
- Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Bioscience, Brunel University London, Uxbridge UB8 3PH, UK
| | - James Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
| | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Silvia Carbonell Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Jacqueline Chrast
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tomás Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Osagie G Izuogu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Shamika Mohanan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Muir
- Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Fabio C P Navarro
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Baikang Pei
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eloise Stapleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Jinuri Xu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yan Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Bronwen Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology & Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
8
|
Harrison PW, Fan J, Richardson D, Clarke L, Zerbino D, Cochrane G, Archibald AL, Schmidt CJ, Flicek P. FAANG, establishing metadata standards, validation and best practices for the farmed and companion animal community. Anim Genet 2018; 49:520-526. [PMID: 30311252 PMCID: PMC6334167 DOI: 10.1111/age.12736] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2018] [Indexed: 12/30/2022]
Abstract
The Functional Annotation of ANimal Genomes (FAANG) project aims, through a coordinated international effort, to provide high quality functional annotation of animal genomes with an initial focus on farmed and companion animals. A key goal of the initiative is to ensure high quality and rich supporting metadata to describe the project's animals, specimens, cell cultures and experimental assays. By defining rich sample and experimental metadata standards and promoting best practices in data descriptions, deposition and openness, FAANG champions higher quality and reusability of published datasets. FAANG has established a Data Coordination Centre, which sits at the heart of the Metadata and Data Sharing Committee. It continues to evolve the metadata standards, support submissions and, crucially, create powerful and accessible tools to support deposition and validation of metadata. FAANG conforms to the findable, accessible, interoperable, and reusable (FAIR) data principles, with high quality, open access and functionally interlinked data. In addition to data generated by FAANG members and specific FAANG projects, existing datasets that meet the main—or more permissive legacy—standards are incorporated into a central, focused, functional data resource portal for the entire farmed and companion animal community. Through clear and effective metadata standards, validation and conversion software, combined with promotion of best practices in metadata implementation, FAANG aims to maximise effectiveness and inter‐comparability of assay data. This supports the community to create a rich genome‐to‐phenotype resource and promotes continuing improvements in animal data standards as a whole.
Collapse
Affiliation(s)
- P W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - J Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - D Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - L Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - D Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - G Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - A L Archibald
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, UK
| | - C J Schmidt
- Department of Animal and Food Sciences, College of Agriculture and Natural Resources, University of Delaware, Newark, DE, 19716, USA
| | - P Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
9
|
Ruffier M, Kähäri A, Komorowska M, Keenan S, Laird M, Longden I, Proctor G, Searle S, Staines D, Taylor K, Vullo A, Yates A, Zerbino D, Flicek P. Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation. Database (Oxford) 2017; 2017:3074789. [PMID: 28365736 PMCID: PMC5467575 DOI: 10.1093/database/bax020] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Revised: 02/07/2017] [Accepted: 02/20/2017] [Indexed: 01/09/2023]
Abstract
The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl 'Core' database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at http://github.com/Ensembl and we have an active developer mailing list ( http://www.ensembl.org/info/about/contact/index.html ). Database URL http://www.ensembl.org.
Collapse
Affiliation(s)
- Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andreas Kähäri
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Monika Komorowska
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen Keenan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Laird
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian Longden
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Glenn Proctor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Steve Searle
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Daniel Staines
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alessandro Vullo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
10
|
Cairns J, Freire-Pritchett P, Wingett SW, Várnai C, Dimond A, Plagnol V, Zerbino D, Schoenfelder S, Javierre BM, Osborne C, Fraser P, Spivakov M. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol 2016; 17:127. [PMID: 27306882 PMCID: PMC4908757 DOI: 10.1186/s13059-016-0992-2] [Citation(s) in RCA: 242] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 05/25/2016] [Indexed: 12/14/2022] Open
Abstract
Capture Hi-C (CHi-C) is a method for profiling chromosomal interactions involving targeted regions of interest, such as gene promoters, globally and at high resolution. Signal detection in CHi-C data involves a number of statistical challenges that are not observed when using other Hi-C-like techniques. We present a background model and algorithms for normalisation and multiple testing that are specifically adapted to CHi-C experiments. We implement these procedures in CHiCAGO ( http://regulatorygenomicsgroup.org/chicago ), an open-source package for robust interaction detection in CHi-C. We validate CHiCAGO by showing that promoter-interacting regions detected with this method are enriched for regulatory features and disease-associated SNPs.
Collapse
Affiliation(s)
- Jonathan Cairns
- Nuclear Dynamics Programme, Babraham Institute, Cambridge, UK
| | | | - Steven W Wingett
- Nuclear Dynamics Programme, Babraham Institute, Cambridge, UK
- Bioinformatics Group, Babraham Institute, Cambridge, UK
| | - Csilla Várnai
- Nuclear Dynamics Programme, Babraham Institute, Cambridge, UK
| | - Andrew Dimond
- Nuclear Dynamics Programme, Babraham Institute, Cambridge, UK
| | | | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | | | | | - Cameron Osborne
- Department of Medical and Molecular Genetics, King's College, London, UK
| | - Peter Fraser
- Nuclear Dynamics Programme, Babraham Institute, Cambridge, UK
| | | |
Collapse
|
11
|
Li JW, Bolser D, Manske M, Giorgi FM, Vyahhi N, Usadel B, Clavijo BJ, Chan TF, Wong N, Zerbino D, Schneider MV. The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability. Brief Bioinform 2013; 14:548-55. [PMID: 23793381 PMCID: PMC3771235 DOI: 10.1093/bib/bbt045] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented. Finding a centralized synthesis is difficult. A multitude of tools exist for NGS data analysis; however, none of these satisfies all possible uses and needs. This gap in functionality could be filled by integrating different methods in customized pipelines, an approach helped by the open-source nature of many NGS programmes. Drawing from community spirit and with the use of the Wikipedia framework, we have initiated a collaborative NGS resource: The NGS WikiBook. We have collected a sufficient amount of text to incentivize a broader community to contribute to it. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.
Collapse
Affiliation(s)
- Jing-Woei Li
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR. Tel.: +852-39431302;
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Abstract
MOTIVATION Large multiple genome alignments and inferred ancestral genomes are ideal resources for comparative studies of molecular evolution, and advances in sequencing and computing technology are making them increasingly obtainable. These structures can provide a rich understanding of the genetic relationships between all subsets of species they contain. Current formats for storing genomic alignments, such as XMFA and MAF, are all indexed or ordered using a single reference genome, however, which limits the information that can be queried with respect to other species and clades. This loss of information grows with the number of species under comparison, as well as their phylogenetic distance. RESULTS We present HAL, a compressed, graph-based hierarchical alignment format for storing multiple genome alignments and ancestral reconstructions. HAL graphs are indexed on all genomes they contain. Furthermore, they are organized phylogenetically, which allows for modular and parallel access to arbitrary subclades without fragmentation because of rearrangements that have occurred in other lineages. HAL graphs can be created or read with a comprehensive C++ API. A set of tools is also provided to perform basic operations, such as importing and exporting data, identifying mutations and coordinate mapping (liftover). AVAILABILITY All documentation and source code for the HAL API and tools are freely available at http://github.com/glennhickey/hal. CONTACT hickey@soe.ucsc.edu or haussler@soe.ucsc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Glenn Hickey
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz CA 95064, USA.
| | | | | | | | | |
Collapse
|
13
|
Abstract
Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new "Cactus" alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.
Collapse
Affiliation(s)
- Benedict Paten
- Center for Biomolecular Science and Engineering, University of California-Santa Cruz, CA 95064, USA.
| | | | | | | | | | | |
Collapse
|
14
|
Radenbaugh A, Sanborn JZ, Zerbino D, Wilks C, Stuart JM, Haussler D. Abstract 59: Identification of RNA editing events in cancer using high-throughput sequencing data. Cancer Res 2011. [DOI: 10.1158/1538-7445.am2011-59] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
RNA editing is a post-transcriptional modification of pre-mRNA that has recently been identified as an additional epigenetic mechanism relevant to cancer development and progression. With projects like The Cancer Genome Atlas (TCGA) providing high-throughput sequencing datasets measuring both DNA and RNA from the same patients across multiple cancers, it is now possible to search for RNA editing events at a genome-wide scale.
Using fully-sequenced tumor and matched-normal genomes and RNA-Seq data from TCGA project, we will identify RNA editing events in acute myeloid leukemia (AML) patients. We have analyzed the tumor and matched normal genomes to identify SNPs, mutations (both germline and somatic), and heterozygosity across the entire genome. By comparing the patient's genomic data to the RNA transcripts assembled by the UCSC RNA-Seq pipeline, we can identify any bases that were transcribed abnormally. All putative RNA editing events will be assessed according to the most common types of RNA editing, such as the deamination of adenosine into inosine (A-to-I) or the conversion of cytosine into uracil (C-to-U). Local phasing information inferred from the genomic sequence will be used to disambiguate potential RNA editing events found at heterozygous locations.
As a positive control, we will confirm RNA-editing events previously discovered experimentally in AML patients, such as an A-to-I conversion in the protein tyrosine phosphatase PTPN6 gene. The PTPN6 gene is recognized as a tumor suppressor gene and is important for the down-regulation of growth-promoting receptors. The A-to-I conversion of adenosine 7866 causes the splicing mechanism to ignore a splicing junction, leading to a non-functional PTPN6 protein via the inclusion of an intron in the mature RNA transcript. Using the TCGA sequencing data, we will identify all AML samples that exhibit this particular A-to-I conversion as well as the inclusion of the intron with the RNA-Seq data. In addition, we will report novel RNA editing events in AML and other cancer types and look for patterns that may be cancer specific or globally relevant to cancer development.
Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 102nd Annual Meeting of the American Association for Cancer Research; 2011 Apr 2-6; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2011;71(8 Suppl):Abstract nr 59. doi:10.1158/1538-7445.AM2011-59
Collapse
Affiliation(s)
| | | | | | - Chris Wilks
- 1University of California, Santa Cruz, Santa Cruz, CA
| | | | | |
Collapse
|
15
|
Young AL, Abaan HO, Zerbino D, Mullikin JC, Birney E, Margulies EH. A new strategy for genome assembly using short sequence reads and reduced representation libraries. Genome Res 2010; 20:249-56. [PMID: 20123915 DOI: 10.1101/gr.097956.109] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We have developed a novel approach for using massively parallel short-read sequencing to generate fast and inexpensive de novo genomic assemblies comparable to those generated by capillary-based methods. The ultrashort (<100 base) sequences generated by this technology pose specific biological and computational challenges for de novo assembly of large genomes. To account for this, we devised a method for experimentally partitioning the genome using reduced representation (RR) libraries prior to assembly. We use two restriction enzymes independently to create a series of overlapping fragment libraries, each containing a tractable subset of the genome. Together, these libraries allow us to reassemble the entire genome without the need of a reference sequence. As proof of concept, we applied this approach to sequence and assembled the majority of the 125-Mb Drosophila melanogaster genome. We subsequently demonstrate the accuracy of our assembly method with meaningful comparisons against the current available D. melanogaster reference genome (dm3). The ease of assembly and accuracy for comparative genomics suggest that our approach will scale to future mammalian genome-sequencing efforts, saving both time and money without sacrificing quality.
Collapse
Affiliation(s)
- Andrew L Young
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | | | | | |
Collapse
|
16
|
Leinonen R, Akhtar R, Birney E, Bonfield J, Bower L, Corbett M, Cheng Y, Demiralp F, Faruque N, Goodgame N, Gibson R, Hoad G, Hunter C, Jang M, Leonard S, Lin Q, Lopez R, Maguire M, McWilliam H, Plaister S, Radhakrishnan R, Sobhany S, Slater G, Ten Hoopen P, Valentin F, Vaughan R, Zalunin V, Zerbino D, Cochrane G. Improvements to services at the European Nucleotide Archive. Nucleic Acids Res 2009; 38:D39-45. [PMID: 19906712 PMCID: PMC2808951 DOI: 10.1093/nar/gkp998] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL–EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services.
Collapse
Affiliation(s)
- Rasko Leinonen
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Han R, Leo-Macias A, Zerbino D, Bastolla U, Contreras-Moreira B, Ortiz AR. An efficient conformational sampling method for homology modeling. Proteins 2008; 71:175-88. [PMID: 17985353 DOI: 10.1002/prot.21672] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The structural refinement of protein models is a challenging problem in protein structure prediction (Moult et al., Proteins 2003;53(Suppl 6):334-339). Most attempts to refine comparative models lead to degradation rather than improvement in model quality, so most current comparative modeling procedures omit the refinement step. However, it has been shown that even in the absence of alignment errors and using optimal templates, methods based on a single template have intrinsic limitations, and that refinement is needed to improve model accuracy. It is thought that failure of current methods originates on one hand from the inaccuracy of the effective free energy functions adopted, which do not represent properly the energetic balance in the native state, and on the other hand from the difficulty to sample the high dimensional and rugged free energy landscape of protein folding, in the search for the global minimum. Here, we address this second issue. We define the evolutionary and vibrational armonics subspace (EVA), a reduced sampling subspace that consists of a combination of evolutionarily favored directions, defined by the principal components of the structural variation within a homologous family, plus topologically favored directions, derived from the low frequency normal modes of the vibrational dynamics, up to 50 dimensions. This subspace is accurate enough so that the cores of most proteins can be represented within 1 A accuracy, and reduced enough so that Replica Exchange Monte Carlo (Hukushima and Nemoto, J Phys Soc Jpn 1996;65:1604-1608; Hukushima et al., Int J Mod Phys C: Phys Comput 1996;7:337-344; Mitsutake et al., J Chem Phys 2003;118:6664-6675; Mitsutake et al., J Chem Phys 2003;118:6676-6688) (REMC) can be applied. REMC is one of the best sampling methods currently available, but its applicability is restricted to spaces of small dimensionality. We show that the combination of the EVA subspace and REMC can essentially solve the optimization problem for backbone atoms in the reduced sampling subspace, even for rather rugged free energy landscapes. Applications and limitations of this methodology are finally discussed.
Collapse
Affiliation(s)
- Rongsheng Han
- Bioinformatics Unit, Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, Cantoblanco, Madrid, Spain
| | | | | | | | | | | |
Collapse
|
18
|
Leo-Macias A, Lopez-Romero P, Lupyan D, Zerbino D, Ortiz AR. Core deformations in protein families: a physical perspective. Biophys Chem 2005; 115:125-8. [PMID: 15752593 DOI: 10.1016/j.bpc.2004.12.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2004] [Revised: 11/10/2004] [Accepted: 12/10/2004] [Indexed: 11/18/2022]
Abstract
An analysis is presented on how structural cores change shape within protein families, and whether or not there is a relationship between these structural changes and the vibrational modes that proteins experiment due to topological constraints. A set of 13 representative and well-populated protein families are studied. The evolutionary directions of deformation are obtained by applying a new multiple structural alignment technique to superimpose the structures and extract a conserved core, together with Principal Components Analysis (PCA) to extract the main deformation modes. A low-resolution Normal Mode Analysis (NMA) technique is used in parallel to study the properties of the mechanical core plasticity of the same proteins. We find that the evolutionary deformations span a low dimensional space. A statistically significant correspondence exists between these principal deformations and the vibrational modes accessible to a particular topology. We conclude that, to a significant extent, the structures of evolving proteins seem to respond to sequence changes by collective deformations along combinations of low-frequency modes. The findings have implications in structure prediction by homology modeling.
Collapse
Affiliation(s)
- Alejandra Leo-Macias
- Bioinformatics Unit, Centro de Biologia Molecular Severo Ochoa, CSIC-UAM, Universidad Autonoma de Madrid, Cantoblanco 28049, Madrid, Spain
| | | | | | | | | |
Collapse
|
19
|
Abstract
An analysis is presented on how structural cores modify their shape across homologous proteins, and whether or not a relationship exists between these structural changes and the vibrational normal modes that proteins experience as a result of the topological constraints imposed by the fold. A set of 35 representative, well-populated protein families is studied. The evolutionary directions of deformation are obtained by using multiple structural alignments to superimpose the structures and extract a conserved core, together with principal components analysis to extract the main deformation modes from the three-dimensional superimposition. In parallel, a low-resolution normal mode analysis technique is employed to study the properties of the mechanical core plasticity of these same families. We show that the evolutionary deformations span a low dimensional space of 4-5 dimensions on average. A statistically significant correspondence exists between these principal deformations and the approximately 20 slowest vibrational modes accessible to a particular topology. We conclude that, to a significant extent, the structural response of a protein topology to sequence changes takes place by means of collective deformations along combinations of a small number of low-frequency modes. The findings have implications in structure prediction by homology modeling.
Collapse
Affiliation(s)
- Alejandra Leo-Macias
- Bioinformatics Unit, Centro de Biología Molecular Severo Ochoa, CSIC-UAM, Cantoblanco, Madrid, Spain
| | | | | | | | | |
Collapse
|
20
|
Olszewski W, Machowski Z, Sokolowski J, Sawicki Z, Zerbino D, Nielubowicz J. [Primary lymphatic edema of lower limbs. I. Lymphographic and histologic examination of vessels and lymph nodes in primary lymphedema]. Pol Przegl Chir 1972; 44:657-65. [PMID: 5026138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|