701
|
Characterization and deep sequencing analysis of exosomal and non-exosomal miRNA in human urine. Kidney Int 2013; 86:433-44. [PMID: 24352158 DOI: 10.1038/ki.2013.502] [Citation(s) in RCA: 265] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2013] [Revised: 09/09/2013] [Accepted: 10/03/2013] [Indexed: 02/07/2023]
Abstract
Micro RNAs (miRNAs) have been shown to circulate in biological fluids and are enclosed in vesicles such as exosomes; they are present in urine and represent a noninvasive methodology to detect biomarkers for diagnostic testing. The low abundance of RNA in urine creates difficulties in its isolation, of which exosomal miRNA is a small fraction, making downstream RNA assays challenging. Here, we investigate methods to maximize exosomal isolation and RNA yield for next-generation deep sequencing. Upon characterizing exosomal proteins and total RNA content in urine, several commercially available kits were tested for their RNA extraction efficiency. We subsequently used the methods with the highest miRNA content to profile baseline miRNA expression using next-generation deep sequencing. Comparisons of miRNA profiles were also made with exosomes isolated by differential ultracentrifugation methodology and a commercially available column-based protocol. Overall, miRNAs were found to be significantly enriched and intact in urine-derived exosomes compared with cell-free urine. The presence of other noncoding RNAs such as small nuclear and small nucleolar RNA in the exosomes, in addition to coding sequences related to kidney and bladder conditions, was also detected. Our study extensively characterizes the RNA content of exosomes isolated from urine, providing the potential to identify miRNA biomarkers in human urine.
Collapse
|
702
|
Abascal F, Irisarri I, Zardoya R. Diversity and evolution of membrane intrinsic proteins. Biochim Biophys Acta Gen Subj 2013; 1840:1468-81. [PMID: 24355433 DOI: 10.1016/j.bbagen.2013.12.001] [Citation(s) in RCA: 152] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Revised: 12/05/2013] [Accepted: 12/09/2013] [Indexed: 12/11/2022]
Abstract
BACKGROUND Membrane intrinsic proteins (MIPs) are the proteins in charge of regulating water transport into cells. Because of this essential function, the MIP family is ancient, widespread, and highly diverse. SCOPE OF REVIEW The rapidly accumulating genomic and transcriptomic data from previously poorly known groups such as unicellular eukaryotes, fungi, green algae, mosses, and non-vertebrate animals are contributing to expand our view of MIP evolution throughout the diversity of life. Here, by analyzing more than 1700 sequences, we provide an updated and comprehensive phylogeny of MIPs MAJOR CONCLUSIONS The reconstructed phylogeny supports (i) deep orthology of X intrinsic proteins (XIPs; present from unicellular eukaryotes to plants); (ii) that the origin of small intrinsic proteins (SIPs) traces back to the common ancestor of all plants; and (iii) the expansion of aquaglyceroporins (GLPs) in Oomycetes, as well as their loss in vascular plants and in the ancestor of endopterygote insects. Additionally, conserved positions in the protein, and residues involved in glycerol selectivity are reviewed within a phylogenetic framework. Furthermore, functional diversification of human and Arabidopsis paralogs are analyzed in an evolutionary genomic context. GENERAL SIGNIFICANCE Our results show that while bacteria and archaea generally function with one copy of each a water channel (aquaporin or AQP) and a GLP, recurrent independent expansions have greatly diversified the structures and functions of the different members of both MIP paralog subfamilies throughout eukaryote evolution (and not only in flowering plants and vertebrates, as previously thought). This article is part of a Special Issue entitled Aquaporins.
Collapse
Affiliation(s)
- Federico Abascal
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain
| | - Iker Irisarri
- Department of Biodiversity and Evolutionary Biology, Museo Nacional de Ciencias Naturales-CSIC (MNCN-CSIC), José Gutiérrez Abascal 2, 28006 Madrid, Spain
| | - Rafael Zardoya
- Department of Biodiversity and Evolutionary Biology, Museo Nacional de Ciencias Naturales-CSIC (MNCN-CSIC), José Gutiérrez Abascal 2, 28006 Madrid, Spain.
| |
Collapse
|
703
|
Ponomarenko EA, Kopylov AT, Lisitsa AV, Radko SP, Kiseleva YY, Kurbatov LK, Ptitsyn KG, Tikhonova OV, Moisa AA, Novikova SE, Poverennaya EV, Ilgisonis EV, Filimonov AD, Bogolubova NA, Averchuk VV, Karalkin PA, Vakhrushev IV, Yarygin KN, Moshkovskii SA, Zgoda VG, Sokolov AS, Mazur AM, Prokhortchouck EB, Skryabin KG, Ilina EN, Kostrjukova ES, Alexeev DG, Tyakht AV, Gorbachev AY, Govorun VM, Archakov AI. Chromosome 18 transcriptoproteome of liver tissue and HepG2 cells and targeted proteome mapping in depleted plasma: update 2013. J Proteome Res 2013; 13:183-90. [PMID: 24328317 DOI: 10.1021/pr400883x] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
We report the results obtained in 2012-2013 by the Russian Consortium for the Chromosome-centric Human Proteome Project (C-HPP). The main scope of this work was the transcriptome profiling of genes on human chromosome 18 (Chr 18), as well as their encoded proteome, from three types of biomaterials: liver tissue, the hepatocellular carcinoma-derived cell line HepG2, and blood plasma. The transcriptome profiling for liver tissue was independently performed using two RNaseq platforms (SOLiD and Illumina) and also by droplet digital PCR (ddPCR) and quantitative RT-PCR. The proteome profiling of Chr 18 was accomplished by quantitatively measuring protein copy numbers in the three types of biomaterial (the lowest protein concentration measured was 10(-13) M) using selected reaction monitoring (SRM). In total, protein copy numbers were estimated for 228 master proteins, including quantitative data on 164 proteins in plasma, 171 in the HepG2 cell line, and 186 in liver tissue. Most proteins were present in plasma at 10(8) copies/μL, while the median abundance was 10(4) and 10(5) protein copies per cell in HepG2 cells and liver tissue, respectively. In summary, for liver tissue and HepG2 cells a "transcriptoproteome" was produced that reflects the relationship between transcript and protein copy numbers of the genes on Chr 18. The quantitative data acquired by RNaseq, PCR, and SRM were uploaded into the "Update_2013" data set of our knowledgebase (www.kb18.ru) and investigated for linear correlations.
Collapse
Affiliation(s)
- Elena A Ponomarenko
- Orekhovich Institute of Biomedical Chemistry of the Russian Academy of Medical Sciences , 10 Pogodinskaya Street, Moscow 119121, Russia
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
704
|
Abstract
In addition to environmental factors and intrinsic variations in base substitution rates, specific genome-destabilizing mutations can shape the mutational trajectory of genomes. How specific alleles influence the nature and position of accumulated mutations in a genomic context is largely unknown. Understanding the impact of genome-destabilizing alleles is particularly relevant to cancer genomes where biased mutational signatures are identifiable. We first created a more complete picture of cellular pathways that impact mutation rate using a primary screen to identify essential Saccharomyces cerevisiae gene mutations that cause mutator phenotypes. Drawing primarily on new alleles identified in this resource, we measure the impact of diverse mutator alleles on mutation patterns directly by whole-genome sequencing of 68 mutation-accumulation strains derived from wild-type and 11 parental mutator genotypes. The accumulated mutations differ across mutator strains, displaying base-substitution biases, allele-specific mutation hotspots, and break-associated mutation clustering. For example, in mutants of POLα and the Cdc13–Stn1–Ten1 complex, we find a distinct subtelomeric bias for mutations that we show is independent of the target sequence. Together our data suggest that specific genome-instability mutations are sufficient to drive discrete mutational signatures, some of which share properties with mutation patterns seen in tumors. Thus, in a population of cells, genome-instability mutations could influence clonal evolution by establishing discrete mutational trajectories for genomes.
Collapse
|
705
|
Computational approaches to identify functional genetic variants in cancer genomes. Nat Methods 2013; 10:723-9. [PMID: 23900255 DOI: 10.1038/nmeth.2562] [Citation(s) in RCA: 127] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Accepted: 06/07/2013] [Indexed: 12/13/2022]
Abstract
The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor but only a minority of these drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.
Collapse
|
706
|
Zhang F, Wang M, Michael T, Drabier R. Novel alternative splicing isoform biomarkers identification from high-throughput plasma proteomics profiling of breast cancer. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 5:S8. [PMID: 24565027 PMCID: PMC4028860 DOI: 10.1186/1752-0509-7-s5-s8] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
BACKGROUND In the biopharmaceutical industry, biomarkers define molecular taxonomies of patients and diseases and serve as surrogate endpoints in early-phase drug trials. Molecular biomarkers can be much more sensitive than traditional lab tests. Discriminating disease biomarkers by traditional method such as DNA microarray has proved challenging. Alternative splicing isoform represents a new class of diagnostic biomarkers. Recent scientific evidence is demonstrating that the differentiation and quantification of individual alternative splicing isoforms could improve insights into disease diagnosis and management. Identifying and characterizing alternative splicing isoforms are essential to the study of molecular mechanisms and early detection of complex diseases such as breast cancer. However, there are limitations with traditional methods used for alternative splicing isoform determination such as transcriptome-level, low level of coverage and poor focus on alternative splicing. RESULTS Therefore, we presented a peptidomics approach to searching novel alternative splicing isoforms in clinical proteomics. Our results showed that the approach has significant potential in enabling discovery of new types of high-quality alternative splicing isoform biomarkers. CONCLUSIONS We developed a peptidomics approach for the proteomics community to analyze, identify, and characterize alternative splicing isoforms from MS-based proteomics experiments with more coverage and exclusive focus on alternative splicing. The approach can help generate novel hypotheses on molecular risk factors and molecular mechanisms of cancer in early stage, leading to identification of potentially highly specific alternative splicing isoform biomarkers for early detection of cancer.
Collapse
|
707
|
Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Girón CG, Gordon L, Hourlier T, Hunt S, Johnson N, Juettemann T, Kähäri AK, Keenan S, Kulesha E, Martin FJ, Maurel T, McLaren WM, Murphy DN, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, Ruffier M, Sheppard D, Taylor K, Thormann A, Trevanion SJ, Vullo A, Wilder SP, Wilson M, Zadissa A, Aken BL, Birney E, Cunningham F, Harrow J, Herrero J, Hubbard TJ, Kinsella R, Muffato M, Parker A, Spudich G, Yates A, Zerbino DR, Searle SM. Ensembl 2014. Nucleic Acids Res 2013; 42:D749-55. [PMID: 24316576 PMCID: PMC3964975 DOI: 10.1093/nar/gkt1196] [Citation(s) in RCA: 1059] [Impact Index Per Article: 96.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.
Collapse
Affiliation(s)
- Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
- *To whom correspondence should be addressed. Tel: +44 1223 492 581; Fax: +44 1223 494 494;
| | - M. Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Daniel Barrell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Kathryn Beal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Simon Brent
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Denise Carvalho-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Peter Clapham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Guy Coates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Stephen Fitzgerald
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laurent Gil
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Sarah Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Nathan Johnson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Thomas Juettemann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Andreas K. Kähäri
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Stephen Keenan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Eugene Kulesha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Fergal J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Thomas Maurel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - William M. McLaren
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Daniel N. Murphy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Rishi Nag
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Bert Overduin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Bethan Pritchard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Emily Pritchard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Harpreet S. Riat
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Daniel Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Anja Thormann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Stephen J. Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Alessandro Vullo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Steven P. Wilder
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Mark Wilson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Amonida Zadissa
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Bronwen L. Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Jennifer Harrow
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Tim J.P. Hubbard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Rhoda Kinsella
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Giulietta Spudich
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Andy Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Daniel R. Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Stephen M.J. Searle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
708
|
Harrow JL, Steward CA, Frankish A, Gilbert JG, Gonzalez JM, Loveland JE, Mudge J, Sheppard D, Thomas M, Trevanion S, Wilming LG. The Vertebrate Genome Annotation browser 10 years on. Nucleic Acids Res 2013; 42:D771-9. [PMID: 24316575 PMCID: PMC3964964 DOI: 10.1093/nar/gkt1241] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The Vertebrate Genome Annotation (VEGA) database (http://vega.sanger.ac.uk), initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).
Collapse
Affiliation(s)
- Jennifer L Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
709
|
Farrah T, Deutsch EW, Omenn GS, Sun Z, Watts JD, Yamamoto T, Shteynberg D, Harris MM, Moritz RL. State of the human proteome in 2013 as viewed through PeptideAtlas: comparing the kidney, urine, and plasma proteomes for the biology- and disease-driven Human Proteome Project. J Proteome Res 2013; 13:60-75. [PMID: 24261998 DOI: 10.1021/pr4010037] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The kidney, urine, and plasma proteomes are intimately related: proteins and metabolic waste products are filtered from the plasma by the kidney and excreted via the urine, while kidney proteins may be secreted into the circulation or released into the urine. Shotgun proteomics data sets derived from human kidney, urine, and plasma samples were collated and processed using a uniform software pipeline, and relative protein abundances were estimated by spectral counting. The resulting PeptideAtlas builds yielded 4005, 2491, and 3553 nonredundant proteins at 1% FDR for the kidney, urine, and plasma proteomes, respectively - for kidney and plasma, the largest high-confidence protein sets to date. The same pipeline applied to all available human data yielded a 2013 Human PeptideAtlas build containing 12,644 nonredundant proteins and at least one peptide for each of ∼14,000 Swiss-Prot entries, an increase over 2012 of ∼7.5% of the predicted human proteome. We demonstrate that abundances are correlated between plasma and urine, examine the most abundant urine proteins not derived from either plasma or kidney, and consider the biomarker potential of proteins associated with renal decline. This analysis forms part of the Biology and Disease-driven Human Proteome Project (B/D-HPP) and is a contribution to the Chromosome-centric Human Proteome Project (C-HPP) special issue.
Collapse
|
710
|
Fagerberg L, Hallström BM, Oksvold P, Kampf C, Djureinovic D, Odeberg J, Habuka M, Tahmasebpoor S, Danielsson A, Edlund K, Asplund A, Sjöstedt E, Lundberg E, Szigyarto CAK, Skogs M, Takanen JO, Berling H, Tegel H, Mulder J, Nilsson P, Schwenk JM, Lindskog C, Danielsson F, Mardinoglu A, Sivertsson A, von Feilitzen K, Forsberg M, Zwahlen M, Olsson I, Navani S, Huss M, Nielsen J, Ponten F, Uhlén M. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics 2013; 13:397-406. [PMID: 24309898 DOI: 10.1074/mcp.m113.035600] [Citation(s) in RCA: 2507] [Impact Index Per Article: 227.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Global classification of the human proteins with regards to spatial expression patterns across organs and tissues is important for studies of human biology and disease. Here, we used a quantitative transcriptomics analysis (RNA-Seq) to classify the tissue-specific expression of genes across a representative set of all major human organs and tissues and combined this analysis with antibody-based profiling of the same tissues. To present the data, we launch a new version of the Human Protein Atlas that integrates RNA and protein expression data corresponding to ∼80% of the human protein-coding genes with access to the primary data for both the RNA and the protein analysis on an individual gene level. We present a classification of all human protein-coding genes with regards to tissue-specificity and spatial expression pattern. The integrative human expression map can be used as a starting point to explore the molecular constituents of the human body.
Collapse
Affiliation(s)
- Linn Fagerberg
- Science for Life Laboratory, KTH - Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
711
|
Gordon C, Petit F, Kroisel P, Jakobsen L, Zechi-Ceide R, Oufadem M, Bole-Feysot C, Pruvost S, Masson C, Tores F, Hieu T, Nitschké P, Lindholm P, Pellerin P, Guion-Almeida M, Kokitsu-Nakata N, Vendramini-Pittoli S, Munnich A, Lyonnet S, Holder-Espinasse M, Amiel J. Mutations in endothelin 1 cause recessive auriculocondylar syndrome and dominant isolated question-mark ears. Am J Hum Genet 2013; 93:1118-25. [PMID: 24268655 DOI: 10.1016/j.ajhg.2013.10.023] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Revised: 10/11/2013] [Accepted: 10/22/2013] [Indexed: 10/26/2022] Open
Abstract
Auriculocondylar syndrome (ACS) is a rare craniofacial disorder with mandibular hypoplasia and question-mark ears (QMEs) as major features. QMEs, consisting of a specific defect at the lobe-helix junction, can also occur as an isolated anomaly. Studies in animal models have indicated the essential role of endothelin 1 (EDN1) signaling through the endothelin receptor type A (EDNRA) in patterning the mandibular portion of the first pharyngeal arch. Mutations in the genes coding for phospholipase C, beta 4 (PLCB4) and guanine nucleotide binding protein (G protein), alpha inhibiting activity polypeptide 3 (GNAI3), predicted to function as signal transducers downstream of EDNRA, have recently been reported in ACS. By whole-exome sequencing (WES), we identified a homozygous substitution in a furin cleavage site of the EDN1 proprotein in ACS-affected siblings born to consanguineous parents. WES of two cases with vertical transmission of isolated QMEs revealed a stop mutation in EDN1 in one family and a missense substitution of a highly conserved residue in the mature EDN1 peptide in the other. Targeted sequencing of EDN1 in an ACS individual with related parents identified a fourth, homozygous mutation falling close to the site of cleavage by endothelin-converting enzyme. The different modes of inheritance suggest that the degree of residual EDN1 activity differs depending on the mutation. These findings provide further support for the hypothesis that ACS and QMEs are uniquely caused by disruption of the EDN1-EDNRA signaling pathway.
Collapse
|
712
|
Mallinjoud P, Villemin JP, Mortada H, Polay Espinoza M, Desmet FO, Samaan S, Chautard E, Tranchevent LC, Auboeuf D. Endothelial, epithelial, and fibroblast cells exhibit specific splicing programs independently of their tissue of origin. Genome Res 2013; 24:511-21. [PMID: 24307554 PMCID: PMC3941115 DOI: 10.1101/gr.162933.113] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Alternative splicing is the main mechanism of increasing the proteome diversity coded by a limited number of genes. It is well established that different tissues or organs express different splicing variants. However, organs are composed of common major cell types, including fibroblasts, epithelial, and endothelial cells. By analyzing large-scale data sets generated by The ENCODE Project Consortium and after extensive RT-PCR validation, we demonstrate that each of the three major cell types expresses a specific splicing program independently of its organ origin. Furthermore, by analyzing splicing factor expression across samples, publicly available splicing factor binding site data sets (CLIP-seq), and exon array data sets after splicing factor depletion, we identified several splicing factors, including ESRP1 and 2, MBNL1, NOVA1, PTBP1, and RBFOX2, that contribute to establishing these cell type–specific splicing programs. All of the analyzed data sets are freely available in a user-friendly web interface named FasterDB, which describes all known splicing variants of human and mouse genes and their splicing patterns across several dozens of normal and cancer cells as well as across tissues. Information regarding splicing factors that potentially contribute to individual exon regulation is also provided via a dedicated CLIP-seq and exon array data visualization interface. To the best of our knowledge, FasterDB is the first database integrating such a variety of large-scale data sets to enable functional genomics analyses at exon-level resolution.
Collapse
Affiliation(s)
- Pierre Mallinjoud
- Inserm UMR-S1052, Centre de Recherche en Cancérologie de Lyon, 69008 Lyon, France
| | | | | | | | | | | | | | | | | |
Collapse
|
713
|
Whelan FJ, Yap NVL, Surette MG, Golding GB, Bowdish DME. A guide to bioinformatics for immunologists. Front Immunol 2013; 4:416. [PMID: 24363654 PMCID: PMC3849744 DOI: 10.3389/fimmu.2013.00416] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Accepted: 11/13/2013] [Indexed: 12/31/2022] Open
Abstract
Bioinformatics includes a suite of methods, which are cheap, approachable, and many of which are easily accessible without any sort of specialized bioinformatic training. Yet, despite this, bioinformatic tools are under-utilized by immunologists. Herein, we review a representative set of publicly available, easy-to-use bioinformatic tools using our own research on an under-annotated human gene, SCARA3, as an example. SCARA3 shares an evolutionary relationship with the class A scavenger receptors, but preliminary research showed that it was divergent enough that its function remained unclear. In our quest for more information about this gene - did it share gene sequence similarities to other scavenger receptors? Did it contain conserved protein domains? Where was it expressed in the human body? - we discovered the power and informative potential of publicly available bioinformatic tools designed for the novice in mind, which allowed us to hypothesize on the regulation, structure, and function of this protein. We argue that these tools are largely applicable to many facets of immunology research.
Collapse
Affiliation(s)
- Fiona J. Whelan
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| | | | - Michael G. Surette
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, ON, Canada
| | - Dawn M. E. Bowdish
- Department of Pathology and Molecular Medicine, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
714
|
Dunn JG, Foo CK, Belletier NG, Gavis ER, Weissman JS. Ribosome profiling reveals pervasive and regulated stop codon readthrough in Drosophila melanogaster. eLife 2013; 2:e01179. [PMID: 24302569 PMCID: PMC3840789 DOI: 10.7554/elife.01179] [Citation(s) in RCA: 265] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Ribosomes can read through stop codons in a regulated manner, elongating rather than terminating the nascent peptide. Stop codon readthrough is essential to diverse viruses, and phylogenetically predicted to occur in a few hundred genes in Drosophila melanogaster, but the importance of regulated readthrough in eukaryotes remains largely unexplored. Here, we present a ribosome profiling assay (deep sequencing of ribosome-protected mRNA fragments) for Drosophila melanogaster, and provide the first genome-wide experimental analysis of readthrough. Readthrough is far more pervasive than expected: the vast majority of readthrough events evolved within D. melanogaster and were not predicted phylogenetically. The resulting C-terminal protein extensions show evidence of selection, contain functional subcellular localization signals, and their readthrough is regulated, arguing for their importance. We further demonstrate that readthrough occurs in yeast and humans. Readthrough thus provides general mechanisms both to regulate gene expression and function, and to add plasticity to the proteome during evolution. DOI: http://dx.doi.org/10.7554/eLife.01179.001.
Collapse
Affiliation(s)
- Joshua G Dunn
- California Institute of Quantitative Biosciences, San Francisco, United States
| | | | | | | | | |
Collapse
|
715
|
The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. Proc Natl Acad Sci U S A 2013; 110:20645-50. [PMID: 24297902 DOI: 10.1073/pnas.1314475110] [Citation(s) in RCA: 205] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Snakes possess many extreme morphological and physiological adaptations. Identification of the molecular basis of these traits can provide novel understanding for vertebrate biology and medicine. Here, we study snake biology using the genome sequence of the Burmese python (Python molurus bivittatus), a model of extreme physiological and metabolic adaptation. We compare the python and king cobra genomes along with genomic samples from other snakes and perform transcriptome analysis to gain insights into the extreme phenotypes of the python. We discovered rapid and massive transcriptional responses in multiple organ systems that occur on feeding and coordinate major changes in organ size and function. Intriguingly, the homologs of these genes in humans are associated with metabolism, development, and pathology. We also found that many snake metabolic genes have undergone positive selection, which together with the rapid evolution of mitochondrial proteins, provides evidence for extensive adaptive redesign of snake metabolic pathways. Additional evidence for molecular adaptation and gene family expansions and contractions is associated with major physiological and phenotypic adaptations in snakes; genes involved are related to cell cycle, development, lungs, eyes, heart, intestine, and skeletal structure, including GRB2-associated binding protein 1, SSH, WNT16, and bone morphogenetic protein 7. Finally, changes in repetitive DNA content, guanine-cytosine isochore structure, and nucleotide substitution rates indicate major shifts in the structure and evolution of snake genomes compared with other amniotes. Phenotypic and physiological novelty in snakes seems to be driven by system-wide coordination of protein adaptation, gene expression, and changes in the structure of the genome.
Collapse
|
716
|
Gong S, Ware JS, Walsh R, Cook SA. NECTAR: a database of codon-centric missense variant annotations. Nucleic Acids Res 2013; 42:D1013-9. [PMID: 24297257 PMCID: PMC3965063 DOI: 10.1093/nar/gkt1245] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
NECTAR (Non-synonymous Enriched Coding muTation ARchive; http://nectarmutation.org) is a database and web application to annotate disease-related and functionally important amino acids in human proteins. A number of tools are available to facilitate the interpretation of DNA variants identified in diagnostic or research sequencing. These typically identify previous reports of DNA variation at a given genomic location, predict its effects on transcript and protein sequence and may predict downstream functional consequences. Previous reports and functional annotations are typically linked by the genomic location of the variant observed. NECTAR collates disease-causing variants and functionally important amino acid residues from a number of sources. Importantly, rather than simply linking annotations by a shared genomic location, NECTAR annotates variants of interest with details of previously reported variation affecting the same codon. This provides a much richer data set for the interpretation of a novel DNA variant. NECTAR also identifies functionally equivalent amino acid residues in evolutionarily related proteins (paralogues) and, where appropriate, transfers annotations between them. As well as accessing these data through a web interface, users can upload batches of variants in variant call format (VCF) for annotation on-the-fly. The database is freely available to download from the ftp site: ftp://ftp.nectarmutation.org.
Collapse
Affiliation(s)
- Sungsam Gong
- NIHR Cardiovascular Biomedical Research Unit, Royal Brompton and Harefield NHS Foundation Trust and Imperial College London, London SW3 6NP, UK, National Heart and Lung Institute, Imperial College, London SW3 6LY, UK, National Heart Centre Singapore, Singapore 168752, Singapore and Cardiovascular & Metabolic Disorders, Duke National University of Singapore, Singapore 169857, Singapore
| | | | | | | |
Collapse
|
717
|
Seitan VC, Faure AJ, Zhan Y, McCord RP, Lajoie BR, Ing-Simmons E, Lenhard B, Giorgetti L, Heard E, Fisher AG, Flicek P, Dekker J, Merkenschlager M. Cohesin-based chromatin interactions enable regulated gene expression within preexisting architectural compartments. Genome Res 2013; 23:2066-77. [PMID: 24002784 PMCID: PMC3847776 DOI: 10.1101/gr.161620.113] [Citation(s) in RCA: 252] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 08/28/2013] [Indexed: 01/09/2023]
Abstract
Chromosome conformation capture approaches have shown that interphase chromatin is partitioned into spatially segregated Mb-sized compartments and sub-Mb-sized topological domains. This compartmentalization is thought to facilitate the matching of genes and regulatory elements, but its precise function and mechanistic basis remain unknown. Cohesin controls chromosome topology to enable DNA repair and chromosome segregation in cycling cells. In addition, cohesin associates with active enhancers and promoters and with CTCF to form long-range interactions important for gene regulation. Although these findings suggest an important role for cohesin in genome organization, this role has not been assessed on a global scale. Unexpectedly, we find that architectural compartments are maintained in noncycling mouse thymocytes after genetic depletion of cohesin in vivo. Cohesin was, however, required for specific long-range interactions within compartments where cohesin-regulated genes reside. Cohesin depletion diminished interactions between cohesin-bound sites, whereas alternative interactions between chromatin features associated with transcriptional activation and repression became more prominent, with corresponding changes in gene expression. Our findings indicate that cohesin-mediated long-range interactions facilitate discrete gene expression states within preexisting chromosomal compartments.
Collapse
Affiliation(s)
- Vlad C. Seitan
- Lymphocyte Development Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, United Kingdom
| | - Andre J. Faure
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Ye Zhan
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Rachel Patton McCord
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Bryan R. Lajoie
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Elizabeth Ing-Simmons
- Lymphocyte Development Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, United Kingdom
- Computational Regulatory Genomics Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, United Kingdom
| | - Boris Lenhard
- Computational Regulatory Genomics Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, United Kingdom
| | | | | | - Amanda G. Fisher
- Lymphocyte Development Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, United Kingdom
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Job Dekker
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Matthias Merkenschlager
- Lymphocyte Development Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, United Kingdom
| |
Collapse
|
718
|
Marques AC, Hughes J, Graham B, Kowalczyk MS, Higgs DR, Ponting CP. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol 2013; 14:R131. [PMID: 24289259 PMCID: PMC4054604 DOI: 10.1186/gb-2013-14-11-r131] [Citation(s) in RCA: 145] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Accepted: 11/29/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mammalian transcriptomes contain thousands of long noncoding RNAs (lncRNAs). Some lncRNAs originate from intragenic enhancers which, when active, behave as alternative promoters producing transcripts that are processed using the canonical signals of their host gene. We have followed up this observation by analyzing intergenic lncRNAs to determine the extent to which they might also originate from intergenic enhancers. RESULTS We integrated high-resolution maps of transcriptional initiation and transcription to annotate a conservative set of intergenic lncRNAs expressed in mouse erythroblasts. We subclassified intergenic lncRNAs according to chromatin status at transcriptional initiation regions, defined by relative levels of histone H3K4 mono- and trimethylation. These transcripts are almost evenly divided between those arising from enhancer-associated (elncRNA) or promoter-associated (plncRNA) elements. These two classes of 5' capped and polyadenylated RNA transcripts are indistinguishable with regard to their length, number of exons or transcriptional orientation relative to their closest neighboring gene. Nevertheless, elncRNAs are more tissue-restricted, less highly expressed and less well conserved during evolution. Of considerable interest, we found that expression of elncRNAs, but not plncRNAs, is associated with enhanced expression of neighboring protein-coding genes during erythropoiesis. CONCLUSIONS We have determined globally the sites of initiation of intergenic lncRNAs in erythroid cells, allowing us to distinguish two similarly abundant classes of transcripts. Different correlations between the levels of elncRNAs, plncRNAs and expression of neighboring genes suggest that functional lncRNAs from the two classes may play contrasting roles in regulating the transcript abundance of local or distal loci.
Collapse
|
719
|
Abstract
In this review, we present an overview of the recent advances of genomic technologies applied to studies of fish species belonging to the superclass of Osteichthyes (bony fish) with a major emphasis on the infraclass of Teleostei, also called teleosts. This superclass that represents more than 50% of all known vertebrate species has gained considerable attention from genome researchers in the last decade. We discuss many examples that demonstrate that this highly deserved attention is currently leading to new opportunities for answering important biological questions on gene function and evolutionary processes. In addition to giving an overview of the technologies that have been applied for studying various fish species we put the recent advances in genome research on the model species zebrafish and medaka in the context of its impact for studies of all fish of the superclass of Osteichthyes. We thereby want to illustrate how the combined value of research on model species together with a broad angle perspective on all bony fish species will have a huge impact on research in all fields of fundamental science and will speed up applications in many societally important areas such as the development of new medicines, toxicology test systems, environmental sensing systems and sustainable aquaculture strategies.
Collapse
|
720
|
Xie C, Yuan J, Li H, Li M, Zhao G, Bu D, Zhu W, Wu W, Chen R, Zhao Y. NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Res 2013; 42:D98-103. [PMID: 24285305 PMCID: PMC3965073 DOI: 10.1093/nar/gkt1222] [Citation(s) in RCA: 338] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
NONCODE (http://www.bioinfo.org/noncode/) is an integrated knowledge database dedicated to non-coding RNAs (excluding tRNAs and rRNAs). Non-coding RNAs (ncRNAs) have been implied in diseases and identified to play important roles in various biological processes. Since NONCODE version 3.0 was released 2 years ago, discovery of novel ncRNAs has been promoted by high-throughput RNA sequencing (RNA-Seq). In this update of NONCODE, we expand the ncRNA data set by collection of newly identified ncRNAs from literature published in the last 2 years and integration of the latest version of RefSeq and Ensembl. Particularly, the number of long non-coding RNA (lncRNA) has increased sharply from 73 327 to 210 831. Owing to similar alternative splicing pattern to mRNAs, the concept of lncRNA genes was put forward to help systematic understanding of lncRNAs. The 56 018 and 46 475 lncRNA genes were generated from 95 135 and 67 628 lncRNAs for human and mouse, respectively. Additionally, we present expression profile of lncRNA genes by graphs based on public RNA-seq data for human and mouse, as well as predict functions of these lncRNA genes. The improvements brought to the database also include an incorporation of an ID conversion tool from RefSeq or Ensembl ID to NONCODE ID and a service of lncRNA identification. NONCODE is also accessible through http://www.noncode.org/.
Collapse
Affiliation(s)
- Chaoyong Xie
- Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, University of Chinese Academy of Sciences, Beijing 100049, China, Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China and Taicang Institute of Life Sciences Information, Suzhou 215400, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
721
|
MacArthur JAL, Morales J, Tully RE, Astashyn A, Gil L, Bruford EA, Larsson P, Flicek P, Dalgleish R, Maglott DR, Cunningham F. Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants. Nucleic Acids Res 2013; 42:D873-8. [PMID: 24285302 PMCID: PMC3965024 DOI: 10.1093/nar/gkt1198] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Locus Reference Genomic (LRG; http://www.lrg-sequence.org/) records contain internationally recognized stable reference sequences designed specifically for reporting clinically relevant sequence variants. Each LRG is contained within a single file consisting of a stable ‘fixed’ section and a regularly updated ‘updatable’ section. The fixed section contains stable genomic DNA sequence for a genomic region, essential transcripts and proteins for variant reporting and an exon numbering system. The updatable section contains mapping information, annotation of all transcripts and overlapping genes in the region and legacy exon and amino acid numbering systems. LRGs provide a stable framework that is vital for reporting variants, according to Human Genome Variation Society (HGVS) conventions, in genomic DNA, transcript or protein coordinates. To enable translation of information between LRG and genomic coordinates, LRGs include mapping to the human genome assembly. LRGs are compiled and maintained by the National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). LRG reference sequences are selected in collaboration with the diagnostic and research communities, locus-specific database curators and mutation consortia. Currently >700 LRGs have been created, of which >400 are publicly available. The aim is to create an LRG for every locus with clinical implications.
Collapse
Affiliation(s)
- Jacqueline A L MacArthur
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, National Center for Biotechnology Information, Bethesda, MD 20894, USA, and Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
722
|
Sulakhe D, Balasubramanian S, Xie B, Feng B, Taylor A, Wang S, Berrocal E, Dave U, Xu J, Börnigen D, Gilliam TC, Maltsev N. Lynx: a database and knowledge extraction engine for integrative medicine. Nucleic Acids Res 2013; 42:D1007-12. [PMID: 24270788 PMCID: PMC3965040 DOI: 10.1093/nar/gkt1166] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
We have developed Lynx (http://lynx.ci.uchicago.edu)—a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Computation Institute, the University of Chicago, Chicago, IL 60637, USA, Department of Human Genetics, the University of Chicago, Chicago, IL 60637, USA, Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60616, USA and Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
723
|
Faulconbridge A, Burdett T, Brandizi M, Gostev M, Pereira R, Vasant D, Sarkans U, Brazma A, Parkinson H. Updates to BioSamples database at European Bioinformatics Institute. Nucleic Acids Res 2013; 42:D50-2. [PMID: 24265224 PMCID: PMC3965081 DOI: 10.1093/nar/gkt1081] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The BioSamples database at the EBI (http://www.ebi.ac.uk/biosamples) provides an integration point for BioSamples information between technology specific databases at the EBI, projects such as ENCODE and reference collections such as cell lines. The database delivers a unified query interface and API to query sample information across EBI's databases and provides links back to assay databases. Sample groups are used to manage related samples, e.g. those from an experimental submission, or a single reference collection. Infrastructural improvements include a new user interface with ontological and key word queries, a new query API, a new data submission API, complete RDF data download and a supporting SPARQL endpoint, accessioning at the point of submission to the European Nucleotide Archive and European Genotype Phenotype Archives and improved query response times.
Collapse
Affiliation(s)
- Adam Faulconbridge
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
724
|
Abstract
The Mouse Phenome Database (MPD; phenome.jax.org) was launched in 2001 as the data coordination center for the international Mouse Phenome Project. MPD integrates quantitative phenotype, gene expression and genotype data into a common annotated framework to facilitate query and analysis. MPD contains >3500 phenotype measurements or traits relevant to human health, including cancer, aging, cardiovascular disorders, obesity, infectious disease susceptibility, blood disorders, neurosensory disorders, drug addiction and toxicity. Since our 2012 NAR report, we have added >70 new data sets, including data from Collaborative Cross lines and Diversity Outbred mice. During this time we have completely revamped our homepage, improved search and navigational aspects of the MPD application, developed several web-enabled data analysis and visualization tools, annotated phenotype data to public ontologies, developed an ontology browser and released new single nucleotide polymorphism query functionality with much higher density coverage than before. Here, we summarize recent data acquisitions and describe our latest improvements.
Collapse
Affiliation(s)
- Stephen C Grubb
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 USA
| | | | | |
Collapse
|
725
|
Abstract
Ever growing interest in microRNAs has immensely populated the number of resources and research papers devoted to the field and, as a result, it becomes more and more demanding to find miRNA data of interest. To mitigate this problem, we created miRNEST database (http://mirnest.amu.edu.pl), an integrative microRNAs resource. In its updated version, named miRNEST 2.0, the database is complemented with our extensive miRNA predictions from deep sequencing libraries, data from plant degradome analyses, results of pre-miRNA classification with HuntMi and miRNA splice sites information. We also added download and upload options and improved the user interface to make it easier to browse through miRNA records.
Collapse
Affiliation(s)
- Michał W. Szcześniak
- *To whom correspondence should be addressed. Tel: +48 61 829 5836; Fax: +48 61 829 5949;
| | - Izabela Makałowska
- Correspondence may also be addressed to Izabela Makałowska. Tel: +48 61 829 5835; Fax: +48 61 829 5949;
| |
Collapse
|
726
|
Pawson AJ, Sharman JL, Benson HE, Faccenda E, Alexander SPH, Buneman OP, Davenport AP, McGrath JC, Peters JA, Southan C, Spedding M, Yu W, Harmar AJ. The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands. Nucleic Acids Res 2013; 42:D1098-106. [PMID: 24234439 PMCID: PMC3965070 DOI: 10.1093/nar/gkt1143] [Citation(s) in RCA: 792] [Impact Index Per Article: 72.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The International Union of Basic and Clinical Pharmacology/British Pharmacological Society (IUPHAR/BPS) Guide to PHARMACOLOGY (http://www.guidetopharmacology.org) is a new open access resource providing pharmacological, chemical, genetic, functional and pathophysiological data on the targets of approved and experimental drugs. Created under the auspices of the IUPHAR and the BPS, the portal provides concise, peer-reviewed overviews of the key properties of a wide range of established and potential drug targets, with in-depth information for a subset of important targets. The resource is the result of curation and integration of data from the IUPHAR Database (IUPHAR-DB) and the published BPS ‘Guide to Receptors and Channels’ (GRAC) compendium. The data are derived from a global network of expert contributors, and the information is extensively linked to relevant databases, including ChEMBL, DrugBank, Ensembl, PubChem, UniProt and PubMed. Each of the ∼6000 small molecule and peptide ligands is annotated with manually curated 2D chemical structures or amino acid sequences, nomenclature and database links. Future expansion of the resource will complete the coverage of all the targets of currently approved drugs and future candidate targets, alongside educational resources to guide scientists and students in pharmacological principles and techniques.
Collapse
Affiliation(s)
- Adam J Pawson
- The University/BHF Centre for Cardiovascular Science, The Queen's Medical Research Institute, University of Edinburgh, Edinburgh EH16 4TJ, UK, School of Biomedical Sciences, Life Sciences E Floor, University of Nottingham Medical School, Queen's Medical Centre, Nottingham NG7 2UH, UK, Laboratory for Foundations of Computer Science, School of Informatics, 10 Crichton Street, University of Edinburgh, Edinburgh EH8 9AB, UK, Clinical Pharmacology Unit, Level 6, Centre for Clinical Investigation, Box 110, Addenbrooke's Hospital, University of Cambridge, Cambridge CB2 0QQ, UK, School of Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK, Neuroscience Division, Medical Education Institute, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK and Spedding Research Solutions SARL, 6 Rue Ampere, Le Vésinet 78110, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
727
|
Guruceaga E, Segura V. Functional interpretation of microRNA-mRNA association in biological systems using R. Comput Biol Med 2013; 44:124-31. [PMID: 24377695 DOI: 10.1016/j.compbiomed.2013.11.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Revised: 10/30/2013] [Accepted: 11/03/2013] [Indexed: 12/24/2022]
Abstract
The prediction of microRNA targets is a challenging task that has given rise to several prediction algorithms. Databases of predicted targets can be used in a microRNA target enrichment analysis, enhancing our capacity to extract functional information from gene lists. However, the available tools in this field analyze gene sets one by one limiting their use in a meta-analysis. Here, we present an R system for miRNA enrichment analysis that is suitable for systems biology. These collection of R scripts and embedded data allow using predicted targets of public databases or a custom integration of them. As a proof-of-principle, we have successfully performed the challenging analysis of 2158 tumoral samples at a time. The obtained results have been summarized in a network where each cancer disease is linked to enriched miRNAs and overrepresented functions. These network connections have proven to be an invaluable resource for the study of biological and pathological causes and effects of the expression of miRNAs.
Collapse
Affiliation(s)
- Elizabeth Guruceaga
- Unit of Proteomics, Genomics and Bioinformatics, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona, Spain.
| | - Victor Segura
- Unit of Proteomics, Genomics and Bioinformatics, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona, Spain.
| |
Collapse
|
728
|
Raney BJ, Dreszer TR, Barber GP, Clawson H, Fujita PA, Wang T, Nguyen N, Paten B, Zweig AS, Karolchik D, Kent WJ. Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics 2013; 30:1003-5. [PMID: 24227676 PMCID: PMC3967101 DOI: 10.1093/bioinformatics/btt637] [Citation(s) in RCA: 286] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
SUMMARY Track data hubs provide an efficient mechanism for visualizing remotely hosted Internet-accessible collections of genome annotations. Hub datasets can be organized, configured and fully integrated into the University of California Santa Cruz (UCSC) Genome Browser and accessed through the familiar browser interface. For the first time, individuals can use the complete browser feature set to view custom datasets without the overhead of setting up and maintaining a mirror. AVAILABILITY AND IMPLEMENTATION Source code for the BigWig, BigBed and Genome Browser software is freely available for non-commercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, implemented in C and supported on Linux. Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/. Binary Alignment/Map (BAM) and Variant Call Format (VCF)/tabix utilities are available from http://samtools.sourceforge.net/ and http://vcftools.sourceforge.net/. The UCSC Genome Browser is publicly accessible at http://genome.ucsc.edu.
Collapse
Affiliation(s)
- Brian J Raney
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
729
|
Moretti S, Laurenczy B, Gharib WH, Castella B, Kuzniar A, Schabauer H, Studer RA, Valle M, Salamin N, Stockinger H, Robinson-Rechavi M. Selectome update: quality control and computational improvements to a database of positive selection. Nucleic Acids Res 2013; 42:D917-21. [PMID: 24225318 PMCID: PMC3964977 DOI: 10.1093/nar/gkt1065] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Selectome (http://selectome.unil.ch/) is a database of positive selection, based on a branch-site likelihood test. This model estimates the number of nonsynonymous substitutions (dN) and synonymous substitutions (dS) to evaluate the variation in selective pressure (dN/dS ratio) over branches and over sites. Since the original release of Selectome, we have benchmarked and implemented a thorough quality control procedure on multiple sequence alignments, aiming to provide minimum false-positive results. We have also improved the computational efficiency of the branch-site test implementation, allowing larger data sets and more frequent updates. Release 6 of Selectome includes all gene trees from Ensembl for Primates and Glires, as well as a large set of vertebrate gene trees. A total of 6810 gene trees have some evidence of positive selection. Finally, the web interface has been improved to be more responsive and to facilitate searches and browsing.
Collapse
Affiliation(s)
- Sébastien Moretti
- Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland, Evolutionary Bioinformatics group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland, Vital-IT group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland, Computational Phylogenetics group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland, Division of Biosciences, Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK and Swiss National Supercomputing Centre (CSCS), CH-6900, Lugano, Switzerland
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
730
|
Farrell CM, O'Leary NA, Harte RA, Loveland JE, Wilming LG, Wallin C, Diekhans M, Barrell D, Searle SMJ, Aken B, Hiatt SM, Frankish A, Suner MM, Rajput B, Steward CA, Brown GR, Bennett R, Murphy M, Wu W, Kay MP, Hart J, Rajan J, Weber J, Snow C, Riddick LD, Hunt T, Webb D, Thomas M, Tamez P, Rangwala SH, McGarvey KM, Pujar S, Shkeda A, Mudge JM, Gonzalez JM, Gilbert JGR, Trevanion SJ, Baertsch R, Harrow JL, Hubbard T, Ostell JM, Haussler D, Pruitt KD. Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res 2013; 42:D865-72. [PMID: 24217909 PMCID: PMC3965069 DOI: 10.1093/nar/gkt1059] [Citation(s) in RCA: 112] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.
Collapse
Affiliation(s)
- Catherine M Farrell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA, Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
731
|
Monaco MK, Stein J, Naithani S, Wei S, Dharmawardhana P, Kumari S, Amarasinghe V, Youens-Clark K, Thomason J, Preece J, Pasternak S, Olson A, Jiao Y, Lu Z, Bolser D, Kerhornou A, Staines D, Walts B, Wu G, D'Eustachio P, Haw R, Croft D, Kersey PJ, Stein L, Jaiswal P, Ware D. Gramene 2013: comparative plant genomics resources. Nucleic Acids Res 2013; 42:D1193-9. [PMID: 24217918 PMCID: PMC3964986 DOI: 10.1093/nar/gkt1110] [Citation(s) in RCA: 120] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
Collapse
Affiliation(s)
- Marcela K Monaco
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA, Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA, EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK, Informatics and Bio-computing Program, Ontario Institute of Cancer Research, Toronto M5G 1L7, Canada, Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA and NAA Plant, Soil & Nutrition Laboratory Research Unit, USDA-ARS, Ithaca, NY 14853, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
732
|
Assembly errors cause false tandem duplicate regions in the chicken (Gallus gallus) genome sequence. Chromosoma 2013; 123:165-8. [PMID: 24213641 DOI: 10.1007/s00412-013-0443-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Revised: 10/25/2013] [Accepted: 10/28/2013] [Indexed: 10/26/2022]
Abstract
The complexity of eukaryote genomes makes assembly errors inevitable in the process of constructing reference genomes. Next-generation sequencing (NGS) could provide an efficient way to validate previously assembled genomes. Here, we exploited NGS data to interrogate the chicken reference genome and identified 35 pairs of nearly identical regions with >99.5 % sequence similarity and a median size of 109 kb. Several lines of evidence, including read depth, the composition of junction sequences, and sequence similarity, suggest that these regions present genome assembly errors and should be excluded from forthcoming genomic studies.
Collapse
|
733
|
Wang Y, Liu Z, Cheng H, Gao T, Pan Z, Yang Q, Guo A, Xue Y. EKPD: a hierarchical database of eukaryotic protein kinases and protein phosphatases. Nucleic Acids Res 2013; 42:D496-502. [PMID: 24214991 PMCID: PMC3965077 DOI: 10.1093/nar/gkt1121] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present here EKPD (http://ekpd.biocuckoo.org), a hierarchical database of eukaryotic protein kinases (PKs) and protein phosphatases (PPs), the key molecules responsible for the reversible phosphorylation of proteins that are involved in almost all aspects of biological processes. As extensive experimental and computational efforts have been carried out to identify PKs and PPs, an integrative resource with detailed classification and annotation information would be of great value for both experimentalists and computational biologists. In this work, we first collected 1855 PKs and 347 PPs from the scientific literature and various public databases. Based on previously established rationales, we classified all of the known PKs and PPs into a hierarchical structure with three levels, i.e. group, family and individual PK/PP. There are 10 groups with 149 families for the PKs and 10 groups with 33 families for the PPs. We constructed 139 and 27 Hidden Markov Model profiles for PK and PP families, respectively. Then we systematically characterized ∼50,000 PKs and >10,000 PPs in eukaryotes. In addition, >500 PKs and >400 PPs were computationally identified by ortholog search. Finally, the online service of the EKPD database was implemented in PHP + MySQL + JavaScript.
Collapse
Affiliation(s)
- Yongbo Wang
- Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | | | | | | | | | | | | | | |
Collapse
|
734
|
Major E, Rigó K, Hague T, Bérces A, Juhos S. HLA typing from 1000 genomes whole genome and whole exome illumina data. PLoS One 2013; 8:e78410. [PMID: 24223151 PMCID: PMC3819389 DOI: 10.1371/journal.pone.0078410] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 09/12/2013] [Indexed: 11/23/2022] Open
Abstract
Specific HLA genotypes are known to be linked to either resistance or susceptibility to certain diseases or sensitivity to certain drugs. In addition, high accuracy HLA typing is crucial for organ and bone marrow transplantation. The most widespread high resolution HLA typing method used to date is Sanger sequencing based typing (SBT), and next generation sequencing (NGS) based HLA typing is just starting to be adopted as a higher throughput, lower cost alternative. By HLA typing the HapMap subset of the public 1000 Genomes paired Illumina data, we demonstrate that HLA-A, B and C typing is possible from exome sequencing samples with higher than 90% accuracy. The older 1000 Genomes whole genome sequencing read sets are less reliable and generally unsuitable for the purpose of HLA typing. We also propose using coverage % (the extent of exons covered) as a quality check (QC) measure to increase reliability.
Collapse
Affiliation(s)
| | | | - Tim Hague
- Omixon Biocomputing, Budapest, Hungary
| | | | | |
Collapse
|
735
|
Romagné F, Santesmasses D, White L, Sarangi GK, Mariotti M, Hübler R, Weihmann A, Parra G, Gladyshev VN, Guigó R, Castellano S. SelenoDB 2.0: annotation of selenoprotein genes in animals and their genetic diversity in humans. Nucleic Acids Res 2013; 42:D437-43. [PMID: 24194593 PMCID: PMC3965025 DOI: 10.1093/nar/gkt1045] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
SelenoDB (http://www.selenodb.org) aims to provide high-quality annotations of selenoprotein genes, proteins and SECIS elements. Selenoproteins are proteins that contain the amino acid selenocysteine (Sec) and the first release of the database included annotations for eight species. Since the release of SelenoDB 1.0 many new animal genomes have been sequenced. The annotations of selenoproteins in new genomes usually contain many errors in major databases. For this reason, we have now fully annotated selenoprotein genes in 58 animal genomes. We provide manually curated annotations for human selenoproteins, whereas we use an automatic annotation pipeline to annotate selenoprotein genes in other animal genomes. In addition, we annotate the homologous genes containing cysteine (Cys) instead of Sec. Finally, we have surveyed genetic variation in the annotated genes in humans. We use exon capture and resequencing approaches to identify single-nucleotide polymorphisms in more than 50 human populations around the world. We thus present a detailed view of the genetic divergence of Sec- and Cys-containing genes in animals and their diversity in humans. The addition of these datasets into the second release of the database provides a valuable resource for addressing medical and evolutionary questions in selenium biology.
Collapse
Affiliation(s)
- Frédéric Romagné
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain and Department of Medicine, Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
736
|
Rescue of a primary myelofibrosis model by retinoid-antagonist therapy. Proc Natl Acad Sci U S A 2013; 110:18820-5. [PMID: 24191050 DOI: 10.1073/pnas.1318974110] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Molecular targeting of the two receptor interaction domains of the epigenetic repressor silencing mediator of retinoid and thyroid hormone receptors (SMRT(mRID)) produced a transplantable skeletal syndrome that reduced radial bone growth, increased numbers of bone-resorbing periosteal osteoclasts, and increased bone fracture risk. Furthermore, SMRT(mRID) mice develop spontaneous primary myelofibrosis, a chronic, usually idiopathic disorder characterized by progressive bone marrow fibrosis. Frequently linked to polycythemia vera and chronic myeloid leukemia, myelofibrosis displays high patient morbidity and mortality, and current treatment is mostly palliative. To decipher the etiology of this disease, we identified the thrombopoietin (Tpo) gene as a target of the SMRT-retinoic acid receptor signaling pathway in bone marrow stromal cells. Chronic induction of Tpo in SMRT(mRID) mice results in up-regulation of TGF-β and PDGF in megakaryocytes, uncontrolled proliferation of bone marrow reticular cells, and fibrosis of the marrow compartment. Of therapeutic relevance, we show that this syndrome can be rescued by retinoid antagonists, demonstrating that the physical interface between SMRT and retinoic acid receptor can be a potential therapeutic target to block primary myelofibrosis disease progression.
Collapse
|
737
|
Coonrod EM, Margraf RL, Russell A, Voelkerding KV, Reese MG. Clinical analysis of genome next-generation sequencing data using the Omicia platform. Expert Rev Mol Diagn 2013; 13:529-40. [PMID: 23895124 DOI: 10.1586/14737159.2013.811907] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
AIMS Next-generation sequencing is being implemented in the clinical laboratory environment for the purposes of candidate causal variant discovery in patients affected with a variety of genetic disorders. The successful implementation of this technology for diagnosing genetic disorders requires a rapid, user-friendly method to annotate variants and generate short lists of clinically relevant variants of interest. This report describes Omicia's Opal platform, a new software tool designed for variant discovery and interpretation in a clinical laboratory environment. The software allows clinical scientists to process, analyze, interpret and report on personal genome files. MATERIALS & METHODS To demonstrate the software, the authors describe the interactive use of the system for the rapid discovery of disease-causing variants using three cases. RESULTS & CONCLUSION Here, the authors show the features of the Opal system and their use in uncovering variants of clinical significance.
Collapse
Affiliation(s)
- Emily M Coonrod
- ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA.
| | | | | | | | | |
Collapse
|
738
|
Kumar A, Bhandari A, Sarde SJ, Goswami C. Sequence, phylogenetic and variant analyses of antithrombin III. Biochem Biophys Res Commun 2013; 440:714-24. [DOI: 10.1016/j.bbrc.2013.09.134] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Accepted: 09/29/2013] [Indexed: 10/26/2022]
|
739
|
Gallego-Ortega D, Oakes SR, Lee HJ, Piggin CL, Ormandy CJ. ELF5, normal mammary development and the heterogeneous phenotypes of breast cancer. BREAST CANCER MANAGEMENT 2013. [DOI: 10.2217/bmt.13.50] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
SUMMARY The ETS transcription factor ELF5 specifies the formation of the secretory cell lineage of the mammary gland during pregnancy, by directing cell fate decisions of the mammary progenitor cells. The decision-making activity continues in breast cancer, where in luminal breast cancer cells forced ELF5 expression suppresses estrogen sensitivity and shifts gene expression toward the basal molecular subtype. The development of anti-estrogen resistance in luminal breast cancer is accompanied by increased expression of ELF5 and acquired dependence on ELF5 for continued proliferation, providing a potential new therapeutic target or prognostic marker to improve the treatment of this stage of the disease. Forced ELF5 expression suppresses the mesenchymal phenotype, making cells more epithelial and producing lower rates of invasion and motility. Conversely, loss of ELF5 promotes metastasis, with a clear corollary in the claudin-low subtype of breast cancer, which does not express ELF5 and is highly metastatic, or during the final stages of tumor progression, where loss of ELF5 expression may be involved in the acquisition of the lethal phenotype. In circumstances where ELF5 expression increases in parallel with metastatic potential, such as anti-estrogen resistant luminal breast cancers and basal breast cancer, there is much more to be understood about ELF5 and metastasis.
Collapse
Affiliation(s)
- David Gallego-Ortega
- Cancer Research Program, Garvan Institute of Medical Research & The Kinghorn Cancer Centre, 384 Victoria Street, Darlinghurst, NSW 2010, Australia
- St Vincent‘s Clinical School, St Vincent‘s Hospital Faculty of Medicine, University of New South Wales, NSW, Australia
| | - Samantha R Oakes
- Cancer Research Program, Garvan Institute of Medical Research & The Kinghorn Cancer Centre, 384 Victoria Street, Darlinghurst, NSW 2010, Australia
- St Vincent‘s Clinical School, St Vincent‘s Hospital Faculty of Medicine, University of New South Wales, NSW, Australia
| | - Heather J Lee
- Cancer Research Program, Garvan Institute of Medical Research & The Kinghorn Cancer Centre, 384 Victoria Street, Darlinghurst, NSW 2010, Australia
- St Vincent‘s Clinical School, St Vincent‘s Hospital Faculty of Medicine, University of New South Wales, NSW, Australia
| | - Catherine L Piggin
- Cancer Research Program, Garvan Institute of Medical Research & The Kinghorn Cancer Centre, 384 Victoria Street, Darlinghurst, NSW 2010, Australia
- St Vincent‘s Clinical School, St Vincent‘s Hospital Faculty of Medicine, University of New South Wales, NSW, Australia
| | - Christopher J Ormandy
- Cancer Research Program, Garvan Institute of Medical Research & The Kinghorn Cancer Centre, 384 Victoria Street, Darlinghurst, NSW 2010, Australia
| |
Collapse
|
740
|
Rhee JK, Shin SY, Zhang BT. Construction of microRNA functional families by a mixture model of position weight matrices. PeerJ 2013; 1:e199. [PMID: 24255813 PMCID: PMC3817585 DOI: 10.7717/peerj.199] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Accepted: 10/10/2013] [Indexed: 12/23/2022] Open
Abstract
MicroRNAs (miRNAs) are small regulatory molecules that repress the translational processes of their target genes by binding to their 3′ untranslated regions (3′ UTRs). Because the target genes are predominantly determined by their sequence complementarity to the miRNA seed regions (nucleotides 2–7) which are evolutionarily conserved, it is inferred that the target relationships and functions of the miRNA family members are conserved across many species. Therefore, detecting the relevant miRNA families with confidence would help to clarify the conserved miRNA functions, and elucidate miRNA-mediated biological processes. We present a mixture model of position weight matrices for constructing miRNA functional families. This model systematically finds not only evolutionarily conserved miRNA family members but also functionally related miRNAs, as it simultaneously generates position weight matrices representing the conserved sequences. Using mammalian miRNA sequences, in our experiments, we identified potential miRNA groups characterized by similar sequence patterns that have common functions. We validated our results using score measures and by the analysis of the conserved targets. Our method would provide a way to comprehensively identify conserved miRNA functions.
Collapse
Affiliation(s)
- Je-Keun Rhee
- Interdisciplinary Program in Bioinformatics, Seoul National University , Seoul , Korea
| | | | | |
Collapse
|
741
|
Michel AM, Fox G, M Kiran A, De Bo C, O'Connor PBF, Heaphy SM, Mullan JPA, Donohue CA, Higgins DG, Baranov PV. GWIPS-viz: development of a ribo-seq genome browser. Nucleic Acids Res 2013; 42:D859-64. [PMID: 24185699 PMCID: PMC3965066 DOI: 10.1093/nar/gkt1035] [Citation(s) in RCA: 175] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
We describe the development of GWIPS-viz (http://gwips.ucc.ie), an online genome browser for viewing ribosome profiling data. Ribosome profiling (ribo-seq) is a recently developed technique that provides genome-wide information on protein synthesis (GWIPS) in vivo. It is based on the deep sequencing of ribosome-protected messenger RNA (mRNA) fragments, which allows the ribosome density along all mRNA transcripts present in the cell to be quantified. Since its inception, ribo-seq has been carried out in a number of eukaryotic and prokaryotic organisms. Owing to the increasing interest in ribo-seq, there is a pertinent demand for a dedicated ribo-seq genome browser. GWIPS-viz is based on The University of California Santa Cruz (UCSC) Genome Browser. Ribo-seq tracks, coupled with mRNA-seq tracks, are currently available for several genomes: human, mouse, zebrafish, nematode, yeast, bacteria (Escherichia coli K12, Bacillus subtilis), human cytomegalovirus and bacteriophage lambda. Our objective is to continue incorporating published ribo-seq data sets so that the wider community can readily view ribosome profiling information from multiple studies without the need to carry out computational processing.
Collapse
Affiliation(s)
- Audrey M Michel
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland, School of Medicine & Medical Science, Conway Institute, University College Dublin, Dublin 4, Ireland and Howest, University College West Flanders, Rijselstraat 5, 8200 Bruges, Belgium
| | | | | | | | | | | | | | | | | | | |
Collapse
|
742
|
Abstract
The last decade has seen tremendous effort committed to the annotation of the human genome sequence, most notably perhaps in the form of the ENCODE project. One of the major findings of ENCODE, and other genome analysis projects, is that the human transcriptome is far larger and more complex than previously thought. This complexity manifests, for example, as alternative splicing within protein-coding genes, as well as in the discovery of thousands of long noncoding RNAs. It is also possible that significant numbers of human transcripts have not yet been described by annotation projects, while existing transcript models are frequently incomplete. The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists. In this article, we will discuss the current state of human transcriptome annotation, drawing on our experience gained in generating the GENCODE gene annotation set. We highlight the gaps in our knowledge of transcript functionality that remain, and consider the potential computational and experimental strategies that can be used to help close them. We propose that an understanding of the true overlap between transcriptional complexity and functionality will not be gained in the short term. However, significant steps toward obtaining this knowledge can now be taken by using an integrated strategy, combining all of the experimental resources at our disposal.
Collapse
Affiliation(s)
- Jonathan M Mudge
- Department of Informatics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom
| | | | | |
Collapse
|
743
|
Halbritter F, Kousa AI, Tomlinson SR. GeneProf data: a resource of curated, integrated and reusable high-throughput genomics experiments. Nucleic Acids Res 2013; 42:D851-8. [PMID: 24174536 PMCID: PMC3965072 DOI: 10.1093/nar/gkt966] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
GeneProf Data (http://www.geneprof.org) is an open web resource for analysed functional genomics experiments. We have built up a large collection of completely processed RNA-seq and ChIP-seq studies by carefully and transparently reanalysing and annotating high-profile public data sets. GeneProf makes these data instantly accessible in an easily interpretable, searchable and reusable manner and thus opens up the path to the advantages and insights gained from genome-scale experiments to a broader scientific audience. Moreover, GeneProf supports programmatic access to these data via web services to further facilitate the reuse of experimental data across tools and laboratories.
Collapse
Affiliation(s)
- Florian Halbritter
- Institute for Stem Cell Research, Centre for Regenerative Medicine, School of Biological Sciences, University of Edinburgh, SCRM Building, 5 Little France Drive, Edinburgh EH16 4UU, UK
| | | | | |
Collapse
|
744
|
Varadi M, Kosol S, Lebrun P, Valentini E, Blackledge M, Dunker AK, Felli IC, Forman-Kay JD, Kriwacki RW, Pierattelli R, Sussman J, Svergun DI, Uversky VN, Vendruscolo M, Wishart D, Wright PE, Tompa P. pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins. Nucleic Acids Res 2013; 42:D326-35. [PMID: 24174539 PMCID: PMC3964940 DOI: 10.1093/nar/gkt960] [Citation(s) in RCA: 171] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The goal of pE-DB (http://pedb.vib.be) is to serve as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. Owing to the inherent flexibility of IDPs, solution techniques are particularly appropriate for characterizing their biophysical properties, and structural ensembles in agreement with these data provide a convenient tool for describing the underlying conformational sampling. Database entries consist of (i) primary experimental data with descriptions of the acquisition methods and algorithms used for the ensemble calculations, and (ii) the structural ensembles consistent with these data, provided as a set of models in a Protein Data Bank format. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states.
Collapse
Affiliation(s)
- Mihaly Varadi
- VIB Department of Structural Biology, Vrije Universiteit Brussel, Brussels, European Molecular Biology Laboratory, Hamburg Unit, EMBL c/o DESY, Hamburg, Germany, CEA, CNRS, UJF-Grenoble 1, Protein Dynamics and Flexibility, Institut de Biologie Structurale Jean-Pierre Ebel, 41 Rue Jules Horowitz, Grenoble 38027, France, Indiana University School of Medicine; Indianapolis, IN, USA, Department of Chemistry, Center of Magnetic Resonance (CERM), University of Florence, Sesto Fiorentino, Italy, Molecular Structure and Function Program, Hospital for Sick Children, Toronto, Ontario, Canada, Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada, Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA, Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel, Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow Region, Russia, Department of Chemistry, University of Cambridge, Cambridge, UK, Departments of Biological Sciences and Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada, Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
745
|
MacDonald JR, Ziman R, Yuen RKC, Feuk L, Scherer SW. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res 2013; 42:D986-92. [PMID: 24174537 PMCID: PMC3965079 DOI: 10.1093/nar/gkt958] [Citation(s) in RCA: 872] [Impact Index Per Article: 79.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Over the past decade, the Database of Genomic Variants (DGV; http://dgv.tcag.ca/) has provided a publicly accessible, comprehensive curated catalogue of structural variation (SV) found in the genomes of control individuals from worldwide populations. Here, we describe updates and new features, which have expanded the utility of DGV for both the basic research and clinical diagnostic communities. The current version of DGV consists of 55 published studies, comprising >2.5 million entries identified in >22,300 genomes. Studies included in DGV are selected from the accessioned data sets in the archival SV databases dbVar (NCBI) and DGVa (EBI), and then further curated for accuracy and validity. The core visualization tool (gbrowse) has been upgraded with additional functions to facilitate data analysis and comparison, and a new query tool has been developed to provide flexible and interactive access to the data. The content from DGV is regularly incorporated into other large-scale genome reference databases and represents a standard data resource for new product and database development, in particular for copy number variation testing in clinical labs. The accurate cataloguing of variants in DGV will continue to enable medical genetics and genome sequencing research.
Collapse
Affiliation(s)
- Jeffrey R MacDonald
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada, Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala SE-751 08, Sweden and Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | | | | | | | | |
Collapse
|
746
|
Arnold R, Goldenberg F, Mewes HW, Rattei T. SIMAP--the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage. Nucleic Acids Res 2013; 42:D279-84. [PMID: 24165881 PMCID: PMC3965014 DOI: 10.1093/nar/gkt970] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith-Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads.
Collapse
Affiliation(s)
- Roland Arnold
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Kim Lab, University of Toronto, Toronto, ON M5S 3E1, Canada, CUBE-Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, 1090 Vienna, Austria and Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85764 Neuherberg, Germany
| | | | | | | |
Collapse
|
747
|
Smith CM, Finger JH, Hayamizu TF, McCright IJ, Xu J, Berghout J, Campbell J, Corbani LE, Forthofer KL, Frost PJ, Miers D, Shaw DR, Stone KR, Eppig JT, Kadin JA, Richardson JE, Ringwald M. The mouse Gene Expression Database (GXD): 2014 update. Nucleic Acids Res 2013; 42:D818-24. [PMID: 24163257 PMCID: PMC3965015 DOI: 10.1093/nar/gkt954] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Gene Expression Database (GXD; http://www.informatics.jax.org/expression.shtml) is an extensive and well-curated community resource of mouse developmental expression information. GXD collects different types of expression data from studies of wild-type and mutant mice, covering all developmental stages and including data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot and western blot experiments. The data are acquired from the scientific literature and from researchers, including groups doing large-scale expression studies. Integration with the other data in Mouse Genome Informatics (MGI) and interconnections with other databases places GXD's gene expression information in the larger biological and biomedical context. Since the last report, the utility of GXD has been greatly enhanced by the addition of new data and by the implementation of more powerful and versatile search and display features. Web interface enhancements include the capability to search for expression data for genes associated with specific phenotypes and/or human diseases; new, more interactive data summaries; easy downloading of data; direct searches of expression images via associated metadata; and new displays that combine image data and their associated annotations. At present, GXD includes >1.4 million expression results and 250,000 images that are accessible to our search tools.
Collapse
|
748
|
Dayem Ullah AZ, Cutts RJ, Ghetia M, Gadaleta E, Hahn SA, Crnogorac-Jurcevic T, Lemoine NR, Chelala C. The pancreatic expression database: recent extensions and updates. Nucleic Acids Res 2013; 42:D944-9. [PMID: 24163255 PMCID: PMC3965100 DOI: 10.1093/nar/gkt959] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The Pancreatic Expression Database (PED, http://www.pancreasexpression.org) is the only device currently available for mining of pancreatic cancer literature data. It brings together the largest collection of multidimensional pancreatic data from the literature including genomic, proteomic, microRNA, methylomic and transcriptomic profiles. PED allows the user to ask specific questions on the observed levels of deregulation among a broad range of specimen/experimental types including healthy/patient tissue and body fluid specimens, cell lines and murine models as well as related treatments/drugs data. Here we provide an update to PED, which has been previously featured in the Database issue of this journal. Briefly, PED data content has been substantially increased and expanded to cover methylomics studies. We introduced an extensive controlled vocabulary that records specific details on the samples and added data from large-scale meta-analysis studies. The web interface has been improved/redesigned with a quick search option to rapidly extract information about a gene/protein of interest and an upload option allowing users to add their own data to PED. We added a user guide and implemented integrated graphical tools to overlay and visualize retrieved information. Interoperability with biomart-compatible data sets was significantly improved to allow integrative queries with pancreatic cancer data.
Collapse
Affiliation(s)
- Abu Z Dayem Ullah
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK and Molecular GI-Onkologie (MGO), University of Bochum, Germany
| | | | | | | | | | | | | | | |
Collapse
|
749
|
Abstract
When the human genome project started, the major challenge was how to sequence a 3 billion letter code in an organized and cost-effective manner. When completed, the project had laid the foundation for a huge variety of biomedical fields through the production of a complete human genome sequence, but also had driven the development of laboratory and analytical methods that could produce large amounts of sequencing data cheaply. These technological developments made possible the sequencing of many more vertebrate genomes, which have been necessary for the interpretation of the human genome. They have also enabled large-scale studies of vertebrate genome evolution, as well as comparative and human medicine. In this review, we give examples of evolutionary analysis using a wide variety of time frames—from the comparison of populations within a species to the comparison of species separated by at least 300 million years. Furthermore, we anticipate discoveries related to evolutionary mechanisms, adaptation, and disease to quickly accelerate in the coming years.
Collapse
Affiliation(s)
- Jessica Alföldi
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | | |
Collapse
|
750
|
Abstract
PDBsum, http://www.ebi.ac.uk/pdbsum, is a website providing numerous pictorial analyses of each entry in the Protein Data Bank. It portrays the structural features of all proteins, DNA and ligands in the entry, as well as depicting the interactions between them. The latest features, described here, include annotation of human protein sequences with their naturally occurring amino acid variants, dynamic graphs showing the relationships between related protein domain architectures, analyses of ligand binding clusters across different experimental determinations of the same protein, analyses of tunnels in proteins and new search options.
Collapse
Affiliation(s)
- Tjaart A P de Beer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and Department of Physical Chemistry, Regional Centre of Advanced Technologies and Materials, Faculty of Science, Palacký University Olomouc, tř. 17. listopadu 12, 771 46 Olomouc, Czech Republic
| | | | | | | |
Collapse
|